Next: Character code checks
 Up: Automatic Validation of Data
 Previous: Annotation, meta data and
     Contents 
 Annotation and lexicon contents
If not already done in the previous steps5.4,
write a simple script to extract labels from the annotation files and
check them for inconsistencies. 
- Cross-check the found labels with the 
documentation of the labeling. Are all found labels documented? Are
there any documented labels not found in the annotations? 
 
- Report any digits or numerals that are not written in their full 
orthographic form.
 
- Report any punctuation used in the annotations. There shouldn't be
any except in cases where they are separated from other items by 
white space and have a special meaning (for instance prosodic).
 
- Report any words that are written with an initial capital because
they are at the beginning of a sentence.
 
- Cross-check all words extracted from the transcripts with the 
spelling in the orthographic part of the lexicon.
 
Also you might check the timing information
in label files for overlapping segments or gaps between segments, if this
should not happen according to your reference. 
 
 
 
  
 Next: Character code checks
 Up: Automatic Validation of Data
 Previous: Annotation, meta data and
     Contents 
Angela Baumann
2004-06-03