next up previous contents
Next: Annotation and lexicon contents Up: Automatic Validation of Data Previous: Signal format   Contents

$\bigcirc$ Annotation, meta data and lexicon file format

Are all annotation and meta data files and the lexicon parsable? If you are lucky, they contain XML with a corresponding DTD or XML scheme description in the documentation; if not, write a simple crude parser to check them. Report any non-parsable formats, because they are essentially not usable.

Do all annotation files, the meta data files and the lexicon have consistent line terminators? DOS requires a combination of CR (Hex 0D) followed by LF (Hex 0A), while UNIX requires only the LF (Hex 0A). Mixed usage of the line terminators may be caused by working on mixed platforms. They may cause problems when parsing the annotation files later.

A simple test for all lines in a annotation file to be DOS-compatible would be5.2:

  cat $file | tr '\r' '&' | grep -v '&$' > /dev/null
  if ( $status == 0 ) then 
    echo "WARNING: $file contains lines not DOS-compatible"
To check for UNIX conformity, simply delete the grep option -v5.3.

Angela Baumann 2004-06-03