Next: Possible Transcript Items
Up: Orthographic Transcription
Previous: Orthographic Transcription
Contents
General Rules for Transcription
- Follow the `natural segmentation' of the corpus into the
individual signal files
and create one transcription file or one line in a table
or one entry in a database per signal file.8.6
- Use a standard spelling and character coding.
- Use capital letters only according to your spelling rules; not
at the beginning of sentences.
- If you use punctuation marks, always separate them from the last
word by a white space; in most cases it is even better to omit
punctuation completely.
- Do not use any white space characters in any other meaning than
to separate items in the transcript. For instance do not use a format
where a certain number of blanks is required to mark the beginning of a
turn. This will lead to severe problems in the parser.
- Do not allow any digits in the transcript but represent spoken
digits, cardinals or numbers as their written names, e.g. `456' as
`four hundred fifty six', `6th' as `sixth' or `72.5' as `seventy two
point five'.
- Use a format that is brief8.7 and readable. Unfortunately, formats
that are easy to parse, like XML do not meet this requirement. Therefore,
you might consider using an intermediate format for the transcription
work and transform this format later into something like XML.
Next: Possible Transcript Items
Up: Orthographic Transcription
Previous: Orthographic Transcription
Contents
BITS Projekt-Account
2004-06-01