Next: General Rules for Transcription
Up: Annotation
Previous: Data Model
Contents
Orthographic Transcription
The most basic type of annotation that makes a
collection of speech recordings into a speech corpus is some kind of
orthographic transcription. This can range from a simple chain of words
per recording item (based for instance on the script that was used
during the recording) to an extensive labeling of several different
semantic layers8.5. The choice about what is
to be included in the transcript is dependent on the type of speech
corpus and the intended usage. For example, a corpus of read speech items
over the telephone network with the aim to train automatic speech
recognition algorithms does not need any elaborated labeling of
discourse events. A corpus containing dialogue speech between two or
more persons that is subject to scientific investigations will require
much more effort.
Subsections
BITS Projekt-Account
2004-06-01