next up previous contents
Next: Domain Up: Contents Previous: Contents   Contents


Vocabulary

Probably the simplest way to specify the spoken content is by vocabulary. It is more or less derived automatically from the intended usage of the corpus.

For instance, if the corpus will be used to train a speech recognizer on 11 German digits4.1 and three command words, then the content definition most likely will require an equal distribution for all 14 items of the vocabulary and their repetitions per speaker, e.g.

14 words spoken by 500 speakers with 10 repetitions equals 70000 tokens



BITS Projekt-Account 2004-06-01