next up previous contents
Next: WebCommand Up: The Production of Speech Previous: Check List Distribution   Contents


Examples

The third part of this cookbook describes the specifications of three prototypical speech corpora: WebCommand, SpeechDat and SmartKom.

WebCommand is an example for a low-cost small-size corpus production, SpeechDat describes the specs of an international and commercial speech corpus production in the field of telephony, and finally SmartKom is a good example for a complex scientific corpus collection of multi-modal data including speech data.

  WebCommand SpeechDat Smartkom
Content Commands Diverse Dialogue
Language English/French 13 European German
Speaker 40 5000 400
Type Read Read Spontaneous
Signal Online Telephone Online
Channels 2 1 9
Environment Office Field Studio
Size 9 GB 30 GB 25 GB
Annotation SpeechDat SpeechDat SK Transliteration

The examples are non-fictitious and by no means meant as role models for an ideal corpus specification. The descriptions were taken from the real corpus contents and missing or badly designed contents are commented on accordingly.

To make the link to the remaining contents of this cookbook easier and to simplify comparisons between the different corpora styles the main description of each corpus is structured in a table more or less according to chapter [*] of this cookbook.



Subsections
next up previous contents
Next: WebCommand Up: The Production of Speech Previous: Check List Distribution   Contents
BITS Projekt-Account 2004-06-01