The pre-validation and the final validation have been done by the producer itself, although we recommend asking a third independent institution for both. However, this might be justified because of the relatively small size of the corpus and the very constrained budget of the client.
In the following, the corpus specification of WebCommand will be presented
in the manner of a check list. The elements of this check list have already been
discussed in this order
in chapter .
If elements are not applicable for WebCommand, they're marked with a
`n.a.'.
Speaker Profiles | Speakers are native speakers of British English or French and at least 18 years old. Gender distribution is 50:50, all dialects allowed, education level not specified |
Number of Speakers | At least 40 speakers had to be recorded, 20 for British English and 20 for French. The number of male and female speakers had to be preferably equal in every language. |
Contents: | The contents of the corpus were specified by the client in form of a plain text command list. The text corpus was fixed - that is all speakers recorded in one recording room spoke the same corpus of 135 command words. There are in total four text corpora: one for each of the two recording environments (see below) in the languages British English and French. |
- Vocabulary | English: 163 words; French: 188 words |
- Domain | Control commands and names |
- Task | No task specified |
- Phonologic Distribution | No distribution specified |
Speaking Style: | |
- Read Speech | + |
- Answering Speech | - |
- Command/Control Speech | - |
- Non Prompted Speech | - |
- Spontaneous Speech | - |
- Neutral/Emotional | - |
Recording Setup: | On-site Recording |
- Acoustical environment | Each speaker is to be recorded on-site in two different recording rooms P and S on different days. The acoustical background consisted only of the hum of the recording device which was a regular Macintosh Desktop PC approx. 50 cm from the head of the speaker. The PCs were rated to be rather silent. |
- Script | Speakers read prompts from the CRT display in their native language |
- Background noise | no artificial background noise specified |
- Microphones | The speaker wears an ear-free headset Beyerdynamik NEM 192; a second Beyerdynamik MCE 10 is mounted on the upper left corner of a dummy laptop case that the user holds with both hands on his/her lap to simulate free speaking. |
Technical Specifications: | |
- Sampling Rate | 22050 Hz |
- Sample Type and Width | Sample Type: linear, not compressed. |
- Number of Channels | Two channels recording: left channel: Beyerdynamik NEM 192; right channel:Beyerdynamik MCE 10. |
- Signal File Format | File format: WAV stereo (RIFF) |
- Annotation File Format | SAM annotation files according to SpeechDat specifications and a summarized annotation table for each recording block. |
- Meta Data File Format | Table SPEAKER.TBL gives a mapping of 4-digit speaker id to sex, age and mother tongue. Table SESSION.TBL contains a mapping of 4-digit session id to speaker id, place of recording, microphone types, channel mapping, environment. The file SUMMARY.TXT contains the SpeechDat compliant summary of recordings: for each recording session all individual recordings are listed in the line. If a recording is missing, a `-' is listed instead of the three-digit prompt number. |
- Lexicon Format | Two-column plain text file: orthography and pronunciation coded in SAM-PA |
Corpus Structure: | |
- Structure | Recordings are stored in separate subdirectories for each combination of recording environment and language. The corpus contains 47 complete sessions (130 recordings per session). Care is taken that each speaker is recorded in complete sessions in each of the two recording rooms. Additional incomplete recording sessions are collected in the directories NOT_USED_FR (4 sessions) and NOT_USED_EN (7 sessions) respectively. Signal data are stored on DVD; a separate CDROM contains documentation, annotation files and pronunciation dictionaries. |
- Terminology | Session names are coded as SES#*** where # codes the
combination of environment and language and *** encodes the session number,
e.g. SES6013 is the 13th recording session of a French speaker in
room P.
A mapping from speaker IDs to sessions, as well as the speaker profile
can be found in the file SESSION.TBL.
A recording file name is encoded as Q1#***YYYY.WAV where YYYY denotes the number of the text prompt (000-129) e.g. Q16013051.WAV contains the two microphone signals in a WAV stereo file of the 52nd prompt of the 13th recording session of French speakers in room P. The channel assignment for the microphones is stored in the file SESSION.TBL. |
- Distribution Media | The corpus consists of two DVD-5 with a total size of 7.5 GByte plus a CD-ROM with the label files and documentation. On one DVD the data of the British speakers are stored; on the second DVD the data of the French speakers. |