_/_/_/_/ _/_/ _/_/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/_/_/_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/ _/ _/_/_/_/ BAVARIAN ARCHIVE FOR SPEECH SIGNALS University of Munich, Institut of Phonetics Schellingstr. 3/II, 80799 Munich, Germany bas@bas.uni-muenchen.de COPYRIGHT University of Munich 2002. All rights reserved. This corpus and software may not be disseminated further - not even partly - without a written permission of the copyright holders. Additional Copyright Holders Siemens Company, Perlach, Munich, Germany - 2002. ---------------------------------------------------------------------- WEBCOMMAND 1.1 - on-site recordings for webpad voice control ---------------------------------------------------------------------- This is the documentation for the WEBCOMMAND database created in Jun - Aug 2002 as a subcontract to Siemens Company. WEBCOMMAND contains recording sessions of native speakers of France and Great Britain. All speakers read a list of 130 prompts from a screen. They are recorded with two microphones: a high quality headset and a high quality microphone fixed to a 'webpad' hold on the lap. ------------------- Contents of this file ------------------------ DVD directory structure Recording situation Naming conventions Signal file formats Transcription and error markers Annotation format Known errors History ----------------- DVD directory structure -------------------------- The corpus consists of two DVD-5 with a total size of 7.5 GByte plus a CD-ROM with the label files and documentation ('DOCCDROM'). On one DVD (WebCommand_EN, #1) the british speakers are stored; on the second DVD (WebCommand_FR, #2) the french speakers. Recordings are situated in the 'BLOCK' directories: BLOCK40 : british, room P, 26 sessions BLOCK50 : british, room S, 26 sessions BLOCK60 : french, room P, 21 sessions BLOCK70 : french, room S, 22 sessions The corpus contains 47 complete sessions (130 recordings per session). Care is taken that each speaker is recorded in complete sessions in each of the two recording rooms. Additional incomplete recording sessions (speakers did not record a second session, or corrupted sessions) are collected in the directories NOT_USED_FR (4 sessions) and NOT_USED_EN (7 sessions) respectively. The CDROM 'DOCCDROM' contains additional documents about the corpus recording and annotation as well as pronunciation dictionaries: PRON_FR.LEX : Pronunciation dictionary, SAM-PA, french PRON_EN.LEX : Pronunciation dictionary, SAM-PA, english TRANSCRP.PDF : description of rules and conventions of SpeechDat transcription (German) TRANSCRP_EN.PDF : description of rules and conventions of SpeechDat transcription (English) PICS/ : Pictures of the recording setup BLOCK##/ : SAM annotation files to recording block ## REPORT.TXT : this file SAMEXPORT.TXT : condensed summary of all SAM label files in one table SUMMARY.TXT : SpeechDat conform summary of recordings: foreach recording session all individual recordings are listed in one line. If a recording is missing, a '-' is listed instead of the three-digit prompt number. SPEAKER.TBL : mapping of 4-digit speaker id to sex, age and mother tongue SESSION.TBL : mapping of 4-digit session id to speaker id, place of recording, date of recording, microphone types, channel mapping, environment ----------------- Recording Situation -------------------------- Each speaker (complete sessions only!) was recorded in two different recording rooms P and S on different days. Each session consists of 130 prompts as given in the prompt lists doc/PROMPTS*. The speaker wears a ear-free headset Beyerdynamik NEM 192; the second mic is a Beyerdynamik MCE 10 mounted on the upper left corner of a dummy laptop case that the user holds with both hands on his/her lap. The recording setup is documented with photos in the directory PICS. During the recording the user does not have to use the keyboard or the mouse. The acoustical environment of both rooms is quiet office environment. There is only one computer (Mac desktop mounted in front of the speaker); no other noise sources. The signal of the microphones is amplified by a Beyerdynamik MV 100 amplifier: headset mic + 20 dB, webpad mic + 20 dB and then connected to the standard Mic input of the recording Mac. Each session starts with a short instruction of the speaker, then the microphones are mounted by the supervisor and a short training session (not recorded) of 5 prompts is performed. Then the supervisor leaves the room for the rset of the session. The prompting and recording runs automatically; for each prompt a fixed time slot of 5.7 sec was recorded. The timing is controlled by a 'red light' control: a red light indicates not to speak, the yellow light indicates to get ready and then together with the green light the prompt is displayed and the speaker reads from the sreen. After the fixed recording time the red light comes again and the cycle starts anew. Recording specs: Minimum speakers per language 20 Minimum speakers per sex 20 Recording sessions per speaker 2 Prompts per session: 130 (000-129) Length per prompt: 5.7 sec Sampling rate: 22050 Hz Bits per sample: 16 File format: WAV stereo Head set: Beyerdynamik NEM 192, left channel Webpad mic: Beyerdynamik MCE 10, right channel Amplifier: Beyerdynamik MV 100, set to +20dB, LF Cut off ----------------------- Naming conventions ---------------------- Session names are coded as follows: SES#### where #### denotes the session number Session numbers starting with '4' : british speaker, room P Session numbers starting with '5' : british speaker, room S Session numbers starting with '6' : french speaker, room P Session numbers starting with '7' : french speaker, room S e.g. SES6013 is the 13th recording session of a french speaker in room P. A mapping from speaker IDs to sessions, as well as the speaker profile can be found in the file TABLE/SESSION.TBL Each recording file is named as follows: Q1####%%%.WAV where: #### denotes the session number %%% denotes the prompt number (000-129) e.g. Q16013051.WAV contains the two microphone signals in a WAV stereo file of the 52nd prompt of the 13th recording session of french speakers in room P. The channel assignment for the microphones is stored in the file TABLE/SESSION.TBL ------------------------- Signal file formats ---------------------- All recording files are stored in WAV standard format. See specs aboce for details. ------------------------------------------------------------------------- Transcription and error markers All recordings were annotated according to SpeechDat conventions. See the document doc/TRANSCRP.PDF for details about this. The transcription files (SAM label format) are stored on a separate CD-ROM in a file system hierarchy that mirrors that of the signal files, i.e.\ BLOCKxx/SESxxxx. The same information is also stored in a semicolon delimited text file SAMEXPORT.TXT. The SAM label names are the following (this is also the field order of SAMEXPORT.TXt): LHD SAM Header specification DBN database name SES session number CMT comment SRC name of signal source file DIR directory path of signal file CCD corpus code of signal file BEG begin recording END end recording (in samples) REP recording place RED recording date RET recording time CMT comment SAM sample rate SNB sample number of bytes SFB byte order QNT quantization NCH number of channels CMT comment SCD speaker code SEX speaker gender AGE speaker age ACC speaker accent CMT comment MIP microphone position MIT microphone type ENV environment CMT comment LBD label file body LBR prompt text LBO transcription of utterance ELF end of label file e.g. LHD: SAM 6.0 DBN: Siemens WebCommand Database SES: 6005 CMT: *** Recording data *** SRC: Q16005004.WAV DIR: BLOCK60/SES6005 CCD: 004 BEG: 0 END: 126064 REP: University of Munich, Phonetics Institute RED: 04.07.2002 RET: 13:54:42 CMT: *** Signal data *** SAM: 22054 SNB: 2 SBF: lo_hi QNT: PCM NCH: 2 CMT: *** Speaker data *** SCD: 1005 SEX: F AGE: 23 ACC: FR CMT: *** Environment data *** MIP: HEADSET=RIGHT, WEBPAD=LEFT MIT: HEADSET=BEYERDYNAMIC_NEM_192,WEBPAD=BEYERDYNAMIC_MCE_10 ENV: P-ROOM CMT: *** Label file body *** LBD: LBR: 0,126064,,,,appeler Nicolas Moulin LBO: 0,63032,126064,appeler Nicolas Moulin ELF: ------------------------------------------------------------------------- Known errors Remark: The subdirectories NOT_USED_* contain sessions that are incomplete, either because speakers were not recorded a second time, or because signal files were corrupted. ------------------------------------------------------------------------- History 01.06.02 : start of recording 20.07.02 : start of validation 01.08.02 : end of recording 08.08.02 : end of validation 09.08.02 : delivery date 1.0 19.08.02 : delivery date 1.1 (update of DOCCDROM only)