next up previous contents
Next: WebCommand - Validation Report Up: The Validation of Speech Previous: WebCommand - Specification   Contents

WebCommand - Main Documentation

                     _/_/_/_/         _/_/         _/_/_/_/
                    _/      _/       _/ _/        _/      _/
                   _/      _/       _/  _/       _/
                  _/      _/       _/   _/       _/
                 _/_/_/_/         _/_/_/_/        _/_/_/
                _/      _/       _/     _/             _/
               _/      _/       _/      _/             _/
              _/      _/       _/       _/    _/      _/
             _/_/_/_/         _/        _/     _/_/_/_/


               University of Munich, Institut of Phonetics
               Schellingstr. 3/II, 80799 Munich, Germany

         COPYRIGHT University of Munich 2002. All rights reserved.   
    This corpus and software may not be disseminated further - not even
      partly - without a written permission of the copyright holders.  

                      Additional Copyright Holders
              Siemens Company, Perlach, Munich, Germany - 2002.


WEBCOMMAND 1.1 - on-site recordings for webpad voice control


This is the documentation for the WEBCOMMAND database created in 
Jun - Aug 2002 as a subcontract to Siemens Company.

WEBCOMMAND contains recording sessions of native speakers of 
France and Great Britain. All speakers read a list of 130 prompts from
a screen. They are recorded with two microphones: a high quality headset
and a high quality microphone fixed to a 'webpad' hold on the lap.

------------------- Contents of this file ------------------------

   DVD directory structure
   Recording situation
   Naming conventions
   Signal file formats
   Transcription and error markers
   Annotation format
   Known errors

----------------- DVD directory structure --------------------------

The corpus consists of two DVD-5 with a total size of 7.5 GByte plus a 
CD-ROM with the label files and documentation ('DOCCDROM').

On one DVD (Webcommand_EN, #1) the british speakers are stored; on the 
second DVD (Webcommand_FR, #2) the french speakers.

Recordings are situated in the 'BLOCK' directories:

BLOCK40  :  british, room P, 26 sessions
BLOCK50  :  british, room S, 26 sessions
BLOCK60  :  french, room P, 21 sessions
BLOCK70  :  french, room S, 22 sessions

The corpus contains 47 complete sessions (130 recordings per session). 
Care is taken that each speaker is recorded in complete sessions
in each of the two recording rooms.
Additional incomplete recording sessions (speakers did not record a second
session, or corrupted sessions) are collected in the directories NOT_USED_FR 
(4 sessions) and NOT_USED_EN (7 sessions) respectively.

The CDROM 'DOCCDROM' contains additional documents about the 
corpus recording and annotation as well as pronunciation dictionaries:

PRON_FR.LEX     : Pronunciation dictionary, SAM-PA, french
PRON_EN.LEX     : Pronunciation dictionary, SAM-PA, english
TRANSCRP.PDF    : description of rules and conventions of SpeechDat
                  transcription (German)
TRANSCRP_EN.PDF : description of rules and conventions of SpeechDat
                  transcription (English)
PICS/           : Pictures of the recording setup
BLOCK##/        : SAM annotation files to recording block ##
REPORT.TXT      : this file
SAMEXPORT.TXT   : condensed summary of all SAM label files in one table
SUMMARY.TXT     : SpeechDat conform summary of recordings: foreach recording 
                  session all individual recordings are listed in one line.
                  If a recording is missing, a '-' is listed instead of the 
                  three-digit prompt number.
SPEAKER.TBL     : mapping of 4-digit speaker id to sex, age and mother tongue
SESSION.TBL     : mapping of 4-digit session id to speaker id, place of 
                  recording, date of recording, microphone types, channel
                  mapping, environment

----------------- Recording Situation  --------------------------

Each speaker (complete sessions only!) was recorded in two different 
recording rooms P and S on different days. Each session consists
of 130 prompts as given in the prompt lists doc/PROMPTS*.
The speaker wears a ear-free headset Beyerdynamik NEM 192; the second mic
is a Beyerdynamik MCE 10 mounted on the upper left corner of a dummy
laptop case that the user holds with both hands on his/her lap. 

The recording setup is documented with photos in the directory PICS.

During the recording the user does not have to use the keyboard or the 
mouse. The acoustical environment of both rooms is quiet office environment.
There is only one computer (Mac desktop mounted in front of the speaker);
no other noise sources. The signal of the microphones is amplified by a
Beyerdynamik MV 100 amplifier: headset mic + 20 dB, webpad mic + 20 dB
and then connected to the standard Mic input of the recording Mac.
Each session starts with a short instruction of the speaker, then the
microphones are mounted by the supervisor and a short training session
(not recorded) of 5 prompts is performed. Then the supervisor leaves the
room for the rset of the session. The prompting and recording runs
automatically; for each prompt a fixed time slot of 5.7 sec was recorded. 
The timing is controlled by a 'red light' control: a red light indicates 
not to speak, the yellow light indicates to get ready and then together
with the green light the prompt is displayed and the speaker reads from 
the sreen. After the fixed recording time the red light comes again and
the cycle starts anew.

Recording specs:

Minimum speakers per language                   20
Minimum speakers per sex                        20
Recording sessions per speaker                  2
Prompts per session:                            130 (000-129)
Length per prompt:                              5.7 sec
Sampling rate:                                  22050 Hz
Bits per sample:                                16
File format:                                    WAV stereo
Head set:               Beyerdynamik NEM 192, left channel
Webpad mic:             Beyerdynamik MCE 10,  right channel
Amplifier:              Beyerdynamik MV 100, set to +20dB, LF Cut off

----------------------- Naming conventions ----------------------

Session names are coded as follows:

SES####    where #### denotes the session number

Session numbers starting with '4' : british speaker, room P
Session numbers starting with '5' : british speaker, room S
Session numbers starting with '6' : french speaker, room P
Session numbers starting with '7' : french speaker, room S

e.g. SES6013 is the 13th recording session of a french speaker in 
room P.

A mapping from speaker IDs to sessions, as well as the speaker profile
can be found in the file TABLE/SESSION.TBL

Each recording file is named as follows:

Q1####%%%.WAV     where: #### denotes the session number
                         %%%  denotes the prompt number (000-129)

e.g. Q16013051.WAV contains the two microphone signals in a WAV stereo
file of the 52nd prompt of the 13th recording session of french speakers 
in room P. The channel assignment for the microphones is stored in the

------------------------- Signal file formats ----------------------

All recording files are stored in WAV standard format.
See specs aboce for details.


Transcription and error markers

All recordings were annotated according to SpeechDat conventions.
See the document doc/TRANSCRP.PDF for details about this. 

The transcription files (SAM label format) are stored 
on a separate CD-ROM in a file system hierarchy that mirrors
that of the signal files, i.e.\ BLOCKxx/SESxxxx.

The same information is also stored in a semicolon delimited text file

The SAM label names are the following (this is also the field
order of SAMEXPORT.TXt):

LHD     SAM Header specification
DBN     database name
SES     session number
CMT     comment
SRC     name of signal source file
DIR     directory path of signal file
CCD     corpus code of signal file
BEG     begin recording
END     end recording (in samples)
REP     recording place
RED recording date
RET recording time
CMT     comment
SAM     sample rate
SNB     sample number of bytes
SFB     byte order
QNT     quantization
NCH     number of channels
CMT     comment
SCD     speaker code
SEX     speaker gender
AGE     speaker age
ACC     speaker accent
CMT     comment
MIP     microphone position
MIT     microphone type
ENV     environment
CMT     comment
LBD     label file body
LBR     prompt text
LBO     transcription of utterance
ELF     end of label file


LHD: SAM 6.0
DBN: Siemens WebCommand Database
SES: 6005
CMT: *** Recording data ***
SRC: Q16005004.WAV
CCD: 004
BEG: 0
END: 126064
REP: University of Munich, Phonetics Institute
RED: 04.07.2002
RET: 13:54:42
CMT: *** Signal data ***
SAM: 22054
SNB: 2
SBF: lo_hi
NCH: 2
CMT: *** Speaker data ***
SCD: 1005
AGE: 23
CMT: *** Environment data ***
CMT: *** Label file body ***
LBR: 0,126064,,,,appeler Nicolas Moulin
LBO: 0,63032,126064,appeler Nicolas Moulin


Known errors

Remark: The subdirectories NOT_USED_* contain sessions that are incomplete,
either because speakers were not recorded a second time, or because signal
files were corrupted. 



01.06.02 : start of recording
20.07.02 : start of validation
01.08.02 : end of recording
08.08.02 : end of validation
09.08.02 : delivery date 1.0
19.08.02 : delivery date 1.1 (update of DOCCDROM only)

Angela Baumann 2004-06-03