_/_/_/_/ _/_/ _/_/_/_/
_/ _/ _/ _/ _/ _/
_/ _/ _/ _/ _/
_/ _/ _/ _/ _/
_/_/_/_/ _/_/_/_/ _/_/_/
_/ _/ _/ _/ _/
_/ _/ _/ _/ _/
_/ _/ _/ _/ _/ _/
_/_/_/_/ _/ _/ _/_/_/_/
BAVARIAN ARCHIVE FOR SPEECH SIGNALS
University of Munich, Institut of Phonetics
Schellingstr. 3/II, 80799 Munich, Germany
bas@bas.uni-muenchen.de
COPYRIGHT University of Munich 2002. All rights reserved.
This corpus and software may not be disseminated further - not even
partly - without a written permission of the copyright holders.
Additional Copyright Holders
Siemens Company, Perlach, Munich, Germany - 2002.
----------------------------------------------------------------------
WEBCOMMAND 1.1 - on-site recordings for webpad voice control
----------------------------------------------------------------------
This is the documentation for the WEBCOMMAND database created in
Jun - Aug 2002 as a subcontract to Siemens Company.
WEBCOMMAND contains recording sessions of native speakers of
France and Great Britain. All speakers read a list of 130 prompts from
a screen. They are recorded with two microphones: a high quality headset
and a high quality microphone fixed to a 'webpad' hold on the lap.
------------------- Contents of this file ------------------------
DVD directory structure
Recording situation
Naming conventions
Signal file formats
Transcription and error markers
Annotation format
Known errors
History
----------------- DVD directory structure --------------------------
The corpus consists of two DVD-5 with a total size of 7.5 GByte plus a
CD-ROM with the label files and documentation ('DOCCDROM').
On one DVD (WebCommand_EN, #1) the british speakers are stored; on the
second DVD (WebCommand_FR, #2) the french speakers.
Recordings are situated in the 'BLOCK' directories:
BLOCK40 : british, room P, 26 sessions
BLOCK50 : british, room S, 26 sessions
BLOCK60 : french, room P, 21 sessions
BLOCK70 : french, room S, 22 sessions
The corpus contains 47 complete sessions (130 recordings per session).
Care is taken that each speaker is recorded in complete sessions
in each of the two recording rooms.
Additional incomplete recording sessions (speakers did not record a second
session, or corrupted sessions) are collected in the directories NOT_USED_FR
(4 sessions) and NOT_USED_EN (7 sessions) respectively.
The CDROM 'DOCCDROM' contains additional documents about the
corpus recording and annotation as well as pronunciation dictionaries:
PRON_FR.LEX : Pronunciation dictionary, SAM-PA, french
PRON_EN.LEX : Pronunciation dictionary, SAM-PA, english
TRANSCRP.PDF : description of rules and conventions of SpeechDat
transcription (German)
TRANSCRP_EN.PDF : description of rules and conventions of SpeechDat
transcription (English)
PICS/ : Pictures of the recording setup
BLOCK##/ : SAM annotation files to recording block ##
REPORT.TXT : this file
SAMEXPORT.TXT : condensed summary of all SAM label files in one table
SUMMARY.TXT : SpeechDat conform summary of recordings: foreach recording
session all individual recordings are listed in one line.
If a recording is missing, a '-' is listed instead of the
three-digit prompt number.
SPEAKER.TBL : mapping of 4-digit speaker id to sex, age and mother tongue
SESSION.TBL : mapping of 4-digit session id to speaker id, place of
recording, date of recording, microphone types, channel
mapping, environment
----------------- Recording Situation --------------------------
Each speaker (complete sessions only!) was recorded in two different
recording rooms P and S on different days. Each session consists
of 130 prompts as given in the prompt lists doc/PROMPTS*.
The speaker wears a ear-free headset Beyerdynamik NEM 192; the second mic
is a Beyerdynamik MCE 10 mounted on the upper left corner of a dummy
laptop case that the user holds with both hands on his/her lap.
The recording setup is documented with photos in the directory PICS.
During the recording the user does not have to use the keyboard or the
mouse. The acoustical environment of both rooms is quiet office environment.
There is only one computer (Mac desktop mounted in front of the speaker);
no other noise sources. The signal of the microphones is amplified by a
Beyerdynamik MV 100 amplifier: headset mic + 20 dB, webpad mic + 20 dB
and then connected to the standard Mic input of the recording Mac.
Each session starts with a short instruction of the speaker, then the
microphones are mounted by the supervisor and a short training session
(not recorded) of 5 prompts is performed. Then the supervisor leaves the
room for the rset of the session. The prompting and recording runs
automatically; for each prompt a fixed time slot of 5.7 sec was recorded.
The timing is controlled by a 'red light' control: a red light indicates
not to speak, the yellow light indicates to get ready and then together
with the green light the prompt is displayed and the speaker reads from
the sreen. After the fixed recording time the red light comes again and
the cycle starts anew.
Recording specs:
Minimum speakers per language 20
Minimum speakers per sex 20
Recording sessions per speaker 2
Prompts per session: 130 (000-129)
Length per prompt: 5.7 sec
Sampling rate: 22050 Hz
Bits per sample: 16
File format: WAV stereo
Head set: Beyerdynamik NEM 192, left channel
Webpad mic: Beyerdynamik MCE 10, right channel
Amplifier: Beyerdynamik MV 100, set to +20dB, LF Cut off
----------------------- Naming conventions ----------------------
Session names are coded as follows:
SES#### where #### denotes the session number
Session numbers starting with '4' : british speaker, room P
Session numbers starting with '5' : british speaker, room S
Session numbers starting with '6' : french speaker, room P
Session numbers starting with '7' : french speaker, room S
e.g. SES6013 is the 13th recording session of a french speaker in
room P.
A mapping from speaker IDs to sessions, as well as the speaker profile
can be found in the file TABLE/SESSION.TBL
Each recording file is named as follows:
Q1####%%%.WAV where: #### denotes the session number
%%% denotes the prompt number (000-129)
e.g. Q16013051.WAV contains the two microphone signals in a WAV stereo
file of the 52nd prompt of the 13th recording session of french speakers
in room P. The channel assignment for the microphones is stored in the
file TABLE/SESSION.TBL
------------------------- Signal file formats ----------------------
All recording files are stored in WAV standard format.
See specs aboce for details.
-------------------------------------------------------------------------
Transcription and error markers
All recordings were annotated according to SpeechDat conventions.
See the document doc/TRANSCRP.PDF for details about this.
The transcription files (SAM label format) are stored
on a separate CD-ROM in a file system hierarchy that mirrors
that of the signal files, i.e.\ BLOCKxx/SESxxxx.
The same information is also stored in a semicolon delimited text file
SAMEXPORT.TXT.
The SAM label names are the following (this is also the field
order of SAMEXPORT.TXt):
LHD SAM Header specification
DBN database name
SES session number
CMT comment
SRC name of signal source file
DIR directory path of signal file
CCD corpus code of signal file
BEG begin recording
END end recording (in samples)
REP recording place
RED recording date
RET recording time
CMT comment
SAM sample rate
SNB sample number of bytes
SFB byte order
QNT quantization
NCH number of channels
CMT comment
SCD speaker code
SEX speaker gender
AGE speaker age
ACC speaker accent
CMT comment
MIP microphone position
MIT microphone type
ENV environment
CMT comment
LBD label file body
LBR prompt text
LBO transcription of utterance
ELF end of label file
e.g.
LHD: SAM 6.0
DBN: Siemens WebCommand Database
SES: 6005
CMT: *** Recording data ***
SRC: Q16005004.WAV
DIR: BLOCK60/SES6005
CCD: 004
BEG: 0
END: 126064
REP: University of Munich, Phonetics Institute
RED: 04.07.2002
RET: 13:54:42
CMT: *** Signal data ***
SAM: 22054
SNB: 2
SBF: lo_hi
QNT: PCM
NCH: 2
CMT: *** Speaker data ***
SCD: 1005
SEX: F
AGE: 23
ACC: FR
CMT: *** Environment data ***
MIP: HEADSET=RIGHT, WEBPAD=LEFT
MIT: HEADSET=BEYERDYNAMIC_NEM_192,WEBPAD=BEYERDYNAMIC_MCE_10
ENV: P-ROOM
CMT: *** Label file body ***
LBD:
LBR: 0,126064,,,,appeler Nicolas Moulin
LBO: 0,63032,126064,appeler Nicolas Moulin
ELF:
-------------------------------------------------------------------------
Known errors
Remark: The subdirectories NOT_USED_* contain sessions that are incomplete,
either because speakers were not recorded a second time, or because signal
files were corrupted.
-------------------------------------------------------------------------
History
01.06.02 : start of recording
20.07.02 : start of validation
01.08.02 : end of recording
08.08.02 : end of validation
09.08.02 : delivery date 1.0
19.08.02 : delivery date 1.1 (update of DOCCDROM only)