_/_/_/_/         _/_/         _/_/_/_/
                    _/      _/       _/ _/        _/      _/
                   _/      _/       _/  _/       _/
                  _/      _/       _/   _/       _/
                 _/_/_/_/         _/_/_/_/        _/_/_/
                _/      _/       _/     _/             _/
               _/      _/       _/      _/             _/
              _/      _/       _/       _/    _/      _/
             _/_/_/_/         _/        _/     _/_/_/_/


                   BAVARIAN ARCHIVE FOR SPEECH SIGNALS

               University of Munich, Institut of Phonetics
               Schellingstr. 3/II, 80799 Munich, Germany
                      bas@phonetik.uni-muenchen.de


Infos to SpeechDatII Data Sets
==============================

Version 	1.0

K. Proell 13.09.2004

This document contains information regarding the usage of the German
SpeechDat(II) speech corpus for ASR or other experiments where a defined
distinction between training, development and test set is necessary.

-----------------------------------------------------------------------

Division and basic numbers
--------------------------

The official SpeechDat(II) training and test (split up in development and test) sessions of the fixed network database are used. The subsets of the mobile data defined here with official SpeechDat algorithm.

Basic numbers of the basic subsets:

SET		WORDS	TURNS	LEX	SPEAK
---------------------------------------------
TRAIN_FIX	843384	150867	23246	3500
DEV_FIX		37886	6421	802*	250
DEV_MOBIL	32168	7027	947*	250
TEST_FIX	34086	6403	807**	250
TEST_MOBIL	32070	7085	921**	250
---------------------------------------------
* here combined DEV_FIXMOBIL lexicon with 1179 words
** here combined TEST_FIXMOBIL lexicon with 1179 words

The trainset include all utterance types:

TYPE			CORPUS CODE
-----------------------------------
isolated digit items		I
digit/number strings		B,C
natural number(s)		N
money amounts			M
yes/no questions		Q
dates				D
times				T
application keywords/keyphrases	A
word spotting phrase		E
directory assistance names	O
spellings			L
phonetically rich words		W
phonetically rich sentences	S
partner specific material*	Y
-----------------------------------
* speaker gender question, birthdate request, speaker region question, today's date

Utterance types O,W,S are excluded from the development and test sets.

Examples ASR Results
--------------------

Using the above defined subsets we obtain currently (Sept 2004) the
following accuracies using a HTK recognizer and a bigram
trained solely on the training corpus:

Trained on TRAIN_FIX;
Tested on DEV_FIX and DEV_MOBIL sets with lexicon DEV_FIXMOBIL.lex (total: 1179 lexical entries):

DEV_FIX:	WA = 68.61%
DEV_MOBIL:	WA = 48.14%

Test "HOME and PUBLIC environments" on mobile network data (lexicon DEV_FIXMOBIL.lex):
The calls from mobile development set are divided into two parts, dependent on the environment of the call.("HOME": home; "PUBLIC": public, street, vehicle)

DEV_MOBIL_HOME(15462 words):	WA = 58.45%
DEV_MOBIL_PUBLIC(16705 words):	WA = 38.59%

Test "Noiseless" on fixed and mobil network data (lexicon DEV_FIXMOBIL.lex):
Utterances with mispronunciation, unintelligible speech or truncations, stationary noise [sta] and
intermittent noise [int] are excluded from the development sets.

DEV_FIX_noiseless(15462 words):		WA = 70.32%
DEV_MOBIL_noiseless(13163 words):	WA = 55.79%

Some more details for those who are interested:
12 Standard MFCC + Energy + velocity + acceleration (39)
Diagonal covariance matrices
3-5 states per phoneme
40 phoneme classes (extended German SAMPA) + garbage + voice garbage +
  silence (43)
Models initialized using the flat start procedure
Re-estimation and splitting mixtures after 6 iterations on total TRAIN;
  testing after every two iterations on DEV_FIX (61 iterations)
  Optimal performance with 256 mixtures per state.
Weight of language model fixed to 6.5 (option -s); word end penalty -15 (option -p); beam search width 100.0

No testing on TEST until now.