Bavarian Archive for Speech Signals
Siemens 1000 (Strange Corpus 7) - SI1000 (SC7)

This corpus contains 1000 utterances from a newspaper corpus read by five male and five female speakers in dictation style. The purpose of this data collection might be either ASR training of dictation systems (SI1000) or evaluation of methods of speaker adaptation (SC7). For the latter purpose the sentence corpus that is spoken by each speaker is divided into a adaptation and test set:

SC7 adaptation set: Utterance 001 - 200
SC7 test set: Utterance 201 - 1000

Tests should be carried out over all 10 speakers to avoid statistical outlyers. (Please keep in mind that if this corpus is used to evaluate speaker adaptation algorithms, the underlying speech recognizer may not be trained with either the adaptation or the test set of this corpus.)

General Corpus Documentation

Contents of the Corpus

Audio files

Speaker CS - Sentence 024
Tarifvereinbarungen , die eine bestimmte Preissteigerung vorwegnähmen , seien unverantwortlich .
Speaker PG - Sentence 037
3. der Bundeshaushalt muß nicht nur kurzfristig in Ordnung gebracht werden .

Revalidation report

Availability and Costs

Free available.
Siemens 1000 - SI1000 (SC7)
5 CDROMs Iso 9660 + Postage
