Bavarian Archive for Speech Signals
Verbmobil I - VM1
Gleiche Seite in deutsch
Recording 1993 - 1996
The Verbmobil corpus consists of two major part: VM I contains
non-overlapping dialog recordings in the scheduling domain (with
exception of VMS1.0) und is distributed in two flavors:
The original edition (VM##.0) contains the unchanged data as used
within the Verbmobil project. The BAS edition (VM##.1) contains
a re-validated version where errors and inconsistencies were fixed
and all available symbolic information was added to the corpus.
The latter consists of the BAS Partitur Files, pronunciation dictionaries,
The volumes VM6, VM8 and VM13 contain American English and 'Denglish'
(= English spoken by native Germans); the volumes VM16-19 contain
Japanese recordings largly conform with the VM I standards
(formats, transliteration, etc);
the volumes VM9-11 were never distributed because
the copyright holder did not allow a usage within the Verbmobil
The term 'spontaneous dialogue' here refers to a complete appointment negotiation between
two speakers. In most cases there more than one negotiation have been recorded
of each pair of speakers.
885 speakers participated in 1422 recordings. The total corpus amounts to
9GB of data containing 23750 conversational turns distributed on 15 CD-R.
The general documentation to the Verbmobil 1 volumes
Information regarding different partitions and pricing
Suggested definition of training, development and test subsets for
the German VM 1 corpus
History for all VMI volumes:
- 05/30/2001 : New edition of all BAS Partitur Files (BPF) based on the
latest error update. This includes a complete new MAUS annotation.
Furthermore, additional previously un-published tiers were added to the
distribution such as Dialogact Annotation, Syntactic-prosodic Labeling,
Prosodic Labeling, Parts-of-Speech-Tagging.
- 06/08/2001 : Edition of the VM Bonus CDROM (VMBONUS) with additional
data and documentation that does not fit into the regular VM volumes;
Edition of the VM Lexicon Database of the University of Bielefeld.
- 13.12.2001 : Errors in BPF tier PRO fixed
- 14.03.2002 : Format error in link list of BPF tier PRO fixed
- 30.01.03 : vm_ger.lex completely re-build:
The German pronunciation dir of VM I+II now contains only the
word items as they appear in the ORT tier of the BPF files.
Also the transcription was unified to a more consistant
concept of a 'canonical form'.
- 19.08.03 : New edition of all BAS Partitur Files (BPF) of German signal data
based on the latest error update:
Some minor bugs in the POS, LMA and SAP tiers fixed.
Complete re-done pronunciation list for German (vm_ger.lex)
according to the new 'Transliteration Conventions for Canonical
Based on the new pronunciation the following tiers in the BPF
files have been re-calculated:
- 20.08.03 : New tier TLN integrated : the TLN tier contains the translation
of the recorded utterance. The transliterations were produced
manually by the University of Tuebingen, Prof. Hinrichs.
The integrated data are also stored on the volume VMBONUS
Please note that the orthographic representation of Japanese
(romanji) in these translations is of the original form as used
in the original Japanese pronunciation list (vm_jap_org.lex).
However, it was never check whether these two data sets (lexicon
and translations) are in fact compatible. Use with caution!
For details about the TLN tier please refer to the BPF documentation
- 09.09.03 : Published defined training, development and test sub sets
Verbmobil data from the BAS edition may also ordered in language groups, e.g.:
thus simplifying the processing of the data.
- all German dialogues
- all American dialogues
Questions and orders: