BAS
Bavarian Archive for Speech Signals
Verbmobil II - VM2
Gleiche Seite in deutsch
Verbmobil II
Recordings 1997 - 2000
The Verbmobil II corpus contains dialogue recordings with overlapping
turns. Furthermore, the scheduling domain was extended and a new
annotation system was developed for the transliteration of the
dialogues (data in VM I were translated into this new format and added to
the BAS edition of VM I). The original edition of VM II differs from the
BAS edition that the whole dialogues including pauses between turns were
recorded in two channels. To be compatible with VM I the signals in the
BAS edition are cut into single files representing one turn each. The
original edition is not listed in this catalogue but interesting parties
may order the original CDs from BAS by the same conditions as the BAS
volumes.
445 speakers participated in 810 recordings (not counting the emotional
speech recordings on volumes 63-65). The total VMII corpus amounts to
17.6 GB of data containing 58961 conversational turns distributed on
39 CD-R (+ 3 CD-R emonional speech).
More detailed information regarding VM II can be found here.
Information regarding different partitions and pricing about the VM corpus
can be found here.
A possible definition of training, development and test subsets for
the German VM corpus can be found
here.
History for all VMII volumes:
- 05/30/2001 : New edition of all BAS Partitur Files (BPF) based on the
latest error update. This includes a complete new MAUS annotation.
Furthermore, additional previously un-published tiers were added to the
distribution such as Syntax Trees, Dialogact Annotation, Syntactic-prosodic Labeling,
Prosodic Labeling, Parts-of-Speech-Tagging.
- 06/08/2001 : Edition of the VM Bonus CDROM (VMBONUS) with additional
data and documentation that does not fit into the regular VM volumes;
Edition of the VM Lexicon Database of the University of Bielefeld.
- 07/10/2001 : Added BPF tiers LBP and LBG
- 12/13/2001 : Errors in BPF tier PRO fixed
- 03/14/2002 : Format error in link list of BPF tier PRO fixed
- 08/01/2002 : Errors in MAU tier of g002acn2_033_AAK.par and
g426acn1_020_AMA.par fixed
- 30.01.03 : vm_ger.lex completely re-build:
The German pronunciation dir of VM I+II now contains only the
word items as they appear in the ORT tier of the BPF files.
Also the transcription was unified to a more consistant
concept of a 'canonical form'.
- 19.08.03 : New edition of all BAS Partitur Files (BPF) of German signal data
based on the latest error update:
Some minor bugs in the POS, LMA and SAP tiers fixed.
Complete re-done pronunciation list for German (vm_ger.lex)
according to the new 'Transliteration Conventions for Canonical
German' (www.bas.uni-muenchen.de/Bas/BasGermanPronunciation/)
Based on the new pronunciation the following tiers in the BPF
files have been re-calculated:
KAN, MAU
- 20.08.03 : New tier TLN integrated : the TLN tier contains the translation
of the recorded utterance. The transliterations were produced
manually by the University of Tuebingen, Prof. Hinrichs.
The integrated data are also stored on the volume VMBONUS
Please note that the orthographic representation of Japanese
(romanji) in these translations is of the original form as used
in the original Japanese pronunciation list (vm_jap_org.lex).
However, it was never check whether these two data sets (lexicon
and translations) are in fact compatible. Use with caution!
For details about the TLN tier please refer to the BPF documentation
www.bas.uni-muenchen.de/Bas/BasFormatseng.html
- 09.09.03 : Published defined training, development and test sub sets
Volumes:
- Verbmobil -
VM CD 15.1 - VM15.1
(new edition)
German - 19 spontaneous dialogues (19 close mic, 19 room mic, 19 phone line (GSM)) - 3117 Turns
- transliteration (VM II format) -
NIST headers
- Partitur Files
- Verbmobil -
VM CD 20.1 - VM20.1
(new edition)
German - 31 spontaneous dialogues (10 close mic, 28 room mic, 10 phone line (GSM) recordings) - 1947 turns
- transliteration (VM II format) -
NIST headers
- Partitur Files
- Verbmobil -
VM CD 21.1 - VM21.1
(new edition)
German - 38 spontaneous dialogues (38 close mic, 20 room mic,
22 phone line (GSM) recordings) - 2880 turns
- transliteration (VM II format) -
NIST headers
- Partitur Files
- Verbmobil - VM CD 22.1 - VM22.1 (BAS edition)
German - 33 spontaneous dialogues (33 close mic, 22 room mic, 27 phone
line (GSM) recordings) - 2674 turns - transliteration (VM II Format)
- Verbmobil - VM CD 23.1 - VM23.1 (BAS edition)
American English - 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (GSM) recordings) - 2459 turns - transliteration (VM II Format)
- Verbmobil - VM CD 24.1 - VM24.1 (BAS edition)
German - 34 spontaneous dialogues (34 close mic, 19 room mic, 20 phone
line (GSM) recordings) - 2830 turns - transliteration (VM II Format)
- Verbmobil - VM CD 25.1 - VM25.1 (BAS edition)
Japanese - 10 spontaneous dialogues (10 close mic, 0 room mic, 0 phone line (GSM) recordings) - 1654 turns - transliteration (VM II Format)
- Verbmobil - VM CD 26.1 - VM26.1 (BAS edition)
Japanese - 16 spontaneous dialogues (16 close mic, 0 room mic, 0 phone line (GSM) recordings) - 1319 turns - transliteration (VM II Format)
- Verbmobil - VM CD 27.1 - VM27.1 (BAS edition)
Japanese - 24 spontaneous dialogues (24 close mic, 0 room mic, 0 phone line (GSM) recordings) - 1149 turns - transliteration (VM II Format)
- Verbmobil - VM CD 28.1 - VM28.1 (BAS edition)
American English - 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (GSM) recordings) - 2409 turns - transliteration (VM II Format)
- Verbmobil - VM CD 29.1 - VM29.1 (BAS edition)
German - 25 spontaneous dialogues (25 close mic, 20 room mic, 20 phone
line (GSM) recordings) - 2708 turns - transliteration (VM II Format)
- Verbmobil - VM CD 30.1 - VM30.1 (BAS edition)
German - 33 spontaneous dialogues (33 close mic, 21 room mic, 25 phone
line (GSM) recordings) - 4176 turns - transliteration (VM II Format)
- Verbmobil - VM CD 31.1 - VM31.1 (BAS edition)
American English - 32 spontaneous dialogues (32 close mic, 0 room mic, 0 phone line (GSM) recordings) - 2512 turns - transliteration (VM II Format)
- Verbmobil - VM CD 32.1 - VM32.1 (BAS edition)
Multilingual English/German - 17 spontaneous dialogues (17 close mic, 0 room mic, 0 phone line (GSM) recordings) - 992 turns - transliteration (VM II Format)
- Verbmobil - VM CD 33.1 - VM33.1 (BAS edition)
Japanese - 25 spontaneous dialogues (25 close mic, 0 room mic, 0 phone line (GSM) recordings) - 1050 turns - transliteration (VM II Format)
- Verbmobil - VM CD 34.1 - VM34.1 (BAS edition)
Japanese - 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (GSM) recordings) - 1437 turns - transliteration (VM II Format)
- Verbmobil - VM CD 35.1 - VM35.1 (BAS edition)
Japanese - 27 spontaneous dialogues (27 close mic, 0 room mic, 0 phone line (GSM) recordings) - 1645 turns - transliteration (VM II Format)
- Verbmobil - VM CD 38.1 - VM38.1 (BAS edition)
German - 33 spontaneous dialogues (33 close mic, 28 room mic, 28 phone
line (GSM) recordings) - 5115 turns - transliteration (VM II Format)
- Verbmobil - VM CD 39.1 - VM39.1 (BAS edition)
German - 28 spontaneous dialogues (28 close mic, 17 room mic, 20 phone
line (GSM) recordings) - 3360 turns - transliteration (VM II Format)
- Verbmobil - VM CD 42.1 - VM42.1 (BAS edition)
American English - 20 spontaneous dialogues (20 close mic, 0 room mic, 0 phone line (GSM) recordings) - 1874 turns - transliteration (VM II Format)
- Verbmobil - VM CD 43.1 - VM43.1 (BAS edition)
American English - 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (GSM) recordings) - 633 turns - transliteration (VM II Format)
- Verbmobil - VM CD 44.1 - VM44.1 (BAS edition)
Japanese - 19 spontaneous dialogues (19 close mic, 0 room mic, 0 phone
line (GSM) recordings) - 920 turns - transliteration (VM II Format)
- Verbmobil - VM CD 45.1 - VM45.1 (BAS edition)
Japanese - 21 spontaneous dialogues (21 close mic, 0 room mic, 0 phone line (GSM) recordings) - 1293 turns - transliteration (VM II Format)
- Verbmobil - VM CD 46.1 - VM46.1 (BAS edition)
Multilingual Japanese/German - 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (GSM) recordings) - 607 turns - transliteration (VM II Format)
- Verbmobil - VM CD 47.1 - VM47.1 (BAS edition)
Multilingual with human interpreter (3 channels) English/German - 17 spontaneous dialogues (17 close mic, 0 room mic, 0 phone line (GSM) recordings) - 902 turns - transliteration (VM II Format)
- Verbmobil - VM CD 48.1 - VM48.1 (BAS edition)
German - 28 spontaneous dialogues (28 close mic, 23 room mic, 27 phone line (GSM) recordings) - 4516 turns - transliteration (VM II Format)
- Verbmobil - VM CD 49.1 - VM49.1 (BAS edition)
German - 24 spontaneous dialogues (24 close mic, 12 room mic, 12 phone line (GSM) recordings) - 2597 turns - transliteration (VM II Format)
- Verbmobil - VM CD 50.1 - VM50.1 (BAS edition)
American English - 8 spontaneous dialogues (8 close mic, 0 room mic, 0 phone line (GSM) recordings) - 679 turns - transliteration (VM II Format)
- Verbmobil - VM CD 51.1 - VM51.1 (BAS edition)
Multilingual German/English with human interpreter (3 channels) - 15 spontaneous dialogues (15 close mic, 0 room mic, 0 phone line (GSM) recordings) - 873 turns - transliteration (VM II Format)
- Verbmobil - VM CD 52.1 - VM52.1 (BAS edition)
Multilingual German/English with human interpreter (3 channels) - 13 spontaneous dialogues (13 close mic, 0 room mic, 0 phone line (GSM) recordings) - 728 turns - transliteration (VM II Format)
- Verbmobil - VM CD 53.1 - VM53.1 (BAS edition)
German - 16 spontaneous dialogues (16 close mic, 8 room mic, 8 phone line (GSM) recordings) - 1771 turns - transliteration (VM II Format)
- Verbmobil - VM CD 55.1 - VM55.1 (BAS edition)
Multilingual German/English with human interpreter (3 channels) - 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (GSM) recordings) - 518 turns - transliteration (VM II Format)
- Verbmobil - VM CD 56.1 - VM56.1 (BAS edition)
Multilingual German/English with human interpreter (3 channels) - 12 spontaneous dialogues (12 close mic, 0 room mic, 0 phone line (GSM) recordings) - 620 turns - transliteration (VM II Format)
- Verbmobil - VM CD 57.1 - VM57.1 (BAS edition)
Multilingual German/Japanese with 2 human interpreters (4 channels) - 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (GSM) recordings) - 702 turns - transliteration (VM II Format)
- Verbmobil - VM CD 58.1 - VM58.1 (BAS edition)
Multilingual German/Japanese with 2 human interpreters (4 channels) - 7 spontaneous dialogues (7 close mic, 0 room mic, 0 phone line (GSM) recordings) - 421 turns - transliteration (VM II Format)
- Verbmobil - VM CD 59.1 - VM59.1 (BAS edition)
Multilingual German/Japanese with 2 human interpreters (4 channels) - 7 spontaneous dialogues (7 close mic, 0 room mic, 0 phone line (GSM) recordings) - 354 turns - transliteration (VM II Format)
- Verbmobil - VM CD 60.1 - VM60.1 (BAS edition)
Japanese - 10 spontaneous dialogues (10 close mic, 0 room mic, 0 phone line (GSM) recordings) - 501 turns - transliteration (VM II Format)
- Verbmobil - VM CD 61.1 - VM61.1 (BAS edition)
Japanese - 19 spontaneous dialogues (19 close mic, 0 room mic, 0 phone line (GSM) recordings) - 946 turns - transliteration (VM II Format)
- Verbmobil - VM CD 62.1 - VM62.1 (BAS edition)
Japanese - 21 spontaneous dialogues (21 close mic, 0 room mic, 0 phone line (GSM) recordings) - 981 turns - transliteration (VM II Format)
- Verbmobil - VM CD 63.0 - VM63.0 (original
edition)
German - 14 WOZ dialogs designed to evoke emotions (mainly
anger) - transliteration, emotion labeling
- Verbmobil - VM CD 64.0 - VM64.0 (original
edition)
German - 13 WOZ dialogs designed to evoke emotions (mainly
anger) - transliteration, emotion labeling
- Verbmobil - VM CD 65.0 - VM65.0 (original
edition)
German - 13 WOZ dialogs designed to evoke emotions (mainly
anger) - transliteration, emotion labeling
- Verbmobil - VM Bonus CD -
VMBONUS (BAS-Edition)
Additional data and doc that is not included
in the regular VM volumes
- Verbmobil - VM Lexicon
Database -
VMLEX (BAS-Edition)
Verbmobil Lexicon Database of the University of Bielefeld
Verbmobil data from the BAS edition may also ordered in language groups, e.g.:
- all German dialogues
- all American dialogues
- etc.
thus simplifying the processing of the data.
Questions and orders:
Florian Schiel