_/_/_/_/ _/_/ _/_/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/_/_/_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/ _/ _/_/_/_/ BAVARIAN ARCHIVE FOR SPEECH SIGNALS University of Munich, Institut of Phonetics Schellingstr. 3/II, 80799 Munich, Germany bas@phonetik.uni-muenchen.de COPYRIGHT University of Munich 1998. All rights reserved. This corpus and software may not be disseminated further - not even partly - without a written permission of the copyright holders. Additional Copyright Holders ---------------------------------------------------------------------- VERBMOBIL II Dialog Database (BAS Edition) 12.01.98 / 18.07.2020 (CLARIN Repo Version 3) ---------------------------------------------------------------------- The BAS edition (X.1.X) of the VERBMOBIL II (VMII) speech data collection contains the same data in a adapted format to the VERBMOBIL I collection to simplify a joined usage of the whole database. The following deviations to the original VMII edition (X.0.X) should be noted here: - Signals cut into turn-length signals - Transliteration and BAS Partitur files added on CD - Additional Documentation of Formats 445 speakers participated in 810 recordings (not counting the emotional speech recordings on volumes 63-65). The total VMII corpus amounts to 17.6 GB of data containing 58961 conversational turns distributed on 39 CD-R (+ 3 CD-R emotional speech). Differences to Verbmobil I (VMI): - multilingual dialogue recordings (volumes 32 46 47 51 52 55 56 57 58 59) - up to 3 synchronous channels per speaker: headset, desktop, phone (one partner landline, one GSM) - extended domain: scheduling, travel, leisure time planing - no push-to-talk: cross talk possible A detailed description of the Verbmobil projects can be found in the book: Wahlster W (ed.): Verbmobil: Foundations of Speech-to-Speech Translation; Berlin, Heidelberg, Germany: Springer, 2000. http://www.springer.com/computer/ai/book/978-3-540-67783-3 ------------------- Contents of this file ------------------------ CD directory structure Naming conventions Signal file formats Nist-Header field definitions for VMII Marker file format Speaker protocol format Recording protocol format Generell known errors across all recordings ----------------- CD directory structure -------------------------- Dialogues are situated in the "data" directory, speaker information files in the "spr" directory The directory names of the dialogues consist of the first 5 letters of a dialogue signal file ( for example: e013a), the dialogue name. The directory "doc" contains additional documents about the corpus recording and annotation as well as pronunciation dictionaries for all three languages: trllex_d.ps|pdf : detailed description of annotation TR2 (German) trllex_e_html/ : detailed description of annotation TR2 (English) vm_eng.lex : English pronunciation dictionary vm_ger.lex : German pronunciation dictionary vm_jap.lex : Japanese pronunciation dictionary vm2.map : mapping of BPF tiers to each utterance this file is useful to find all utterances that contain a certain BPF tier. dialogsperspeaker.txt : number of recorded dialogs per speaker partitur.pdf|ps : first publication about BPF (German) VM*.pdf|ps : original VERBMOBIL Memos, TechDocs, Reports (see Section 'Additional Documentation' below pardoc/ : copy of the BAS Partitur File definition (WWW) sets/ : definition of training, development and test sets ----------------------- Naming conventions ---------------------- Dialog names are coded as follows: (see also doc/VMMemo-131-97.ps for a more detailed description) 1st character: [g,e,j,m,n] recorded language g(erman), e(nglish), j(apanese), m(ultilingual), n(oise) 2nd to 4th character: dialogue number i.e. 001 5th character: scenario a(main), b(information desk), c(remote maintenance), d(VM1), n(noise), e(end-to-end evaluation, do not use!) Turn names consist of the dialog name (char 1-5) and the following: 6th character: technical definition of recording c(lose), r(oom), t(elephone) 7th character: detailed description of recording means (microphone) telephone: m(obile), p(hone,analog), w(ireless), d(ect) close: h(eadset), n(eckband microphone), c(lip microphone) room: r(room) 8th character: channel coding [1..n] 9th character: '_' 10th - 12th character: turn number starting with '000' 13th character: '_' 14th - 16th character: speaker ID [A-Z][A-Z][A-Z] The extensions code the contents of the file: .nis NIST SPHERE file .trl transliteration in Verbmobil 2 format (TR2) .spr speaker protocol file (7-bit ASCII) .rpr recording session protocol (7-bit ASCII) .par symbolic information in BAS Partitur Format (BPF) .16 NIST SPHERE file, full-length recording headeset/room mic .al ALAW file, full-length recording telephone channel Each recording consists of a set of files like the following: Type Name Location signals .nis data// recording session protocol .rpr data// speaker protocol file _ spr/ transliteration .trl trl/ Bas Partitur Files (BPF) .par par// Notes: + Noise recordings (for example n001n on CD 20.0) are handled differently. For those files there is no speaker and recording protocol. + In multi-lingual recordings the language of the interpreter changes during the recording. Therefore the spoken language was coded into the signal file name by adding '_' to the body, eg. 'm129ach1_054_QXY_ENG.nis' Note that this ist not done in the turn markers within the transliteration files nor in the file naming of the BPFs! Also note that the end-to-end evaluation recordings on volume 39.1 (5th char in dialog name is 'e') do not have such a coding (and should not be used anyway!). ------------------------- Signal file formats ---------------------- a. Physical signal characteristics Signal files containing room or close microphone data are coded in the following format: 16 bit, 16 kHz, mono, linearly coded, little endian (intel byte order) . Files containing telephone data are coded: 8 bit alaw, 8 kHz, mono. b. Logical signal characteristics Each signal file contains one turn of a dialogue session of one speaker. ----------------- Nist-Header field definitions for VMII -------------- The signal files begin with a header following the NIST conventions. It has a minimum size of 1024 bytes and consists of ascii characters. The format is as follows: key type description (possible) value(s) ------------------------------------------------------------------------ database_id string database VERBMOBIL2 database_version string version 1.0 scenario_language string recorded [german|english|japanese| language multi_english_japanese| multi_german_japanese| multi_english_german| multi_german| multi_japanese| multi_english|noise] scenario_id string scenario [main|information_desk| remote_maintenance|vm1|no] dialog_id string # of dialog 000-999 *see below speaker_id string speaker AAA-ZZZ recording_site string site [CMU|LMU|ATR|UBN|UHH] recording_medium string rec. medium [telephone|room|close] recmed_spec string spec. of [mobile|wireless|analog| neckband|dect| mic/tel-type headset|clip] sample_coding string coding [alaw|ulaw|pcm] sample_n_bytes int bytes/samle 1,2,... channel_count int # of channels 1,2,... sample_count int # of samples sample_byte_format string little/big [01|10|1] endian, one byte sample_rate int samp. freq scenario_date string logical date of YYMMDD, 980101 recording The remaining bytes are filled with spaces. Example header: NIST_1A 1024 database_id -s10 VERBMOBIL2 database_version -s3 1.0 scenario_language -s6 german scenario_id -s4 main dialog_id -s3 010 speaker_id -s3 ABA recording_site -s3 LMU recording_medium -s5 close recmed_spec -s8 neckband sample_coding -s3 pcm sample_n_bytes -i 2 channel_count -i 1 sample_count -i 124798 sample_byte_format -s2 01 sample_rate -i 16000 scenario_date -s6 980101 end_head If necessary, software for extracting information from the header, editing header information etc. can be obtained from the NIST ftp-server under the address: ftp://jaguar.ncsl.nist.gov/pub/ . The source package ( for unix ) is called "sphere_2.6a.tar.Z". ----------------------- Speaker protocol format -------------------- 1)-3) obligatory speaker information: 1) id use you own unambiguous speaker code, upper case 2) sex m,f 3) date_of_birth six numbers: year month day; 15th Febr. 1972 = 720215 4)-16) optional speaker information: 4) own_native_language iso 639-2 codes 5) native_language_father 6) native_language_mother 7) primary_school county/city of primary school years 8) dialect region in which speaker lived most of the time; of which your accent/dialect is characteristic 9) education highest educational degree 10) profession 11) height with measuring unit, no blank: 172cm 12) weight with measuring unit, no blank: 65kg 13) smoker y/n/former 14) right_left_handed r/l/ambi 15) comments when present: at the end of the document, line feeds are allowed Tag and Value are seperated by a tab! Example: id FGR sex f date_of_birth 541215 own_native_language deu native_language_father deu native_language_mother deu primary_school K"oln dialect Rhein education Universit"at profession Lehrer height 168cm weight 58kg smoker n right_left_handed r -----------------------Recording protocol format ---------------------- 1) session_no XXX, digits 2) dialogue_name dialogue directory name: g076a 3) recording_date XxYyZz; year month day -> 971512 4) scenario_date logical_date (scenario-date) 5) recording_by name of person who carried out the recording 6) recording_site location of recording, e.g. LMU,CMU,ATR,UHH,UBN 7) scenario_id scenario: a (main), b (information desk), c (remote maintenance), d(VM1), e(end-to-end) 8) no_speakers 2 to 9 9) speaker1_id use your own unambiguous code, 3 upper case letters 10) speaker2_id 11) speaker3_id 12) speaker1_language XY: X - language spoken during recording - g(erman), e(nglish), j(apanese); Y - 0 (native), 1-3 (non-native); in multilingual dialogues the interpreter speaks two language in a dialogue e.g. g0,e1 13) speaker2_language 14) speaker3_language 15) speaker1_recmed_spec XYZ: XY used microphone - h(headset), n(eckband mic),c(lip mic), r(oom), Z used telephone - m(obile telephone), p(hone, analog), w(ireless), d(ect) 16) speaker2_recmed_spec 17) speaker3_recmed_spec 18) speaker1_micbrand comma separated list of types of used microphones; use underline within names, e.g. beyer_dynamics_115 19) speaker2_micbrand 20) speaker3_micbrand 21) comments when present: at the end; line feeds allowed optional only: 1), 4) and 21) 12)-14): applies only if dialogue is multilingual Tag and Value are seperated by a tab! Example1 (monolingual): session_no 3 dialogue_name g012a recording_date 970707 scenario_date 970601 recording_by DO recording_site LMU scenario_id a no_speakers 2 speaker1_id ABA speaker2_id ABD speaker1_recmed_spec rnp speaker2_recmed_spec rnw speaker1_micbrand beyer_dynamic_mce_10, beyer_dynamic_nem_191 speaker2_micbrand beyer_dynamic_mce_10, beyer_dynamic_nem_191 Example2 (multilingual): dialogue_name m888a recording_date 981001 scenario_date 981001 recording_site UHH scenario_id a no_speakers 3 speaker1_id QZX speaker2_id HCB speaker3_id HBK speaker1_language e0 speaker2_language e1,g0 speaker3_language g0 speaker1_recmed_spec h speaker2_recmed_spec h speaker3_recmed_spec h speaker1_micbrand beyer_dynamic_nem_194 speaker2_micbrand beyer_dynamic_nem_194 speaker3_micbrand beyer_dynamic_nem_194 ----------------------------------emuDB--------------------------------- Starting with Jan 2013 the BAS edition was extended by an emuR DB, which was later (2016) added to the BAS CLARIN Repository; if the root dir of this corpus (where this README resides) is called 'VM2_emuDB', then you are dealing with this emuDB variant. This emuDB only comprises the cut turns of the headset mic channel (*.wav) from the BAS Edition X.1 and all BPF annotations in form of emuR compatible *_annot.json files. That way Emu queries can be performed based on the cut turn signal files and their corresponding multiple annotations. The room microphone and phone channels, if exist, were copied as *.wav into the bundle of the headset mic; that way these channels are available as such, but they cannot be queried in the emuDB. Note that some dialogs comprise only of room mic recordings; in this case the annotation is based on the room mic channel and therefore bundles with the proper VM2 name were created. Later (2020) the emuDB was augmented by the full-length headset mic recording (one channel per speaker), the full Verbmobil transliteration *.trl, the *.mar with the segmentation into turns and the recording protocol *.rpr; all these files are stored in the emuDB session directory, but are not part of the emuDB, i.e. it is not possible to query these files. Example files set for dialog 'g001a': g001a_ses/g001ac.mar : segmentation in turns: begin_sample, end_sample, turn_name g001a_ses/g001ac.trl : VM2 transliteration (see doc/trllex_e_html/) g001a_ses/g001a.rpr : VM2 recording protocol g001acn1_AAJ.wav : full length recording speaker 1 (speaker Id is AAJ) g001acn2_AAK.wav : full length recording speaker 2 (speaker Id is AAK) ------------------------------------------------------------------------- Generell known errors across all recording volumes The SUP tier in the BAS Partitur Files does not handle the following case gracefully: If a non-word item (= breath, pause or noise) is passively superposed right after an actively superimposing word, the SUP tier is not reliable. There might be entries with an active and passive superposition in the same SUP line. The reason for this is that the BPF concept does not allow the handling of non-word items because they cannot referred to from other tiers. The above described error occurs in the following turns: VM21.1/par/g203a/g203acn2_031_AHJ.par VM21.1/par/g203a/g203atp2_031_AHJ.par VM21.1/par/g215a/g215acn1_046_AHP.par VM21.1/par/g215a/g215atm1_046_AHP.par VM23.1/par/e028a/e028ach1_040_PNP.par VM24.1/par/g249a/g249acn2_017_AIG.par VM25.1/par/j006a/j006ach1_196_BAD.par VM25.1/par/j010a/j010ach1_235_BAH.par VM27.1/par/j034a/j034ach1_010_BAZ.par VM29.1/par/g415a/g415acn1_037_ALK.par VM29.1/par/g415a/g415acn2_052_AKI.par VM29.1/par/g415a/g415atm1_037_ALK.par VM29.1/par/g415a/g415atm2_052_AKI.par VM30.1/par/g367a/g367acn1_006_AKS.par VM30.1/par/g367a/g367atm1_006_AKS.par VM30.1/par/g370a/g370acn2_020_AKT.par VM30.1/par/g370a/g370atm2_020_AKT.par VM31.1/par/e003a/e003ach2_059_ANV.par VM38.1/par/g376a/g376acn2_035_AKX.par VM38.1/par/g376a/g376atm2_035_AKX.par VM38.1/par/g378a/g378acn1_084_AKY.par VM38.1/par/g378a/g378atm1_084_AKY.par VM39.1/par/g333a/g333acn1_030_AJZ.par VM39.1/par/g333a/g333atm1_030_AJZ.par VM44.1/par/j155a/j155ach2_055_PBQ.par VM49.1/par/g446a/g446acn1_029_AMM.par VM49.1/par/g446a/g446atm1_029_AMM.par VM49.1/par/g448a/g448acn2_070_AMN.par VM49.1/par/g448a/g448atm2_070_AMN.par VM49.1/par/g450a/g450acn1_054_AMO.par VM49.1/par/g450a/g450atm1_054_AMO.par VM49.1/par/g624b/g624bch1_055_BHM.par The end-to-end evaluation recordings on volume 39.1 (5th char in dialog name is 'e') do not have the encoding of the speaker language in the signal file name as regular multilingual VM recordings. The following speakers (mostly from the end-to-end evaluation) do not have speaker protocols (*.spr): HBM HDF HKB HKD HKK HKL HKM HKN HKO QYY QZN ------------------------------------------------------------------------- Additional Documentation Copies of the original Verbmobil Memos, TechDocs and Reports as relevant to the corpus are store in the subdir doc: TechDok-34-95: Automatic Conversion of American Dialog Data into VM Compatible Format TechDok-36-95: Transliterationslexikon (VERBMOBIL I) Memo-90-95: Partiturformat für die Darstellung unterschiedlicher Repräsentationsebenen von gesprochener Sprache Memo-95-95: Das Münchener AUtomatische Segmentationssystem (MAUS) Memo-96-95: Regelsystem zur Generierung von Aussprachevarianten Memo-111-96: Aussprachevarianten in der Verbmobil-Transliteration - Regeln zur konsistenteren Verschriftung TechDok-56-97: Transliteration spontansprachlicher Daten - Lexikon der Transliterationskonventionen - VERBMOBIL II Memo-128-97: The technical Setup for Dialog Recordings in VMII and Problems caused by Mobile Phones Memo-129-97: The conventions for phonetic transcription and segmentation of German used for the Munich Verbmobil corpus Memo-131-97: File Names, Formats and Structures in VERBMOBIL II VMReport-226-98: Dialogue Acts inVERBMOBIL II - Second Edition VMTechDok-71-99: VMII Szenario A und B: Instruktionen fur alle Sprachstellungen Copies of some publications can be found in the subdir doc: partitur.pdf : First publication of BAS Partitur Format (BPF) ------------------------------------------------------------------------- Main History (only events that concern all VM volumes) ... 12.03.98 : Filtered German dictionary vm_ger.lex and ORT tier in BAS Partitur files from the following characters: '=%*_' 16.08.00 : Converted VMI signal files from Phondat2 into NIST and placed them into directory /DATA. File names were adapted to VMII naming conventions. Update of all BPF; naming conventions of VMII; old BPFs are retained for backward compatibility; tier TRL not contained in new BPFs any more (TR2 is now used throughout the whole VM corpus!) 30.05.01 : New edition of all BAS Partitur Files (BPF) based on the latest error update. This includes a complete new MAUS annotation. Furthermore, additional previously un-published tiers were added to the distribution such as Syntax Trees, Dialogact Annotation, Syntactic-prosodic Labeling, Prosodic Labeling, Parts-of-Speech-Tagging. 08.06.01 : Edition of the VM Bonus CDROM (VMBONUS) with additional data and documentation that does not fit into the regular VM volumes; Edition of the VM Lexicon Database of the University of Bielefeld (VMLEX). 10.07.01 : Tiers LBP and LBG added to the BAS Partitur Files 30.01.03 : vm_ger.lex completely re-build: The German pronunciation dir of VM I+II now contains only the word items as they appear in the ORT tier of the BPF files. Also the transcription was unified to a more consistant concept of a 'canonical form'. For instance: - /R/ and /r/ was unified to /r/ because it was not clear how these two allophones were used by different transcribers - /a:6/ was replaced by /ar/ 19.08.03 : New edition of all BAS Partitur Files (BPF) of German signal data based on the latest error update: Some minor bugs in the POS, LMA and SAP tiers fixed. Complete re-done pronunciation list for German (vm_ger.lex) according to the new 'Transliteration Conventions for Canonical German' (www.bas.uni-muenchen.de/Bas/BasGermanPronunciation/) Based on the new pronunciation the following tiers in the BPF files have been re-calculated: KAN, MAU 20.08.03 : New tier TLN integrated : the TLN tier contains the translation of the recorded utterance. The translations were produced manually by the University of Tuebingen, Prof. Hinrichs. The integrated data are also stored on the volume VMBONUS Please note that the orthographic representation of Japanese (romanji) in these translations is of the original form as used in the original Japanese pronunciation list (vm_jap_org.lex). However, it was never check whether these two data sets (lexicon and translations) are in fact compatible. Use with caution! For details about the TLN tier please refer to the BPF documentation www.bas.uni-muenchen.de/Bas/BasFormatseng.html 04.09.12 : changed language descriptions in speaker protocols (*.spr) to Iso 639-2 codes. 17.01.13 : CLARIN Repo Version 1 10.06.13 : CLARIN Repo Version 2: found ISO8859 characters in BPF files -> fixed to UTF-8 10.06.16 : CLARIN Repo Version 3: - bug fix in *.par and *.ags, SUP tier, of the following turns: e011ach2_069_NMW e011ach1_070_ANV e011ach1_071_ANV e011ach1_072_ANV e011ach1_073_ANV g335acn2_021_AKA, g335atm2_021_AKA g335acn1_022_AJZ, g335atm1_022_AJZ e087ach2_064_PNP e087ach1_065_SMG g514arr1_018_BFD g514arr2_019_BFH g514arr2_020_BFH j155ach2_055_PBQ j155ach1_056_BBS - extended corpus by a emuDB component to enable emuR usage; this includes the addition of *.wav files parallel to *.nis, and the addition of complete emuDB structure (/vdata/BAS/VM2_total_emuDB/VM2_emuDB/). This emuDB is *not* part of the CD-R/DVD-R distribution but can be accessed only via the BAS CLARIN Repository (http://hdl.handle.net/11858/00-1779-0000-0006-BF00-E). The NIST SPHERE signals files were removed from the BAS CLARIN Repository. - the structure of the BAS CLARIN Repo version changed with this version 3 like following: in the German part (dialogs g?????) all multi-channel recordings (if available) were pooled in the close-microphone bundle, i.e. the former bundles for room microphone and telephone channel are no longer present. The emuDB treats the close-microphone channel as primary channel; the other channels (if present) are included in the bundle directory but not loaded in the EMU-SDMS. 18.07.20 CLARIN Repo Version : added documentation /doc and speaker metadata /spr and this file to root dir VM2_emuDB, added headset mic recordings of the complete dialog to the session directories together with transliteration files (*.trl), recording protocol *.rpr and turn marker files (*.mar, = a turn segmentation that resulted in the turn-based signal files of the emuDB); these full-length recordings (two channels, one per speaker) are not part of the emuDB (= cannot be queried), but are provided as additional material for researchers interested in the total dialog structure. added missing room mic/telephone channels to headset mic bundles. added missing recording protocols *.rpr to CLARINDocu.zip,2. Downloads: ftp://ftp.bas.uni-muenchen.de/pub/BAS/VM/partitur.tgz ftp://ftp.bas.uni-muenchen.de/pub/BAS/VM/vm_ger.lex