_/_/_/_/         _/_/         _/_/_/_/
                    _/      _/       _/ _/        _/      _/
                   _/      _/       _/  _/       _/
                  _/      _/       _/   _/       _/
                 _/_/_/_/         _/_/_/_/        _/_/_/
                _/      _/       _/     _/             _/
               _/      _/       _/      _/             _/
              _/      _/       _/       _/    _/      _/
             _/_/_/_/         _/        _/     _/_/_/_/


                   BAVARIAN ARCHIVE FOR SPEECH SIGNALS 

               University of Munich, Institut of Phonetics
               Schellingstr. 3/II, 80799 Munich, Germany
                      bas@phonetik.uni-muenchen.de


         COPYRIGHT University of Munich 1998. All rights reserved.   
    This corpus and software may not be disseminated further - not even
      partly - without a written permission of the copyright holders.  

                      Additional Copyright Holders

----------------------------------------------------------------------

VERBMOBIL II Dialog Database  (BAS Edition)

12.01.98 / 18.07.2020 (CLARIN Repo Version 3)

----------------------------------------------------------------------

The BAS edition (X.1.X) of the VERBMOBIL II (VMII) speech data collection 
contains the same data in a adapted format to the VERBMOBIL I collection
to simplify a joined usage of the whole database. The following
deviations to the original VMII edition (X.0.X) should be noted here:

- Signals cut into turn-length signals
- Transliteration and BAS Partitur files added on CD
- Additional Documentation of Formats

445 speakers participated in 810 recordings (not counting the emotional 
speech recordings on volumes 63-65). The total VMII corpus amounts to 
17.6 GB of data containing 58961 conversational turns distributed on 
39 CD-R (+ 3 CD-R emotional speech).

Differences to Verbmobil I (VMI):

- multilingual dialogue recordings (volumes 32 46 47 51 52 55 56 57 58 59)
- up to 3 synchronous channels per speaker: headset, desktop, phone (one partner landline, 
  one GSM)
- extended domain: scheduling, travel, leisure time planing
- no push-to-talk: cross talk possible

A detailed description of the Verbmobil projects can be found in the 
book:
Wahlster W (ed.): Verbmobil: Foundations of Speech-to-Speech Translation; 
Berlin, Heidelberg, Germany: Springer, 2000.
http://www.springer.com/computer/ai/book/978-3-540-67783-3

------------------- Contents of this file ------------------------

   CD directory structure
   Naming conventions
   Signal file formats
   Nist-Header field definitions for VMII
   Marker file format
   Speaker protocol format
   Recording protocol format
   Generell known errors across all recordings


----------------- CD directory structure --------------------------


Dialogues are situated in the "data" directory, speaker information 
files in the "spr" directory 

The directory names of the dialogues consist of the first 5 
letters of a dialogue signal file ( for example: e013a),   
the dialogue name.

The directory "doc" contains additional documents about the 
corpus recording and annotation as well as pronunciation dictionaries
for all three languages:

trllex_d.ps|pdf		: detailed description of annotation TR2 (German)
trllex_e_html/          : detailed description of annotation TR2 (English)
vm_eng.lex		: English pronunciation dictionary
vm_ger.lex		: German pronunciation dictionary
vm_jap.lex		: Japanese pronunciation dictionary
vm2.map			: mapping of BPF tiers to each utterance
			  this file is useful to find all utterances that
			  contain a certain BPF tier.
dialogsperspeaker.txt	: number of recorded dialogs per speaker

partitur.pdf|ps		: first publication about BPF (German)
VM*.pdf|ps              : original VERBMOBIL Memos, TechDocs, Reports 
                          (see Section 'Additional Documentation' below

pardoc/  		: copy of the BAS Partitur File definition (WWW)
sets/			: definition of training, development and test sets

----------------------- Naming conventions ----------------------


Dialog names are coded as follows:
(see also doc/VMMemo-131-97.ps for a more detailed description)

1st character: 
      <lang> [g,e,j,m,n] recorded language
      g(erman), e(nglish), j(apanese), m(ultilingual), n(oise)

2nd to 4th character:
      dialogue number i.e. 001

5th character:
      scenario
      a(main), b(information desk), c(remote maintenance), 
      d(VM1), n(noise), e(end-to-end evaluation, do not use!)

Turn names consist of the dialog name (char 1-5) and the following:

6th character:
      technical definition of recording
      c(lose), r(oom), t(elephone)   

7th character:

      detailed description of recording means (microphone)
      telephone:
        m(obile), p(hone,analog), w(ireless), d(ect)

      close:
        h(eadset), n(eckband microphone), c(lip microphone)

      room:
        r(room)

8th character:
      channel coding
      [1..n]        
9th character: '_'

10th - 12th character:
      turn number starting with '000'

13th character: '_'       

14th - 16th character:
      <sp_id> speaker ID [A-Z][A-Z][A-Z]


The extensions code the contents of the file:
       .nis  NIST SPHERE file
       .trl  transliteration in Verbmobil 2 format (TR2)
       .spr  speaker protocol file (7-bit ASCII)
       .rpr  recording session protocol (7-bit ASCII)
       .par  symbolic information in BAS Partitur Format (BPF)
       .16   NIST SPHERE file, full-length recording headeset/room mic
       .al   ALAW file, full-length recording telephone channel

Each recording consists of a set of files like the following:

Type				Name			Location

signals				<turn>.nis		data/<dialog>/
recording session protocol	<dialog>.rpr		data/<dialog>/
speaker protocol file		<lang>_<sp_id>		spr/
transliteration			<char 1-6 of turn>.trl	trl/
Bas Partitur Files (BPF)	<turn>.par		par/<dialog>/
	

Notes:
+ Noise recordings (for example n001n on CD 20.0) are handled differently.
  For those files there is no speaker and recording protocol.
+ In multi-lingual recordings the language of the interpreter changes
  during the recording. Therefore the spoken language was coded into the 
  signal file name by adding '_<LANG>' to the body, eg. 
  'm129ach1_054_QXY_ENG.nis'
  Note that this ist not done in the turn markers within the transliteration
  files nor in the file naming of the BPFs!
  Also note that the end-to-end evaluation recordings on volume 39.1 
  (5th char in dialog name is 'e') do not have such a coding 
  (and should not be used anyway!).

------------------------- Signal file formats ----------------------

 a. Physical signal characteristics

Signal files containing room or close microphone data are coded 
in the following format: 16 bit, 16 kHz, mono, linearly coded, 
little endian (intel byte order) .
Files containing telephone data are coded: 8 bit alaw, 8 kHz, mono.

 b. Logical signal characteristics

Each signal file contains one turn of a dialogue 
session of one speaker.


----------------- Nist-Header field definitions for VMII --------------


The signal files begin with a header following the NIST conventions.
It has a minimum size of 1024 bytes and consists of ascii characters.
The format is as follows:


key                     type    description     (possible) value(s)
------------------------------------------------------------------------
database_id             string  database        VERBMOBIL2
database_version        string  version         1.0
scenario_language       string  recorded        [german|english|japanese|
                                language	multi_english_japanese|
						multi_german_japanese|
						multi_english_german| 
						multi_german|
						multi_japanese|
						multi_english|noise]
scenario_id             string  scenario        [main|information_desk|
                                                 remote_maintenance|vm1|no]
dialog_id               string  # of dialog     000-999  *see below
speaker_id              string  speaker         AAA-ZZZ
recording_site          string  site            [CMU|LMU|ATR|UBN|UHH]
recording_medium        string  rec. medium     [telephone|room|close]
recmed_spec             string  spec. of        [mobile|wireless|analog|
                                                 neckband|dect|
                                mic/tel-type     headset|clip]
sample_coding           string  coding          [alaw|ulaw|pcm]
sample_n_bytes          int     bytes/samle     1,2,...
channel_count           int     # of channels   1,2,...
sample_count            int     # of samples
sample_byte_format      string  little/big      [01|10|1]
                                endian, one byte                
sample_rate             int     samp. freq
scenario_date           string  logical date of   YYMMDD, 980101
                                recording


The remaining bytes are filled with spaces.

Example header: 

NIST_1A
   1024
database_id -s10 VERBMOBIL2
database_version -s3 1.0
scenario_language -s6 german
scenario_id -s4 main
dialog_id -s3 010
speaker_id -s3 ABA
recording_site -s3 LMU
recording_medium -s5 close
recmed_spec -s8 neckband
sample_coding -s3 pcm
sample_n_bytes -i 2
channel_count -i 1
sample_count -i 124798
sample_byte_format -s2 01
sample_rate -i 16000
scenario_date -s6 980101
end_head


If necessary, software for extracting information from the header, 
editing header information etc. can be obtained from the NIST 
ftp-server under the address: 
        ftp://jaguar.ncsl.nist.gov/pub/  .

The source package ( for unix ) is called "sphere_2.6a.tar.Z".
 

----------------------- Speaker protocol format --------------------

1)-3) obligatory speaker information:

1)   id                        use you own unambiguous speaker 
                               code, upper case
2)   sex                       m,f
3)   date_of_birth             six numbers: year month day; 
                               15th Febr. 1972 = 720215

4)-16) optional speaker information:

4)   own_native_language       iso 639-2 codes
5)   native_language_father
6)   native_language_mother
7)   primary_school            county/city of primary school years
8)   dialect                   region in which speaker lived most 
                               of the time; of which your accent/dialect
                               is characteristic
9)   education                 highest educational degree
10)  profession
11)  height                    with measuring unit, no blank: 172cm
12)  weight                    with measuring unit, no blank: 65kg
13)  smoker                    y/n/former 
14)  right_left_handed         r/l/ambi
15)  comments                  when present: at the end of the document,
                               line feeds are allowed
                          
Tag and Value are seperated by a tab!

Example:

id	FGR
sex	f
date_of_birth	541215
own_native_language	deu
native_language_father	deu
native_language_mother	deu
primary_school	K"oln
dialect	Rhein
education	Universit"at
profession	Lehrer
height	168cm
weight	58kg
smoker	n
right_left_handed	r


-----------------------Recording protocol format ----------------------


1)  session_no            XXX, digits 
2)  dialogue_name         dialogue directory name: g076a
3)  recording_date        XxYyZz; year month day -> 971512
4)  scenario_date         logical_date (scenario-date)
5)  recording_by          name of person who carried out the recording
6)  recording_site        location of recording, e.g. LMU,CMU,ATR,UHH,UBN
7)  scenario_id           scenario: a (main), b (information desk), 
                          c (remote maintenance), d(VM1), e(end-to-end)
8)  no_speakers           2 to 9
9)  speaker1_id           use your own unambiguous code, 3 upper 
                          case letters
10) speaker2_id
11) speaker3_id
12) speaker1_language     XY: X - language spoken during recording - 
                          g(erman), e(nglish), j(apanese); 
                          Y - 0 (native), 1-3 (non-native);
			  in multilingual dialogues the interpreter
			  speaks two language in a dialogue e.g. g0,e1 
13) speaker2_language
14) speaker3_language
15) speaker1_recmed_spec  XYZ: XY used microphone - h(headset), 
                          n(eckband mic),c(lip mic), r(oom), 
                          Z used telephone - m(obile telephone), 
                          p(hone, analog), w(ireless), d(ect)
16) speaker2_recmed_spec
17) speaker3_recmed_spec
18) speaker1_micbrand     comma separated list of types of used 
                          microphones; use underline within names,
                          e.g. beyer_dynamics_115
19) speaker2_micbrand
20) speaker3_micbrand
21) comments              when present: at the end; line feeds allowed
  

optional only: 1), 4) and 21)
12)-14): applies only if dialogue is multilingual


Tag and Value are seperated by a tab!

Example1 (monolingual):

session_no	3
dialogue_name	g012a
recording_date	970707
scenario_date	970601
recording_by	DO
recording_site	LMU
scenario_id	a
no_speakers	2
speaker1_id	ABA
speaker2_id	ABD
speaker1_recmed_spec	rnp
speaker2_recmed_spec	rnw
speaker1_micbrand	beyer_dynamic_mce_10, beyer_dynamic_nem_191
speaker2_micbrand	beyer_dynamic_mce_10, beyer_dynamic_nem_191

Example2 (multilingual):

dialogue_name   m888a
recording_date  981001
scenario_date   981001
recording_site  UHH
scenario_id     a
no_speakers     3
speaker1_id     QZX
speaker2_id     HCB
speaker3_id     HBK
speaker1_language       e0
speaker2_language       e1,g0
speaker3_language       g0
speaker1_recmed_spec    h
speaker2_recmed_spec    h
speaker3_recmed_spec    h
speaker1_micbrand       beyer_dynamic_nem_194
speaker2_micbrand       beyer_dynamic_nem_194
speaker3_micbrand       beyer_dynamic_nem_194

----------------------------------emuDB---------------------------------

Starting with Jan 2013 the BAS edition was extended by an emuR DB, which was 
later (2016) added to the BAS CLARIN Repository; if the root dir of this corpus
(where this README resides) is called 'VM2_emuDB', then you are dealing with this 
emuDB variant.
This emuDB only comprises the cut turns of the headset mic channel (*.wav) from the 
BAS Edition X.1 and all BPF annotations in form of emuR compatible *_annot.json files.
That way Emu queries can be performed based on the cut turn signal files and 
their corresponding multiple annotations.

The room microphone and phone channels, if exist, were copied as *.wav into the
bundle of the headset mic; that way these channels are available as such, but they
cannot be queried in the emuDB. Note that some dialogs comprise only of room mic
recordings; in this case the annotation is based on the room mic channel and therefore
bundles with the proper VM2 name were created.
 
Later (2020) the emuDB was augmented by the full-length headset mic recording (one channel
per speaker), the full Verbmobil transliteration *.trl, the *.mar with the 
segmentation into turns and the recording protocol *.rpr; all these files 
are stored in the emuDB session directory, but are not part of the emuDB,
i.e. it is not possible to query these files. Example files set for dialog 'g001a':
g001a_ses/g001ac.mar : segmentation in turns: begin_sample, end_sample, turn_name  
g001a_ses/g001ac.trl : VM2 transliteration (see doc/trllex_e_html/)
g001a_ses/g001a.rpr : VM2 recording protocol 
g001acn1_AAJ.wav : full length recording speaker 1 (speaker Id is AAJ)
g001acn2_AAK.wav : full length recording speaker 2 (speaker Id is AAK)

-------------------------------------------------------------------------

Generell known errors across all recording volumes

The SUP tier in the BAS Partitur Files does not handle the following case
gracefully: If a non-word item (= breath, pause or noise) is passively 
superposed right after an actively superimposing word, the SUP tier is not
reliable. There might be entries with an active and passive superposition
in the same SUP line. The reason for this is that the BPF concept does not
allow the handling of non-word items because they cannot referred to from
other tiers.
The above described error occurs in the following turns:
 VM21.1/par/g203a/g203acn2_031_AHJ.par
 VM21.1/par/g203a/g203atp2_031_AHJ.par
 VM21.1/par/g215a/g215acn1_046_AHP.par
 VM21.1/par/g215a/g215atm1_046_AHP.par
 VM23.1/par/e028a/e028ach1_040_PNP.par
 VM24.1/par/g249a/g249acn2_017_AIG.par
 VM25.1/par/j006a/j006ach1_196_BAD.par
 VM25.1/par/j010a/j010ach1_235_BAH.par
 VM27.1/par/j034a/j034ach1_010_BAZ.par
 VM29.1/par/g415a/g415acn1_037_ALK.par
 VM29.1/par/g415a/g415acn2_052_AKI.par
 VM29.1/par/g415a/g415atm1_037_ALK.par
 VM29.1/par/g415a/g415atm2_052_AKI.par
 VM30.1/par/g367a/g367acn1_006_AKS.par
 VM30.1/par/g367a/g367atm1_006_AKS.par
 VM30.1/par/g370a/g370acn2_020_AKT.par
 VM30.1/par/g370a/g370atm2_020_AKT.par
 VM31.1/par/e003a/e003ach2_059_ANV.par
 VM38.1/par/g376a/g376acn2_035_AKX.par
 VM38.1/par/g376a/g376atm2_035_AKX.par
 VM38.1/par/g378a/g378acn1_084_AKY.par
 VM38.1/par/g378a/g378atm1_084_AKY.par
 VM39.1/par/g333a/g333acn1_030_AJZ.par
 VM39.1/par/g333a/g333atm1_030_AJZ.par
 VM44.1/par/j155a/j155ach2_055_PBQ.par
 VM49.1/par/g446a/g446acn1_029_AMM.par
 VM49.1/par/g446a/g446atm1_029_AMM.par
 VM49.1/par/g448a/g448acn2_070_AMN.par
 VM49.1/par/g448a/g448atm2_070_AMN.par
 VM49.1/par/g450a/g450acn1_054_AMO.par
 VM49.1/par/g450a/g450atm1_054_AMO.par
 VM49.1/par/g624b/g624bch1_055_BHM.par

The end-to-end evaluation recordings on volume 39.1 (5th char in 
dialog name is 'e') do not have the encoding of the speaker language 
in the signal file name as regular multilingual VM recordings.

The following speakers (mostly from the end-to-end evaluation) do not 
have speaker protocols (*.spr):
HBM
HDF
HKB
HKD
HKK
HKL
HKM
HKN
HKO
QYY
QZN


-------------------------------------------------------------------------

Additional Documentation

Copies of the original Verbmobil Memos, TechDocs and Reports as 
relevant to the corpus are store in the subdir doc:

    TechDok-34-95: Automatic Conversion of American Dialog Data into VM Compatible Format
    TechDok-36-95: Transliterationslexikon (VERBMOBIL I)
    Memo-90-95: Partiturformat für die Darstellung unterschiedlicher Repräsentationsebenen von gesprochener Sprache
    Memo-95-95: Das Münchener AUtomatische Segmentationssystem (MAUS)
    Memo-96-95: Regelsystem zur Generierung von Aussprachevarianten
    Memo-111-96: Aussprachevarianten in der Verbmobil-Transliteration - Regeln zur konsistenteren Verschriftung
    TechDok-56-97: Transliteration spontansprachlicher Daten - Lexikon der Transliterationskonventionen - VERBMOBIL II
    Memo-128-97: The technical Setup for Dialog Recordings in VMII and Problems caused by Mobile Phones
    Memo-129-97: The conventions for phonetic transcription and segmentation of German used for the Munich Verbmobil corpus
    Memo-131-97: File Names, Formats and Structures in VERBMOBIL II
    VMReport-226-98: Dialogue Acts inVERBMOBIL II - Second Edition
    VMTechDok-71-99: VMII Szenario A und B: Instruktionen fur alle Sprachstellungen 

Copies of some publications can be found in the subdir doc:

partitur.pdf : First publication of BAS Partitur Format (BPF)


-------------------------------------------------------------------------

Main History (only events that concern all VM volumes)

...
12.03.98 : Filtered German dictionary vm_ger.lex and ORT tier in
           BAS Partitur files from the following characters:
           '=%*_'
16.08.00 : Converted VMI signal files from Phondat2 into NIST and
           placed them into directory /DATA. File names were adapted
           to VMII naming conventions.
           Update of all BPF; naming conventions of VMII; old BPFs
           are retained for backward compatibility; tier TRL not
           contained in new BPFs any more (TR2 is now used throughout
           the whole VM corpus!)
30.05.01 : New edition of all BAS Partitur Files (BPF) based on the latest
           error update. This includes a complete new MAUS annotation.
           Furthermore, additional previously un-published tiers were
           added to the distribution such as Syntax Trees, Dialogact
           Annotation, Syntactic-prosodic Labeling, Prosodic Labeling,
           Parts-of-Speech-Tagging.
08.06.01 : Edition of the VM Bonus CDROM (VMBONUS) with additional data
           and documentation that does not fit into the regular VM
           volumes; Edition of the VM Lexicon Database of the University
           of Bielefeld (VMLEX).
10.07.01 : Tiers LBP and LBG added to the BAS Partitur Files	   
30.01.03 : vm_ger.lex completely re-build:
           The German pronunciation dir of VM I+II now contains only the 
	   word items as they appear in the ORT tier of the BPF files.
	   Also the transcription was unified to a more consistant 
	   concept of a 'canonical form'.
	   For instance:
	   - /R/ and /r/ was unified to /r/ because it was not clear 
	     how these two allophones were used by different transcribers
	   - /a:6/ was replaced by /ar/
19.08.03 : New edition of all BAS Partitur Files (BPF) of German signal data
           based on the latest error update:
           Some minor bugs in the POS, LMA and SAP tiers fixed.
           Complete re-done pronunciation list for German (vm_ger.lex)
           according to the new 'Transliteration Conventions for Canonical
           German' (www.bas.uni-muenchen.de/Bas/BasGermanPronunciation/)
           Based on the new pronunciation the following tiers in the BPF
           files have been re-calculated:
           KAN, MAU
20.08.03 : New tier TLN integrated : the TLN tier contains the translation
           of the recorded utterance. The translations were produced
           manually by the University of Tuebingen, Prof. Hinrichs.
           The integrated data are also stored on the volume VMBONUS
           Please note that the orthographic representation of Japanese
           (romanji) in these translations is of the original form as used
           in the original Japanese pronunciation list (vm_jap_org.lex).
           However, it was never check whether these two data sets (lexicon
           and translations) are in fact compatible. Use with caution!
           For details about the TLN tier please refer to the BPF documentation
           www.bas.uni-muenchen.de/Bas/BasFormatseng.html
04.09.12 : changed language descriptions in speaker protocols (*.spr) to
           Iso 639-2 codes.
17.01.13 : CLARIN Repo Version 1
10.06.13 : CLARIN Repo Version 2: 
           found ISO8859 characters in BPF files -> fixed to UTF-8
10.06.16 : CLARIN Repo Version 3:
           - bug fix in *.par and *.ags, SUP tier, of the following turns:
           e011ach2_069_NMW
           e011ach1_070_ANV
           e011ach1_071_ANV
           e011ach1_072_ANV
           e011ach1_073_ANV
           g335acn2_021_AKA, g335atm2_021_AKA
           g335acn1_022_AJZ, g335atm1_022_AJZ
           e087ach2_064_PNP
           e087ach1_065_SMG
           g514arr1_018_BFD
           g514arr2_019_BFH
           g514arr2_020_BFH
           j155ach2_055_PBQ
           j155ach1_056_BBS
           - extended corpus by a emuDB component to enable emuR usage; 
           this includes the addition of *.wav files parallel to *.nis, 
           and the addition of complete emuDB structure 
           (/vdata/BAS/VM2_total_emuDB/VM2_emuDB/). This emuDB is *not*
           part of the CD-R/DVD-R distribution but can be accessed only
           via the BAS CLARIN Repository
           (http://hdl.handle.net/11858/00-1779-0000-0006-BF00-E).
           The NIST SPHERE signals files were removed from the BAS CLARIN 
           Repository. 
           - the structure of the BAS CLARIN Repo version changed with this 
           version 3 like following: in the German part (dialogs g?????)
           all multi-channel recordings (if available) were pooled in 
           the close-microphone bundle, i.e. the former bundles for room 
           microphone and telephone channel are no longer present. The emuDB
           treats the close-microphone channel as primary channel; the other 
           channels (if present) are included in the bundle directory but 
           not loaded in the EMU-SDMS.
18.07.20   CLARIN Repo Version : added documentation /doc and speaker metadata
           /spr and this file to root dir VM2_emuDB,
           added headset mic recordings of the complete dialog to the session directories
           together with transliteration files (*.trl), recording protocol *.rpr 
           and turn marker files (*.mar, = a turn segmentation that resulted
           in the turn-based signal files of the emuDB); these full-length 
           recordings (two channels, one per speaker) are not part of the emuDB
           (= cannot be queried), but are provided as additional material for
           researchers interested in the total dialog structure.
           added missing room mic/telephone channels to headset mic bundles.
           added missing recording protocols *.rpr to CLARINDocu.zip,2.
           

Downloads:

ftp://ftp.bas.uni-muenchen.de/pub/BAS/VM/partitur<date>.tgz
ftp://ftp.bas.uni-muenchen.de/pub/BAS/VM/vm_ger.lex