_/_/_/_/ _/_/ _/_/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/_/_/_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/ _/ _/_/_/_/ BAVARIAN ARCHIVE FOR SPEECH SIGNALS University of Munich, Institut of Phonetics Schellingstr. 3/II, 80799 Munich, Germany bas@phonetik.uni-muenchen.de COPYRIGHT University of Munich. All rights reserved. This corpus and software may not be disseminated further - not even partly - without a written permission of the copyright holders. Additional Copyright Holders University of Hamburg 1999 ---------------------------------------------------------------------- VERBMOBIL II Dialog Database (BAS Edition) 24.6.1999 / 15.09.99 ---------------------------------------------------------------------- The BAS edition (X.1.X) of the VERBMOBIL II speech data collection contains the same data in a adapted format to the VERBMOBIL I collection to simplify a joined usage of the whole database. The following deviations to the original edition (X.0.X) should be noted here: - Signals cut into turn-length signals - Transliteration and BAS Partitur files added on CD - Additional Documentation of Formats ------------------- Contents of this file ------------------------ CD directory structure and naming Naming conventions Signal file formats Nist-Header field definitions for VMII Marker file format Speaker protocol format Recording protocol format ----------------- CD directory structure and naming -------------- Directories README : thsi file README.X: specific information regarding this volume data : signal files, cut in turns spr : speaker information trl : transliteration files par : BAS partitur files doc : documentation of formats, etc. The directory names of the dialogues consist of the first 6 letters of a dialogue signal file ( for example: e013ac), the dialogue name. ----------------------- Naming conventions ---------------------- The naming does not follow the "DOS" convention; therefore Rockridge Extension was used for the ISO 9660 format The names are coded as follows: 1st character: [g,e,j,m] recorded language g(erman), e(nglish), j(apanese), m(ultilingual) 2nd to 4th character: dialogue number i.e. 001 5th character: scenario a(main), b(information desk), c(remote maintenance), d(VM1) 6th character: technical definition of recording c(lose), r(oom), t(elephone) 7th character: detailed description of recording means (microphone) telephone: m(obile), p(hone,analog), w(ireless), d(ect) close: h(eadset), n(eckband microphone), c(lip microphone) room: r(room) 8th character: channel coding [1..n] 9th character: '_' 10th to 12th character: turn number (starting with '000') 13th character: '_' 14th to 16th character: speaker ID The extentions code the contents of the file: .nis NIST SPHERE 16 kHz 16 Bit linear signal alaw signal, 8 kHz ulaw signal, 8 kHz .trl transliteration .par BAS partitur file .spr speaker protocol file .rpr recording session protocol ------------------------- Signal file formats ---------------------- a. Physical signal characteristics Signal files containing room or close microphone data are coded in the following format: 16 bit, 16 kHz, mono, linearly coded, little endian (intel byte order) . Files containing telephone data are coded: 8 bit alaw, 8 kHz, mono. b. Logical signal characteristics Each signal file contains one turn of a dialogue session of one speaker. Turn may overlap. ----------------- Nist-Header field definitions for VMII -------------- The signal files begin with a header following the NIST conventions. It has a defined size of 1024 bytes and consists of ascii characters. The format is as follows: key type description (possible) value(s) ------------------------------------------------------------------------ database_id string database VERBMOBIL2 database_version string version 1.0 scenario_language string recorded [german|english|japanese| language multilingual] scenario_id string scenario [main|information_desk| remote_maintenance|vm1] dialog_id string # of dialog 000-999 speaker_id string speaker AAA-ZZZ recording_site string site [CMU|LMU|ATR|UBN|UHH] recording_medium string rec. medium [telephone|room|close] recmed_spec string spec. of [mobile|wireless|analog| neckband|dect| mic/tel-type headset|clip] sample_coding string coding [alaw|ulaw|linear] sample_n_bytes int bytes/samle 1,2,... channel_count int # of channels 1,2,... sample_count int # of samples sample_byte_format string little/big [01|10|1] endian, one byte sample_rate int samp. freq scenario_date string logical date of YYMMDD, 980101 recording Added for Edition X.1.X: sample_sig_bits int number of significant bits (8 or 16) turn_id string turn name (prefix of signal file) The remaining bytes are filled with spaces. Example header: NIST_1A 1024 database_id -s10 VERBMOBIL2 database_version -s3 1.0 scenario_language -s6 german scenario_id -s4 main dialog_id -s3 010 speaker_id -s3 ABA recording_site -s3 LMU recording_medium -s5 close recmed_spec -s8 neckband sample_coding -s6 linear sample_n_bytes -i 2 channel_count -i 1 sample_count -i 124798 sample_byte_format -s2 01 sample_rate -i 16000 scenario_date -s8 980101 sample_sig_bits -i 16 turn_id -s16 g024acn2_013_AES end_head If necessary, software for extracting information from the header, editing header information etc. can be obtained from the NIST ftp-server under the address: ftp://jaguar.ncsl.nist.gov/pub/ . The source package ( for unix ) is called "sphere_2.6a.tar.Z". ----------------------- Speaker protocol format -------------------- 1)-3) obligatory speaker information: 1) id use you own unambiguous speaker code, upper case 2) sex m,f 3) date_of_birth six numbers: year month day; 15th Febr. 1972 = 720215 4)-16) optional speaker information: 4) own_native_language 5) native_language_father 6) native_language_mother 7) primary_school county/city of primary school years 8) dialect region in which speaker lived most of the time; of which your accent/dialect is characteristic 9) education highest educational degree 10) profession 11) height with measuring unit, no blank: 172cm 12) weight with measuring unit, no blank: 65kg 13) smoker y/n/former 14) right_left_handed r/l/ambi 15) comments when present: at the end of the document, line feeds are allowed Tag and Value are seperated by a tab! Example: id FGR sex f date_of_birth 541215 own_native_language g native_language_father g native_language_mother g primary_school K"oln dialect Rhein education Universit"at profession Lehrer height 168cm weight 58kg smoker n right_left_handed r -----------------------Recording protocol format ---------------------- 1) session_no XXX, digits 2) dialogue_name dialogue directory name: g076a 3) recording_date XxYyZz; year month day -> 971512 4) scenario_date logical_date (scenario-date) 5) recording_by name of person who carried out the recording 6) recording_site location of recording, e.g. LMU,CMU,ATR,UHH,UBN 7) scenario_id scenario: a (main), b (information desk), c (remote maintenance),d(VM1) 8) no_speakers 2 to 9 9) speaker1_id use your own unambiguous code, 3 upper case letters 10) speaker2_id 11) speaker3_id 12) speaker1_language XY: X - language spoken during recording - g(erman), e(nglish), j(apanese); Y - 0 (native), 1-3 (non-native); 13) speaker2_language 14) speaker3_language 15) speaker1_recmed_spec XYZ: XY used microphone - h(headset), n(eckband mic),c(lip mic), r(oom), Z used telephone - m(obile telephone), p(hone, analog), w(ireless), d(ect) 16) speaker2_recmed_spec 17) speaker3_recmed_spec 18) speaker1_micbrand comma seperated list of types of used microphones,use underline within names, e.g. beyer_dynamics_115 19) speaker2_micbrand 20) speaker3_micbrand 21) comments when present: at the end; line feeds allowed optional only: 1), 4) and 21) 12)-14): applies only if dialogue is multilingual Tag and Value are seperated by a tab! Example: session_no 3 dialogue_name g012a recording_date 970707 scenario_date 970601 recording_by DO recording_site LMU scenario_id a no_speakers 2 speaker1_id ABA speaker2_id ABD speaker1_recmed_spec rnp speaker2_recmed_spec rnw speaker1_micbrand beyer_dynamic_mce_10, beyer_dynamic_nem_191 speaker2_micbrand beyer_dynamic_mce_10, beyer_dynamic_nem_191 ---------------------------- further documentation --------------------- In the subdir "doc" you will find the following useful files: trllex_d.ps : Extensive documentation of the VERBMOBIL II Transliteration format (only in German) par*en.htm : Last documentation of the BAS Partitur Format (BPF) in English par*de.htm : in German (for an up-to-date documenmtation of the BPF please refer to: www.phonetik.uni-muenchen.de/Bas/BasFormatseng.html) vm_ger.lex : German pronunciation dictionary covering the total VERBMOBIL collection (transcripts in extended German SAM-PA) vm_eng.lex : English pronunciation dictionary covering the total VERBMOBIL collection (SAM-PA) For a definition of the SAm Phonetic Alphabets refer to: www.phon.ucl.ac.uk/home/sampa/home.htm Also you might find additional information in the following links: http://www.dfki.uni-sb.de/verbmobil/ : Project Page http://www.phonetik.uni-muenchen.de/Verbmobil.html : Munich Group http://www.phonetik.uni-muenchen.de/Bas : BAS Home Page http://www.phonetik.uni-muenchen.de : IPSK