_/_/_/_/ _/_/ _/_/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/_/_/_/ _/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/ _/ _/_/_/_/ BAVARIAN ARCHIVE FOR SPEECH SIGNALS University of Munich, Institut of Phonetics Schellingstr. 3/II, 80799 Munich, Germany bas@phonetik.uni-muenchen.de PH@TTSESSIONZ GERMAN DATABASE OF ADOLESCENT SPEECH Ph@ttSessionz SPEECH DATABASE COLLECTION Version 1.0.2 - 2008/08/26 Copyright(C) 2007, 2008 by Institute of Phonetics and Speech Processing University of Munich, Munich Germany Compiled by: Chr. Draxler Department of Phonetics and Speech Communication University of Munich Schellingstr. 3/II D 80799 Munich +49/89/2180 2807 tel +49/89/2180 5790 fax draxler@phonetik.uni-muenchen.de 1) OVERVIEW The Ph@ttSessionz speech database contains recordings of 864 adolescent speakers of German (age range 12-20). The recordings were performed via the WWW in public schools (Gymnasium) in 41 locations in Germany. The speech material recorded is a superset of the German SpeechDat-II and RVG-I corpora. Version 1.0.X : subset of prompt items 43 - 85 (up to 43) 30 phonetically rich sentences 11 telephone numbers Version 1.1.X : complete prompt items (up to 138) A session consists of up to 138 recording items, with both read and non-scripted speech. The read speech material comprises isolated digits, digit sequences, numbers, time and date expressions, spellings, person, company and geographical names, and phonetically rich sentences. The non-scripted speech consists of short and long text production items. The short text production items are questions on the current date or prompts for descriptions, e.g. on how to get from home to the train station, or the speaker's clothes; for the long items the speaker was asked to talk about the last holidays, or the favorite subject at school, etc. ITEMCODE | DESCRIPTION | COUNT ----------+------------------------------+------- 01-12 | single digit | 861 13-30 | number | 856 31-42 | command | 858 43-72 | phonetically rich sentence | 860 73-85 | telephone number | 855 B1-B3 | digit string, all digits | 856 C1-C3 | digit string, credit card | 850 C4-C6 | digit string, PIN code | 856 D1-D3 | date expression | 857 L0 | spelling, arbitrary sequence | 734 L1-L5 | spelling, geographical name | 858 L6-L8 | spelling, person name | 859 L7-L8 | spelling, person name | 858 L9 | spelling, arbitrary sequence | 859 O1-O3 | geographical name | 857 O4-O6 | company name | 856 O7-O9 | person name | 856 P00-P10 | phonetics test sentence | 119 T1-T3 | time expression | 853 X1-X5 | text production, short | 822 Y1,Y3,Y4 | text production, long | 764 NOTE 1: items P00-P10 were added later during the project and hence have not been recorded at every site. NOTE 2: version 1.0.X of the Ph@ttSessionz database contains only the phonetically rich sentences (item codes 43-72) and the telephone numbers (item codes 73-85). 2) DATABASE ORGANIZATION The database is organized in recording sessions. Each session corresponds to a directory, and each recording is stored in a separate audio file in WAV format. The file nomenclature is NNNN"/AAA"NNNNII[I]"_"C".WAV" with NNNN the four digit session code II[I] a two- respectively three-character item code (see table above) C the recording channel: 0 = close talk microphone 1 = table top microphone (see below) The file hierarchy on the distribution media is as follows (all text files are coded in UTF-8): /- README.TXT this file /- COPYRIGHT.TXT /- DOC ---+- SAMPALEX.PDF SAM-PA symbols used +- at3031_english.pdf table top microphone +- 060628_MPre_UG_EN01.pdf USB A/D converter +- Opus54_DB_E.pdf headset microphone +- TRANSCRIPTION.PDF rules of transcription /- TABLE -+- CONTENTS.TBL summary transcripts +- LEXICON.TBL pronunciation dictionary +- METADATA.TBL speaker information +- PH100TRN.TBL training set definition +- PH100TST.TBL test set definition /- SOURCE +- DEFTEST.PL script to create sets /- DATA --+- 2000 -+- AAA200001_0.wav recordings | +- AAA200001_1.wav | +- AAA200001_0.par BPF file | +... : +- 4999 -+- AAA499901_0.wav +- AAA499901_1.wav +- AAA499901_1.par NOTE 1: the audio files may be distributed in a gzip-compressed archive file split into separate chunks. These chunks are named audio_xx with xx a two-character code starting at "aa", followed by "ab", "ac", etc. To concatenate the contents of the chunks use the following command % cat {all chunk files in correct order separated by blanks} > audio.tgz Then decompress and extract the compressed tar file. NOTE 2: the file name extension mappings are .WAV RIFF WAVE audio file, mono, 22.05kHz, 16bit PCM .PDF Adobe Portable Document Format .PL perl script .TXT UTF-8 plain text file with Unix line breaks (line feed) .TBL tab-delimited UTF-8 table file with Unix line breaks (line feed) .PAR BAS Partitur Format (BPF) file (7 bit ASCII) with tiers ORT,KAN,TRN,MAU (see http://www.bas.uni-muenchen.de/Bas/BasFormatseng.html a copy of this file can be found in DOC/HTML) NOTE 3: BPF files BPF files are only provided for the channel 0 since both channels are synchronuous. The ORT tier (orthographic) of the BPF files contains two labels that are no regular words but nevertheless represent sounds produced by the vocal apparatus of the speaker and are therefore allowed in the ORT tier: '[spk]' : non-speech sounds produced by the speaker, e.g. cough, throat clear, blow, laugh... (these sounds are labelled by the speech garbage model '' in the MAU tier) '[fil]' : hesitation (these sounds are modelled by the phoneme sequence /QE:/ in the MAU tier) Furthermore, the ORT tier contains all word fragments of the orthographic transcription usually marked with either '*' or '~' but without the markers. Please note that neither the two special markers nor the fragments are part of the official pronunciation dictionary TABLE/LEXICON.TBL The table TABLE/LEXICON.ORT provides a complete list of items as being used in the BPF ORT tiers. The TRN tier contains begin and duration of the utterance in samples. The MAU tier contains a phonemic segmentation of the signal based on the orthographic transcript and the utterance segmentation. The phonemic inventory is that of the extended German SAM-PA as defined in DOC/GermanSAMPA.txt. Garbage sounds such as cough, laugh etc. are modelled by the symbol ''. The following directories contain documentation and related information: DOC : GermanSAMPA.txt Extended German SAM-PA table TRANSCRIPTION.PDF validation and transcription handbook at3031_english.pdf Audio-Technica AT3031 data sheet 060628_MPre_UG_EN01.pdf M-Audio mobile pre user guide Opus54_DB_E.pdf Beyerdynamic opus 54 data sheet SOURCE : contains the follwing Unix formatted ISO 8859-1 files DEFTEST.PL perl script to define training and test sets TABLE : contains the following UTF-8 encoded plain text files CONTENTS.TBL the prompts and annotations file with tab-delimited fields SESSION ITEMCODE DESCRIPTION PROMPT SEGMENT_BEGIN SEGMENT_END ANNOTATION SEGMENT_BEGIN and SEGMENT_END are given in milliseconds LEXICON.TBL the lexicon file covering regular words with the following tab-delimited fields ORTHOGRAPHY FREQUENCY SAM-PRONUNCIATION Note that the pronunciation is coded in extended German SAMPA as defined in DOC/GermanSAMPA.txt LEXICON.ORT the lexicon file also covering word fragments with the following tab-delimited fields ORTHOGRAPHY SAM-PRONUNCIATION Note that the pronunciation is coded in extended German SAMPA as defined in DOC/GermanSAMPA.txt METADATA.TBL the speaker information file with the following tab separated fields SESSION ZIPCODE CITY REC_DATE REC_TIME SEX DIALECT SMOKER HEIGHT WEIGHT AGE COUNT This file is used to generate the training and test sets respectively. PH100TRN.TBL 664 session numbers for training set PH100TST.TBL 200 session numbers for test set 3) SIGNAL QUALITY The following recording equipment was used: 1) Beyerdynamic opus54 close-talk microphone 2) AudioTechnica AT3033 table microphone 3) m-audio mobile pre USB A/D converter The recording quality is 22.05 kHz 16 bit, and the files are in mono .wav format. The file suffix "_0" identifies the close-talk microphone channel, the suffix "_1" the table top microphone. 4) INSTRUCTIONS TO RECORDING STAFF AND SPEAKERS A description of the instructions to the recording staff is given in file DOC/HTML/manual_eng.html 5) HISTORY 2008-08-19 : edition 1.0.1 via BAS 2008-08-26 : edition 1.0.2 added BAS Partitur Format files (BPF) with MAUS segmentation (tiers ORT,KAN,TRN,MAU) 6) REFERENCES Papers on Ph@ttSessionz and the tools used (SpeechRecorder and WebTranscribe) were published at numerous international conferences: @inproceedings{Draxler_2006_b, Address = {St. Petersburg}, Author = {Chr. Draxler}, Booktitle = {Proc. of Specom}, Title = {Web-Based Speech Data Collection and Annotation}, Year = {2006}} @inproceedings{Draxler_Jaensch_2006, Address = {Genova}, Author = {Chr. Draxler and K. J{\"a}nsch}, Booktitle = {Proc. of LREC}, Title = {Speech Recordings in Public Schools in {Germany} - the Perfect Show Case for Web-based Recordings and Annotation}, Year = {2006}} @inproceedings{Draxler_2006_a, Address = {Pittsburgh, PA}, Author = {Chr. Draxler}, Booktitle = {Proc. of Interspeech}, Title = {Exploring the Unknown -- Collecting 1000 speakers over the Internet for the Ph@ttSessionz Database of Adolescent Speakers}, Year = {2006}} @inproceedings{Draxler_2005, Address = {Karlsbad, Czech Republic}, Author = {Chr. Draxler}, Booktitle = {Proceedings of TSD 2005}, Title = {WebTranscribe -- An Extensible Web-based Speech Annotation Framework}, Year = {2005}} @inproceedings{DraxlerJaensch2005, Author = {Chr. Draxler and K. J{\"a}nsch}, Booktitle = {Proceedings of DAGA 2005}, Title = {{SpeechRecorder} -- Mehrkanal Sprachaufnahmen {\"u}ber das {WWW}}, Year = {2005}} @inproceedings{Steffen_et_al_2005, Author = {A. Steffen and Chr. Draxler and A. Baumann and S. Schmidt}, Booktitle = {Proceedings of DAGA 2005}, Title = {{Ph@ttSessionz}: {A}ufbau einer {D}atenbank mit {J}ugendsprache}, Year = {2005}} @inproceedings{Draxler_Steffen_2005, Address = {Lisbon, Portugal}, Author = {Chr. Draxler and A. Steffen}, Booktitle = {Proceedings of Interspeech 2005}, Title = {Ph@ttSessionz: Recording 1000 Adolescent Speakers in Schools in Germany}, Year = {2005}} @inproceedings{DraxlerJaensch2004, Address = {Lisbon}, Author = {Chr. Draxler and K. J{\"a}nsch}, Booktitle = {Proceedings. of 4th Intl. Conference on Language Resources and Evaluation}, Pages = {559-562}, Title = {SpeechRecorder -- a Universal Platform Independent Multi-Channel AudioRecording Software}, Year = {2004}} @inproceedings{Draxler1998, Address = {Granada}, Author = {Chr. Draxler}, Booktitle = {Proceedings of LREC}, Title = {{WWWSigTranscribe} -- A {J}ava {E}xtension of the {WWWTranscribe} {T}oolbox}, Year = {1998}} @inproceedings{Draxler1997, Address = {Rhodos}, Author = {Chr. Draxler}, Booktitle = {Proc. of {Eurospeech}}, Title = {{WWWTranscribe} -- A {Modular} {T}ranscription {S}ystem {B}ased on the {W}orld {W}ide {W}eb}, Year = {1997}} 7) KNOWN ERRORS As mentioned earlier not all recording session contain all 138 prompted recordings. Missing items are primarily caused by errors during the recording session. Please refer to the annotation table TABLE/CONTENTS.TBL for a precise reference on what recordings are contained for each recorded session in this edition. Version 1.0.X : 281 recordings are missing (37152-36871) The word 'Sie' (polite form of adress) is being transcribed as 'sie' in the transcriptions. It therefore cannot be distinguished from the regular German pronoun 'sie'. The following BPF files contain a corrupt TRN and MAU tier caused by wrong manual segmentation of the utterance: AAA207965_0.par AAA209863_0.par AAA214249_0.par AAA215059_0.par AAA215075_0.par AAA215144_0.par AAA215153_0.par AAA215169_0.par AAA215274_0.par AAA223370_0.par AAA223657_0.par AAA223668_0.par AAA224175_0.par AAA228573_0.par AAA230066_0.par AAA238985_0.par AAA241451_0.par AAA243053_0.par AAA243055_0.par AAA243470_0.par AAA259945_0.par AAA261451_0.par AAA263760_0.par AAA263853_0.par AAA264855_0.par AAA272475_0.par AAA276961_0.par AAA342251_0.par AAA344544_0.par AAA345347_0.par AAA345366_0.par AAA350354_0.par AAA350375_0.par AAA355446_0.par AAA364760_0.par AAA364849_0.par AAA372956_0.par AAA372968_0.par AAA376952_0.par AAA377878_0.par AAA384680_0.par AAA385077_0.par AAA385273_0.par AAA386161_0.par AAA387461_0.par AAA388655_0.par AAA388667_0.par AAA389356_0.par AAA389559_0.par AAA389748_0.par AAA389964_0.par AAA389969_0.par AAA389975_0.par AAA394375_0.par AAA450954_0.par This will be fixed in one of the next versions and the complete set of BPF files may be downloaded from our server (users will be notified by email).