_/_/_/_/         _/_/         _/_/_/_/
                    _/      _/       _/ _/        _/      _/
                   _/      _/       _/  _/       _/
                  _/      _/       _/   _/       _/
                 _/_/_/_/         _/_/_/_/        _/_/_/
                _/      _/       _/     _/             _/
               _/      _/       _/      _/             _/
              _/      _/       _/       _/    _/      _/
             _/_/_/_/         _/        _/     _/_/_/_/


                   BAVARIAN ARCHIVE FOR SPEECH SIGNALS 

               University of Munich, Institut of Phonetics
               Schellingstr. 3/II, 80799 Munich, Germany
                      bas@phonetik.uni-muenchen.de


        PH@TTSESSIONZ GERMAN DATABASE OF ADOLESCENT SPEECH
                    
                          Ph@ttSessionz
                      
                    SPEECH DATABASE COLLECTION
                          
                          
                    Version 1.0.2 - 2008/08/26
                    Copyright(C) 2007, 2008 by
            Institute of Phonetics and Speech Processing
                    University of Munich, Munich
                             Germany

Compiled by: Chr. Draxler
             Department of Phonetics and Speech Communication
             University of Munich
             Schellingstr. 3/II
             D 80799 Munich

             +49/89/2180 2807 tel
             +49/89/2180 5790 fax

             draxler@phonetik.uni-muenchen.de


1) OVERVIEW

The Ph@ttSessionz speech database contains recordings of
864 adolescent speakers of German (age range 12-20). The recordings were
performed via the WWW in public schools (Gymnasium) in 41 locations in
Germany. The speech material recorded is a superset of the German
SpeechDat-II and RVG-I corpora.

Version 1.0.X : subset of prompt items 43 - 85 (up to 43)
                30 phonetically rich sentences
                11 telephone numbers
Version 1.1.X : complete prompt items (up to 138)

A session consists of up to 138 recording items, with both read and
non-scripted speech. The read speech material comprises isolated digits,
digit sequences, numbers, time and date expressions, spellings, person,
company and geographical names, and phonetically rich sentences. The
non-scripted speech consists of short and long text production items.
The short text production items are questions on the current date or
prompts for descriptions, e.g. on how to get from home to the train
station, or the speaker's clothes; for the long items the speaker was
asked to talk about the last holidays, or the favorite subject at
school, etc. 

 ITEMCODE |         DESCRIPTION          | COUNT 
----------+------------------------------+-------
 01-12    | single digit                 |   861
 13-30    | number                       |   856
 31-42    | command                      |   858
 43-72    | phonetically rich sentence   |   860
 73-85    | telephone number             |   855
 B1-B3    | digit string, all digits     |   856
 C1-C3    | digit string, credit card    |   850
 C4-C6    | digit string, PIN code       |   856
 D1-D3    | date expression              |   857
 L0       | spelling, arbitrary sequence |   734
 L1-L5    | spelling, geographical name  |   858
 L6-L8    | spelling, person name        |   859
 L7-L8    | spelling, person name        |   858
 L9       | spelling, arbitrary sequence |   859
 O1-O3    | geographical name            |   857
 O4-O6    | company name                 |   856
 O7-O9    | person name                  |   856
 P00-P10  | phonetics test sentence      |   119
 T1-T3    | time expression              |   853
 X1-X5    | text production, short       |   822
 Y1,Y3,Y4 | text production, long        |   764

NOTE 1: items P00-P10 were added later during the project and hence have
not been recorded at every site.

NOTE 2: version 1.0.X of the Ph@ttSessionz database contains only the
phonetically rich sentences (item codes 43-72) and the telephone numbers 
(item codes 73-85).


2) DATABASE ORGANIZATION

The database is organized in recording sessions. Each session
corresponds to a directory, and each recording is stored in a separate
audio file in WAV format.  The file nomenclature is

NNNN"/AAA"NNNNII[I]"_"C".WAV"

with NNNN  the four digit session code 
     II[I] a two- respectively three-character item code (see table above)
     C     the recording channel: 0 = close talk microphone
                                  1 = table top microphone (see below)

The file hierarchy on the distribution media is as follows
(all text files are coded in UTF-8):

/- README.TXT                                    this file
/- COPYRIGHT.TXT
/- DOC ---+- SAMPALEX.PDF                        SAM-PA symbols used
          +- at3031_english.pdf                  table top microphone
          +- 060628_MPre_UG_EN01.pdf             USB A/D converter
          +- Opus54_DB_E.pdf                     headset microphone
          +- TRANSCRIPTION.PDF                   rules of transcription
/- TABLE -+- CONTENTS.TBL                        summary transcripts
          +- LEXICON.TBL                         pronunciation dictionary
          +- METADATA.TBL                        speaker information
          +- PH100TRN.TBL                        training set definition
          +- PH100TST.TBL                        test set definition
/- SOURCE +- DEFTEST.PL                          script to create sets
/- DATA --+- 2000 -+- AAA200001_0.wav            recordings
          |        +- AAA200001_1.wav
          |        +- AAA200001_0.par            BPF file
          |        +...
          :
          +- 4999 -+- AAA499901_0.wav
                   +- AAA499901_1.wav
                   +- AAA499901_1.par

NOTE 1: the audio files may be distributed in a gzip-compressed archive
file  split into separate chunks. These chunks are named audio_xx with
xx a two-character code starting at "aa", followed by "ab", "ac", etc.

To concatenate the contents of the chunks use the following command

% cat {all chunk files in correct order separated by blanks} > audio.tgz

Then decompress and extract the compressed tar file.


NOTE 2: the file name extension mappings are

.WAV     RIFF WAVE audio file, mono, 22.05kHz, 16bit PCM
.PDF     Adobe Portable Document Format
.PL      perl script
.TXT     UTF-8 plain text file with Unix line breaks (line feed)
.TBL     tab-delimited UTF-8 table file with Unix line breaks (line feed)
.PAR     BAS Partitur Format (BPF) file (7 bit ASCII) with tiers
         ORT,KAN,TRN,MAU 
	 (see http://www.bas.uni-muenchen.de/Bas/BasFormatseng.html
	 a copy of this file can be found in DOC/HTML)

NOTE 3: BPF files
BPF files are only provided for the channel 0 since both channels are 
synchronuous.
The ORT tier (orthographic) of the BPF files contains two labels that are
no regular words but nevertheless represent sounds produced by the vocal 
apparatus of the speaker and are therefore allowed in the ORT tier:
'[spk]' : non-speech sounds produced by the speaker, e.g. cough, throat
          clear, blow, laugh... (these sounds are labelled by the speech 
	  garbage model '<usb>' in the MAU tier)
'[fil]' : hesitation (these sounds are modelled by the phoneme sequence
          /QE:/ in the MAU tier)
Furthermore, the ORT tier contains all word fragments of the orthographic
transcription usually marked with either '*' or '~' but without the 
markers. 
Please note that neither the two special markers nor the fragments are
part of the official pronunciation dictionary TABLE/LEXICON.TBL
The table TABLE/LEXICON.ORT provides a complete list of items as being used 
in the BPF ORT tiers.
The TRN tier contains begin and duration of the utterance in samples.
The MAU tier contains a phonemic segmentation of the signal based on the 
orthographic transcript and the utterance segmentation. The phonemic 
inventory is that of the extended German SAM-PA as defined in 
DOC/GermanSAMPA.txt. Garbage sounds such as cough, laugh etc. are modelled 
by the symbol '<usb>'.


The following directories contain documentation and related information:

DOC    : GermanSAMPA.txt         Extended German SAM-PA table
	 TRANSCRIPTION.PDF       validation and transcription handbook
         at3031_english.pdf      Audio-Technica AT3031 data sheet
         060628_MPre_UG_EN01.pdf M-Audio mobile pre user guide
         Opus54_DB_E.pdf         Beyerdynamic opus 54 data sheet
         
SOURCE : contains the follwing Unix formatted ISO 8859-1 files

         DEFTEST.PL   perl script to define training and test sets

		 
TABLE  : contains the following UTF-8 encoded plain text files

         CONTENTS.TBL the prompts and annotations file with
                      tab-delimited fields
                      
                      SESSION ITEMCODE DESCRIPTION PROMPT SEGMENT_BEGIN SEGMENT_END ANNOTATION
                      
                      SEGMENT_BEGIN and SEGMENT_END are given in milliseconds
                      
         LEXICON.TBL  the lexicon file covering regular words 
	              with the following tab-delimited fields

                      ORTHOGRAPHY FREQUENCY SAM-PRONUNCIATION	

		      Note that the pronunciation is coded in extended
		      German SAMPA as defined in DOC/GermanSAMPA.txt

         LEXICON.ORT  the lexicon file also covering word fragments 
	              with the following tab-delimited fields

                      ORTHOGRAPHY SAM-PRONUNCIATION	

		      Note that the pronunciation is coded in extended
		      German SAMPA as defined in DOC/GermanSAMPA.txt


         METADATA.TBL the speaker information file with the following
                      tab separated fields

                      SESSION	ZIPCODE	CITY	REC_DATE	REC_TIME	SEX	DIALECT	SMOKER	HEIGHT	WEIGHT	AGE	COUNT

                      This file is used to generate the training and test
                      sets respectively.

         PH100TRN.TBL 664 session numbers for training set

         PH100TST.TBL 200 session numbers for test set


3) SIGNAL QUALITY

The following recording equipment was used:

1) Beyerdynamic opus54 close-talk microphone
2) AudioTechnica AT3033 table microphone
3) m-audio mobile pre USB A/D converter

The recording quality is 22.05 kHz 16 bit, and the files are in mono
.wav format. The file suffix "_0" identifies the close-talk microphone
channel, the suffix "_1" the table top microphone.

4) INSTRUCTIONS TO RECORDING STAFF AND SPEAKERS

A description of the instructions to the recording staff is given in file
DOC/HTML/manual_eng.html


5) HISTORY

2008-08-19 : edition 1.0.1 via BAS
2008-08-26 : edition 1.0.2 added BAS Partitur Format files (BPF) 
             with MAUS segmentation (tiers ORT,KAN,TRN,MAU)

6) REFERENCES

Papers on Ph@ttSessionz and the tools used (SpeechRecorder and
WebTranscribe) were published at numerous international conferences:


@inproceedings{Draxler_2006_b,
	Address = {St. Petersburg},
	Author = {Chr. Draxler},
	Booktitle = {Proc. of Specom},
	Title = {Web-Based Speech Data Collection and Annotation},
	Year = {2006}}


@inproceedings{Draxler_Jaensch_2006,
	Address = {Genova},
	Author = {Chr. Draxler and K. J{\"a}nsch},
	Booktitle = {Proc. of LREC},
	Title = {Speech Recordings in Public Schools in {Germany} - the Perfect Show Case for Web-based Recordings and Annotation},
	Year = {2006}}


@inproceedings{Draxler_2006_a,
	Address = {Pittsburgh, PA},
	Author = {Chr. Draxler},
	Booktitle = {Proc. of Interspeech},
	Title = {Exploring the Unknown -- Collecting 1000 speakers over the Internet for the Ph@ttSessionz Database of Adolescent Speakers},
	Year = {2006}}


@inproceedings{Draxler_2005,
	Address = {Karlsbad, Czech Republic},
	Author = {Chr. Draxler},
	Booktitle = {Proceedings of TSD 2005},
	Title = {WebTranscribe -- An Extensible Web-based Speech Annotation Framework},
	Year = {2005}}


@inproceedings{DraxlerJaensch2005,
	Author = {Chr. Draxler and K. J{\"a}nsch},
	Booktitle = {Proceedings of DAGA 2005},
	Title = {{SpeechRecorder} -- Mehrkanal Sprachaufnahmen {\"u}ber das {WWW}},
	Year = {2005}}


@inproceedings{Steffen_et_al_2005,
	Author = {A. Steffen and Chr. Draxler and A. Baumann and S. Schmidt},
	Booktitle = {Proceedings of DAGA 2005},
	Title = {{Ph@ttSessionz}: {A}ufbau einer {D}atenbank mit {J}ugendsprache},
	Year = {2005}}


@inproceedings{Draxler_Steffen_2005,
	Address = {Lisbon, Portugal},
	Author = {Chr. Draxler and A. Steffen},
	Booktitle = {Proceedings of Interspeech 2005},
	Title = {Ph@ttSessionz: Recording 1000 Adolescent Speakers in Schools in Germany},
	Year = {2005}}


@inproceedings{DraxlerJaensch2004,
	Address = {Lisbon},
	Author = {Chr. Draxler and K. J{\"a}nsch},
	Booktitle = {Proceedings. of 4th Intl. Conference on Language Resources and Evaluation},
	Pages = {559-562},
	Title = {SpeechRecorder -- a Universal Platform Independent Multi-Channel AudioRecording Software},
	Year = {2004}}


@inproceedings{Draxler1998,
	Address = {Granada},
	Author = {Chr. Draxler},
	Booktitle = {Proceedings of LREC},
	Title = {{WWWSigTranscribe} -- A {J}ava {E}xtension of the {WWWTranscribe} {T}oolbox},
	Year = {1998}}


@inproceedings{Draxler1997,
	Address = {Rhodos},
	Author = {Chr. Draxler},
	Booktitle = {Proc. of {Eurospeech}},
	Title = {{WWWTranscribe} -- A {Modular} {T}ranscription {S}ystem {B}ased on the {W}orld {W}ide {W}eb},
	Year = {1997}}

7) KNOWN ERRORS

As mentioned earlier not all recording session contain all 138 prompted recordings.
Missing items are primarily caused by errors during the recording session.
Please refer to the annotation table TABLE/CONTENTS.TBL for a precise reference on 
what recordings are contained for each recorded session in this edition.
Version 1.0.X : 281 recordings are missing (37152-36871)

The word 'Sie' (polite form of adress) is being transcribed as 'sie' in the 
transcriptions. It therefore cannot be distinguished from the regular 
German pronoun 'sie'.

The following BPF files contain a corrupt TRN and MAU tier caused by wrong
manual segmentation of the utterance:
AAA207965_0.par
AAA209863_0.par
AAA214249_0.par
AAA215059_0.par
AAA215075_0.par
AAA215144_0.par
AAA215153_0.par
AAA215169_0.par
AAA215274_0.par
AAA223370_0.par
AAA223657_0.par
AAA223668_0.par
AAA224175_0.par
AAA228573_0.par
AAA230066_0.par
AAA238985_0.par
AAA241451_0.par
AAA243053_0.par
AAA243055_0.par
AAA243470_0.par
AAA259945_0.par
AAA261451_0.par
AAA263760_0.par
AAA263853_0.par
AAA264855_0.par
AAA272475_0.par
AAA276961_0.par
AAA342251_0.par
AAA344544_0.par
AAA345347_0.par
AAA345366_0.par
AAA350354_0.par
AAA350375_0.par
AAA355446_0.par
AAA364760_0.par
AAA364849_0.par
AAA372956_0.par
AAA372968_0.par
AAA376952_0.par
AAA377878_0.par
AAA384680_0.par
AAA385077_0.par
AAA385273_0.par
AAA386161_0.par
AAA387461_0.par
AAA388655_0.par
AAA388667_0.par
AAA389356_0.par
AAA389559_0.par
AAA389748_0.par
AAA389964_0.par
AAA389969_0.par
AAA389975_0.par
AAA394375_0.par
AAA450954_0.par
This will be fixed in one of the next versions and the complete set of 
BPF files may be downloaded from our server (users will be notified by email).