_/_/_/_/         _/_/         _/_/_/_/
                    _/      _/       _/ _/        _/      _/
                   _/      _/       _/  _/       _/
                  _/      _/       _/   _/       _/
                 _/_/_/_/         _/_/_/_/        _/_/_/
                _/      _/       _/     _/             _/
               _/      _/       _/      _/             _/
              _/      _/       _/       _/    _/      _/
             _/_/_/_/         _/        _/     _/_/_/_/


                  BAVARIAN ARCHIVE FOR SPEECH SIGNALS

              University of Munich, Institute of Phonetics
              Schellingstr. 3/II, 80799 Munich, Germany
                      bas@phonetik.uni-muenchen.de


                      BITS UNIT SELECTION CORPUS
                           DVD-ROM Database
                          
                          
                         Copyright(C) 2005 by
                 Bavarian Archive for Speech Signals
                     University of Munich, Germany


Version 1.7
Short Name: BAS BITS-US
Corpus Date: 2006/06/08
Modification Date: 2006/08/25

The BITS synthesis corpus consists of two parts: a set of logatome
recordings for controlled diphone synthesis and a set of sentence
recordings for unit selection techniques. BITS stands for "BAS
Infrastructures for Technical Speech Processing" and was funded by
the German Ministry of Science and Education during 2003-2005.  

This README file reports on the BITS Unit Selection Corpus which consists of
6732 recordings stored on 4 DVDs. Each DVD contains the recordings, the
annotation files and the meta data files of one of the four professional
speakers, and the entire corpus' documentation.

Note that all documentation files are coded in Unicode UTF-8 if not 
stated otherwise.


Table of contents
-----------------
1.) Introduction
2.) Speakers
2.1.) Recruitment of Speakers
2.2.) Speakers Profile
3.) Recording
4.) Recording Procedure
5.1.) Phonemic Annotation
5.2.) Prosodic Annotation
6.) File Nomenclatura
7.) Structure of each DVD
8.) Other Documentation Files
9.) Contact
10.) History


Introduction
------------

The BITS Unit Selection Corpus consists of a set of sentence recordings for
concatenative 'unit selection' speech synthesis.

Speech synthesis using concatenative techniques is maturing to a point
where standard procedures are being implemented in a variety of products.
However, because of the considerable costs most small and medium-sized
companies as well as university labs cannot afford to produce the required
speech resources on their own. Although there are some public domain German
diphone voices available for research purposes (e.g. MBROLA) there is
definitely a lack of publicly available synthesis resources.
The BITS synthesis corpus (recorded and) produced by BAS fills the obvious
gap. The work was funded by the German Ministry of Education and Science
(grant no 01 IV B01).


Speakers
--------

Recruitment of speakers:
------------------------
45 speakers were invited for a casting. They were asked to read 90
logatomes that contained a subset of our diphone set so that three target
sentences of nearly all German phonemes could be synthesised.  Based on a
ranking according to naturaless and pleasantness 10 speakers were selected
as nominees. After an overall evaluation - by specialists in speech
synthesis and by the BITS group - the best four speakers (two male and two
female) were chosen for the final recordings. More informations about the
recruitment of the speakers can be found under:
/DOC/HTML/Specification_Unit_Selection_corpus.pdf

Speakers Profile:
-----------------
Four professional speakers were recorded, between the age of 40 to 45.
All speakers were of German nationality and had at least foreign language
competence in English.  More informations about the speakers can be found
in the table /DOC/SRPK.TBL.
SRPK.TBL contains a list that gives information about the speakers.

The ordered list has 10 columns (seperated by tabs):
ID			: speaker id (SES200[1-4])
Sex			: M = male, W = female
Age			: age of the speakers at the time of the recordings
Name			: full name
Nationality		: the nationality of the speaker
Size			: size in cm
Weight			: weight in kg
ACC			: the accent of the speaker is determined through the
			  federal state the speaker entered the school
Edu			: Education of the speaker
PoL			: current place of living
Prof			: current occupation
FL			: foreign languages
			  ENG - English
			  FR  - French
			  I   -	Italian
			  EL  - Greek
Smk			: smoker (y=yes, n=no, cas=casually)			

Recording:
----------
The speech signal was recorded in three channels (0 : headset-microphone,
1 : laryngograph signal and 2 : room microphone). The sampling rate is 48kHz, 
with 16 bit quantization. All signals are recorded via a Yamaha 02R digital
sound mixer directly to hard disk using the multi-channel recording software
SpeechRecorder (www.phonetik.uni-muenchen.de/Bas/software/speechrecorder/).


 - Channel 0 : close talk microphone (Beyerdynamic NEM 192) positioned 7cm to
 the right of the mid-sagital plane at the height of the upper lip.
 - Channel 1 : laryngograph signal (LaryngoGraph PCLX)
 - Channel 2 : large membrane condenser microphone (Neumann Type TLM 103) 60cm
 from the mouth.

Channels were separated into standard WAV format files; no further processing
was performed to avoid any undesired degradations of the signals.
 

Recording Procedure:
--------------------

The speaker was seated in an insulated room with low reverberation.  The
positions of the chair and room microphone were marked on the floor.
Before the recordings the speaker was asked to put on the headset
microphone and the laryngograph electrodes.  During the session the speech
prompts are displayed through a window using the program "SpeechRecorder".
Three supervisors monitored the recording and a prompt was repeated until
all three supervisors agreed about its quality.
More informations about the recording procedure can be found under:
/DOC/HTML/Preparation_and_Execution_of_Recordings_8_2.pdf

Annotation Files
----------------

Results of manual annotation are stored in the directory ANNOT/SES####
(#### = speaker number) in three different file formats:
- BAS Partitur Format (BPF,*.par) with the following tiers:
    ORT : orthographic representation of the prompted sentence
    KAN : canonical pronunciation (SAM-PA) of sentence
    SAP : segmentation of phonemes in augmented German SAM-PA
	  (see details below)
    PRM : prosodic labelling in GTobi 'light' (see below)	  
    Note that the label for lip smack (§) is replaced by '\S' in the 
    BPF files.
    (see DOC/README.PAR for more details)
- Annotation Graphs (XML, *.ags)
    This is basically the same information in XML form. See the ag.dtd 
    and metadata.dtd in directory DOC as well as 
    http://agtk.sourceforge.net/ for details.
- TextGrid (praat, *.TextGrid)
    The original segmentation results in the praat format. The two tiers
    contain the phonemic and prosodic labelling.

Phonemic Annotation:
--------------------

For the phonetic annotation all sentences were segmented in a first pass
with MAUS into German SAM-PA. (More about the SAMPA encoding used for the
annotation under /DOC/HTML/Conventions_for_segmentation_8_5e.pdf)

Phonemic annotions are stored either in the original Praat TextGrid 
files and also in the SAP tier of the BAS Partitur Format (BPF) files.

In a second pass a group of ten to twelve trained phoneticians manually 
corrected the pre-segmented sentences.
After that three phoneticians that were consistent to each other corrected the
segmentations in a third pass. 
In a last step all segmentations were reviewed by the team supervisor.

The following rules of annotation were used:
- the placing of boundaries is primarily based on the auditory judgement.
- the boundaries of segments are always placed at  positive zero-crossings
  of the oscillogram.
- the placement of the boundaries should be controlled by sonagram and
  oscillogram.
- in transitions where both of two adjacent phonemes can be heard, the
  boundary is placed in the middle of this transition (50% rule).
- voiced (periodic) elements start with the first clearly identifiable
  glottal pulse.
- the boundaries of segments with low intensity (e.g. /h/, aspiration) are
  set where the signal can be clearly distinguished from the background
  noise.
Noises of breathing - if clearly recognised - have to be cut off from the
friction or aspiration.

Special labels aside from standard German SAM-PA:
Q : glottal stop (SAM-PA: ?)
~ : preceeding vowel is nasalized
§ : preceeding phoneme contains an audible lip smack
    (replaced by '\S' in the BPF annotation files)
q : preceeding vowel was glottalized

More informations about the annotation of the corpus can be found under:
/DOC/HTML/Conventions_for_segmentation_8_5e.pdf

Prosodic Annotation:
--------------------

For the prosodic annotation ofthe BITS-US corpus a reduced subset of the 
GToBI set was used (GToBI Light). The motivation for this was the 
experiences of the colleagues at IMS Stuttgart about the reproducebility
of the GToBI tag set by different human labellers.

The reduced set includes the following tags:

Boundaries:	-	intermediate phrase boundary (ip)
		% 	general phrase boundary or intonation phrase boundary (IP)
		H%	high boundary tone IP
		-?	uncertain: ip present ?
		%?	uncertain: - or % ?
		H%?	uncertain: % or H% ?

Accents:	H*L	fall
		L*H	rise
		H*	high target on accented syllable
		..L	low trail tone
		*L	low target on accented syllable
		..H	high trail tone
		*?	uncertain: accentuation ?
		x?	uncertain about label x

More informations about the label types, rules of placement, the 
anotation procedure and certain problem type can be found under:
/DOC/HTML/Prosodic_annotation_8_4e.pdf


File Nomenclatura:
------------------

The names of both audio and annotation files consist of the following:

US####%%%%_$   with   #### : speaker id 1001 - 1004
                             (corresponds to the id 2001 to 2004 in the 
			      BITS logatome corpus)
                      %%%% : logatom id (see table /DOC/BITS-US.TBL)
		      $    : channel  0 - 2

File name extension mappings

.TextGrid	Praat Label file with interval tiers
.wav		Audio file
.txt		Text file
.html		HTML file
.par            BAS Partitur Format file


Structure of each DVD
---------------------

Each DVD contains the following:

README		: this file
DATA/		: the recordings of one of the four speakers
ANNOT/		: the annotations files of one of the four speakers
                  Praat annotation files (*.TextGrid), BPF files (*.par)
DOC/		: documentation files


The DOC/ directory contains the following:

README.PAR      : brief documentation of the BAS Partitur Format (BPF)
SPRK.TBL        : speaker profiles (see before)
KNOWN-ERRORS    : list of known errors that cannot be fixed

BITS-US.TBL     : sentence corpus lists 
BITS-US.XML
README.BITS-US  : docu to sentence lists
LEXICON.TBL     : pronunciation dictionary (see below)
MAPPING.TBL     : mapping of orthographic strings between prompted
                  and 'spelled out' orthography (see below)

DOC.HTML        : start of main documentation
HTML/           : main documentation files

PUBLICATIONS/   : publications


Other Documentation / Meta Data Files
-------------------------------------

BITS-US.TBL - sentence corpus list

This file contains a 4-column table describing the recorded sentences 
of the corpus:

<ID>;<KAN>;<PROMPT>;<ORT>

where:  <ID>          : sentence id 0001-1683
        <KAN>         : SAM-PA transcript of the sentence; words are
                        separated by a blank; ' denotes the lexical accent
        <PROMPT>      : prompt text as being displayed to speakers
        <ORT>         : orthographic form of the sentence 
                        in contrast to <PROMPT> here no punctuations are
			used, number and abbreviations are spelled out.
			Doc file MAPPING.TBL contains a list of mappings
			between <PROMPT> and <ORT>.

BITS-US.XML - sentence corpus list

This file contains the sentence corpus together with additional information 
such as 'part-of-speech' tags and prosodic hypothesis labelings.
Also see README.BITS-US.


LEXICON.TBL - pronunciation dictionary

This file contains a two-column list of orthographic words and their 
respective canonical pronunciation coded in extended German SAM-PA.
<PROMPT>	<KAN>
Please note:
- the orthographic string is coded as in the prompt text displayed to the 
  speakers; to achieve a spelled out version of the orthographic string use the 
  mapping table in MAPPING.TBL
- the SAM-PA coding follows the extended German SAM-PA coding as described in 
  /DOC/HTML/Conventions_for_segmentation_8_5e.pdf
  Phonemes are separated by a '|'; 
  lexical accents in non-functional words are coded by a preceding '.


PHONEME.TBL - list of phonemic SAM-PA symbols


Contact
-------

For questions, remarks, bug reports etc. please contact
Florian Schiel          schiel@bas-services.de
                        +49-89-2180-5751

History
-------

15.03.06 : Version 1.0
27.04.06 : Version 1.1 : Documentation re-worked
                         BAS Partitur Format files added
03.05.06 : Version 1.3 : changed all plain text files to UTF-8 coding
                         Documentation re-worked
			 meta files removed
08.06.06 : Version 1.4 : bug fix in LEXICON.TBL :
                         /pfuI/ -> /pfUI/
			 /be:'Qa:t@/ -> /be:Q'a:t@/
			 added phoneme list DOC/PHONEME.TBL
			 changed canonical SAM-PA transcripts in 
			  - LEXICON.TBL
			  - KAN tier of BPF files (*.par)
			  - BITS-US.TBL
			  - BITS-US.XML
			 in such a way that the phonemic units are 
			 separated by blancs, e.g.
			 /bo:tSaft6/ -> /b o: t S a f t 6/
12.07.06 : Version 1.5 : inserted phoneme separator '|' in 
                         - BPF files (tier KAN)
			 - LEXICON.TBL
			 - BITS-US.TBL
			 - BITS-US.XML
09.08.06 : Version 1.6 : Bug fix: in BPF files was one 8-bit char 
                         '§' denoting the 'lip smack'.
			 This was not conform to BPF and was replaced by 
			 the corresponding LaTeX code '\S'.
25.08.06 : Version 1.7 : Bug fix in AGS file creation, all *.ags files anew