BAS
Bavarian Archive for Speech Signals
File Formats

Gleiche Seite in deutsch

This page was last updated 01/09/04

This page contains description and definitions of the BAS file formats.
An extensive overview about all existing linguistic annotation systems can be found here.

Signal Files

PhonDat 1
PhonDat 2
NIST SPHERE

Segment/Label Files

S0-Format (Word Label)
S1-Format (Phoneme Label)
S2-Format (Automatic Segmentation)
BAS Partitur Format

General
History
Definition of Structure
Definition of Tiers

'Canonical Pronunciation' - KAN
Orthography - ORT
Verbmobil Transliteration - TRL
Verbmobil Transliteration II - TR2
Superimposed Speech - SUP
Phonetic Segmentation PhonDat - PHO
Phonetic Segmentation SAM-PA - SAP
Automatic Segmentation SAM-PA - MAU
Word Segmentation - WOR
Dialogact Segmentation - DAS
Prosodic Segmentation - PRB
Symbolic prosodic Segmentation - PRS
Noise Labeling - NOI
Signal-based Prosodic accents labeling - LBP
Signal-based Prosodic boundaries labeling - LBG
Syntactic-prosodic labeling - PRO
Syntactic trees - SYN,FUN,LEX
Parts of Speech - POS
Lemmata - LMA
Phonetic Segmentation IPA - IPA
Segmentation in turns/sentences/chunks/etc - TRN
SmartKom Transliteration - TRS
SmartKom Gesture Labeling - GES
SmartKom User State Labeling holistic - USH
SmartKom User State Labeling by mimic expression - USM
SmartKom User State Labeling Occlusions - OCC
SmartKom Meta Linguistic Features - USP
Translation - TLN

SAM Format
AGS Format

Signal Files

PhonDat 1

Signal files with PhonDat 1 Header contain a binary header of constant length (512 bytes). The signal samples (2 bytes per sample) start after this header and are always in LoHi byte order (Intel format). The header contains a defined structure with information as sampling frequency, resolution in bits, etc. The header is ILS comaptible.

For reading and writing please use the software delivered with the corpus (modul header.c).

A detailed description of the binary header structure can be found here.

PhonDat 2

PhonDat 2 is an extension of the PhonDat 1 format. After the binary header of 512 bytes additional blocks of 512 bytes follow which contain the orthography and canonical transcript of the utterance (SAM-PA).
The PhonDat 2 header can be identified by the version number (2) in the binary part.

For reading and writing please use the software delivered with the corpus (modul header.c).

A detailed description of the binary header structure and the following header blocks can be found here.

NIST - SPHERE

The NIST - SPHERE speech header format was defined by the 'National Institute of Standards and Technology, USA'. It is used in many american speech corpora.

A detailed description of the NIST/SPHERE formats can be found here.

Only a few BAS copora contain data with NIST headers. However, in all BAS corpora you will find tools (nist2pho, pho2nist) to transform PhonDat to NIST and vice versa.

Segment/Label Files

S0 Format

The S0 Format contains word labels of utterances longer than a single word. The format was defined in the German PhonDat project. The label files are in ASCII and have the same prefix as the corresponding signal files. The extension is .S0.

Syntax:


<file> = <Name of segment file> CR
         <Orthography> CR
         oend CR
         <Canonical form> CR
         kend CR
         hend CR
         <list of word segments> 

<list of word segments> = <begin sample> <marker> CR
                                ...

<begin sample> = number of first sample 

<marker> = '#c:' (beginning of first word)  OR
           <canonical word form> (as read from the lexicon)  OR
           '.' (end of last word)

<Name of segment file> = any valid filename

<Orthography> =
The orthographic string contains the standard orthography or a
transliteration with additional markers of the spoken utterance.
German Umlauts are represented either by LaTeX
convention or by 7 bit ASCII signs or by German Character set
coding used by DEC and Sun:

Umlaut  LaTeX   7 Bit ASCII (dec)       German Char Set (hex)
Ae      "A      [ (91)                  C4
Ue      "U      ] (93)                  CD
Oe      "O      \ (92)                  D6
ae      "a      { (123)                 E4
ue      "u      } (125)                 FC
oe      "o      | (124)                 F6
ss      "s      ~ (126)                 DF

<Canonical form> =
The canonicalal string contains the exspected citation forms of the
word in the utterance. Note that this is NOT a transcription of the
signal. Symbols used are the German subcorpus of the 
SAM-PA, with
following changes to SAM-PA:

Q       Glottal Stop
q       Glottalisierung (not in canonicalal forms!)
'       main stress
"       secondary stress
#       compositum marker (optional)
+       function word marker (suffix, optional)

Words are seperated by two blanks, phonemic labels are seperated by
one blank.

Remarks:

Word boundaries are in rising zero crossings only.
Differing pronunciations are not marked.
Pauses or silence not marked.
Missing words are marked with a '-' after the marker. The following word has the same begin sample.
Replaced word are marked with a '-' after the marker The replacing word has the same begin sample.

S1 Format

The S1 Format contains the phonological segmentation of the utterance. The format was defined in the German PhonDat project. The label files are in ASCII and have the same prefix as the corresponding signal files. The extension is .S1.

Syntax:


<file> = <Name of segment file> CR
         <Orthography> CR
         oend CR
         <Canonical form> CR
         kend CR
         <Transcription> CR
         hend CR
         <list of phoneme segments> 

<list of phoneme segments> = <begin sample> <marker> CR
                                   ...

<begin sample> = number of first sample 

 = '#c:' (beginning of first word)  OR
           '#p:' (pause) OR
           '#v:' (mis-pronunciation) OR
            OR
            OR
            OR
            

 = $ (ordinary segment)

 = ##

 = $#

 = any string of <extended 
German SAM-PA symbols> 

 = '#.' OR '#,' OR '#?' OR '#!'

<Name of segment file> = any valid filename

<Orthography> =
The orthographic string contains the standard orthography or a
transliteration with additional markers of the spoken utterance.
German Umlauts are represented either by LaTeX
convention or by 7 bit ASCII signs or by German Character set
coding used by DEC and Sun:

Umlaut  LaTeX   7 Bit ASCII (dec)       German Char Set (hex)
Ae      "A      [ (91)                  C4
Ue      "U      ] (93)                  CD
Oe      "O      \ (92)                  D6
ae      "a      { (123)                 E4
ue      "u      } (125)                 FC
oe      "o      | (124)                 F6
ss      "s      ~ (126)                 DF

<Canonical form> =
The canonicalal string contains the exspected citation forms of the
word in the utterance. Note that this is NOT a transcription of the
signal. Symbols used are the German subcorpus of the 
SAM-PA, with
following changes to SAM-PA:

Q       Glottal Stop
q       Glottalisierung (not in canonicalal forms!)
'       main stress
"       secondary stress
#       compositum marker (optional)
+       function word marker (suffix, optional)

Words are seperated by two blanks, phonemic labels are seperated by
one blank. 

<Extended German SAM-PA symbols> =
See the here for a complete table of extended SAM-PA symbols.
Aside of the defined German SAM-PA symbols we use the following
additional symbols:
~               : nasalation, e.g. E~
Q               : glottal stop (instead of ? in SAM-PA) 
'               : canonicalal main stress (of word)
"               : canonicalal secondary stress (of word)
q               : glottalization
%               : uncertain boundary, e.g. $%a:
-               : change to canonicalal form:
                  replacement:  a:-A
                  elision:      a:-
                  insertion:    -A
=               : realization of two syllables in a diphtong, e.g. E:=6
+               : function word (placed after last segment)

Remarks:

Segment boundaries are in rising zero crossings only.
The vocalized /r/ is usually represented by a diphtong/triphtong (consisting of the preceeding vowel/diphtong and the vocalized /6/) rather then two segments. Examples: /d i:6/ (dir), /g e: h OY6/ (geheuer)

S2 Format

The S2 format contains an automatically generated phonological annotation of the signal.
The format is quite the same as PhonDat 1 with the following alterations:

The transcript is not given; only the list of segments.
The list of segments has a third column which contains the time-normalized log likelihood of the segment calculated during the Viterbi align.

BAS Partitur Format

General

Most formats of files with segmental information to speech signal have the disadvantage that

they are not easy to extend (without rewriting software that uses the older format).
they are not easy to process with UNIX standard tools.
they mix different despription levels (which leads to technical and conceptional problems)

Therefore a new open format based on the SAM Label Format was developed at BAS which eludes most of the mentioned problems. In this format all levels of description should be described independently but time aligned like the single parts of a score. Hence this format was called 'BAS Partitur Format' (German for 'score').

In the future all BAS corpora will be distributed with the new BAS Partitur Format, if they contain segmental information of any kind. The former used formats will be retained but not further updated.

A first draft to the BAS Partitur Format (1995) can be found here. An up-to-date publication of version 1.2 can be donwloaded here (1998).

The BAS Partitur Format has the following features:

SAM compatible structure and entries.
easy to extend by simple UNIX cat
open format, that is extensions to the format can be implemented without necessary alterations to the software reading the older format.
time-aligned independent description of as many different levels of the speech signal as necessary. For instance: orthography, canonicalal transcript, phonology, phonetics, prosody, dialog acts, syntax tagging, semantics, ...
Symbolic links between the independent levels allow logical assignments aside to the physical time scale. These links are based on the word units of the utterance.

History

1.0   : 01.09.95 Preliminary Definition of the BAS Partitur Format
        PLEASE DO NOT USE THIS VERSION ANY MORE!
1.1   : 01.06.96 First Definition structured in classes
1.2   : 28.08.96 Label ELF: removed from definition
        (tool par-1.1-to-1.2 transforms 1.1 files into 1.2 files)
1.2.1 : ??
1.2.2 : tier DAS added
1.2.3 : tier TR2, SUP added
1.2.4 : tier PRS added
1.2.5 : tier NOI added
1.2.6 : distinction between symbolic links to word groups (list of word
        numbers seperated by kommata) and symbolic links to events between 
        words (eg. noises, number pairs seperated by semi-colon)
        changed class definition of class 1, 4 and 5 accordingly
        changed tier defintion NOI
1.2.7 : 12.09.00 Tier LBP and LBG added
1.2.8 : 11.05.01 Tier PRO,POS,LMA,SYN,FUN,LEX added
1.2.9 : 07.08.01 Tier IPA added
1.2.10 : 29.08.01 Tier TRN added
1.2.11 : 28.11.01 Tier TRS added
1.2.12 : 20.07.02 : Tiers GES,USH,USM,OCC,USP added
1.2.13 : 22.10.02 : Tier GES: definition of gestures extended
                    Tier TLN added

Definition of Structure 1.2

A Partitur file has the same prefix like the corresponding signal file (8 Bytes for Iso 9660 compatibility) but the extension .par.

The contents is in 7-bit ASCII exclusively (to garanty portability to all platforms). Each line starts with a three-byte label followed by a colon, which defines synopsis and semantics of the following line. The following units of the line are seperated by 'white spaces' (blank, tab).

The Partitur file is structured into a header and a body (like SAM description files are). The header stretches from the beginning of the file to the label LBD:; the body from the label LBD: to the end of file where the last line has to be closed by a 'new line' (the final SAM label ELF:was omitted for the BAS Partitur Format since it prevents effective processing of the Partitur files).

The header contains SAM-compatible lines of general information. The following entries are compulsary:

LHD: Partitur file version
REP: Place of recording
SNB: Number of Bytes per Sample
SAM: Sampling Frequency in Hz
SBF: Byteorder (Intel 01, Motorola 10)
SSB: Bit Resolution
NCH: Number of Channels
SPN: Speaker ID
LBD:

Example:

LHD: Partitur 1.2
REP: Muenchen
SNB: 2
SAM: 16000
SBF: 01
SSB: 16
NCH: 1
SPN: PS1
LBD:

The following entries are optional; aside from these other entries are tolerated as long as they do not conflict with compulsary and optional entries:

FIL: SAM File Type
TYP: Typ of SAM Label File
DBN: Corpus Name
VOL: Number of Volume
DIR: Directory in Volume
SRC: Name of speech file
BEG: Beginning of labeling sequence
END: End of labeling sequence
RED: Date of Recording
RET: Duration
RCC: Recording Conditions
CMT: Comment
SPI: Speaker Information
PCF: Name of Protocol File
PCN: Protocol Number
EXP: Name of Segmenter
SYS: Labelingsystem
DAT: Date of Labeling
SPA: SAM-PA Version

All header labels are SAM-compatible.

The body starts after the label LBD: and stretches to the the end of file. It contains the tiers of the Partitur. Each tier is identified by an unique label. The order of tiers as well as the order of lines within a tier is not significant.

There are 5 basic classes of tiers:

Tiers with symbolic relation
A line of this tier contains:
- the label
- a comma seperated list of integers that reference the item to one or more words or
  a pair of integers seperated by semi-colon refering to an event between those two words
- a string with the label information
The symbolic links (relations) refer to a reference tier which numbers the word units beginning with zero. (the choice of word units as symbolic relations is arbitrarily!).
The string itself may have an internal synopsis which is defined in the tier definition.
Example:
TRL: 6,7 mit'm
NOI: 4;5 #Klopfen
Tiers with time relation, time consuming
A line of this tier contains:
- the label
- two integers denoting the beginning and duration of the event.
- a string containing the label information
The meaning of the integers is defined by the tier definition (possible are samples, millisecs, etc.)
The string itself may have an internal synopsis which is defined in the tier definition.
Example:
SAP: 13456 345 aU
Tiers with time relation, not time consuming
A line of this tier contains:
- the label
- one integer denoting the time position of the event.
- a string containing the label information
The meaning of the integer is defined by the tier definition (possible are samples, millisecs, etc.)
The string itself may have an internal synopsis which is defined in the tier definition.
Example:
PRB: 13456 TON: P*; FUN: PA
Tiers with time and symbolic relation, time consuming
A line of this tier contains:
- the label
- two integers denoting the beginning and duration of the event.
- a comma seperated list of integers that reference the item to one or more words or
  a pair of integers seperated by semi-colon refering to an event between those two words
- a string containing the label information
The meaning of the integers is defined by the tier definition (possible are samples, millisecs, etc.)
The symbolic links (relations) refer to a reference tier which numbers the word units beginning with zero. (the choice of word units as symbolic relations is arbitrarily!).
The string itself may have an internal synopsis which is defined in the tier definition.
Example:
SAP: 13456 345 9 aU
Tiers with time and symbolic relation, not time consuming
A line of this tier contains:
- the label
- one integer denoting the time position of the event.
- a comma seperated list of integers that reference the item to one or more words or
  a pair of integers seperated by semi-colon refering to an event between those two words
- a string containing the label information
The meaning of the integer is defined by the tier definition (possible are samples, millisecs, etc.)
The symbolic links (relations) refer to a reference tier which numbers the word units beginning with zero. (the choice of word units as symbolic relations is arbitrarily!).
The string itself may have an internal synopsis which is defined in the tier definition.
Example:
PRB: 13456 13 TON: P*; FUN: PA

Remarks:

If the symbolic relation in a tier is not (or not yet) known, the symbolic marker is set to -1.
The same symbolic relation may occur in different lines of a tier (for example if a non-articulatory noise occurs before the first utteramce).

Definition of Tiers (version 1.2.2)

'Vorschlagstranskription' (canonical pronunciation) KAN: class 1
Synopsis:
KAN: (symbolic link) (transcript)
This tier contains a list of the spoken words within the utterance annotated in extended German SAM-PA ('canonical pronunciation').
The segmentation of the whole utterance is done in word units, where everything counts as a word that is produced by the articulatory organs of the speaker and can be seen as 'speech'. Following this definition hesitations are words, where laughing, coughs, etc. are not. This speration isn't always clear, but on the other hand the selection of word units is abitrarily as well. The main point is a unique reference tier for symbolic relations in other tiers. Another problems is the reduction of words that are annotated in the orthographic form, eg. mit'm. In these cases the reduction is restituted (in this example /mIt de:m/). The reason for this lies in the fact that some of these reductions should be automatically accessible.
Example:

KAN: 0 j'a: KAN: 1 Qalzo:+ KAN: 2 QE:m KAN: 3 h'OYt@ KAN: 4 Qo:d6+ KAN: 5 m'O6g@n
Orthography ORT: class 1
Synopsis:
ORT: (symbolic link) (orthography)

The tier 'Orthography' contains the orthographic (lexical) forms corresponding to the units in the tier 'Vorschlagstranskription' (see above).
Words are not capitalized at the beginning of an utterance or sentence within an utterance (except nouns of course). German 'Umlauts' and other letter not conform with 7 Bit ASCII are written as to be used for the lexical access. Therefore the coding might differ in different speech corpora, e.g. ISO-8859 or LaTeX coding.
This tier is used for an easy lexicon reference; therefore no additional markers except lexical words are allowed. There is no punctuation in this tier. Lexical words include items that are contained in the KAN tier (eg. hesitations, word breaks).
Example:
```
ORT: 0  ja
ORT: 1  also
ORT: 2  <"ahm>
ORT: 3  heute
ORT: 4  oder
ORT: 5  morgen
```
Verbmobil Transliteration TRL: class 1
Synopsis:
TRL: (list of symbolic links) (transliteration) class 1

The tier 'Verbmobil Transliteration' contains the transliteration of the utterance according to the VM conventions 3.0. The transliteration is segmented into the units of the KAN tier (see above). Therefore multiple references may occur (eg. if a reduced form of two words is written as one unit in the transliteration). Each segment covers the scope from the begin of the referenced unit(s) to the begin of the next referenced unit(s). By doing this it may happen that the first line of this tier contains no referenced unit. In this case the line is aligned to the first unit.
A detailed description of the Verbmobil I transliteration format can be found here (German only!).
Example:
```
TRL: 0  <Schmatzen>
TRL: 0  ja ,
TRL: 1  also
TRL: 2  <"ahm>
TRL: 3  heute
TRL: 4  oder
TRL: 5  morgen .
```
Verbmobil Transliteration II TR2: class 1
Synopsis:
TR2: (list of symbolic links) (transliteration) class 1

The tier 'Verbmobil II Transliteration' contains the transliteration of the utterance according to the Verbmobil II conventions. A new improved format was necessary because the VM I format was not parsable. For more information about the VM II format see here.
Our partner at CMU kindly provided an English translation also.
The transliteration is segmented into the units of the KAN tier (see above) by starting a new line after each unit. Exceptions are punctuations and pronunciation comments that are kept together with the last unit (this is just for a better readability).
Example:
```
    TR2: 25 ~Weihnachten
    TR2: 26 ist
    TR2: 27 das
    TR2: 28 sowieso
    TR2: 29 immer
    TR2: 30 etwas
    TR2: 31 schwierig ,
    TR2: 32 und
    TR2: 33 <"ahm>
    TR2: 34 in
    TR2: 35 der
    TR2: 36 #zweiten
    TR2: 37 Dezemberwoche
    TR2: 38 bin
    TR2: 39 ich
    TR2: 40 in
    TR2: 41 ~M"unchen
    TR2: 42 auf
    TR2: 43 dem 
    TR2: 44 Kongre"s .
    TR2: 45 also
    TR2: 46 bliebe
    TR2: 47 noch
```
Superimposed Speech SUP: class 1
Synopsis:
SUP: (list of symbolic links) (utterance-id) (transliteration) class 1

In multi-party recording as in the Verbmobil II project it may happen that the speech of the currently recorded speaker is actively super-imposed by another dialog partner (cross talk). To denote this the tier SUP was added to the format. It will give the transliteration of the 'foreign' speaker together with the symbolic markers to which parts of speech of the recorded speaker these superimposed events are asigned to. The item 'utterance-id' gives the name of the correspondig Bas Partitur file containing the superimposing part of speech. The tier SUP is currently only used in combination with the tier TR2. For a detailed discussion of superimposed speech in the Verbmobil II project please click here.
Example:
```
TR2: 0 ich
TR2: 1 w"urde
TR2: 2 vorschlagen ,
TR2: 3 da"s
TR2: 4 wir9@
TR2: 5 dann9@
TR2: 6 <:<#> hinfliegen:> ,
TR2: 7 <:<#> ich:>
TR2: 8 hab'
TR2: 9 jetzt 
TR2: 10 aber
TR2: 11 <:<#Rascheln> grade:>
TR2: 12 <:<#Rascheln> keine:>
TR2: 13 Unterlagen
TR2: 14 da . <#>
SUP: 4,5 g002acn2_028_AAK.par	@9ja
```
In this example the speaker is superimposed during the words 4 and 5 by the single word 'ja' of another speaker. The latter occurs in the BAS Partitur file 'g002acn2_028_AAK.par'.

Phonetic Segmentation PhonDat PHO: class 4

Synopsis:

PHO: (begin) (duration) (list of symbolic links) (label string)

This tier contains a totally time-consuming segmentation into phonemic units (extended German SAM-PA , broad phonetic transcript). The first number denotes the beginning of the segment in samples counted from the beginning of the speech file; the second number the duration of the segment in samples.
The conventions of labeling and segmentation is briefly described here.

Synopsis of label string

<label string> = '#c:' (beginning of first word)  OR
           '#p:' (pause) OR
           '#v:' (mis-pronunciation) OR
           <segment> OR
           <word boundary segment> OR
           <compound boundary segment> OR
           <punctuation>

<segment> = $<sampa string> (ordinary segment)

<word boundary segment> = ##<sampa string>

<compound boundary segment> = $#<sampa string>

<sampa string> = any string of <extended German SAM-PA symbols>

<punctuation> = '#.' OR '#,' OR '#?' OR '#!'

A definition of extended German SAM-PA can be found here.

Example:

PHO: 2473	0	0	#c:
PHO: 2473	1100	0	##d
PHO: 3573	0	0	$a-@
PHO: 4126	2007	0	$s
PHO: 6133	0	0	$-+
PHO: 6133	1130	1	##g
PHO: 7263	1206	1	$e:
PHO: 8496	937	1	$t
PHO: 9433	0	2	##Q-
PHO: 9433	0	2	$-q
PHO: 9433	2698	2	$aU
PHO: 12131	1178	2	$x
PHO: 13309	0	2	$-+
PHO: 13309	962	3	##n
PHO: 14271	1675	3	$I
PHO: 15946	4308	3	$C
PHO: 18579	0	3	$t-
PHO: 18579	0	3	$-+
PHO: 18579	5467	3	#p:

Phonetic Segmentation SAM-PA SAP: class 4
Synopsis:
SAP: (begin) (duration) (list of symbolic links) (label string)

This tier contains a segmentation into phonemic units (extended German SAM-PA , broad phonetic transcript). In contrast to the PHO tier (see above) this segmentation is not stringent time consuming. That is, there might be pauses in the signal that are not labeled (which happens frequently in spontaneous speech). The first number denotes the beginning of the segment in samples counted from the beginning of the speech file; the second number the duration of the segment in samples.
The conventions of labeling and segmentation is briefly described here.
Example:
```
SAP:	549	867	0	Q%<
SAP:	1416	1242	0	aU
SAP:	2658	1136	0	f
SAP:	3794	408	1	v
SAP:	4202	852	1	i:
SAP:	5054	433	1	d
SAP:	5487	1686	1	6%>
SAP:	7173	828	1	h%<%>
SAP:	8001	864	1	2:-9%<%>
SAP:	8865	1015	1	r-6%<
SAP:	9880	0	1	@-
SAP:	9880	1732	1	n
```
Automatic Phonetic Segmentation by MAUS MAU: Class 4
Definition:
MAU: (begin) (duration) (list of symbolic links) (label string)

This tier contains an automatically generated phonetic-phonologic segmentation in units of SAM-PA. Some of these tiers are produced in close cooperation with Technical University of Munich (Dr. G. Ruske).
A detailed description of the MAUS system can be found here.
The first number is the start of the segment counted in samples from the beginning of the file; the second number is the length of the segment in samples.
The segmentation is justified and has no relation to the tier 'Vorschlagstranskription' as done in the tier SAP. (however, there are symbolic links to the words).
The units are extented German SAM-PA. Additional labels are <nib> (non-speech event) and <p:> (pause). These labels always get the symbolic link -1 (no link).
Furthermore, events that clearly stem from the speaker, but cannot be classified (e.g. non-understandable words) are labelled and segmented as <usb>. The latter receive a symbolic link as other word events.
Example:
```
MAU: 0 676 -1 <p:>
MAU: 677 7861 -1 <nib>
MAU: 8539 450 0 g
MAU: 8990 2436 0 u:
MAU: 11427 1740 0 t
MAU: 13168 958 1 d
MAU: 14127 1298 1 a
MAU: 15426 3820 1 n
MAU: 19247 303 2 n
MAU: 19551 1785 2 e:
MAU: 21337 624 2 m
MAU: 21962 636 2 n
MAU: 22599 501 3 v
```
Word Segmentation WOR: Class 4
Definition:
WOR: (begin) (duration) (list of symbolic links) (label string)

This tier contains a segmentation of the utterance in word or word equivalents. The segmentation need not to be justified. The 'label string' may contain othographic or pronunciation information (eg. in SAM-PA). A '-' at the end of 'label string' denotes a missing word in reference of the tier KAN. A '-' a last character in 'label string' denotes an inserted word.
The symbolic links give the relation to the KAN tier. Note that inserted words have a symbolic link to the previous word in the KAN tier.
Dialog Act Segmentation DAS: Class 1
Definition:
DAS: (list of symbolic links) (marker string)

This tier contains a segmentation in dialog acts according to the ongoing work of the 'Deutsches Forschungszentrums für künstliche Intelligenz', Saarbrücken, Germany (DFKI). Each marker covers a portion of the speech signal that is denoted by the symbolic links to the reference tier KAN.
Example:
```
DAS: 0,1,2,3,4,5 @(SUGGEST_SUPPORT_DATE BA)
DAS: 6,7,8,9 @(DELIBERATE_EXPLICITE BA)
DAS: 10,11,12,13,14,15,16,17,18,19,20 @(SUGGEST_SUPPORT_DATE BA)
```
In this example the marker SUGGEST_SUPPORT_DATE covers the words 0 to 5 in the reference tier. The term 'BA' denotes a dialog act from speaker 'B' to speaker 'A', where speaker 'A' is always the speaker that initiates the dialog.
A more detailed description of the markers and the principles of segmentation can be found here.
Prosodic Segmentation PRB: Class 5
Definition:
PRB: (sample) (list of symbolic links) (marker string)

This tier contains the prosodic segmentation (by hand) according to GTobi done by the Technical University of Braunschweig.
The first number gives the time of the prosodic event measured in samples from beginning of the file.
The symbolic links give the relation to the KAN tier.
The label string describes the prosodic event itself. A concise description of the labeling convention (GTobi) can be found here (Sorry: only in German).
Example:
```
PRB:    54212    5   TON: H*; FUN: NA
PRB:    63269    7   TON: L+H*; FUN: EK
PRB:    76371    8   BRE: B3; TON: L-L%
PRB:    79967    8   TON: L*+H; FUN: PA
```
Sybolic prosodic segmentation PRS: Class 1
Definition:
PRS: (list of symbolic links) (marker string)

This tier contains a symbolic prosodic segmentation and labeling (by hand) into 3 boundary markers and 3 accent markers (close to GTobi).
The symbolic links give the relation to the word event order.
The label string describes the prosodic event itself. Boundary markers (B3, B2, B9) are linked to two words acting as left and right neighbors of the boundary. Accent markers (PA, NA, EK) refer to the word where the accent was labeled. No syllable information is provided.
Definition of Marker Strings:
B3 : full intonational boundary with strong intonational marking, often with pauses or lengthening or change of speed
B2 : intermediate phrase boundary with weak marking, weaker intonational marking than B3
B9 : 'agrammatical' boundary, e.g. hesitations, repairs, unintended pauses
PA : main accents (phrase accent) carried by one word; in rare cases there can be two or more words marked together
NA : secondary accent for accentuated words without PA
EK : emphatic or contrastive accents

Example:
```
PRS:    0       EK
PRS:    4;5     B2
PRS:    7       NA
PRS:    9       NA
PRS:    11      NA
PRS:    11;12   B3
PRS:    13      EK
PRS:    14      EK
PRS:    15      PA
PRS:    17      NA
PRS:    17;18   B2
PRS:    18      NA
PRS:    19;20   B3
PRS:    23      EK
PRS:    23;24   B3
PRS:    25      EK
PRS:    27      PA
```

Noise Labeling NOI: Class 1

Definition:

NOI: (single or pair of symbolic links) (marker string)

This tier contains a noise labelung in reference to the word chain defined. Two different types of noises are possible: simple noise occuring between two words are denoted with a semi-colon seperated pair of symbolic links to these wrds (e.g. '5;6'); noise that superimpose a single word is marked with a single symbolic link denoting the superimposed word (e.g. '5').
The marker string contains a blank seperated list of noise labels. The labels are drawn from the VMII TRL transliteration format:

<A> <B>                 : Breathing
<P>                           : distinct silence within an utterance
<%>                           : not understandable muttering
Schmatzen> <Smack>         : lip smack
<Schlucken> <Swallow>   : swallow
<R"auspern> <Throat>    : throat clear
<Husten> <Cough>        : cough
<Lachen> <Laugh>        : laugh
<Ger"ausch> <Noise>     : other articulatory noise
<#Klopfen> <#Knock>     : knock
<#Rascheln> <#Rustle>   : rustle
<#Quietschen> <#Squeak> : creak
<#Klicken> <#Click>     : click noise
<#Mikrowind>                  : blowing into microphone 
<#Mikrobe>                    : noise caused by touching, knocking,
                                      rubbing against the microphone
<#>                           : other technical noise

For example:

NOI:    5       <Lachen>          # word 5 is superimposed by a laugh
NOI:    5;6     <A>               # between word 5 and word 6 a distinct
                                  # breathing was recorded

Signal-based prosodic accent labeling LBP: Class 3
Definition:
LBP: (sample) (marker string)

This tier contains a manually labeled accent marker according to GTobi. There is no link to the word order. The labeling was done during the German Verbmobil 2 project by the Technical University of Braunschweig.
The following three accent classes were used:
```
PA    phrase accent
NA    secondary accent
EK    emphatic or contrastive accents
```
For example:
```
LBP: 1651 PA
```

Signal-based prosodic boundary labeling LBG: Class 3

Definition:

LBG: (sample) (marker string)

This tier contains a manually labeled accent marker according to GTobi. There is no link to the word order. The labeling was done during the German Verbmobil 2 project by the Technical University of Braunschweig.
The following 5 boundary classes were used:

B9    irregular boundary: 'agrammatical' boundary, e.g. hesitations,
      repairs, unintended pauses
B2    intermediate phrase boundary with weak marking, weaker intonational
      marking than B3
B3    intonational boundary with strong intonational marking, no
      question
B3QH  B3, sematically a question, with high tone
B3QL  B3, sematically a question, with low tone

For example:

LBG: 6586 B3

Syntactic-prosodic labeling PRO: Class 1
Definition:
PRO: (sybolic link) (marker string)

This tier contains a manually labeled prosodic accent and boundary annotation based on the linguistic information (the chain of words). Consequently it only contains links to the spoken words of the utterance but not to the signal itself. The labeling was done during the German Verbmobil 2 project by the Technical University of Erlangen in cooperation with the University of Munich.
A detailed description of the labeling system as well as the used categories can be found here (definition of labels can be found in table 12 on pp. 15-15 of the document). For example:
```
PRO: 6;7        SS2
PRO: 13;14      AC1
PRO: 14;15      AC1
PRO: 15;16      AC1
PRO: 18;19      SC3
PRO: 24;25      IRB
PRO: 25;26      AC1
PRO: 26;27      AC1
PRO: 27;28      AC1
PRO: 28;29      IWE
PRO: 28;29      IZB
PRO: 31         SM3
```

Syntactic trees SYN: FUN: LEX: Class 1

Definition:

SYN: (sybolic link) (marker string)

FUN: (sybolic link) (marker string)

LEX: (sybolic link) (marker string)

This tier contains a computer-readable representation of a syntactic tree of the utterance. The tiers SYN, FUN and LEX are describing different aspects of this tree, such as syntactic node, function and word class (see below). They may also be exploited independently. The labeling was done during the German Verbmobil 2 project by the University of T�bingen.

An overview about the treebanks of Verbmobil II (6 pages) can be found here,

A detailed description of the labeling system as well as the used categories can be found here for German, English and Japanese .

Representation of Syntax Trees in the BAS Partitur Format (BPF)
===============================================================

In the BAS Partitur Format the syntax trees are represented in three
tiers. The terminal (lexical) categories are listed in the LEX
tier. Syntactical categoies of higher orders are listed in the SYN
tier. Grammatical functions refering to both LEX and SYN are
listed in the FUN tier. The LEX and the SYN entries refer to the nodes
and FUN represents the edges of the syntax tree.


Lexical Categories:
-------------------

Definition:

LEX: (symbolic link) (label string)

This tier represents the lexical categories of the words. The words
are represented by symbolic links. Hesitations, neologisms and
unintelligible parts of an utterance have not been annotated.

Example:

LEX:    0               0       PDS
LEX:    1               0       VMFIN
LEX:    2               0       CARD
LEX:    3               0       NN
LEX:    4               0       ADJD
LEX:    5               0       VVINF

The label string contains 

(1) a tag for the lexical category, e.g. CARD (cardinal number) for word 2.
(2) an index indicating whether the node is terminal or branching or
non-branching. The LEX tier represents only terminal nodes therefore
the index is always 0 (see SYN and FUN tier for further information 
to this index). 


LEX labels used in syntax trees of German dialogues:

UNKNOWN unknown tag 
--      
ADJA    attributive adjective
ADJD    adverbial or predicative adjective
ADV     adverb
APPR    preposition; circumposition left
APPRART preposition with article
APPO    postposition
APZR    circumposition right
ART     definite or indefinite article
CARD    cardinal number
FM      foreign language material
ITJ     interjection
KOUI    subordinating conjunction with "zu" and infinitive
KOUS    subordinating conjunction with sentence
KON     coordinative conjunction
KOKOM   particle of comparison, no clause
NN      noun
NE      proper noun
PDS     substituting demonstrative pronoun
PDAT    attributive demonstrative pronoun
PIS     substituting indefinit pronoun
PIAT    attributive indefinit pronoun without determiner
PIDAT   attributive indefinit pronoun with determiner
PPER    irreflexive personal pronoun
PPOSS   substituting possessive pronoun
PPOSAT  attributive possessive pronoun
PRELS   substituting relative pronoun
PRELAT  attributive relative pronoun
PRF     reflexive personal pronoun
PWS     substituting interrogative pronoun
PWAT    attributive interrogative pronoun
PWAV    adverbial interrogative oder relative pronoun
PAV     (replaced by PROP)
PTKZU   "zu" + infinitive
PTKNEG  negation particle
PTKVZ   seperated verb particle
PTKANT  answer particle
PTKA    particle with adjective or adverb
TRUNC   truncated word - first part
VVFIN   finite main verb
VVIMP   imperative, main verb
VVINF   infinitive, main verb
VVIZU   infinitive with "zu", main
VVPP    past participle, main
VAFIN   finite verb, aux
VAIMP   imperative, aux
VAINF   infinitive, aux
VAPP    past participle, aux
VMFIN   finite verb, modal
VMINF   infinitive, modal
VMPP    past participle, modal
XY      non-word containing special characters
$,      comma
$.      sentence-final punctuation
$(      sentence internal punctuation marks
PROP    NEW: pronominal adverb ("daf�r")
BS      letter (e. g. spelling)


LEX labels used in syntax trees of English dialogues:

UNKNOWN        unknown tag
--             
CC             Coordinating conjunction
CD             Cardinal number
DT             Determiner
EX             Existential there
FW             Foreign word
IN             Preposition or subordinating conjunction
JJ             Adjective
JJR            Adjective, comparative
JJS            Adjective, superlative
LS             List item marker
MD             Modal
NN             Noun, singular or mass
NNS            Noun, plural
NP             Proper noun, singular
NPS            Proper noun, plural
PDT            Predeterminer
POS            Possessive ending
PP             Personal pronoun
PP$            Possessive pronoun
RB             Adverb
RBR            Adverb, comparative
RBS            Adverb, superlative
RP             Particle
SYM            Symbol
TO             to
UH             Interjection
VB             Verb, base form
VBD            Verb, past tense
VBG            Verb, gerund or present participle
VBN            Verb, past participle
VBP            Verb, non-3rd person singular present
VBZ            Verb, 3rd person singular present
WDT            Wh-determiner
WP             Wh-pronoun
WP$            Possessive wh-pronoun
WRB            Wh-adverb
,              Comma
.              Sentence-final punctuation
____________________________________________________________________________


Syntactical Categories:
-----------------------

Definition:

SYN: (list of symbolic links) (label string)

This tier contains syntactical categories of constituents of phrases,
topological fields and clauses. Analogous to the LEX tier hesitations,
neologisms and unintelligible parts of an utterance have not been
annotated. Therefore it is possible that some turns have a LEX and a FUN tier,
but do not have a SYN tier.

Example:

SYN:    0               1       NX
SYN:    0               2       VF
SYN:    0,1,2,3,4,5     0       SIMPX
SYN:    1               1       VXFIN
SYN:    1               2       LK
SYN:    2               1       ADJX
SYN:    2,3             0       NX
SYN:    2,3,4           0       MF
SYN:    4               1       ADJX
SYN:    5               1       VXINF
SYN:    5               2       VC

Each label string contains two kinds of information:

(1) The syntactical category of a constituent that spans over the
words represented by the list of symbolic links. Thus the words 2 and
3 belong to the nominal phrase NX, which again is part of the middle
field MF that is finally part of the Simplex clause SIMPX.

(2) An index indicating whether the node is terminal or branching or
non-branching. Branching nodes as well as the terminal nodes of the
LEX tier get the index 0. For non-branching nodes the numbers are
incremented by 1 for each level. Thus the position of a node in the
syntax tree is unambiguously defined. 


SYN:          _____________________SIMPX_____________
             /        /              |               \ 
SYN:        /        /            __MF(0)__           \
           /        /            /         \           \
SYN:     VF(2)    LK(2)        NX(0)        \         VC(2)    
          |        |         /      \        |          |
SYN:     NX(1)  VXFIN(1)  ADJX(1)    |     ADJX(1)   VXINF(1)
          |        |        |        |       |          |
LEX:    PDS(0)  VMFIN(0)  CARD(0)  NN(0)   ADJD(0)   VVINF(0)

symbolic   0        1        2        3       4          5  
links

For word 2 in the LEX tier the index is 0, because CARD is a terminal
node. For the node VXFIN the index is incremented by 1. As the node
CARD it only refers to word 2 and is therefore non-branching. For node
LK the index is incremented by 1 for the same reason. The index of
node SIMPX is 0, because it has a branch to LK but also several
branches to other nodes. The the edge label positions in the syntax
tree of the FUN tier can be obtained in a similar way.


SYN labels used in syntax trees of German dialogues:

--       (must always be "--")
NX      noun chunk
PX      prepositional phrase
SIMPX   simplex clause
VXFIN   finite verb phrase
MF      middle field (Mittelfeld)
VC      verb complex (Verbkomplex)
NF      final field (Nachfeld)
LK      left sentence bracket (Linke Satzklammer)
VF      initial field (Vorfeld)
ADVX    adverbial chunk
ADJX    adjectival chunk
P-SIMPX paratactic construction of simplex clauses (Parataktische Verknuepfung zweier SIMPX)
R-SIMPX Relativsatz
VXINF   infinite verb phrase
DM      discourse marker
MVC     conjunct consisting of MF and VC (Konjunkt, bestehend aus MF und VC)
PARORD  field of non-coordinative particles (Feld f. nicht-koord. beiordnende Partikeln) (V2)
C       complementizer field (Feld f. Komplementierer bei Verb-letzt-Saetzen)
KOORD   field of coordinative particles (Feld f. koordinierende Partikeln (und, oder, aber usw.))
LV      topological field for resumptive construction (topologisches Feld fuer Linksversetzungen)
LKMVC   conjunct consisting of LK, MF and VC (Konjunkt, bestehend aus LK, MF, VC)
LKM     conjunct consisting of LK, MF (Konjunkt, bestehend aus LK, MF)
MVCN    conjunct consisting of MF, VC and NF (Konjunkt, bestehend aus MF, VC, NF)
MN      conjunct consisting of MF and NF (Konjunkt, bestehend aus MF, NF)
DP      determiner phrase (e.g. "gar keine")
KONX    complex of conjuncts (Konjunktionskomplex ("und zwar" in VF))
VLKM    conjunct consisting of VF, LK and MF (Konjunkt, bestehend aus VF, LK, MF)
VLKMVC  conjunct consisting of VF, LK, MF and VC (Konjunkt, bestehend aus VF, LK, MF, VC)
LKMVCN  conjunct consisting of LK, MF, VC and NF (Konjunkt, bestehend aus LK, MF, VC, NF)
LKMN    conjunct consisting of LK, MF and NF (Konjunkt, bestehend aus LK, MF, NF)
LKVCN   Konjunkt, bestehend aus LK, VC, N
VCN     Konjunkt, bestehend aus VC und N
FKOORD  coordination consisting of conjuncts of fields (komplexe Felderkoordination)
LKN     Konjunkt, bestehend aus LK und N
CMVCN   Konjunkt, bestehend aus C, MF, VC und NF


SYN labels used in syntax trees of English dialogues:

--       (must always be "--")
AP      Adjective Phrase
APS     Adj-headed sm.clause
ADVP    Adverb Phrase
ADVPD   Adverb DATE-Phrase
CMP     Complementizer
CMP-WH  Complementizer,WH-
CNJ     Conjunction(single)
CNJ1    Conjunction(1 of 2)
CNJ2    Conjunction(2 of 2)
DG      Degree(non-wh)
DG-WH   Degree-WH(how...)
DGP     Degree Phrase
DT-ART  Det,Article(the,a)
DT-DM   Det,Demonstrative
DT-QNT  Det,Quantifier(every)
DT-R    Det,Rel.clause
DT-WH   Det,Wh-(which,whose)
DTP     Det.Phrase
N       Noun,Common
-        do not use this
CNUM    N,Cardinal Number
ONUM    N,Ordinal Number
NP      Noun Phrase
NPS     Noun-headed sm.clause
NPD     Noun DATE-phrase
NPT     Noun TIME-phrase
PR-DM   PR,Demonstrative
PR-WH   PR,WH-
PR-R    PR,Relative
PP      Prepositional Phrase
PPS     Prep-headed sm.clause
SUGG    Suggestion("How about Tuesday?")
S       Sentence(VP w/subject)
V-G     Verb,gerund
V-PRP   Verb,present participle
V-PSS   Verb,passive participle
VP      Verb Phrase(S if sub Vs sister)


Grammatical Functions:
----------------------

Definition:

FUN: (list of symbolic links) (label string)

The FUN tier contains the grammatical functions that refer to the
syntactical and lexical categories listed in the SYN and LEX tier.


Example:

FUN:    0               0       HD
FUN:    0               1       ON
FUN:    0               2       -
FUN:    0,1,2,3,4,5     0       --
FUN:    1               0       HD
FUN:    1               1       HD
FUN:    1               2       -
FUN:    2               0       HD
FUN:    2               1       -
FUN:    2,3             0       V-MOD
FUN:    2,3,4           0       -
FUN:    3               0       HD
FUN:    4               0       HD
FUN:    4               1       MOD
FUN:    5               0       HD
FUN:    5               1       OV
FUN:    5               2       --

Label string contains the grammatical functions of the word or the
constituent in the syntax tree (see LEX and SYN tier) that has the
same list of symbolic links and the same index. Word 3 as part of the
constituent NX (see SYN tier) has the function HD (head) and NX has
the Function of V-MOD (Modifier of a Verb). 


FUN labels used in syntax trees of German dialogues:

--	 Not bound
HD       Head
ON       Nominative object(=subject)
-        Shalt not be bound
OD       Dative object
MOD      Ambiguous modifier
ON-MOD   Modifier of subjects
OA-MOD   Modifier of accusative objects
OD-MOD   Modifier of dative objects
OPP      Prepositional object (obligatorisches PP-Objekt)
OV       Verbal objekt
ONK      Nominativ-Objekt-Konjunkt
OAK      Akkusativ-Objekt-Konjunkt
VPT      Seperable verb prefix
MOD-MOD  Modifier of other Modifier
APP      Apposition
-        Not bound
PRED     Predicate
OA       Accusative object
V-MOD    Modifier of a Verb
V-MODK   Konjunkt des Verb-Modifikators
OPP-MOD  Not bound
PRED-MOD Mod. of predicate
FOPP     Optional prepositional object
OS       Sentential object
OADVP    Adverbial object
FOPP-MOD Modifier of optional prepositional object
OADJP    ADJP object
OADVPMOD Modifier of ADVP object
OADJPK   Konjunkt des ADJP-Objekt-Modifikators
FOPPK    fakul. PP-Objekt-Konjunkt
PREDK    Praedikativ-Konjunkt
MOD-MODK        Konjunkt des modif. Modifikators
MODK     nicht-eind. Modifikator-Konjunkt
OPP-MODK        Konjunkt d. obl. PP-Objekts
PREDMODK        Konjunkt d. Praedikativs
OPPK    obligatorisches PP-Objekt-Konjunkt
OADVPK  Konjunkt des ADVP-Obj.-Modif.


FUN labels used in syntax trees of English dialogues:

--      Not bound
HD      Head
COMP    Complement
SPR     Specifier
SBJ     Subject
SBQ     Subject,WH-
SBR     Subject,REL
ADJ     Adjunct
ADJ?    Adjunct?
FLL     Filler
FLQ     Filler,WH-
FLR     Filler,REL
MRK     Marker
-       For intentionally empty edge labels


The annotations were prepared in the NeGra format by the University of
T�bingen and have afterwards been converted into th partitur
format. It is possible that this process caused little changes.
To view the syntax trees the partitur files have to be converted into
the NeGra format by the perl program "bas2negra.pl" (included in the
standard BAS software package on each BAS CDROM). 
The Java program TIGERSearch that has been developed by Wolfgang
Lezius during the TIGER project at the IMS Stuttgart can be used to
search and visualize the trees. From autumn 2001 on TIGERSearch can be
downloaded from the following Website:

http://www.ims.uni-stuttgart.de/projekte/TIGER/

For example:

SYN:    0       1       DM
SYN:    1       1       NX
SYN:    1       2       VF
SYN:    1,2,3,4,5       0       SIMPX
SYN:    2       1       VXFIN
SYN:    2       2       LK
SYN:    3       1       ADVX
SYN:    3,4,5   0       MF
SYN:    4       1       NX
SYN:    5       1       ADVX
SYN:    7       1       VXFIN
SYN:    7       2       LK
SYN:    7,8,9,10,11     0       SIMPX
SYN:    8       1       NX
SYN:    8,9,10,11       0       MF
SYN:    9,10,11 0       NX
SYN:    10      1       NX
SYN:    10,11   0       NX
SYN:    11      1       NX
FUN:    0       0       -
FUN:    0       1       --
FUN:    1       0       HD
FUN:    1       1       ON
FUN:    1       2       -
FUN:    1,2,3,4,5       0       --
FUN:    2       0       HD
FUN:    2       1       HD
FUN:    2       2       -
FUN:    3       0       HD
FUN:    3       1       MOD
FUN:    3,4,5   0       -
FUN:    4       0       HD
FUN:    4       1       OA
FUN:    5       0       HD
FUN:    5       1       V-MOD
FUN:    7       0       HD
FUN:    7       1       HD
FUN:    7       2       -
FUN:    7,8,9,10,11     0       --
FUN:    8       0       HD
FUN:    8       1       ON
FUN:    8,9,10,11       0       -
LEX:    0       0       PTKANT
LEX:    1       0       PPER
LEX:    2       0       VAFIN
LEX:    3       0       ADV
LEX:    4       0       NN
LEX:    5       0       ADV
LEX:    7       0       VVFIN
LEX:    8       0       PPER
LEX:    9       0       ART
LEX:    10      0       NN
LEX:    11      0       NE

Parts of Speech POS: Class 1
Definition:
POS: (sybolic link) (marker string)

This tier contains an automatically generated lexical tagging of all words of the utterance. The class systemis based on the STTS (Stuttgart-T�bingen-TagSet) like the LEX tier (but the LEX tier was annotated manually!). The labeling was done during the German Verbmobil 2 project by the Technical University of Stuttgart.
A detailed description of the labeling system as well as the used categories can be found here for German (pp. 17 - 19) and English (pp. 48 - 49). Furthermore, some examples for each German category can be found here (only in German).
For example:
```
POS:    0       ITJ
POS:    1       PPER
POS:    2       VAFIN
POS:    3       ADV
POS:    4       NN
POS:    5       ADV
POS:    7       VVFIN
POS:    8       PPER
POS:    9       ART
POS:    10      NN
POS:    11      NE
```
Lemmata LMA: Class 1
Definition:
LMA: (sybolic link) (marker string)

This tier contains automatically derived lemmas for each word in the BPF. The labeling was done during the German Verbmobil 2 project by the Technical University of Stuttgart.
For example:
```
LMA:    0       nein
LMA:    1       pper
LMA:    2       haben
LMA:    3       hier
LMA:    4       Unterlage
LMA:    5       da
LMA:    7       kennen
LMA:    8       pper
LMA:    9       d
LMA:    10      Hotel
LMA:    11      Maritim
```
Please note that all personal pronomina were annotated with 'pper' and all articles were annotated with 'd'.
Phonetic Segmentation IPA IPA: Class 2
Definition:
IPA: (begin) (duration) (label string)

This tier contains a phonetical segmentation and labeling according to IPA.
The first number denotes the beginning of a segment counted in samples from the beginning of the file, the second number denotes the duration of the segment in samples. The remainder of the line must contain a list of comma-separated IPA numbers (at least one), optionally followed by a list of corresponding SAM-PA symbols.
IPA chart with IPA numbers
IPA chart with symbols
For example:
```
 IPA:    4856    1228    322     @
 IPA:    10629   564     317
 IPA:    11805   991     319     I
 IPA:    12797   1142    138     C
 IPA:    13940   1534    302     e
 IPA:    15475   895     110     g
 IPA:    16371   777     322     @
 IPA:    17149   758     155     l
 IPA:    17908   1497    305
 IPA:    19406   1204    116     n
 IPA:    20611   589     104     d
 IPA:    21201   1018    322     @
 IPA:    22220   1185    103     t
 
```
Segmentation in turns/sentences/chunks/etc TRN: Class 4
Definition:
TRN: (begin) (duration) (symbolic link) (label string)

This tier contains a segmentation of longer recordings into turns, sentences or similar longer events, that contain more than one word.
The first number denotes the beginning of a segment counted in samples from the beginning of the file, the second number denotes the duration of the segment in samples. The symbolic link contains a list of comma separated word numbers that are contained in the segment. The rest of the line may contain an optional label (e.g. a turn number).
For example:
```
TRN:    132736  144640  0,1,2,3,4,5,6,7 002
```

SmartKom Transliteration TRS class 1

Synopsis:

TRS: (list of symbolic links) (transliteration)

The tier 'Smartkom Transliteration' contains the transliteration of a whole Man Machine Dialogue recorded in the SmartKom data collection. For more background information about the SmartKom data collection see here.
A detailed description of the underlying transliteration format can be found here.
The transliteration is segmented into the units of the KAN tier (see above) by starting a new line after each unit. Exceptions are punctuations and pronunciation comments that are kept together with the last unit (this is just for a better readability).

Example:

TRS:    0       <:<#> ja:> [NA] [B2] ,
TRS:    1       ich
TRS:    2       h"atte
TRS:    3       <:<#> gern:> [NA]
TRS:    4       +/die/+ [B9] 
TRS:    5       die
TRS:    6       Sehensw"urdigkeiten [PA]
TRS:    7       von
TRS:    8       ~Heidelberg  [NA] [B3 fall] .
TRS:    9       gibt [NA]
TRS:    10      es
TRS:    11      hier
TRS:    12      vielleicht
TRS:    13      Cafeterias [PA] [B3 rise] ? <#>
TRS:    14      was
TRS:    15      f"ur
TRS:    16      Hotels [NA]
TRS:    17      gibt [PA]
TRS:    18      es [B3 cont] ?
TRS:    19      @1mhm [NA] [B3 cont] .
TRS:    20      kannst 
TRS:    21      was
TRS:    22      andres [PA]

SmartKom Gesture Labeling GES class 2
Synopsis:
GES: (begin) (duration) (label string)
This tier contains a manual segmentation and annotation of 2D gestures as recorded in the SmartKom data collection. All gestures that occur within the range of the SIVIT camera are labelled. Additionally, emotional gestures that occur elsewhere are labeled. For more background information about the SmartKom data collection see here.
The first number denotes the begin of a gestural event in samples from the beginning of the recording (in SmartKom: 16 kHz); the second number the duration in samples.
The 'label string' consists of 8 columns separated by TAB, optionally followed by a comment string:
- category of the gesture
- label of the gesture
- finger, hand or pen used for the gesture
- reference word
- reference zone
- reference object reached/not reached
- begin of the stroke in samples
- duration of the stroke in samples
- optional comment
For a detailed description of the labeling system see here; the following is a brief description of the 8 label categories (possible values of labels are quoted in ''):
- Category of the gesture: The following three broad intentional (functional) categories are used:
  - I-gesture 'I-Geste': a request, e.g. pointing, circling
  - U-gesture 'U-Geste': a supporting gesture that is not a request, it prepares a request, e.g. reading, searching
  - R-gesture 'R-Geste': an unidentifiable gesture or an emotional gesture
- Label of the gesture: Within each broad category the following labels are used to further specify the gesture:
  - I-gesture
    - long pointing with/without (+/-) touching the display 'I - deut +' 'I - deut -'
    - short pointing (19 frames or less) with/without touching the display 'I - tipp +' 'I - tipp -'
    - circling/marking with/without touching the display 'I - kreis +' 'I - kreis -'
    - complex gestures within/outside the display area 'I - frei +' 'I - frei -'
    - non-identifyable gestures 'nicht erkennbar'
  - U-gestures
    - reading/moving hand 'U - les - k'
    - searching 'U - such - k'
    - counting 'U - z�hl - k'
    - pondering/moving hand 'U - �berleg - k'
    - reading/hand not moving 'U - les - p'
    - pondering/hand not moving 'U - �berleg - p'
    - non-identifyable gestures 'nicht erkennbar'
  - R-gestures
    - emotional within/without display 'R -emot +' 'R - emot -'
    - unidentifiable gestures 'R - UFO'
- Finger, hand or pen used:
  This string is either '[FINGER] re|li [TOOL]' or 'nicht erkennbar', if the pointing method cannot be determined.
  - The (optional) string FINGER denotes one of the five finger types used by the person in the gesture:
    - 'Zeige' = index finger
    - 'Mittel' = middle finger
    - 'Ring' = ring finger
    - 'Kleiner' = small finger
    - 'Daumen' = thumb
    If more than one finger or a pen is used, the string [FINGER] is empty.
  - 're' denotes right (German 'rechts'); 'li' denotes left (German 'links')
  - The string [TOOL] is mandatory and is either set to 'Hand' if the gesture is done by the hand (and optional one finger) or set to 'Stift' (German for 'pen') if the gesture is performend using a pen. In the latter case the string [FINGER] is always empty.
  For instance 'Zeige re Hand' denotes the index finger of the right hand; 'li Hand' denotes the left hand (more than one finger used); 'li Stift' denotes a gesture performed with the left hand holding a pen.
- Reference word: The spoken word or phrase that is time-aligned to the gesture.
  The string may have one of the three forms:
  - 'Phrase' : the gesture takes place during the utterance of 'Phrase'. ('Phrase' may contain more than one word.)
  - '� Word' : The gesture is before the utterance of 'Word'.
  - 'Word �' : The gesture takes place after the utterance of 'Word'
  The reference word is only labelled in I-gestures.
- Reference zone: Part of the display where the gesture roughly takes place:
  - 'Mitte' = center of display
  - 'links unten' = down left corner
  - 'links oben' = Upper left corner
  - 'recht unten' = down right corner
  - 'recht oben' = upper right corner
  - 'gesamtes Display' : if the gestures is stretched across more than one of the above regions (large gestures).
  The reference zone is only labelled in I- and U-gestures.
- Reference object: Did the gesture reach the intended object:
  - 'Treffer' : the gesture did reach the object.
  - 'oberhalb' : the gesture did reach a region above the object.
  - 'unterhalb' : the gesture did reach a region below the object.
  - 'rechts' : the gesture did reach a region right of the object.
  - 'links' : the gesture did reach a region left of the object.
  - 'leer' : the gesture did reach a region where no object is located.
  The reference object is only labelled in I-gestures.
- Begin/Duration Stroke: Onset and duration in samples of the stroke (= most important part of the gesture is marked separately). The stroke is only segmented in I-gestures.
- Optional comment:
  This may be free text or - more often - one of the following remarks (codes):
  - 'Anfang schwer zu bestimmen' : begin of gesture is unsure
  - 'Ende schwer zu bestimmen' : end of gesture is unsure
  - 'Morphologie schwer bestimmbar, weil verdeckt' : form of gesture is difficult to classify because gesture was masked
  - 'Stroke schwer bestimmbar, weil verdeckt' : time of stroke is difficult to mark because gesture was masked
  - 'Stroke unklar' : cannot segment stroke
  - 'Doppelklick' : double pointing gesture ("double click")
  - 'Mehrfachklicks' : multiple rapid pointings in one gesture
  - 'Wiederholungsgeste' : repeated gesture
  - 'Geste durch Synthese-Ausgabe des Systems abgebrochen' : gesture interrupted by system voice output
  - 'Geste durch Display-Ausgabe des Systems abgebrochen' : gesture interrupted by system display output
  - 'Geste durch Versuchsende abgebrochen' : gesture interrupted because of end of recording
  - 'Stroke unklar, weil Sivit-Strom fehlt' : stroke cannot be segmented because no Sivit video available
  - 'Label unsicher, weil Audio fehlt' : label of gesture unsure because no audio available
  - 'Label unsicher, weil Beamer Output fehlt' : label of gesture unsure because no system output available
  - 'Kamera hat zu sp�t mit der Aufzeichnung begonnen' : gesture incomplete because video recording started in the first gesture
  - 'Sprache - Gestik Mismatch' : different inputs from voice and gesture
Example:
```
GES:    1072000 23040   I-Geste I - tipp +      Zeige li Hand           links oben      Treffer 1078400 12160
GES:    1959680 114560  R-Geste R - emot -      re Hand                         1078400 12160   �berlegung/Nachdenken
GES:    2166400 16000   I-Geste I - tipp +      Zeige li Hand           links oben      rechts  2171520 7680
GES:    2641280 12800   I-Geste I - tipp +      Zeige re Hand    � Schlo"s       rechts unten    Treffer 2647680 5120
GES:    3093120 14080   I-Geste I - tipp +      Zeige re Hand           links unten     Treffer 3098240 7040
GES:    3351680 7040    R-Geste R - UFO re Hand                         3098240 7040
GES:    4029440 22400   I-Geste I - tipp +      Zeige li Hand           links oben      rechts  4035840 10240
```
SmartKom User State Annotation (holistic) USH class 2
Synopsis:
USH: (begin) (duration) (label string)
This tier type contains information on user-states (interesting emotional and cognitive states) that occured in a SmartKom recording session. For more background information about the SmartKom data collection see here.

The whole session is segmented (no gaps). For each segment begin (begin) and duration (duration) are given in samples from the beginning of the recording (SmartKom: 16kHz).
In the label string (label string) each segment is assigned to one of the labels described below, optionally followed by a TAB-separated rating. For a detailed description of the labeling system see here; the following is a brief description of the 7 label categories (the verbose values of labels are quoted in ''):
1. neutral 'Neutral'
2. joy/gratification (being successful) 'Freude/Erfolg'
3. anger/irritation '�rger/Mi�erfolg'
4. helplessness 'Ratlosigkeit'
5. pondering/reflecting '�berlegen/Nachdenken'
6. surprise '�berraschung/Verwunderung'
7. unidentifiable episodes 'Restklasse'
The labels are assigned with respect to the impression of the labeler. Not only the facial expression but also the voice quality or other contextual information is considered. Only the use of words with emotional content, but without an emotional expression is NOT considered as an indicator of a respective emotion/user-state.
The intensity of a user-state is given after the label classes 2-6 by the following rating:
- strong 'stark'
- weak 'schwach'
Example:
```
USH:    0       205440  Freude/Erfolg   schwach
USH:    205440  30720   Neutral
USH:    236160  37760   Freude/Erfolg   schwach
USH:    273920  192000  Neutral
USH:    465920  78720   �berlegen/Nachdenken    stark
USH:    544640  295680  Neutral
USH:    840320  49920   �rger/Mi�erfolg schwach
USH:    890240  42880   Neutral
USH:    933120  21760   �berraschung/Verwunderung       schwach
USH:    954880  97920   Ratlosigkeit    schwach
USH:    1052800 542720  Neutral
```
See also tiers USM, USP and OCC.
SmartKom User State Annotation (facial expression) USM class 2
Synopsis:
USM: (begin) (duration) (label string)
This tier type contains information on user-states (interesting emotional and cognitive states) that occured in a SmartKom recording session. In contrast to the USH tier only the video signal of the face is available.
For more background information about the SmartKom data collection see here.

The whole session is segmented (no gaps). For each segment begin (begin) and duration (duration) are given in samples from the beginning of the recording (SmartKom: 16kHz).
In the label string (label string) each segment is assigned to one of the labels described below, optionally followed by a TAB-separated rating. For a detailed description of the labeling system see here; the following is a brief description of the 7 label categories (the verbose values of labels are quoted in ''):
1. neutral 'Neutral'
2. joy/gratification (being successful) 'Freude/Erfolg'
3. anger/irritation '�rger/Mi�erfolg'
4. helplessness 'Ratlosigkeit'
5. pondering/reflecting '�berlegen/Nachdenken'
6. surprise '�berraschung/Verwunderung'
7. unidentifiable episodes 'Restklasse'
The labels are assigned with respect to the impression of the labeler. ONLY the facial expression but NOT the voice quality or other contextual information is considered. This annotation was performed by a different labeler group than the USH annotation. Therefore this annotation may be used for a investigation of influence of speech input to user stae judgements.
The intensity of a user-state is given after the label classes 2-6 by the following rating:
- strong 'stark'
- weak 'schwach'
Example:
```
USM:    0       205440  Freude/Erfolg   schwach
USM:    205440  30720   Neutral
USM:    236160  37760   Freude/Erfolg   schwach
USM:    273920  192000  Neutral
USM:    465920  78720   �berlegen/Nachdenken    schwach
USM:    544640  295680  Neutral
USM:    840320  49920   �rger/Mi�erfolg schwach
USM:    890240  42880   Neutral
USM:    933120  119680  �berlegen/Nachdenken    schwach
USM:    1052800 542720  Neutral
USM:    1595520 59520   �berlegen/Nachdenken    schwach
USM:    1655040 157440  Neutral
USM:    1812480 143360  �berlegen/Nachdenken    schwach
USM:    1955840 58880   �rger/Mi�erfolg stark
USM:    2014720 89600   Neutral
USM:    2104320 559360  �rger/Mi�erfolg schwach
USM:    2663680 263680  Neutral
USM:    2927360 28800   �rger/Mi�erfolg schwach
```
See also tiers USH, USP and OCC.
SmartKom occlusion in the facial video OCC class 2
Synopsis:
OCC: (begin) (duration) (label string)

This tier contains an additional segmentation and labeling to the SmartKom facial video recording. All occlusions of the face or part of the face by the hand, pen or other objects are segmented and classified here. This tier might be very useful fir the automatic processing of the facial video signal.
Begin (begin) and duration (duration) of the occlusion are given in samples counted from the beginning of the recording (SmartKom: 16 kHz).
The label string contains one of the following 10 classes:
- 'Hand im Gesicht' : hand in face
- 'Hand im Gesicht/Mund' : hand in face in the area of the mouth
- 'Hand im Gesicht/Nase' : hand in face in the area of the nose
- 'Hand im Gesicht/Augen' : Hand hand in face in the area of the eyes
- 'Stift im Gesicht' : pen in face
- 'Stift im Gesicht/Mund' : pen in face in the area of the mouth
- 'Stift im Gesicht/Nase' : pen in face in the area of the nose
- 'Stift im Gesicht/Augen' : pen in face in the area of the eyes
- 'Teilweise nicht im Bild' : face partly not in the range of the recording camera
- 'Objekt im Gesicht' : other object than hand or pen in the area of the face
Example:
```
OCC:    380800  18560   Teilweise nicht im Bild
OCC:    458880  58240   Teilweise nicht im Bild
OCC:    1167360 7680    Teilweise nicht im Bild
OCC:    1173120 14720   Hand im Gesicht
OCC:    1201920 11520   Teilweise nicht im Bild
OCC:    2000000 12160   Hand im Gesicht/Mund
OCC:    2567040 57600   Teilweise nicht im Bild
OCC:    2709120 40960   Hand im Gesicht/Mund
OCC:    2947840 33280   Hand im Gesicht
OCC:    2955520 9600    Teilweise nicht im Bild
OCC:    2981120 35840   Teilweise nicht im Bild
OCC:    3528960 10880   Hand im Gesicht
OCC:    4001920 10240   Hand im Gesicht
OCC:    4103680 20480   Teilweise nicht im Bild
```
See also tiers
USH, USP and USM.
SmartKom Meta-Linguistic Features USP class 4
Synopsis:
USP: (begin) (duration) (list of symbolic links) (label string)

This tier contains a segmentation and labeling to the SmartKom audio recording. The meta-linguistic features used in this tier are the feature set for a voice based user state detection (see tier USH for details about SmartKom user state categories). The USP tier is a word-aligned extract from the original SmartKom TRP annotation files. It contains all information from the TRP files without the trouble that TRP has to be aligned to the base TRS tier first. More information regarding the TRP annotation scheme can be found here (only in German). For more background information about the SmartKom data collection see here.

Begin (begin) and duration (duration) of the event are given in samples counted from the beginning of the recording (SmartKom: 16 kHz). Please note that in some cases NOT the event but the word in which the event takes palce are segmented. See the special notes to the individual labels below.
The symbolic links refers to the word in question.
The label string contains one of the following 9 classes.
Label codes:
(If not stated otherwise the segment is the duration of the complete word.)
- CLEAR_ART : clear articulated speech.
  Speaker trys to speak Standard German (Hochdeutsch'); no dialectal variations; but not yet hyper-articulated; comparable to a trained radio announcer.
- HYPER_ART : hyperarticulated speech.
  Un-natural emphasis on clear speech; like speaking to a person with bad language skills.
- EMPHASIS : emphatic accentuation.
- STRONG_EMPH : strong emphatic accentuation.
  Very strong accentuation of a syllable, e.g. 'MOOONtag'
- LENGTH_SYLL : lenghtening of a syllable
- PAUSE_PHRASE : irregular pause on a phrasal level.
  Unusual pausing between semantic units; not pauses between sentences or between main clause and sub-ordinate clause (except they are very long)
  In this case the segment covers the word before the pause plus the pause.
- PAUSE_WORD : irregular pause on a word unit level.
  Pause between words where usually no pause should occur.
  In this case the segment covers the word before the pause plus the pause.
- PAUSE_SYLL : irregular pause on a syllable level.
  In this case the segment covers the word in which the pause occurs between syllables.
- LAUGHTER : speech overlapped by laughter, sigh or the like.
  Only words that are affected by laughter, strong breathing etc; no laughter alone.
  In this case the segment covers the word which is overlapped.
Label rules:
- Before the labeler starts working on the labeling, he listens to the whole dialog to get a feeling what is 'normal' speech to that particular user. Then he trys to identify deviations from the 'normal' speech.
- One word may have more than one label; each label is then annotated in a separat line. In case that one label denotes a pause on phrase or word level, the segments my differ, although they refer to the same word.
- Pauses on word or phrase level are annotated as between the adjescent words, e.g.
```
USP:    3678656 14144   48;49   PAUSE_WORD
```
- The USP tier does not contain the information to which part of compositum a label refers. You may in principle retrieve this information from the original TRP label files.
- Filled pauses are treated as words.
Example:
```
USP:    79552   6704    0       EMPHASIS
USP:    426176  8768    6       STRONG_EMPH
USP:    426176  8768    6       CLEAR_ART
USP:    435952  10160   7       CLEAR_ART
USP:    806560  6592    9       LENGTH_SYLL
USP:    814624  4832    10      LENGTH_SYLL
USP:    819776  17184   11      EMPHASIS
USP:    1356896 6000    13      LENGTH_SYLL
USP:    1785232 11808   20      LENGTH_SYLL
USP:    1798064 7808    21      LENGTH_SYLL
USP:    2449632 7376    23      LENGTH_SYLL
USP:    2470016 10736   27      LENGTH_SYLL
USP:    2470016 14800   27;28   PAUSE_WORD
USP:    2794160 12080   31      LENGTH_SYLL
USP:    3221632 5440    41      CLEAR_ART
USP:    3678656 8528    48      LENGTH_SYLL
USP:    3678656 14144   48;49   PAUSE_WORD
USP:    3694576 3824    49      EMPHASIS
USP:    4170960 11344   53      LENGTH_SYLL
USP:    4186192 4464    54      EMPHASIS
```
See also tiers USH, OCC and USM.
Translation TLN class 1
Synopsis:
TLN: (list of symbolic links) (label string)

This tier contains a translation of the recorded speech into another language.
The list of symbolic links marks the area that is covered by the following translation within the recording. Translations may therefore be spread in chunks over more than one TLN line; even overlapping areas are possible, if necessary.
The label string contains a marker giving the language pair of the translation in the form '##>%%' where '##' is the international language code for the source language while '%%' is the code for the target language. e.g. from German to English: 'DE>EN'. After this marker, separated by a single TAB follows the orthographic form of the translation without punctuation. Coding of special characters may differ as in the tier ORT (see above).
Example:
```
ORT:    0       okay
ORT:    1       thank
ORT:    2       you
ORT:    3       bye
TLN:    0,1,2,3 EN>DE    gut danke tsch�s
```

SAM

The SAM Format was defined in the ESPRIT "SAM" Project No 2589 : 'Speech Input and Output Assessment Methodologies and Standardization'. Only very few BAS corpora contain SAM Format files.
On each BAS CDROM you will find scripts (sam2pho, pho2sam) for the conversion of SAM into PhonDat and vice versa.

A detailed description of the SAM format can be found here.

AGS - Annotation Graphs

Bird et al (LDC) use an abstract and very general data model called 'annotation graphs' to represent all kinds of annotations in the ATLAS project. The BAS Partitur Format (BPF) can be represented as an annotation graph as well.
Since LDC provides also software modules for designing new annotation tools based on this model, they defined a SGML based format (based on ATLAS Level 0, v1.1b3) to store and exchange such annotation data (AGS).
On each BAS CDROM you will find the script par2ags.pl that transforms a BAS Partitur Format (BPF) file into an AGS file. A DTD for the AGS format can be found here.
Some BAS corpora are already shipped with both formats, BPF and AGS.

Florian Schiel

BASBavarian Archive for Speech SignalsFile Formats

Signal Files

Segment/Label Files

General

History

Definition of Structure 1.2

Remarks:

Definition of Tiers (version 1.2.2)

BAS
Bavarian Archive for Speech Signals
File Formats