Gleiche Seite in deutsch
This page was last updated 01/09/04
For reading and writing please use the
software
delivered with the corpus (modul header.c).
A detailed description of the binary header structure can be found
here.
For reading and writing please use the
software
delivered with the corpus (modul header.c).
A detailed description of the binary header structure and the
following header blocks can be found
here.
A detailed description of the NIST/SPHERE formats can be found
here.
Only a few BAS copora contain data with NIST headers.
However, in all BAS corpora you will find
tools (
Syntax:
Remarks:
Syntax:
Remarks:
In the future all BAS corpora will be distributed with the new BAS Partitur
Format, if they contain segmental information of any kind. The former used
formats will be retained but not further updated.
A first draft to the BAS Partitur Format (1995) can be found
here. An up-to-date
publication of version 1.2 can be
donwloaded here (1998).
The BAS Partitur Format has the following features:
The contents is in 7-bit ASCII exclusively (to garanty portability to all
platforms). Each line starts with a three-byte label followed by a colon,
which defines synopsis and semantics of the following line. The following
units of the line are seperated by 'white spaces' (blank, tab).
The Partitur file is structured into a header and a body (like SAM
description files are). The header stretches from the beginning of the file to the
label
The header contains SAM-compatible lines of general information.
The following entries are compulsary:
LHD: Partitur file version
Example:
The following entries are optional; aside from these other
entries are tolerated as long as they do not conflict with compulsary
and optional entries:
FIL: SAM File Type
All header labels are SAM-compatible.
The body starts after the label
There are 5 basic classes of tiers:
A line of this tier contains:
Example:
A line of this tier contains:
Example:
A line of this tier contains:
Example:
A line of this tier contains:
Example:
A line of this tier contains:
Example:
Synopsis:
This tier contains a list of the spoken words within the utterance annotated
in extended German SAM-PA
('canonical pronunciation').
Example:
Synopsis:
The tier 'Orthography' contains the orthographic (lexical) forms
corresponding to the units in the tier 'Vorschlagstranskription' (see above).
Example:
Synopsis:
The tier 'Verbmobil Transliteration' contains the transliteration
of the utterance according to the VM conventions 3.0. The transliteration
is segmented into the units of the KAN tier (see above).
Therefore multiple references may occur (eg. if a reduced form of two
words is written as one unit in the transliteration). Each segment
covers the scope from the begin of the referenced unit(s) to the begin
of the next referenced unit(s). By doing this
it may happen that the first line of this tier contains no referenced unit.
In this case the line is aligned to the first unit.
A detailed description of the Verbmobil I transliteration format can
be found here (German only!).
Example:
Synopsis:
The tier 'Verbmobil II Transliteration' contains the transliteration
of the utterance according to the Verbmobil II conventions.
A new improved format was necessary because the VM I format was not
parsable. For more information about the VM II format see
here.
Our partner at CMU kindly provided an English translation also.
The transliteration
is segmented into the units of the KAN tier (see above) by starting a new
line after each unit. Exceptions are punctuations and pronunciation
comments that are kept together with the last unit (this is just for a better
readability).
Example:
Synopsis:
In multi-party recording as in the Verbmobil II project it may happen
that the speech of the currently recorded speaker is actively super-imposed
by another dialog partner (cross talk). To denote this the tier
Example:
Synopsis:
This tier contains a totally time-consuming segmentation into phonemic units
(extended German SAM-PA
, broad phonetic transcript).
The first number denotes the beginning of the segment in samples counted
from the beginning of the speech file; the second number the duration
of the segment in samples.
Synopsis of label string
A definition of extended German SAM-PA can be found
here.
Example:
Synopsis:
This tier contains a segmentation into phonemic units
(extended German SAM-PA
, broad phonetic transcript).
In contrast to the PHO tier (see above) this segmentation is not stringent
time consuming. That is, there might be pauses in the signal that are not
labeled (which happens frequently in spontaneous speech).
The first number denotes the beginning of the segment in samples counted
from the beginning of the speech file; the second number the duration
of the segment in samples.
Example:
Definition:
This tier contains an automatically generated phonetic-phonologic
segmentation in units of SAM-PA.
Some of these tiers are produced in close cooperation with
Technical University of Munich (Dr. G. Ruske).
The first number is the start of the segment counted in samples from the
beginning of the file; the second number is the length of the segment
in samples.
Example:
Definition:
This tier contains a segmentation of the utterance in word or word equivalents.
The segmentation need not to be justified. The 'label string' may contain
othographic or pronunciation information (eg. in SAM-PA). A '-' at the
end of 'label string' denotes a missing word in reference of the tier KAN.
A '-' a last character in 'label string' denotes an inserted word.
Definition:
This tier contains a segmentation in dialog acts according to the
ongoing work of the
'Deutsches Forschungszentrums für
künstliche
Intelligenz', Saarbrücken, Germany (DFKI). Each marker covers
a portion of the speech signal that is denoted by the symbolic links
to the reference tier
Example:
Definition:
This tier contains the prosodic segmentation (by hand) according
to GTobi done by the
Technical University of Braunschweig.
Example:
Definition:
This tier contains a symbolic prosodic segmentation and labeling (by hand)
into 3 boundary markers and 3 accent markers (close to GTobi).
The symbolic links give the relation to the word event order.
Example:
Definition:
This tier contains a noise labelung in reference to the word chain defined.
Two different types of noises are possible: simple noise occuring between
two words are denoted with a semi-colon seperated pair of symbolic links to these
wrds (e.g. '5;6'); noise that superimpose a single word is marked with
a single symbolic link denoting the superimposed word (e.g. '5').
For example:
Definition:
This tier contains a manually labeled accent marker according to GTobi.
There is no link to the word order. The labeling was done during the
German Verbmobil 2 project by the Technical University of Braunschweig.
Definition:
This tier contains a manually labeled accent marker according to GTobi.
There is no link to the word order. The labeling was done during the
German Verbmobil 2 project by the Technical University of Braunschweig.
Definition:
This tier contains a manually labeled prosodic accent and boundary
annotation based on the linguistic information (the chain of words).
Consequently it only contains links to the spoken words of the utterance
but not to the signal itself.
The labeling was done during the
German Verbmobil 2 project by the Technical University of Erlangen in
cooperation with the University of Munich.
A detailed description of the labeling system as well as the used
categories can be found
here
(definition of labels can be found in table 12 on pp. 15-15 of the document).
For example:
Definition:
This tier contains a computer-readable representation of a syntactic
tree of the utterance. The tiers SYN, FUN and LEX are describing different
aspects of this tree, such as syntactic node, function and word class (see
below). They may also be exploited independently.
The labeling was done during the
German Verbmobil 2 project by the University of Tübingen.
An overview about the treebanks of Verbmobil II (6 pages) can be found
here,
A detailed description of the labeling system as well as the used
categories can be found here for
German,
English
and
Japanese .
For example:
Definition:
This tier contains an automatically generated lexical tagging of all words
of the utterance. The class systemis based on the STTS
(Stuttgart-Tübingen-TagSet) like the LEX tier (but the LEX tier was
annotated manually!).
The labeling was done during the
German Verbmobil 2 project by the Technical University of Stuttgart.
A detailed description of the labeling system as well as the used
categories can be found here for
German
(pp. 17 - 19) and
English
(pp. 48 - 49). Furthermore, some examples for each German category
can be found here
(only in German).
For example:
Definition:
This tier contains automatically derived lemmas for each word in the
BPF.
The labeling was done during the
German Verbmobil 2 project by the Technical University of Stuttgart.
For example:
Definition:
This tier contains a phonetical segmentation and labeling according to
IPA.
For example:
Definition:
This tier contains a segmentation of longer recordings into turns, sentences or similar longer events, that contain more than one word.
For example:
Synopsis:
The tier 'Smartkom Transliteration' contains the transliteration
of a whole Man Machine Dialogue recorded in the SmartKom data collection.
For more background information about the SmartKom data collection see
here.
Example:
TRS: 5 die
TRS: 6 Sehensw"urdigkeiten [PA]
TRS: 7 von
TRS: 8 ~Heidelberg [NA] [B3 fall] .
TRS: 9 gibt [NA]
TRS: 10 es
TRS: 11 hier
TRS: 12 vielleicht
TRS: 13 Cafeterias [PA] [B3 rise] ? <#>
TRS: 14 was
TRS: 15 f"ur
TRS: 16 Hotels [NA]
TRS: 17 gibt [PA]
TRS: 18 es [B3 cont] ?
TRS: 19 @1mhm [NA] [B3 cont] .
TRS: 20 kannst
TRS: 21 was
TRS: 22 andres [PA]
Synopsis:
This tier contains a manual segmentation and annotation of 2D gestures as
recorded in the
SmartKom data collection.
All gestures that occur within the range of the SIVIT camera are
labelled. Additionally, emotional gestures that occur elsewhere are labeled.
For more background information about the SmartKom data collection see
here.
For a detailed description of the labeling system see here; the following is a brief description of the 8 label categories (possible values of labels are quoted in ''):
Example:
Synopsis:
This tier type contains information on user-states (interesting emotional
and cognitive states) that occured in a
SmartKom recording
session.
For more background information about the SmartKom data collection see
here.
The whole session is segmented (no gaps).
For each segment begin (begin) and duration (duration) are given
in samples from the beginning of the recording (SmartKom: 16kHz).
In the label string (label string) each segment is assigned to one of the
labels described below, optionally followed by a TAB-separated rating.
For a detailed description of the labeling system see
here; the following is a brief
description of the 7 label categories (the verbose values of labels
are quoted in ''):
The intensity of a user-state is given after the label classes 2-6
by the following rating:
Example:
Synopsis:
This tier type contains information on user-states (interesting emotional
and cognitive states) that occured in a
SmartKom recording session. In contrast to the USH tier only the
video signal of the face is available.
The whole session is segmented (no gaps).
For each segment begin (begin) and duration (duration) are given
in samples from the beginning of the recording (SmartKom: 16kHz).
In the label string (label string) each segment is assigned to one of the
labels described below, optionally followed by a TAB-separated rating.
For a detailed description of the labeling system see
here; the following is a brief
description of the 7 label categories (the verbose values of labels
are quoted in ''):
The intensity of a user-state is given after the label classes 2-6
by the following rating:
Example:
Synopsis:
This tier contains an additional segmentation and labeling to the
SmartKom facial video recording. All occlusions of the face or
part of the face by the hand, pen or other objects are segmented and
classified here. This tier might be very useful fir the automatic
processing of the facial video signal.
Begin (begin) and duration (duration) of the occlusion are given in
samples counted from the beginning of the recording (SmartKom: 16 kHz).
Example:
Synopsis:
This tier contains a segmentation and labeling to the
SmartKom audio recording. The meta-linguistic features used
in this tier are the feature set for a voice based user state
detection (see tier USH for details about SmartKom user state categories).
The USP tier is a word-aligned extract from the original SmartKom
TRP annotation files. It contains all information from the TRP files
without the trouble that TRP has to be aligned to the base TRS tier first.
More information regarding the TRP annotation scheme can be found
here (only in German).
For more background information about the SmartKom data collection see
here.
Begin (begin) and duration (duration) of the event are given in
samples counted from the beginning of the recording (SmartKom: 16 kHz).
Please note that in some cases NOT the event but the word in which the
event takes palce are segmented. See the special notes to the
individual labels below.
Label codes:
Label rules:
Example:
A detailed description of the SAM format can be found
here.
PhonDat 1
Signal files with PhonDat 1 Header contain a binary header
of constant length (512 bytes). The signal samples (2 bytes per sample)
start after
this header and are always in LoHi byte order (Intel format).
The header contains a defined structure with information as
sampling frequency, resolution in bits, etc. The header is ILS comaptible.
PhonDat 2
PhonDat 2 is an extension of the PhonDat 1 format.
After the binary header of 512 bytes additional blocks of
512 bytes follow which contain the orthography and canonical transcript of
the utterance (SAM-PA).
The PhonDat 2 header can be identified by the version number (2) in the
binary part.
NIST - SPHERE
The NIST - SPHERE speech header format was defined by the
'National Institute of Standards and Technology, USA'.
It is used in many american speech corpora.
nist2pho, pho2nist
)
to transform PhonDat to NIST and vice versa.
Segment/Label Files
S0 Format
The S0 Format contains word labels of utterances longer than a single word.
The format was defined in the German PhonDat project.
The label files are in ASCII and have the same prefix as the corresponding
signal files. The extension is .S0
.
<file> = <Name of segment file> CR
<Orthography> CR
oend CR
<Canonical form> CR
kend CR
hend CR
<list of word segments>
<list of word segments> = <begin sample> <marker> CR
...
<begin sample> = number of first sample
<marker> = '#c:' (beginning of first word) OR
<canonical word form> (as read from the lexicon) OR
'.' (end of last word)
<Name of segment file> = any valid filename
<Orthography> =
The orthographic string contains the standard orthography or a
transliteration with additional markers of the spoken utterance.
German Umlauts are represented either by LaTeX
convention or by 7 bit ASCII signs or by German Character set
coding used by DEC and Sun:
Umlaut LaTeX 7 Bit ASCII (dec) German Char Set (hex)
Ae "A [ (91) C4
Ue "U ] (93) CD
Oe "O \ (92) D6
ae "a { (123) E4
ue "u } (125) FC
oe "o | (124) F6
ss "s ~ (126) DF
<Canonical form> =
The canonicalal string contains the exspected citation forms of the
word in the utterance. Note that this is NOT a transcription of the
signal. Symbols used are the German subcorpus of the
SAM-PA, with
following changes to SAM-PA:
Q Glottal Stop
q Glottalisierung (not in canonicalal forms!)
' main stress
" secondary stress
# compositum marker (optional)
+ function word marker (suffix, optional)
Words are seperated by two blanks, phonemic labels are seperated by
one blank.
marker
.
The following word has the same begin sample
.
marker
The replacing word has the same begin sample
.
S1 Format
The S1 Format contains the phonological segmentation of the
utterance. The format was defined in the German PhonDat project.
The label files are in ASCII and have the same prefix as the corresponding
signal files. The extension is .S1
.
<file> = <Name of segment file> CR
<Orthography> CR
oend CR
<Canonical form> CR
kend CR
<Transcription> CR
hend CR
<list of phoneme segments>
<list of phoneme segments> = <begin sample> <marker> CR
...
<begin sample> = number of first sample
/d i:6/
(dir), /g e: h OY6/
(geheuer)
S2 Format
The S2 format contains an automatically generated phonological
annotation of the signal.
The format is quite the same as PhonDat 1 with the following alterations:
BAS Partitur Format
General
Most formats of files with segmental information to speech signal have
the disadvantage that
Therefore a new open format based on the SAM Label Format was developed
at BAS which eludes most of the mentioned problems.
In this format all levels of description should be described
independently but time aligned like the single parts of a score. Hence
this format was called 'BAS Partitur Format' (German for 'score').
cat
History
1.0 : 01.09.95 Preliminary Definition of the BAS Partitur Format
PLEASE DO NOT USE THIS VERSION ANY MORE!
1.1 : 01.06.96 First Definition structured in classes
1.2 : 28.08.96 Label ELF: removed from definition
(tool par-1.1-to-1.2 transforms 1.1 files into 1.2 files)
1.2.1 : ??
1.2.2 : tier DAS added
1.2.3 : tier TR2, SUP added
1.2.4 : tier PRS added
1.2.5 : tier NOI added
1.2.6 : distinction between symbolic links to word groups (list of word
numbers seperated by kommata) and symbolic links to events between
words (eg. noises, number pairs seperated by semi-colon)
changed class definition of class 1, 4 and 5 accordingly
changed tier defintion NOI
1.2.7 : 12.09.00 Tier LBP and LBG added
1.2.8 : 11.05.01 Tier PRO,POS,LMA,SYN,FUN,LEX added
1.2.9 : 07.08.01 Tier IPA added
1.2.10 : 29.08.01 Tier TRN added
1.2.11 : 28.11.01 Tier TRS added
1.2.12 : 20.07.02 : Tiers GES,USH,USM,OCC,USP added
1.2.13 : 22.10.02 : Tier GES: definition of gestures extended
Tier TLN added
Definition of Structure 1.2
A Partitur file has the same prefix like the corresponding signal file
(8 Bytes for Iso 9660 compatibility) but the extension .par
.LBD:
; the body from the label LBD:
to the
end of file where the last line has to be closed by a 'new line'
(the final SAM label ELF:
was omitted for the BAS Partitur
Format since it prevents effective processing of the Partitur files).
REP: Place of recording
SNB: Number of Bytes per Sample
SAM: Sampling Frequency in Hz
SBF: Byteorder (Intel 01, Motorola 10)
SSB: Bit Resolution
NCH: Number of Channels
SPN: Speaker ID
LBD:
LHD: Partitur 1.2
REP: Muenchen
SNB: 2
SAM: 16000
SBF: 01
SSB: 16
NCH: 1
SPN: PS1
LBD:
TYP: Typ of SAM Label File
DBN: Corpus Name
VOL: Number of Volume
DIR: Directory in Volume
SRC: Name of speech file
BEG: Beginning of labeling sequence
END: End of labeling sequence
RED: Date of Recording
RET: Duration
RCC: Recording Conditions
CMT: Comment
SPI: Speaker Information
PCF: Name of Protocol File
PCN: Protocol Number
EXP: Name of Segmenter
SYS: Labelingsystem
DAT: Date of Labeling
SPA: SAM-PA Version
LBD:
and stretches to
the the end of file. It contains the tiers of the Partitur.
Each tier is identified by an unique label. The order of tiers as well as the
order of lines within a tier is not significant.
The symbolic links (relations) refer to a reference tier which numbers the
word units beginning with zero.
(the choice of word units as symbolic relations is arbitrarily!).
a pair of integers seperated by semi-colon refering to an event between those
two words
The string itself may have an internal synopsis which is defined in the
tier definition.
TRL: 6,7 mit'm
NOI: 4;5 #Klopfen
The meaning of the integers is defined by the tier definition
(possible are samples, millisecs, etc.)
The string itself may have an internal synopsis which is defined in the
tier definition.
SAP: 13456 345 aU
The meaning of the integer is defined by the tier definition
(possible are samples, millisecs, etc.)
The string itself may have an internal synopsis which is defined in the
tier definition.
PRB: 13456 TON: P*; FUN: PA
The meaning of the integers is defined by the tier definition
(possible are samples, millisecs, etc.)
a pair of integers seperated by semi-colon refering to an event between those
two words
The symbolic links (relations) refer to a reference tier which numbers the
word units beginning with zero.
(the choice of word units as symbolic relations is arbitrarily!).
The string itself may have an internal synopsis which is defined in the
tier definition.
SAP: 13456 345 9 aU
The meaning of the integer is defined by the tier definition
(possible are samples, millisecs, etc.)
a pair of integers seperated by semi-colon refering to an event between those
two words
The symbolic links (relations) refer to a reference tier which numbers the
word units beginning with zero.
(the choice of word units as symbolic relations is arbitrarily!).
The string itself may have an internal synopsis which is defined in the
tier definition.
PRB: 13456 13 TON: P*; FUN: PA
Remarks:
-1
.
Definition of Tiers (version 1.2.2)
KAN:
class 1KAN: (symbolic link) (transcript)
The segmentation of the whole utterance is done in word units, where
everything counts as a word that is produced by the articulatory
organs of the speaker and can be seen as 'speech'.
Following this definition hesitations are words, where laughing, coughs,
etc. are not. This speration isn't always clear, but on the other
hand the selection of word units is abitrarily as well. The main point
is a unique reference tier for symbolic relations in other tiers.
Another problems is the reduction of words that are annotated in the
orthographic form, eg. mit'm. In these cases the reduction
is restituted (in this example /mIt de:m/). The reason for this
lies in the fact that some of these reductions should be automatically
accessible.
KAN: 0 j'a:
KAN: 1 Qalzo:+
KAN: 2 QE:m
KAN: 3 h'OYt@
KAN: 4 Qo:d6+
KAN: 5 m'O6g@n
ORT:
class 1ORT: (symbolic link) (orthography)
Words are not capitalized at the beginning of an utterance or sentence
within an utterance (except nouns of course). German 'Umlauts' and other
letter not conform with 7 Bit ASCII are written as to be used for the
lexical access. Therefore the coding might differ in different speech
corpora, e.g. ISO-8859 or LaTeX coding.
This tier is used for an easy lexicon reference; therefore no additional
markers except lexical words are allowed. There is no punctuation in this
tier. Lexical words include items that are contained in the KAN tier (eg.
hesitations, word breaks).
ORT: 0 ja
ORT: 1 also
ORT: 2 <"ahm>
ORT: 3 heute
ORT: 4 oder
ORT: 5 morgen
TRL:
class 1TRL: (list of symbolic links) (transliteration)
class 1
TRL: 0 <Schmatzen>
TRL: 0 ja ,
TRL: 1 also
TRL: 2 <"ahm>
TRL: 3 heute
TRL: 4 oder
TRL: 5 morgen .
TR2:
class 1TR2: (list of symbolic links) (transliteration)
class 1
TR2: 25 ~Weihnachten
TR2: 26 ist
TR2: 27 das
TR2: 28 sowieso
TR2: 29 immer
TR2: 30 etwas
TR2: 31 schwierig ,
TR2: 32 und
TR2: 33 <"ahm>
TR2: 34 in
TR2: 35 der
TR2: 36 #zweiten
TR2: 37 Dezemberwoche
TR2: 38 bin
TR2: 39 ich
TR2: 40 in
TR2: 41 ~M"unchen
TR2: 42 auf
TR2: 43 dem
TR2: 44 Kongre"s .
TR2: 45 also
TR2: 46 bliebe
TR2: 47 noch
SUP:
class 1SUP: (list of symbolic links) (utterance-id) (transliteration)
class 1SUP
was added to the format. It will give the transliteration of the 'foreign'
speaker together with the symbolic markers to which parts of speech of the
recorded speaker these superimposed events are asigned to. The item
'utterance-id' gives the name of the correspondig Bas Partitur file
containing the superimposing part of speech.
The tier SUP
is currently only used in combination with the
tier TR2
. For a detailed discussion of superimposed speech in the
Verbmobil II project please click
here.
TR2: 0 ich
TR2: 1 w"urde
TR2: 2 vorschlagen ,
TR2: 3 da"s
TR2: 4 wir9@
TR2: 5 dann9@
TR2: 6 <:<#> hinfliegen:> ,
TR2: 7 <:<#> ich:>
TR2: 8 hab'
TR2: 9 jetzt
TR2: 10 aber
TR2: 11 <:<#Rascheln> grade:>
TR2: 12 <:<#Rascheln> keine:>
TR2: 13 Unterlagen
TR2: 14 da . <#>
SUP: 4,5 g002acn2_028_AAK.par @9ja
In this example the speaker is superimposed during the words 4 and 5 by the
single word 'ja' of another speaker. The latter occurs in the BAS Partitur
file 'g002acn2_028_AAK.par'.
PHO:
class 4PHO: (begin) (duration) (list of symbolic links) (label string)
The conventions of labeling and segmentation is briefly described
here.
<label string> = '#c:' (beginning of first word) OR
'#p:' (pause) OR
'#v:' (mis-pronunciation) OR
<segment> OR
<word boundary segment> OR
<compound boundary segment> OR
<punctuation>
<segment> = $<sampa string> (ordinary segment)
<word boundary segment> = ##<sampa string>
<compound boundary segment> = $#<sampa string>
<sampa string> = any string of <extended German SAM-PA symbols>
<punctuation> = '#.' OR '#,' OR '#?' OR '#!'
PHO: 2473 0 0 #c:
PHO: 2473 1100 0 ##d
PHO: 3573 0 0 $a-@
PHO: 4126 2007 0 $s
PHO: 6133 0 0 $-+
PHO: 6133 1130 1 ##g
PHO: 7263 1206 1 $e:
PHO: 8496 937 1 $t
PHO: 9433 0 2 ##Q-
PHO: 9433 0 2 $-q
PHO: 9433 2698 2 $aU
PHO: 12131 1178 2 $x
PHO: 13309 0 2 $-+
PHO: 13309 962 3 ##n
PHO: 14271 1675 3 $I
PHO: 15946 4308 3 $C
PHO: 18579 0 3 $t-
PHO: 18579 0 3 $-+
PHO: 18579 5467 3 #p:
SAP:
class 4SAP: (begin) (duration) (list of symbolic links) (label string)
The conventions of labeling and segmentation is briefly described
here.
SAP: 549 867 0 Q%<
SAP: 1416 1242 0 aU
SAP: 2658 1136 0 f
SAP: 3794 408 1 v
SAP: 4202 852 1 i:
SAP: 5054 433 1 d
SAP: 5487 1686 1 6%>
SAP: 7173 828 1 h%<%>
SAP: 8001 864 1 2:-9%<%>
SAP: 8865 1015 1 r-6%<
SAP: 9880 0 1 @-
SAP: 9880 1732 1 n
MAU:
Class 4
MAU: (begin) (duration) (list of symbolic links) (label string)
A detailed description of the MAUS system can be found
here.
The segmentation is justified and has
no relation to the tier 'Vorschlagstranskription' as done in the tier SAP.
(however, there are symbolic links to the words).
The units are extented German SAM-PA.
Additional labels are <nib>
(non-speech event) and
<p:>
(pause). These labels always get the symbolic
link -1
(no link).
Furthermore, events that clearly stem from the speaker, but cannot be
classified (e.g. non-understandable words) are labelled and segmented
as <usb>
. The latter receive a symbolic link as other
word events.
MAU: 0 676 -1 <p:>
MAU: 677 7861 -1 <nib>
MAU: 8539 450 0 g
MAU: 8990 2436 0 u:
MAU: 11427 1740 0 t
MAU: 13168 958 1 d
MAU: 14127 1298 1 a
MAU: 15426 3820 1 n
MAU: 19247 303 2 n
MAU: 19551 1785 2 e:
MAU: 21337 624 2 m
MAU: 21962 636 2 n
MAU: 22599 501 3 v
WOR:
Class 4WOR: (begin) (duration) (list of symbolic links) (label string)
The symbolic links give the relation to the KAN tier. Note that inserted
words have a symbolic link to the previous word in the KAN tier.DAS:
Class 1DAS: (list of symbolic links) (marker string)
KAN
.
DAS: 0,1,2,3,4,5 @(SUGGEST_SUPPORT_DATE BA)
DAS: 6,7,8,9 @(DELIBERATE_EXPLICITE BA)
DAS: 10,11,12,13,14,15,16,17,18,19,20 @(SUGGEST_SUPPORT_DATE BA)
In this example the marker SUGGEST_SUPPORT_DATE
covers the words 0 to 5 in the reference tier. The term 'BA' denotes
a dialog act from speaker 'B' to speaker 'A', where speaker 'A' is
always the speaker that initiates the dialog.
A more detailed description of the markers and the principles
of segmentation can be found
here.
PRB:
Class 5PRB: (sample) (list of symbolic links) (marker string)
The first number gives the time of the prosodic event measured in
samples from beginning of the file.
The symbolic links give the relation to the KAN tier.
The label string describes the prosodic event itself. A concise
description of the labeling convention (GTobi) can be found
here (Sorry: only in German).
PRB: 54212 5 TON: H*; FUN: NA
PRB: 63269 7 TON: L+H*; FUN: EK
PRB: 76371 8 BRE: B3; TON: L-L%
PRB: 79967 8 TON: L*+H; FUN: PA
PRS:
Class 1PRS: (list of symbolic links) (marker string)
The label string describes the prosodic event itself. Boundary markers
(B3, B2, B9) are linked to two words acting as left and right neighbors
of the boundary.
Accent markers (PA, NA, EK) refer to the word where the accent was
labeled. No syllable
information is provided.
Definition of Marker Strings:
B3 : full intonational boundary with strong intonational marking,
often with pauses or lengthening or change of speed
B2 : intermediate phrase boundary with weak marking, weaker intonational
marking than B3
B9 : 'agrammatical' boundary, e.g. hesitations, repairs, unintended pauses
PA : main accents (phrase accent) carried by one word; in rare cases
there can be two or more words marked together
NA : secondary accent for accentuated words without PA
EK : emphatic or contrastive accents
PRS: 0 EK
PRS: 4;5 B2
PRS: 7 NA
PRS: 9 NA
PRS: 11 NA
PRS: 11;12 B3
PRS: 13 EK
PRS: 14 EK
PRS: 15 PA
PRS: 17 NA
PRS: 17;18 B2
PRS: 18 NA
PRS: 19;20 B3
PRS: 23 EK
PRS: 23;24 B3
PRS: 25 EK
PRS: 27 PA
NOI:
Class 1NOI: (single or pair of symbolic links) (marker string)
The marker string contains a blank seperated list of noise labels. The
labels are drawn from the VMII TRL transliteration format:
<A> <B> : Breathing
<P> : distinct silence within an utterance
<%> : not understandable muttering
Schmatzen> <Smack> : lip smack
<Schlucken> <Swallow> : swallow
<R"auspern> <Throat> : throat clear
<Husten> <Cough> : cough
<Lachen> <Laugh> : laugh
<Ger"ausch> <Noise> : other articulatory noise
<#Klopfen> <#Knock> : knock
<#Rascheln> <#Rustle> : rustle
<#Quietschen> <#Squeak> : creak
<#Klicken> <#Click> : click noise
<#Mikrowind> : blowing into microphone
<#Mikrobe> : noise caused by touching, knocking,
rubbing against the microphone
<#> : other technical noise
NOI: 5 <Lachen> # word 5 is superimposed by a laugh
NOI: 5;6 <A> # between word 5 and word 6 a distinct
# breathing was recorded
LBP:
Class 3LBP: (sample) (marker string)
The following three accent classes were used:
PA phrase accent
NA secondary accent
EK emphatic or contrastive accents
For example:
LBP: 1651 PA
LBG:
Class 3LBG: (sample) (marker string)
The following 5 boundary classes were used:
B9 irregular boundary: 'agrammatical' boundary, e.g. hesitations,
repairs, unintended pauses
B2 intermediate phrase boundary with weak marking, weaker intonational
marking than B3
B3 intonational boundary with strong intonational marking, no
question
B3QH B3, sematically a question, with high tone
B3QL B3, sematically a question, with low tone
For example:
LBG: 6586 B3
PRO:
Class 1PRO: (sybolic link) (marker string)
PRO: 6;7 SS2
PRO: 13;14 AC1
PRO: 14;15 AC1
PRO: 15;16 AC1
PRO: 18;19 SC3
PRO: 24;25 IRB
PRO: 25;26 AC1
PRO: 26;27 AC1
PRO: 27;28 AC1
PRO: 28;29 IWE
PRO: 28;29 IZB
PRO: 31 SM3
SYN: FUN: LEX:
Class 1SYN: (sybolic link) (marker string)
FUN: (sybolic link) (marker string)
LEX: (sybolic link) (marker string)
Representation of Syntax Trees in the BAS Partitur Format (BPF)
===============================================================
In the BAS Partitur Format the syntax trees are represented in three
tiers. The terminal (lexical) categories are listed in the LEX
tier. Syntactical categoies of higher orders are listed in the SYN
tier. Grammatical functions refering to both LEX and SYN are
listed in the FUN tier. The LEX and the SYN entries refer to the nodes
and FUN represents the edges of the syntax tree.
Lexical Categories:
-------------------
Definition:
LEX: (symbolic link) (label string)
This tier represents the lexical categories of the words. The words
are represented by symbolic links. Hesitations, neologisms and
unintelligible parts of an utterance have not been annotated.
Example:
LEX: 0 0 PDS
LEX: 1 0 VMFIN
LEX: 2 0 CARD
LEX: 3 0 NN
LEX: 4 0 ADJD
LEX: 5 0 VVINF
The label string contains
(1) a tag for the lexical category, e.g. CARD (cardinal number) for word 2.
(2) an index indicating whether the node is terminal or branching or
non-branching. The LEX tier represents only terminal nodes therefore
the index is always 0 (see SYN and FUN tier for further information
to this index).
LEX labels used in syntax trees of German dialogues:
UNKNOWN unknown tag
--
SYN: 0 1 DM
SYN: 1 1 NX
SYN: 1 2 VF
SYN: 1,2,3,4,5 0 SIMPX
SYN: 2 1 VXFIN
SYN: 2 2 LK
SYN: 3 1 ADVX
SYN: 3,4,5 0 MF
SYN: 4 1 NX
SYN: 5 1 ADVX
SYN: 7 1 VXFIN
SYN: 7 2 LK
SYN: 7,8,9,10,11 0 SIMPX
SYN: 8 1 NX
SYN: 8,9,10,11 0 MF
SYN: 9,10,11 0 NX
SYN: 10 1 NX
SYN: 10,11 0 NX
SYN: 11 1 NX
FUN: 0 0 -
FUN: 0 1 --
FUN: 1 0 HD
FUN: 1 1 ON
FUN: 1 2 -
FUN: 1,2,3,4,5 0 --
FUN: 2 0 HD
FUN: 2 1 HD
FUN: 2 2 -
FUN: 3 0 HD
FUN: 3 1 MOD
FUN: 3,4,5 0 -
FUN: 4 0 HD
FUN: 4 1 OA
FUN: 5 0 HD
FUN: 5 1 V-MOD
FUN: 7 0 HD
FUN: 7 1 HD
FUN: 7 2 -
FUN: 7,8,9,10,11 0 --
FUN: 8 0 HD
FUN: 8 1 ON
FUN: 8,9,10,11 0 -
LEX: 0 0 PTKANT
LEX: 1 0 PPER
LEX: 2 0 VAFIN
LEX: 3 0 ADV
LEX: 4 0 NN
LEX: 5 0 ADV
LEX: 7 0 VVFIN
LEX: 8 0 PPER
LEX: 9 0 ART
LEX: 10 0 NN
LEX: 11 0 NE
POS:
Class 1POS: (sybolic link) (marker string)
POS: 0 ITJ
POS: 1 PPER
POS: 2 VAFIN
POS: 3 ADV
POS: 4 NN
POS: 5 ADV
POS: 7 VVFIN
POS: 8 PPER
POS: 9 ART
POS: 10 NN
POS: 11 NE
LMA:
Class 1LMA: (sybolic link) (marker string)
LMA: 0 nein
LMA: 1 pper
LMA: 2 haben
LMA: 3 hier
LMA: 4 Unterlage
LMA: 5 da
LMA: 7 kennen
LMA: 8 pper
LMA: 9 d
LMA: 10 Hotel
LMA: 11 Maritim
Please note that all personal pronomina were annotated with 'pper' and
all articles were annotated with 'd'.
IPA:
Class 2IPA: (begin) (duration) (label string)
The first number denotes the beginning of a segment counted in samples
from the beginning of the file, the second number denotes the duration of
the segment in samples. The remainder of the line must contain a list of
comma-separated IPA numbers (at least one), optionally followed by a list
of corresponding SAM-PA symbols.
IPA chart
with IPA numbers
IPA chart
with symbols
IPA: 4856 1228 322 @
IPA: 10629 564 317
IPA: 11805 991 319 I
IPA: 12797 1142 138 C
IPA: 13940 1534 302 e
IPA: 15475 895 110 g
IPA: 16371 777 322 @
IPA: 17149 758 155 l
IPA: 17908 1497 305
IPA: 19406 1204 116 n
IPA: 20611 589 104 d
IPA: 21201 1018 322 @
IPA: 22220 1185 103 t
TRN:
Class 4TRN: (begin) (duration) (symbolic link) (label string)
The first number denotes the beginning of a segment counted in samples
from the beginning of the file, the second number denotes the duration of
the segment in samples. The symbolic link contains a list of comma separated
word numbers that are contained in the segment. The rest of the line may contain an optional label (e.g. a turn number).
TRN: 132736 144640 0,1,2,3,4,5,6,7 002
TRS
class 1TRS: (list of symbolic links) (transliteration)
A detailed description of the underlying transliteration format can be found
here.
The transliteration
is segmented into the units of the KAN tier (see above) by starting a new
line after each unit. Exceptions are punctuations and pronunciation
comments that are kept together with the last unit (this is just for a better
readability).
TRS: 0 <:<#> ja:> [NA] [B2] ,
TRS: 1 ich
TRS: 2 h"atte
TRS: 3 <:<#> gern:> [NA]
TRS: 4 +/die/+ [B9]
GES
class 2GES: (begin) (duration) (label string)
The first number denotes the begin of a gestural event in samples from the
beginning of the recording (in SmartKom: 16 kHz); the second number the
duration in samples.
The 'label string' consists of 8 columns separated by TAB, optionally followed by a comment string:
This string is either '[FINGER] re|li [TOOL]' or 'nicht erkennbar', if the
pointing method cannot be determined.
For instance 'Zeige re Hand' denotes the index finger of the right hand;
'li Hand' denotes the left hand (more than one finger used); 'li Stift'
denotes a gesture performed with the left hand holding a pen.
If more than one finger or a pen is used, the string [FINGER] is empty.
The string may have one of the three forms:
The reference word is only labelled in I-gestures.
The reference zone is only labelled in I- and U-gestures.
The reference object is only labelled in I-gestures.
This may be free text or - more often - one of the following remarks (codes):
GES: 1072000 23040 I-Geste I - tipp + Zeige li Hand links oben Treffer 1078400 12160
GES: 1959680 114560 R-Geste R - emot - re Hand 1078400 12160 Überlegung/Nachdenken
GES: 2166400 16000 I-Geste I - tipp + Zeige li Hand links oben rechts 2171520 7680
GES: 2641280 12800 I-Geste I - tipp + Zeige re Hand § Schlo"s rechts unten Treffer 2647680 5120
GES: 3093120 14080 I-Geste I - tipp + Zeige re Hand links unten Treffer 3098240 7040
GES: 3351680 7040 R-Geste R - UFO re Hand 3098240 7040
GES: 4029440 22400 I-Geste I - tipp + Zeige li Hand links oben rechts 4035840 10240
USH
class 2USH: (begin) (duration) (label string)
The labels are assigned with respect to the impression of the labeler.
Not only the facial expression but also the voice quality or other
contextual information is considered. Only the use of words with
emotional content, but without an emotional expression is NOT considered
as an indicator of a respective emotion/user-state.
USH: 0 205440 Freude/Erfolg schwach
USH: 205440 30720 Neutral
USH: 236160 37760 Freude/Erfolg schwach
USH: 273920 192000 Neutral
USH: 465920 78720 Überlegen/Nachdenken stark
USH: 544640 295680 Neutral
USH: 840320 49920 Ärger/Mißerfolg schwach
USH: 890240 42880 Neutral
USH: 933120 21760 Überraschung/Verwunderung schwach
USH: 954880 97920 Ratlosigkeit schwach
USH: 1052800 542720 Neutral
See also tiers USM, USP and
OCC.
USM
class 2USM: (begin) (duration) (label string)
For more background information about the SmartKom data collection see
here.
The labels are assigned with respect to the impression of the labeler.
ONLY the facial expression but NOT the voice quality or other
contextual information is considered.
This annotation was performed by a different labeler group than the
USH annotation. Therefore this annotation may be used for a
investigation of influence of speech input to user stae judgements.
USM: 0 205440 Freude/Erfolg schwach
USM: 205440 30720 Neutral
USM: 236160 37760 Freude/Erfolg schwach
USM: 273920 192000 Neutral
USM: 465920 78720 Überlegen/Nachdenken schwach
USM: 544640 295680 Neutral
USM: 840320 49920 Ärger/Mißerfolg schwach
USM: 890240 42880 Neutral
USM: 933120 119680 Überlegen/Nachdenken schwach
USM: 1052800 542720 Neutral
USM: 1595520 59520 Überlegen/Nachdenken schwach
USM: 1655040 157440 Neutral
USM: 1812480 143360 Überlegen/Nachdenken schwach
USM: 1955840 58880 Ärger/Mißerfolg stark
USM: 2014720 89600 Neutral
USM: 2104320 559360 Ärger/Mißerfolg schwach
USM: 2663680 263680 Neutral
USM: 2927360 28800 Ärger/Mißerfolg schwach
See also tiers USH, USP and
OCC.
OCC
class 2OCC: (begin) (duration) (label string)
The label string contains one of the following 10 classes:
OCC: 380800 18560 Teilweise nicht im Bild
OCC: 458880 58240 Teilweise nicht im Bild
OCC: 1167360 7680 Teilweise nicht im Bild
OCC: 1173120 14720 Hand im Gesicht
OCC: 1201920 11520 Teilweise nicht im Bild
OCC: 2000000 12160 Hand im Gesicht/Mund
OCC: 2567040 57600 Teilweise nicht im Bild
OCC: 2709120 40960 Hand im Gesicht/Mund
OCC: 2947840 33280 Hand im Gesicht
OCC: 2955520 9600 Teilweise nicht im Bild
OCC: 2981120 35840 Teilweise nicht im Bild
OCC: 3528960 10880 Hand im Gesicht
OCC: 4001920 10240 Hand im Gesicht
OCC: 4103680 20480 Teilweise nicht im Bild
See also tiers USH, USP and
USM.
USP
class 4USP: (begin) (duration) (list of symbolic links) (label string)
The symbolic links refers to the word in question.
The label string contains one of the following 9 classes.
(If not stated otherwise the segment is the duration of the complete word.)
Speaker trys to speak Standard German (Hochdeutsch'); no dialectal
variations; but not yet hyper-articulated; comparable to a trained radio announcer.
Un-natural emphasis on clear speech; like speaking to a person with bad
language skills.
Very strong accentuation of a syllable, e.g. 'MOOONtag'
Unusual pausing between semantic units; not pauses between sentences or
between main clause and sub-ordinate clause (except they are very long)
In this case the segment covers the word before the pause plus the pause.
Pause between words where usually no pause should occur.
In this case the segment covers the word before the pause plus the pause.
In this case the segment covers the word in which the pause occurs between syllables.
Only words that are affected by laughter, strong breathing etc; no laughter
alone.
In this case the segment covers the word which is overlapped.
USP: 3678656 14144 48;49 PAUSE_WORD
USP: 79552 6704 0 EMPHASIS
USP: 426176 8768 6 STRONG_EMPH
USP: 426176 8768 6 CLEAR_ART
USP: 435952 10160 7 CLEAR_ART
USP: 806560 6592 9 LENGTH_SYLL
USP: 814624 4832 10 LENGTH_SYLL
USP: 819776 17184 11 EMPHASIS
USP: 1356896 6000 13 LENGTH_SYLL
USP: 1785232 11808 20 LENGTH_SYLL
USP: 1798064 7808 21 LENGTH_SYLL
USP: 2449632 7376 23 LENGTH_SYLL
USP: 2470016 10736 27 LENGTH_SYLL
USP: 2470016 14800 27;28 PAUSE_WORD
USP: 2794160 12080 31 LENGTH_SYLL
USP: 3221632 5440 41 CLEAR_ART
USP: 3678656 8528 48 LENGTH_SYLL
USP: 3678656 14144 48;49 PAUSE_WORD
USP: 3694576 3824 49 EMPHASIS
USP: 4170960 11344 53 LENGTH_SYLL
USP: 4186192 4464 54 EMPHASIS
See also tiers USH, OCC and
USM.
SAM
The SAM Format was defined in the
ESPRIT "SAM" Project No 2589 : 'Speech Input and Output
Assessment Methodologies and Standardization'. Only very few BAS
corpora contain SAM Format files.
On each BAS CDROM you will find
scripts (sam2pho, pho2sam
)
for the conversion of SAM into PhonDat and vice versa.
AGS - Annotation Graphs
Bird et al (LDC) use an abstract and very general data model called
'annotation graphs' to represent all kinds of annotations in the
ATLAS project.
The BAS Partitur Format (BPF) can be represented as
an annotation graph as well.
Since LDC provides also software modules for designing new annotation
tools based on this model, they defined a SGML based format (based on
ATLAS Level 0, v1.1b3) to
store and exchange such annotation data (AGS).
On each BAS CDROM you will find the script
par2ags.pl that transforms a BAS Partitur Format (BPF) file
into an AGS file. A DTD for the AGS format can be found
here.
Some BAS corpora are already shipped with both formats, BPF and AGS.
Florian Schiel