Gleiche Seite in deutsch
This page was last updated 2020-07-03
This page contains description and definitions of recommended/accepted file formats of BAS.
Aside from the listed types below BAS will support all non-proprietary file formats as recommended
by CLARIN.
media-corpus
XSD Schemamedia-session
XSD Schema
Signal files with PhonDat 1 Header contain a binary header
of constant length (512 bytes). The signal samples (2 bytes per sample)
start after
this header and are always in LoHi byte order (Intel format).
The header contains a defined structure with information as
sampling frequency, resolution in bits, etc. The header is ILS comaptible.
For reading and writing please use the
software
delivered with the corpus (modul header.c).
A detailed description of the binary header structure can be found
here.
PhonDat 2 is an extension of the PhonDat 1 format.
After the binary header of 512 bytes additional blocks of
512 bytes follow which contain the orthography and canonical transcript of
the utterance (SAM-PA).
For reading and writing please use the
software
delivered with the corpus (modul header.c).
A detailed description of the binary header structure and the
following header blocks can be found
here.
A detailed description of the NIST/SPHERE formats can be found
here.
Some BAS corpora contain data with NIST headers. To transform NIST/SPHERE
into other standard formats we recommend
SoX, e.g.:
The S0 Format contains word labels of utterances longer than a single word.
The format was defined in the German PhonDat project.
The label files are in ASCII and have the same prefix as the corresponding
signal files. The extension is
Syntax:
Remarks:
The S1 Format contains the phonological segmentation of the
utterance. The format was defined in the German PhonDat project.
The label files are in ASCII and have the same prefix as the corresponding
signal files. The extension is
Syntax:
Remarks:
The S2 format contains an automatically generated phonological
annotation of the signal.
All BAS corpora will be distributed with BAS Partitur
Format (BPF), if they contain segmental information of any kind. The former used
formats will be retained but not further updated.
The BPF will also be used as internal annotation file format for the BAS WebServices.
A publication of version 1.2 can be
downloaded here (1998).
The BAS Partitur Format has the following features:
As in the SAM standard BPF files are of type text/plain. Allowed codings
are 7-bit ASCII or UTF-8. Some BPF tiers also allow the coding in LaTeX for historical reasons. Usually BPF files have the extensions '*.par' or '*.PAR' and the mimetype
'text/plain-bas'. BPF files are 'line oriented', that is information is structured in lines and
optimized for line processing UNIX tools such as grep, sed, gawk etc. A XML version
of the Annotation Graph concept proposed by Liberman (ATLAS format) can be used
to handle the same information as the BPF file in XML. The latter file have usually
the extensions '*.ags' or '*.AGS' and the mimetype 'text/xml'. The DTD of this
file type can be found here.
The contents is in 7-bit ASCII or UTF-8 exclusively (to guaranty portability to all
platforms); depending on the label type the labels may contain
special characters which are either coded in UTF-8 (or LaTeX in some tiers).
Each line starts with a three-byte label followed by a colon,
which defines synopsis and semantics of the following line. The following
units of the line are seperated by 'white spaces' (blank, tab).
The Partitur file is structured into a header and a body (like SAM
description files are). The header starts with the line labelled
The header contains SAM-compatible lines of general information.
The following entries are compulsory:
LHD: Partitur file version
The following entries are recommended:
REP: Place of recording
Example:
The following entries are optional; aside from these other
entries are tolerated as long as they do not conflict with compulsory
and optional entries:
FIL: SAM File Type
The body starts after the label
There are 5 basic classes of tiers:
A line of this tier contains three fields:
Example:
A line of this tier contains four fields:
Example:
A line of this tier contains three fields:
Example:
A line of this tier contains five fields:
Example:
A line of this tier contains four fields:
Example:
Synopsis:
This tier contains a list of the spoken words within the utterance annotated
in SAMPA (if a SAMPA definition exists for that language) or in X-SAMPA
(in some old German speech corpora the extended
German SAM-PA
is used). Phonetic symbols can be either 'glutinated' in one string in column 3 or
phonetic symbols can be encoded blanks-separated (but no mixed forms); the latter
is recommended.
Example:
Definition:
This tier contains the canonical transcription of the words within the
utterance in SAMPA (if available for that language), IPA, or ARPABET
(eng-US only). In contrast to the KAN tier KSS additionally contains
syllable boundary and primary word stress markers. All symbols are
blank-separated.
Example:
Definition:
This tier contains the segmentation of the words within the utterance
into morphs and their classes. The morph sequence and its class
sequence are separated by a semicolon. Segments within each sequence
are blank-separated. The morphologic class inventory is documented
here: https://www.bas.uni-muenchen.de/Bas/readme_mrp_inventory.txt
Example:
Synopsis: This tier contains a list of the uttered words in a syllabified
canonic form. The transcription is given in the SAMPA variant of the
respective language. Syllables are separated by a dot '.'. The SAMPA
symbols may be separated by blanks (recommended). Ambisyllabic consonants are
assigned to the preceding syllable. Example:
Definition:
The tier contains the phonemic transcript of the spoken words
of the utterance. In contrast to the KAN tier these will
deviate from the canonical (citation) pronunciation form
since speakers rarely speak in citation pronunciation.
Example:
Synopsis:
The tier 'Orthography' contains the orthographic (lexical) forms
corresponding to the units in the tier 'Vorschlagstranskription' (see above).
Example:
Synopsis:
The tier 'Verbmobil Transliteration' contains the transliteration
of the utterance according to the VM conventions 3.0. The transliteration
is segmented into the units of the KAN tier (see above).
Therefore multiple references may occur (eg. if a reduced form of two
words is written as one unit in the transliteration). Each segment
covers the scope from the begin of the referenced unit(s) to the begin
of the next referenced unit(s). By doing this
it may happen that the first line of this tier contains no referenced unit.
In this case the line is aligned to the first unit.
A detailed description of the Verbmobil I transliteration format can
be found here (German only!).
Example:
Synopsis:
The tier 'Verbmobil II Transliteration' contains the transliteration
of the utterance according to the Verbmobil II conventions.
A new improved format was necessary because the VM I format was not
parsable. For more information about the VM II format see
here.
Our partner at CMU kindly provided an English translation also.
The transliteration
is segmented into the units of the KAN tier (see above) by starting a new
line after each unit. Exceptions are punctuations and pronunciation
comments that are kept together with the last unit (this is just for a better
readability).
Example:
Definition:
This tier represents the exact original Transcript, i.e. if you concatenate
the label strings you will yield the exact text form of the transcript.
Newlines are encoded as '\n', tabulator as ']\t' and other white space are encoded as '\s'.
This tier can for instance be the result of
an optimized maping of the reference tier ORT to the origianl transcript
(as for example the web service 'subtitle' will produce). In combination
with a MAUS segmentation this tier can be the bases for automatic subtitle
generation, or for indexing original transcript strings.
Example:
Synopsis:
In multi-party recording as in the Verbmobil II project it may happen
that the speech of the currently recorded speaker is actively super-imposed
by another dialog partner (cross talk). To denote this the tier
Example:
Synopsis:
This tier contains a totally time-consuming segmentation into phonemic units
(extended German SAM-PA
, broad phonetic transcript).
The first number denotes the beginning of the segment in samples counted
from the beginning of the speech file; the second number the duration
of the segment in samples.
Synopsis of label string
A definition of extended German SAM-PA can be found
here.
Example:
Synopsis:
This tier contains a segmentation into phonemic units
SAMPA/X-SAMPA (broad phonetic transcript).
In contrast to the PHO tier (see above) this segmentation is not stringent
time consuming. That is, there might be pauses in the signal that are not
labeled (which happens frequently in spontaneous speech).
The first number denotes the beginning of the segment in samples counted
from the beginning of the speech file; the second number the duration
of the segment in samples.
Example:
Definition:
This tier contains an automatically generated phonetic-phonologic
segmentation in units of SAM-PA.
Some of these tiers are produced in close cooperation with
Technical University of Munich (Dr. G. Ruske).
The first number is the start of the segment counted in samples from the
beginning of the file; the second number is the length of the segment
in samples.
Example:
Definition:
This tier contains a segmentation of the utterance in word or word equivalents.
The segmentation need not to be justified. The 'label string' may contain
othographic or pronunciation information (eg. in SAM-PA). A '-' at the
end of 'label string' denotes a missing word in reference of the tier KAN.
A '-' a last character in 'label string' denotes an inserted word.
Definition:
This tier contains a segmentation in dialog acts according to the
ongoing work of the
'Deutsches Forschungszentrums für
künstliche
Intelligenz', Saarbrücken, Germany (DFKI). Each marker covers
a portion of the speech signal that is denoted by the symbolic links
to the reference tier
Example:
Definition:
This tier contains the prosodic segmentation (by hand) according
to GTobi done by the
Technical University of Braunschweig.
Example:
Definition:
This tier contains a symbolic prosodic segmentation and labeling (by hand)
into 3 boundary markers and 3 accent markers (close to GTobi).
The symbolic links give the relation to the word event order.
Example:
Definition:
This tier contains a noise labelung in reference to the word chain defined.
Two different types of noises are possible: simple noise occuring between
two words are denoted with a semi-colon seperated pair of symbolic links to these
wrds (e.g. '5;6'); noise that superimpose a single word is marked with
a single symbolic link denoting the superimposed word (e.g. '5').
For example:
Definition:
This tier contains a manually labeled accent marker according to GTobi.
There is no link to the word order. The labeling was done during the
German Verbmobil 2 project by the Technical University of Braunschweig.
Definition:
This tier contains a manually labeled accent marker according to GTobi.
There is no link to the word order. The labeling was done during the
German Verbmobil 2 project by the Technical University of Braunschweig.
Definition:
This tier contains a manually labeled prosodic accent and boundary
annotation based on the linguistic information (the chain of words).
Consequently it only contains links to the spoken words of the utterance
but not to the signal itself.
The labeling was done during the
German Verbmobil 2 project by the Technical University of Erlangen in
cooperation with the University of Munich.
A detailed description of the labeling system as well as the used
categories can be found
here (for German)
(definition of labels can be found in table 12 on pp. 15-16 of the document)
and here (for English).
For example:
Definition:
This tier contains a computer-readable representation of a syntactic
tree of the utterance. The tiers SYN, FUN and LEX are describing different
aspects of this tree, such as syntactic node, function and word class (see
below). They may also be exploited independently.
The labeling was done during the
German Verbmobil 2 project by the University of Tübingen.
An overview about the treebanks of Verbmobil II (6 pages) can be found
here,
A detailed description of the labeling system as well as the used
categories can be found here for
German,
English
and
Japanese .
For example:
Definition:
This tier contains an automatically generated lexical tagging of all words
of the utterance. The class systemis based on the STTS
(Stuttgart-Tübingen-TagSet) like the LEX tier (but the LEX tier was
annotated manually!).
The labeling was done during the
German Verbmobil 2 project by the Technical University of Stuttgart.
A detailed description of the labeling system as well as the used
categories can be found here for
German
(pp. 17 - 19) and
English
(pp. 48 - 49). Furthermore, some examples for each German category
can be found here
(only in German).
For example:
Definition:
This tier contains automatically derived lemmas for each word in the
BPF.
The labeling was done during the
German Verbmobil 2 project by the Technical University of Stuttgart.
For example:
Definition:
This tier contains a phonetical segmentation and labeling according to
IPA.
For example:
Definition:
This tier contains a segmentation of longer recordings into turns, sentences or similar longer events, that contain more than one word.
For example:
Synopsis:
The tier 'Smartkom Transliteration' contains the transliteration
of a whole Man Machine Dialogue recorded in the SmartKom data collection.
For more background information about the SmartKom data collection see
here.
Beispiel:
Synopsis:
This tier contains a manual segmentation and annotation of 2D gestures as
recorded in the
SmartKom data collection.
All gestures that occur within the range of the SIVIT camera are
labelled. Additionally, emotional gestures that occur elsewhere are labeled.
For more background information about the SmartKom data collection see
here.
For a detailed description of the labeling system see here; the following is a brief description of the 8 label categories (possible values of labels are quoted in ''):
Example:
Synopsis:
This tier type contains information on user-states (interesting emotional
and cognitive states) that occured in a
SmartKom recording
session.
For more background information about the SmartKom data collection see
here.
The whole session is segmented (no gaps).
For each segment begin (begin) and duration (duration) are given
in samples from the beginning of the recording (SmartKom: 16kHz).
In the label string (label string) each segment is assigned to one of the
labels described below, optionally followed by a TAB-separated rating.
For a detailed description of the labeling system see
here; the following is a brief
description of the 7 label categories (the verbose values of labels
are quoted in ''):
The intensity of a user-state is given after the label classes 2-6
by the following rating:
Example:
Synopsis:
This tier type contains information on user-states (interesting emotional
and cognitive states) that occured in a
SmartKom recording session. In contrast to the USH tier only the
video signal of the face is available.
The whole session is segmented (no gaps).
For each segment begin (begin) and duration (duration) are given
in samples from the beginning of the recording (SmartKom: 16kHz).
In the label string (label string) each segment is assigned to one of the
labels described below, optionally followed by a TAB-separated rating.
For a detailed description of the labeling system see
here; the following is a brief
description of the 7 label categories (the verbose values of labels
are quoted in ''):
The intensity of a user-state is given after the label classes 2-6
by the following rating:
Example:
Synopsis:
This tier contains an additional segmentation and labeling to the
SmartKom facial video recording. All occlusions of the face or
part of the face by the hand, pen or other objects are segmented and
classified here. This tier might be very useful fir the automatic
processing of the facial video signal.
Begin (begin) and duration (duration) of the occlusion are given in
samples counted from the beginning of the recording (SmartKom: 16 kHz).
Example:
Synopsis:
This tier contains a segmentation and labeling to the
SmartKom audio recording. The meta-linguistic features used
in this tier are the feature set for a voice based user state
detection (see tier USH for details about SmartKom user state categories).
The USP tier is a word-aligned extract from the original SmartKom
TRP annotation files. It contains all information from the TRP files
without the trouble that TRP has to be aligned to the base TRS tier first.
More information regarding the TRP annotation scheme can be found
here (only in German).
For more background information about the SmartKom data collection see
here.
Begin (begin) and duration (duration) of the event are given in
samples counted from the beginning of the recording (SmartKom: 16 kHz).
Please note that in some cases NOT the event but the word in which the
event takes palce are segmented. See the special notes to the
individual labels below.
Label codes:
Label rules:
Example:
Synopsis:
This tier contains a translation of the recorded speech into another language.
The list of symbolic links marks the area that is covered by the following translation
within the recording. Translations may therefore be spread in chunks over more than
one TLN line; even overlapping areas are possible, if necessary.
Example:
Synopsis:
This tier contains a prosodic labelling as being used in German Speech
Synthesis Projects at IMS, University of Stuttgart and at BAS, Munich.
This simplified version of the German Tobi standard uses only either an
accent or a boundary marker in each labelled point-in-time. This format -
called 'GTobi light' - was developed by IMS Stuttgart for the usage in
unit selection speech synthesis techniques. In contrast to the standard
GTobi either an accent tone or a boundary type from a closed inventory
may be labelled; free combinations of tone (TON:), accent type (FUN:) and
boundary type (BRE:) as in GTobi is not allowed here, although some
boundary markers do in fact contain information about the tone structure.
A detailed description of the label inventory can be found
in the documentation
of the German BITS synthesis corpora, part B.
Example:
Synopsis:
This tier contains a transliteration of the German SmartWeb
corpus project. It uses a subset of the
SmartKom transliterations set
(TRS) extended by 2 additional off-talk markers, by an
pronuciation coding in SAM-PA and two additional time markers, which
allow to segment the dialogue into turn-like chunks.
the following SmartKom tags are used within TRW:
Example:
Synopsis:
This tier contains a syllable segmentation based on the the automatic
phonemic segmentation by MAUS ( see tier MAU).
Starting with the transcript in SAM-PA extracted from the MAU tier we
first search for minima of sonority as possible syllable boundaries between
syllable nuclei, and then re-adjust these boundaries according to
the rule set published by K. Kohler. The resulting syllabification is then
mapped back to the MAU segmentation to obtain the start and duration of
each listed syllable.
Example:
Synopsis:
This tier contains the result of a token-based speaker diarization of the recorded speech.
The list of symbolic links marks the area that is covered by the following speaker label
within the recording. Each word token can only be assigned to one speaker label.
Example:
A description of the SAM format can be found
here.
PhonDat 1
This format is out-dated and should not be used any longer!
PhonDat 2
This format is out-dated and should not be used any longer!<
The PhonDat 2 header can be identified by the version number (2) in the
binary part.
NIST - SPHERE
The NIST - SPHERE speech header format was defined by the
'National Institute of Standards and Technology, USA'.
It is used in many speech corpora originated in the United States.
sox -t sph input.nist output.wav
Segment/Label Files
S0 Format
This format is out-dated and should not be used any longer!.S0
.
<file> = <Name of segment file> CR
<Orthography> CR
oend CR
<Canonical form> CR
kend CR
hend CR
<list of word segments>
<list of word segments> = <begin sample> <marker> CR
...
<begin sample> = number of first sample
<marker> = '#c:' (beginning of first word) OR
<canonical word form> (as read from the lexicon) OR
'.' (end of last word)
<Name of segment file> = any valid filename
<Orthography> =
The orthographic string contains the standard orthography or a
transliteration with additional markers of the spoken utterance.
German Umlauts are represented either by LaTeX
convention or by 7 bit ASCII signs or by German Character set
coding used by DEC and Sun:
Umlaut LaTeX 7 Bit ASCII (dec) German Char Set (hex)
Ae "A [ (91) C4
Ue "U ] (93) CD
Oe "O \ (92) D6
ae "a { (123) E4
ue "u } (125) FC
oe "o | (124) F6
ss "s ~ (126) DF
<Canonical form> =
The canonical string contains the exspected citation forms of the
word in the utterance. Note that this is NOT a transcription of the
signal. Symbols used are the German subcorpus of the
SAM-PA, with
following changes to SAM-PA:
Q Glottal Stop
q Glottalisierung (not in canonical forms!)
' main stress
" secondary stress
# compositum marker (optional)
+ function word marker (suffix, optional)
Words are seperated by two blanks, phonemic labels are seperated by
one blank.
marker
.
The following word has the same begin sample
.
marker
The replacing word has the same begin sample
.
S1 Format
This format is out-dated and should not be used any longer!.S1
.
<file> = <Name of segment file> CR
<Orthography> CR
oend CR
<Canonical form> CR
kend CR
<Transcription> CR
hend CR
<list of phoneme segments>
<list of phoneme segments> = <begin sample> <marker> CR
...
<begin sample> = number of first sample
/d i:6/
(dir), /g e: h OY6/
(geheuer)
S2 Format
This format is out-dated and should not be used any longer!
The format is quite the same as PhonDat 1 with the following alterations:
BAS Partitur Format
General
Most formats of files with segmental information to speech signal have
the disadvantage that
Therefore a new open format based on the SAM Label Format was developed
at BAS which eludes most of the mentioned problems.
In this format all levels of description should be described
independently but time aligned like the single parts of a score. Hence
this format was called 'BAS Partitur Format' ('Partitur' is German for 'musical score').
cat
Files and Mimetype
History
1.0 : 01.09.95 Preliminary Definition of the BAS Partitur Format
PLEASE DO NOT USE THIS VERSION ANY MORE!
1.1 : 01.06.96 First Definition structured in classes
1.2 : 28.08.96 Label ELF: removed from definition
(tool par-1.1-to-1.2 transforms 1.1 files into 1.2 files)
1.2.1 : ??
1.2.2 : tier DAS added
1.2.3 : tier TR2, SUP added
1.2.4 : tier PRS added
1.2.5 : tier NOI added
1.2.6 : distinction between symbolic links to word groups (list of word
numbers seperated by kommata) and symbolic links to events between
words (eg. noises, number pairs seperated by semi-colon)
changed class definition of class 1, 4 and 5 accordingly
changed tier defintion NOI
1.2.7 : 12.09.00 Tier LBP and LBG added
1.2.8 : 11.05.01 Tier PRO,POS,LMA,SYN,FUN,LEX added
1.2.9 : 07.08.01 Tier IPA added
1.2.10 : 29.08.01 Tier TRN added
1.2.11 : 28.11.01 Tier TRS added
1.2.12 : 20.07.02 : Tiers GES,USH,USM,OCC,USP added
1.2.13 : 22.10.02 : Tier GES: definition of gestures extended
Tier TLN added
1.2.14 : 21.04.06 : Tier PRM added
1.2.15 : 21.02.07 : Tier TRW added
1.2.16 : 21.09.09 : Tier MAS added
1.3 : 05.10.12 : Allow UTF-8 coding of label content
1.3.1 : 11.05.17 : added header entries MAO (MAUS options) and GPO (G2P options)
1.3.2 : 27.06.17 : added header entry SAO (Speech Recognition options)
1.3.3 : 20.07.17 : added type 1 tier TRO
1.3.4 : 13.10.17 : added type 1 tier SPK
1.3.5 : 26.08.19 : added type 2 tier SPD
1.3.6 : 03.02.20 : added type 2 tier VAD
1.4 : 10.05.20 : made header definition stricter
Definition of Structure 1.X
A Partitur file has the same prefix like the corresponding signal file
but the extension .par
.LHD:
(usually in the first line of the file) to the line
labelled with LBD:
; the body from the label LBD:
to the
end of file (EOF) where the last line has to be closed by a 'line terminator' symbol
(the final SAM label ELF:
was omitted for the BAS Partitur
Format since it prevents effective processing of the Partitur files).
SAM: Sampling Frequency in Hz
LBD:
SNB: Number of Bytes per Sample
SBF: Byteorder (Intel 01, Motorola 10)
SSB: Bit Resolution
NCH: Number of Channels
SPN: Speaker ID
LHD: Partitur 1.3
REP: Muenchen
SNB: 2
SAM: 16000
SBF: 01
SSB: 16
NCH: 1
SPN: PS1
LBD:
TYP: Typ of SAM Label File
DBN: Corpus Name
VOL: Number of Volume
DIR: Directory in Volume
SRC: Name of speech file
BEG: Beginning of labeling sequence
END: End of labeling sequence
RED: Date of Recording
RET: Duration
RCC: Recording Conditions
CMT: Comment
SPI: Speaker Information
PCF: Name of Protocol File
PCN: Protocol Number
EXP: Name of Segmenter
SYS: Labelingsystem
DAT: Date of Labeling
SPA: SAM-PA Version
MAO: MAUS version and option list (paired value list)
GPO: G2P version and option list (paired value list)
SAO: Speech recognition program, version and option list (paired value list)
LBD:
and stretches to
the the end of file. It contains the tiers of the BPF file.
Each tier is identified by an unique label. The order of tiers as well as the
order of lines within a tier is not significant.
These three items are separated by white spaces.
a pair of integers separated by semi-colon refering to an event between those
two words
The symbolic links (relations) number the
word tokens beginning with zero.
(the choice of word tokens as symbolic relations in BPF is arbitrary!).
The label string itself has an special synopsis which is defined in the
tier definition; the label string may contain white spaces.
TRL: 6,7 mit'm
NOI: 4;5 #Klopfen
GES: 2348000 4999 I-Geste ...
GES: 2353000 2999 G-Geste ...
PRB: 13456 TON: P*; FUN: PA
a pair of integers seperated by semi-colon refering to an event between those
two words
SAP: 13456 345 9 aU
a pair of integers seperated by semi-colon refering to an event between those
two words
PRB: 13456 13 TON: P*; FUN: PA
Remarks:
-1
.
Definition of Tiers
KAN:
class 1KAN: (symbolic link) (transcript)
The segmentation of the whole utterance is done in word units, where
everything counts as a word that is produced by the articulatory
organs of the speaker and can be interpreted as 'speech'.
Following this definition hesitations are words, while laughing, coughs,
etc. are not. This separation isn't always clear, but on the other
hand the selection of word units is abitrarily as well. The main point
is a unique reference tier for symbolic relations in other tiers.
Another problems is the reduction of words that are annotated in the
orthographic form, eg. mit'm. In these cases the reduction
is restituted (in this example /mIt de:m/). The reason for this
lies in the fact that some of these reductions should be automatically
accessible.
KAN: 0 j' a:
KAN: 1 Q a l z o:+
KAN: 2 Q E: m
KAN: 3 h 'OY t @
KAN: 4 Q o: d 6+
KAN: 5 m 'O6 g @ n
KSS:
Class 1
KSS: (symbolic link) (transcript)
KSS: 0 d ' e:6
KSS: 1 b ' U n . d @ s . t a: k
KSS: 2 h ' a t
KSS: 3 z ' aI . n @
KSS: 4 d e . b ' a . t @
KSS: 5 ? ' y: . b 6
KSS: 6 d ' i:
KSS: 7 r e . g ' i: . r U N s . ? E6 . k l E: . r U N
KSS: 8 f ' O6 t . g @ . z E t s t
MRP:
Class 1
MRP: (symbolic link) (transcript)
MRP: 0 d er;ART INFL
MRP: 1 bund es tag;NN FG NN
MRP: 2 hat;V
MRP: 3 sein e;PPOS INFL
MRP: 4 debatte;NN
MRP: 5 über;ADP
MRP: 6 d ie;ART INFL
MRP: 7 reg ier ung s er klär ung;V SFX SFX FG PRFX V SFX
MRP: 8 fort ge setz t;PTKVZ PRFX V SFX
KAS:
class 1
KAS: (symbolic link) (transcript)
KAS: 0 v i:6
KAS: 1 m Y s . @ n
KAS: 2 d a n
KAS: 3 d i: . z @
KAS: 4 f i l . j a: . l @
KAS: 5 Q I n
KAS: 6 h a n . o: . f 6
KAS: 7 b @ . z u: . x @ n
PTR:
class 1PTR: (symbolic link) (transcript)
PTR: 0 j a:
PTR: 1 a l z O
PTR: 2 @ m
PTR: 3 h OY t @
PTR: 4 o: d 6
PTR: 5 m O6 N
ORT:
class 1ORT: (symbolic link) (orthography)
Words are not capitalized at the beginning of an utterance or sentence
within an utterance (except nouns of course). German 'Umlauts' and other
letter not conform with 7 Bit ASCII are written as to be used for the
lexical access. Therefore the coding might differ in different speech
corpora, e.g. ISO-8859 or LaTeX coding.
This tier is used for an easy lexicon reference; therefore no additional
markers except lexical words are allowed. There is no punctuation in this
tier. Lexical words include items that are contained in the KAN tier (eg.
hesitations, word breaks).
ORT: 0 ja
ORT: 1 also
ORT: 2 <"ahm>
ORT: 3 heute
ORT: 4 oder
ORT: 5 morgen
TRL:
class 1TRL: (list of symbolic links) (transliteration)
class 1
TRL: 0 <Schmatzen>
TRL: 0 ja ,
TRL: 1 also
TRL: 2 <"ahm>
TRL: 3 heute
TRL: 4 oder
TRL: 5 morgen .
TR2:
class 1TR2: (list of symbolic links) (transliteration)
class 1
TR2: 25 ~Weihnachten
TR2: 26 ist
TR2: 27 das
TR2: 28 sowieso
TR2: 29 immer
TR2: 30 etwas
TR2: 31 schwierig ,
TR2: 32 und
TR2: 33 <"ahm>
TR2: 34 in
TR2: 35 der
TR2: 36 #zweiten
TR2: 37 Dezemberwoche
TR2: 38 bin
TR2: 39 ich
TR2: 40 in
TR2: 41 ~M"unchen
TR2: 42 auf
TR2: 43 dem
TR2: 44 Kongre"s .
TR2: 45 also
TR2: 46 bliebe
TR2: 47 noch
TRO:
Klasse 1TRO: (list of symbolic links) (transliteration)
Klasse 1
TRO: 67 Roten\s
TRO: 68 Himmel.\s\n
TRO: 69 Mein\s
TRO: 70 Blick\s
TRO: 71 folgte\s
TRO: 72 dem\s
TRO: 73 2.\s
TRO: 74 Raumschiff,\s\n
TRO: 75 wie\s
SUP:
class 1SUP: (list of symbolic links) (utterance-id) (transliteration)
class 1SUP
was added to the format. It will give the transliteration of the 'foreign'
speaker together with the symbolic markers to which parts of speech of the
recorded speaker these superimposed events are asigned to. The item
'utterance-id' gives the name of the correspondig Bas Partitur file
containing the superimposing part of speech.
The tier SUP
is currently only used in combination with the
tier TR2
. For a detailed discussion of superimposed speech in the
Verbmobil II project please click
here.
TR2: 0 ich
TR2: 1 w"urde
TR2: 2 vorschlagen ,
TR2: 3 da"s
TR2: 4 wir9@
TR2: 5 dann9@
TR2: 6 <:<#> hinfliegen:> ,
TR2: 7 <:<#> ich:>
TR2: 8 hab'
TR2: 9 jetzt
TR2: 10 aber
TR2: 11 <:<#Rascheln> grade:>
TR2: 12 <:<#Rascheln> keine:>
TR2: 13 Unterlagen
TR2: 14 da . <#>
SUP: 4,5 g002acn2_028_AAK.par @9ja
In this example the speaker is superimposed during the words 4 and 5 by the
single word 'ja' of another speaker. The latter occurs in the BAS Partitur
file 'g002acn2_028_AAK.par'.
PHO:
class 4PHO: (begin) (duration) (list of symbolic links) (label string)
The conventions of labeling and segmentation is briefly described
here.
<label string> = '#c:' (beginning of first word) OR
'#p:' (pause) OR
'#v:' (mis-pronunciation) OR
<segment> OR
<word boundary segment> OR
<compound boundary segment> OR
<punctuation>
<segment> = $<sampa string> (ordinary segment)
<word boundary segment> = ##<sampa string>
<compound boundary segment> = $#<sampa string>
<sampa string> = any string of <extended German SAM-PA symbols>
<punctuation> = '#.' OR '#,' OR '#?' OR '#!'
PHO: 2473 0 0 #c:
PHO: 2473 1100 0 ##d
PHO: 3573 0 0 $a-@
PHO: 4126 2007 0 $s
PHO: 6133 0 0 $-+
PHO: 6133 1130 1 ##g
PHO: 7263 1206 1 $e:
PHO: 8496 937 1 $t
PHO: 9433 0 2 ##Q-
PHO: 9433 0 2 $-q
PHO: 9433 2698 2 $aU
PHO: 12131 1178 2 $x
PHO: 13309 0 2 $-+
PHO: 13309 962 3 ##n
PHO: 14271 1675 3 $I
PHO: 15946 4308 3 $C
PHO: 18579 0 3 $t-
PHO: 18579 0 3 $-+
PHO: 18579 5467 3 #p:
SAP:
class 4SAP: (begin) (duration) (list of symbolic links) (label string)
As an example the conventions of labeling and segmentation of German is briefly described
here.
SAP: 549 867 0 Q%<
SAP: 1416 1242 0 aU
SAP: 2658 1136 0 f
SAP: 3794 408 1 v
SAP: 4202 852 1 i:
SAP: 5054 433 1 d
SAP: 5487 1686 1 6%>
SAP: 7173 828 1 h%<%>
SAP: 8001 864 1 2:-9%<%>
SAP: 8865 1015 1 r-6%<
SAP: 9880 0 1 @-
SAP: 9880 1732 1 n
MAU:
Class 4
MAU: (begin) (duration) (list of symbolic links) (label string)
A detailed description of the MAUS system can be found
here.
The segmentation is justified and has
no relation to the tier 'Vorschlagstranskription' as done in the tier SAP.
(however, there are symbolic links to the words).
The units are extented German SAM-PA.
Additional labels are <nib>
(non-speech event) and
<p:>
(pause). These labels always get the symbolic
link -1
(no link).
Furthermore, events that clearly stem from the speaker, but cannot be
classified (e.g. non-understandable words) are labelled and segmented
as <usb>
. The latter receive a symbolic link as other
word events.
MAU: 0 676 -1 <p:>
MAU: 677 7861 -1 <nib>
MAU: 8539 450 0 g
MAU: 8990 2436 0 u:
MAU: 11427 1740 0 t
MAU: 13168 958 1 d
MAU: 14127 1298 1 a
MAU: 15426 3820 1 n
MAU: 19247 303 2 n
MAU: 19551 1785 2 e:
MAU: 21337 624 2 m
MAU: 21962 636 2 n
MAU: 22599 501 3 v
WOR:
Class 4WOR: (begin) (duration) (list of symbolic links) (label string)
The symbolic links give the relation to the KAN tier. Note that inserted
words have a symbolic link to the previous word in the KAN tier.DAS:
Class 1DAS: (list of symbolic links) (marker string)
KAN
.
DAS: 0,1,2,3,4,5 @(SUGGEST_SUPPORT_DATE BA)
DAS: 6,7,8,9 @(DELIBERATE_EXPLICITE BA)
DAS: 10,11,12,13,14,15,16,17,18,19,20 @(SUGGEST_SUPPORT_DATE BA)
In this example the marker SUGGEST_SUPPORT_DATE
covers the words 0 to 5 in the reference tier. The term 'BA' denotes
a dialog act from speaker 'B' to speaker 'A', where speaker 'A' is
always the speaker that initiates the dialog.
A more detailed description of the markers and the principles
of segmentation can be found
here.
PRB:
Class 5PRB: (sample) (list of symbolic links) (marker string)
The first number gives the time of the prosodic event measured in
samples from beginning of the file.
The symbolic links give the relation to the KAN tier.
The label string describes the prosodic event itself. A concise
description of the labeling convention (GTobi) can be found
here (Sorry: only in German).
PRB: 54212 5 TON: H*; FUN: NA
PRB: 63269 7 TON: L+H*; FUN: EK
PRB: 76371 8 BRE: B3; TON: L-L%
PRB: 79967 8 TON: L*+H; FUN: PA
PRS:
Class 1PRS: (list of symbolic links) (marker string)
The label string describes the prosodic event itself. Boundary markers
(B3, B2, B9) are linked to two words acting as left and right neighbors
of the boundary.
Accent markers (PA, NA, EK) refer to the word where the accent was
labeled. No syllable
information is provided.
Definition of Marker Strings:
B3 : full intonational boundary with strong intonational marking,
often with pauses or lengthening or change of speed
B2 : intermediate phrase boundary with weak marking, weaker intonational
marking than B3
B9 : 'agrammatical' boundary, e.g. hesitations, repairs, unintended pauses
PA : main accents (phrase accent) carried by one word; in rare cases
there can be two or more words marked together
NA : secondary accent for accentuated words without PA
EK : emphatic or contrastive accents
PRS: 0 EK
PRS: 4;5 B2
PRS: 7 NA
PRS: 9 NA
PRS: 11 NA
PRS: 11;12 B3
PRS: 13 EK
PRS: 14 EK
PRS: 15 PA
PRS: 17 NA
PRS: 17;18 B2
PRS: 18 NA
PRS: 19;20 B3
PRS: 23 EK
PRS: 23;24 B3
PRS: 25 EK
PRS: 27 PA
NOI:
Class 1NOI: (single or pair of symbolic links) (marker string)
The marker string contains a blank seperated list of noise labels. The
labels are drawn from the VMII TRL transliteration format:
<A> <B> : Breathing
<P> : distinct silence within an utterance
<%> : not understandable muttering
Schmatzen> <Smack> : lip smack
<Schlucken> <Swallow> : swallow
<R"auspern> <Throat> : throat clear
<Husten> <Cough> : cough
<Lachen> <Laugh> : laugh
<Ger"ausch> <Noise> : other articulatory noise
<#Klopfen> <#Knock> : knock
<#Rascheln> <#Rustle> : rustle
<#Quietschen> <#Squeak> : creak
<#Klicken> <#Click> : click noise
<#Mikrowind> : blowing into microphone
<#Mikrobe> : noise caused by touching, knocking,
rubbing against the microphone
<#> : other technical noise
NOI: 5 <Lachen> # word 5 is superimposed by a laugh
NOI: 5;6 <A> # between word 5 and word 6 a distinct
# breathing was recorded
LBP:
Class 3LBP: (sample) (marker string)
The following three accent classes were used:
PA phrase accent
NA secondary accent
EK emphatic or contrastive accents
For example:
LBP: 1651 PA
LBG:
Class 3LBG: (sample) (marker string)
The following 5 boundary classes were used:
B9 irregular boundary: 'agrammatical' boundary, e.g. hesitations,
repairs, unintended pauses
B2 intermediate phrase boundary with weak marking, weaker intonational
marking than B3
B3 intonational boundary with strong intonational marking, no
question
B3QH B3, sematically a question, with high tone
B3QL B3, sematically a question, with low tone
For example:
LBG: 6586 B3
PRO:
Class 1PRO: (sybolic link) (marker string)
PRO: 6;7 SS2
PRO: 13;14 AC1
PRO: 14;15 AC1
PRO: 15;16 AC1
PRO: 18;19 SC3
PRO: 24;25 IRB
PRO: 25;26 AC1
PRO: 26;27 AC1
PRO: 27;28 AC1
PRO: 28;29 IWE
PRO: 28;29 IZB
PRO: 31 SM3
SYN: FUN: LEX:
Class 1SYN: (sybolic link) (marker string)
FUN: (sybolic link) (marker string)
LEX: (sybolic link) (marker string)
Representation of Syntax Trees in the BAS Partitur Format (BPF)
===============================================================
In the BAS Partitur Format the syntax trees are represented in three
tiers. The terminal (lexical) categories are listed in the LEX
tier. Syntactical categoies of higher orders are listed in the SYN
tier. Grammatical functions refering to both LEX and SYN are
listed in the FUN tier. The LEX and the SYN entries refer to the nodes
and FUN represents the edges of the syntax tree.
Lexical Categories:
-------------------
Definition:
LEX: (symbolic link) (label string)
This tier represents the lexical categories of the words. The words
are represented by symbolic links. Hesitations, neologisms and
unintelligible parts of an utterance have not been annotated.
Example:
LEX: 0 0 PDS
LEX: 1 0 VMFIN
LEX: 2 0 CARD
LEX: 3 0 NN
LEX: 4 0 ADJD
LEX: 5 0 VVINF
The label string contains
(1) a tag for the lexical category, e.g. CARD (cardinal number) for word 2.
(2) an index indicating whether the node is terminal or branching or
non-branching. The LEX tier represents only terminal nodes therefore
the index is always 0 (see SYN and FUN tier for further information
to this index).
LEX labels used in syntax trees of German dialogues:
UNKNOWN unknown tag
--
SYN: 0 1 DM
SYN: 1 1 NX
SYN: 1 2 VF
SYN: 1,2,3,4,5 0 SIMPX
SYN: 2 1 VXFIN
SYN: 2 2 LK
SYN: 3 1 ADVX
SYN: 3,4,5 0 MF
SYN: 4 1 NX
SYN: 5 1 ADVX
SYN: 7 1 VXFIN
SYN: 7 2 LK
SYN: 7,8,9,10,11 0 SIMPX
SYN: 8 1 NX
SYN: 8,9,10,11 0 MF
SYN: 9,10,11 0 NX
SYN: 10 1 NX
SYN: 10,11 0 NX
SYN: 11 1 NX
FUN: 0 0 -
FUN: 0 1 --
FUN: 1 0 HD
FUN: 1 1 ON
FUN: 1 2 -
FUN: 1,2,3,4,5 0 --
FUN: 2 0 HD
FUN: 2 1 HD
FUN: 2 2 -
FUN: 3 0 HD
FUN: 3 1 MOD
FUN: 3,4,5 0 -
FUN: 4 0 HD
FUN: 4 1 OA
FUN: 5 0 HD
FUN: 5 1 V-MOD
FUN: 7 0 HD
FUN: 7 1 HD
FUN: 7 2 -
FUN: 7,8,9,10,11 0 --
FUN: 8 0 HD
FUN: 8 1 ON
FUN: 8,9,10,11 0 -
LEX: 0 0 PTKANT
LEX: 1 0 PPER
LEX: 2 0 VAFIN
LEX: 3 0 ADV
LEX: 4 0 NN
LEX: 5 0 ADV
LEX: 7 0 VVFIN
LEX: 8 0 PPER
LEX: 9 0 ART
LEX: 10 0 NN
LEX: 11 0 NE
POS:
Class 1POS: (sybolic link) (marker string)
POS: 0 ITJ
POS: 1 PPER
POS: 2 VAFIN
POS: 3 ADV
POS: 4 NN
POS: 5 ADV
POS: 7 VVFIN
POS: 8 PPER
POS: 9 ART
POS: 10 NN
POS: 11 NE
LMA:
Class 1LMA: (sybolic link) (marker string)
LMA: 0 nein
LMA: 1 pper
LMA: 2 haben
LMA: 3 hier
LMA: 4 Unterlage
LMA: 5 da
LMA: 7 kennen
LMA: 8 pper
LMA: 9 d
LMA: 10 Hotel
LMA: 11 Maritim
Please note that all personal pronomina were annotated with 'pper' and
all articles were annotated with 'd'.
IPA:
Class 2IPA: (begin) (duration) (label string)
The first number denotes the beginning of a segment counted in samples
from the beginning of the file, the second number denotes the duration of
the segment in samples. The remainder of the line must contain a list of
comma-separated IPA numbers (at least one), optionally followed by a list
of corresponding SAM-PA symbols.
IPA chart
with IPA numbers
IPA chart
with symbols
IPA: 4856 1228 322 @
IPA: 10629 564 317
IPA: 11805 991 319 I
IPA: 12797 1142 138 C
IPA: 13940 1534 302 e
IPA: 15475 895 110 g
IPA: 16371 777 322 @
IPA: 17149 758 155 l
IPA: 17908 1497 305
IPA: 19406 1204 116 n
IPA: 20611 589 104 d
IPA: 21201 1018 322 @
IPA: 22220 1185 103 t
TRN:
Class 4TRN: (begin) (duration) (symbolic link) (label string)
The first number denotes the beginning of a segment counted in samples
from the beginning of the file, the second number denotes the duration of
the segment in samples. The symbolic link contains a list of comma separated
word numbers that are contained in the segment. The rest of the line may contain an optional label (e.g. a turn number).
TRN: 132736 144640 0,1,2,3,4,5,6,7 002
TRS
class 1TRS: (list of symbolic links) (transliteration)
A detailed description of the underlying transliteration format can be found
here.
The transliteration
is segmented into the units of the KAN tier (see above) by starting a new
line after each unit. Exceptions are punctuations and pronunciation
comments that are kept together with the last unit (this is just for a better
readability).
TRS: 0 <:<#> ja:> [NA] [B2] ,
TRS: 1 ich
TRS: 2 h"atte
TRS: 3 <:<#> gern:> [NA]
TRS: 4 +/die/+ [B9] <P>
TRS: 5 die
TRS: 6 Sehensw"urdigkeiten [PA]
TRS: 7 von
TRS: 8 ~Heidelberg <!1 Heidelber'> [NA] [B3 fall] .
TRS: 9 gibt [NA]
TRS: 10 es
TRS: 11 hier
TRS: 12 vielleicht
TRS: 13 Cafeterias [PA] [B3 rise] ? <#>
TRS: 14 was
TRS: 15 f"ur
TRS: 16 Hotels [NA]
TRS: 17 gibt [PA]
TRS: 18 es [B3 cont] ?
TRS: 19 @1mhm [NA] [B3 cont] .
TRS: 20 kannst <!1 kanns'>
TRS: 21 was
TRS: 22 andres [PA]
The same tier was also used in the German
SmartWeb Project.
See TRW tier.
GES
class 2GES: (begin) (duration) (label string)
The first number denotes the begin of a gestural event in samples from the
beginning of the recording (in SmartKom: 16 kHz); the second number the
duration in samples.
The 'label string' consists of 8 columns separated by TAB, optionally followed by a comment string:
This string is either '[FINGER] re|li [TOOL]' or 'nicht erkennbar', if the
pointing method cannot be determined.
For instance 'Zeige re Hand' denotes the index finger of the right hand;
'li Hand' denotes the left hand (more than one finger used); 'li Stift'
denotes a gesture performed with the left hand holding a pen.
If more than one finger or a pen is used, the string [FINGER] is empty.
The string may have one of the three forms:
The reference word is only labelled in I-gestures.
The reference zone is only labelled in I- and U-gestures.
The reference object is only labelled in I-gestures.
This may be free text or - more often - one of the following remarks (codes):
GES: 1072000 23039 I-Geste I - tipp + Zeige li Hand links oben Treffer 1078400 12159
GES: 1959680 114559 R-Geste R - emot - re Hand 1078400 12159 "Uberlegung/Nachdenken
GES: 2166400 15999 I-Geste I - tipp + Zeige li Hand links oben rechts 2171520 7679
GES: 2641280 12799 I-Geste I - tipp + Zeige re Hand § Schlo"s rechts unten Treffer 2647680 5119
GES: 3093120 14079 I-Geste I - tipp + Zeige re Hand links unten Treffer 3098240 7039
GES: 3351680 7039 R-Geste R - UFO re Hand 3098240 7039
GES: 4029440 22399 I-Geste I - tipp + Zeige li Hand links oben rechts 4035840 10239
USH
class 2USH: (begin) (duration) (label string)
The labels are assigned with respect to the impression of the labeler.
Not only the facial expression but also the voice quality or other
contextual information is considered. Only the use of words with
emotional content, but without an emotional expression is NOT considered
as an indicator of a respective emotion/user-state.
USH: 0 205439 Freude/Erfolg schwach
USH: 205440 30719 Neutral
USH: 236160 37759 Freude/Erfolg schwach
USH: 273920 191999 Neutral
USH: 465920 78719 "Uberlegen/Nachdenken stark
USH: 544640 295679 Neutral
USH: 840320 49919 "Arger/Mi"serfolg schwach
USH: 890240 42879 Neutral
USH: 933120 21759 "Uberraschung/Verwunderung schwach
USH: 954880 97919 Ratlosigkeit schwach
USH: 1052800 542719 Neutral
See also tiers USM, USP and
OCC.
USM
class 2USM: (begin) (duration) (label string)
For more background information about the SmartKom data collection see
here.
The labels are assigned with respect to the impression of the labeler.
ONLY the facial expression but NOT the voice quality or other
contextual information is considered.
This annotation was performed by a different labeler group than the
USH annotation. Therefore this annotation may be used for a
investigation of influence of speech input to user stae judgements.
USM: 0 205439 Freude/Erfolg schwach
USM: 205440 30719 Neutral
USM: 236160 37759 Freude/Erfolg schwach
USM: 273920 191999 Neutral
USM: 465920 78719 "Uberlegen/Nachdenken schwach
USM: 544640 295679 Neutral
USM: 840320 49919 "Arger/Mi"serfolg schwach
USM: 890240 42879 Neutral
USM: 933120 119679 "Uberlegen/Nachdenken schwach
USM: 1052800 542719 Neutral
USM: 1595520 59519 "Uberlegen/Nachdenken schwach
USM: 1655040 157439 Neutral
USM: 1812480 143359 "Uberlegen/Nachdenken schwach
USM: 1955840 58879 "Arger/Mi"serfolg stark
USM: 2014720 89599 Neutral
USM: 2104320 559359 "Arger/Mi"Serfolg schwach
USM: 2663680 263679 Neutral
USM: 2927360 28799 "Arger/Mi"serfolg schwach
See also tiers USH, USP and
OCC.
OCC
class 2OCC: (begin) (duration) (label string)
The label string contains one of the following 10 classes:
OCC: 380800 18559 Teilweise nicht im Bild
OCC: 458880 58239 Teilweise nicht im Bild
OCC: 1167360 7679 Teilweise nicht im Bild
OCC: 1173120 14719 Hand im Gesicht
OCC: 1201920 11519 Teilweise nicht im Bild
OCC: 2000000 12159 Hand im Gesicht/Mund
OCC: 2567040 57599 Teilweise nicht im Bild
OCC: 2709120 40959 Hand im Gesicht/Mund
OCC: 2947840 33279 Hand im Gesicht
OCC: 2955520 9599 Teilweise nicht im Bild
OCC: 2981120 35839 Teilweise nicht im Bild
OCC: 3528960 10879 Hand im Gesicht
OCC: 4001920 10239 Hand im Gesicht
OCC: 4103680 20479 Teilweise nicht im Bild
See also tiers USH, USP and
USM.
USP
class 4USP: (begin) (duration) (list of symbolic links) (label string)
The symbolic links refers to the word in question.
The label string contains one of the following 9 classes.
(If not stated otherwise the segment is the duration of the complete word.)
Speaker trys to speak Standard German (Hochdeutsch'); no dialectal
variations; but not yet hyper-articulated; comparable to a trained radio announcer.
Un-natural emphasis on clear speech; like speaking to a person with bad
language skills.
Very strong accentuation of a syllable, e.g. 'MOOONtag'
Unusual pausing between semantic units; not pauses between sentences or
between main clause and sub-ordinate clause (except they are very long)
In this case the segment covers the word before the pause plus the pause.
Pause between words where usually no pause should occur.
In this case the segment covers the word before the pause plus the pause.
In this case the segment covers the word in which the pause occurs between syllables.
Only words that are affected by laughter, strong breathing etc; no laughter
alone.
In this case the segment covers the word which is overlapped.
USP: 3678656 14144 48;49 PAUSE_WORD
USP: 79552 6704 0 EMPHASIS
USP: 426176 8768 6 STRONG_EMPH
USP: 426176 8768 6 CLEAR_ART
USP: 435952 10160 7 CLEAR_ART
USP: 806560 6592 9 LENGTH_SYLL
USP: 814624 4832 10 LENGTH_SYLL
USP: 819776 17184 11 EMPHASIS
USP: 1356896 6000 13 LENGTH_SYLL
USP: 1785232 11808 20 LENGTH_SYLL
USP: 1798064 7808 21 LENGTH_SYLL
USP: 2449632 7376 23 LENGTH_SYLL
USP: 2470016 10736 27 LENGTH_SYLL
USP: 2470016 14800 27;28 PAUSE_WORD
USP: 2794160 12080 31 LENGTH_SYLL
USP: 3221632 5440 41 CLEAR_ART
USP: 3678656 8528 48 LENGTH_SYLL
USP: 3678656 14144 48;49 PAUSE_WORD
USP: 3694576 3824 49 EMPHASIS
USP: 4170960 11344 53 LENGTH_SYLL
USP: 4186192 4464 54 EMPHASIS
See also tiers USH, OCC and
USM.
TLN
class 1TLN: (list of symbolic links) (label string)
The label string contains a marker giving the language pair of the translation in the
form '##>%%' where '##' is the international language code for the source language while
'%%' is the code for the target language. e.g. from German to English: 'DE>EN'.
After this marker, separated by a single TAB follows the orthographic form of the
translation without punctuation. Coding of special characters may differ as in the tier
ORT (see above).
ORT: 0 okay
ORT: 1 thank
ORT: 2 you
ORT: 3 bye
TLN: 0,1,2,3 EN>DE gut danke tschüs
PRM
class 3PRM: (point-in-time) (label string)
PRM: 98160 L*H
PRM: 108665 -
PRM: 132414 H*L
PRM: 158400 %?
TRW
class 1TRW: (list of symbolic links) (label string)
Additional tags:
For example:<ROT>
read off-talk; speaker reads from display
<POT>
paraphrased off-talk; speaker repeats
information with his/her own words (usually to a second speaker)
<SOT>
spontaneous off-talk; speaker converses with
a (human) third party
<OOT>
other off-talk; thinking aloud etc.
weitere<POT> ber"uhmte<POT> Sehensw"urdigkeiten<POT> in%<POT> ~Berlin<POT>
sind<POT> der<POT> ~Alexanderplatz<POT> , der<POT> Funkturm<POT> ,
das<POT> ~Brandenburger+Tor<POT> und<Z><SOT> das<SOT> letzte<SOT>
hab'<SOT> ich<SOT> vergessen<SOT> .
In contrast to the TRS pronunciation comments where only orthographic coding
of the deviation from the canonic pronunciation is possible
(e.g. haben wir <!2 hama>
), here
an extra SAM-PA coding string is added in the tag, e.g.
haben wir <!2 hama#ha:m6>
.
###.###
signifies the milliseconds from the start of recording:
<ZA ###.###>
<ZE ###.###>
TRW: 0 <ZA 211.619> wurde<POT>
TRW: 1 #zw"olf<POT>
TRW: 2 irgendwann<POT>
TRW: 3 von<POT> <P>
TRW: 4 <%> . <PP>
TRW: 5 <"ah>
TRW: 6 's<POT>
TRW: 7 wurde<POT>
TRW: 8 #zw"olf<POT>
TRW: 9 #drei"sig<POT>
TRW: 10 von<POT>
TRW: 11 ~Otto<POT>
TRW: 12 dem<POT>
TRW: 13 <%>
TRW: 14 und<POT>
TRW: 15 ~Heinrich<Z><POT>
TRW: 16 irgendjemandem<POT>
TRW: 17 gegr"undet<POT> .
TRW: 18 ~Heinrich<POT>
TRW: 19 der<Z><POT> ,
TRW: 20 keine<SOT>
TRW: 21 Ahnung<SOT> ,
TRW: 22 und<POT>
TRW: 23 ~Otto<POT> ,
TRW: 24 was<SOT>
TRW: 25 wei"s<SOT>
TRW: 26 ich<SOT> <;ungrammatisch> . <PP>
TRW: 27 #zw"olf<POT> , <P>
TRW: 28 ne<OOT> . <ZE 233.342>
MAS
class 4MAS: (begin sample) (duration sample) (list of symbolic links) (label string)
MAS: 53600 1920 0 'smar
MAS: 55520 10560 0 ta
MAS: 66080 1680 0 kUs
MAS: 67760 11120 1 'vEl
MAS: 78880 960 1 C@
MAS: 79840 1600 2 'li:
MAS: 81440 6880 2 plINs
MAS: 88320 1600 2 'far
MAS: 89920 1920 2 b@
MAS: 91840 1760 3 'has
MAS: 93600 1120 4 'du:
MAS: 220256 480 5 m
MAS: 220736 11040 6 'mi:6
MAS: 231776 2560 7 'maI
MAS: 234336 2240 7 n@
MAS: 236576 4160 8 'fra:
MAS: 240736 2080 8 g@
MAS: 242816 1600 9 b@
MAS: 244416 5440 9 'ant
MAS: 249856 4160 9 'vO6
MAS: 254016 2400 9 t@n
SPK
class 1SPK: (list of symbolic links) (label string)
ORT: 0 okay
ORT: 1 bye
ORT: 2 good
ORT: 3 bye
SPK: 0 speaker001
SPK: 1 speaker001
SPK: 2,3 speaker002
SAM
The SAM Format was defined in the
ESPRIT "SAM" Project No 2589 : 'Speech Input and Output
Assessment Methodologies and Standardization'. Only very few BAS
corpora contain SAM Format files.
On each BAS CDROM you will find
scripts (sam2pho, pho2sam
)
for the conversion of SAM into PhonDat and vice versa.
AGS - Annotation Graphs
Bird et al (LDC) use an abstract and very general data model called
'annotation graphs' to represent all kinds of annotations in the
ATLAS project.
The BAS Partitur Format (BPF) can be represented as
an annotation graph as well.
Since LDC provides also software modules for designing new annotation
tools based on this model, they defined a SGML based format (based on
ATLAS Level 0, v1.1b3) to
store and exchange such annotation data (AGS).
On each BAS CDROM you will find the script
par2ags.pl that transforms a BAS Partitur Format (BPF) file
into an AGS file. A DTD for the AGS format can be found
here.
Some BAS corpora are already shipped with both formats, BPF and AGS.
Florian Schiel