Susanne Burger
Institut für Phonetik und sprachliche Kommunikation
München
(Lexicon version of the "Handbuch zur Datenaufnahme und
Transliteration in TP14 von VERBMOBIL -3.0"
Kohler,Lex,Pätzold,Scheffers,Simpson,Thon
Kiel,September1994)
A Postscript-Version of the Handbuch is available at the IPDS
Kiel.
Same page in German
Contents
Memo 111: Aussprachevarianten in der VERBMOBIL-Transliteration - Regeln zur konsistenteren Verschriftung (Burger, Kachelrieß, München, August 1996)
This Memo proposes regularities for a more consistent transliteration of the VERBMOBIL-pronunciation-comments at the orthographic level.
Sorry, only in german, but the english version is coming soon!
German umlauts and Sz are transliterated according to the LaTex-Format.
Only period, comma and question mark are allowed. Contrary to Duden conventions, small letters are used after periods and question marks. Nouns are always written with capital letters at the beginning of the word. There is always a blank before and after punctuation marks. In general, punctuation in spontaneous speech is not easy because sentences are rarely grammatically well-formed.
Symbol: " . "
Location: at the end of a sentence; between blanks
Example:
... so , guten Tag , mein Name ist <!1 is'> J"ansch . <"ah> wir hatten bereits telefoniert<Z> . ...
Note:
When in doubt a decision should be based on:
Symbol: " ? "
Location: at the end of a sentence; between blanks
Example:
... <A> wie schaut 's denn aus , den darauffolgenden Sonntag , den <:<#Mikrobe> neunundzwanzigsten:> bei Ihnen ? geht 's da ? ...
Note:
When in doubt a decision should be based on:
Symbol: " , "
Location: between sentence parts, subordinate clauses; between blanks
Example:
... <"ahm> morgen , Freitag , <h"as> wie ich seh' , <"ah> mu"s ich feststellen , da"s ich <"ah> "uberhaupt keine Zeit hab' . ...
Note:
When in doubt a decision should be based on:
e.g.:
a) I could , I've always time on Wednesdays . b) I could . <P> <A> I've always time on Wednesdays .
capital letter small letter :> <A> (only in case of exhalation!) <Z> <%> <!n ..> <;..>
A transliteration file consists of a header
Example:
; Dialog N057K ; zuletzt bearbeitet am 23.5.94 ; Tonqualit"at: (allgemeine Kommentare zu ; Sprechern oder zur Aufnahmequalit"at des ; Dialogs)
and a text part,
which is separated from the header by an empty line.
Every turn of the dialogue is transliterated in the text part. A turn
starts with the speaker code, containing three capital letters and a turn
number, starting at 000. After the turn number there has to be a colon,
followed by a blank.
Now the text part begins. Words are not hyphenated at the end of a line,
every new line within a turn begins with eight blanks. There may be a comment
referring to the previous turn in the line after a turn. To mark the beginning
of each comment, a semicolon is set before the comment. The turns are separated
by empty lines.
Example:
AAP000: so , guten Tag , mein Name ist <!1 is'> J"ansch . <"ah> wir hatten bereits telefoniert<Z> , mein Name J<Z>"ansch , $J $"A $N $S $C $H , wegen <:<#Mikrobe> eines Arbeitstreffens:> . BBP001: gr"u"s Gott , mein Name ist G<Z>"urtner , <A> <"ahm> $G $"U $R $T $N $E $R <A> . <A> <"ahm> ;Brummen "uber gesamtem Turn AAP002: ja , <:<#Mikrobe> ich kuck' jetzt:> mal nach bei mir , wann ich <A> einen <!1 ein'> Termin frei h"atte . <A> das <:<#Mikrowind> erste:> w"are <Schmatzen> in der Woche oder die Tage vom vierzehnten Juli bis zum achtzehnten <:<#Mikrobe> Juli:> . <A> ginge das bei <:<#> Ihnen:> ? BBP003: ...
symbol............meaning $.................spelling *.................non-word/mispronunciation <"ah>............. <"ahm>............ <hm>.............. <h"as>............hesitation/filled pause <Schmatzen>....... <Schlucken>....... <R"auspern>....... <Husten>.......... <Lachen> ......... <Ger"ausch>.......nonverbal articulatory noises <Z>...............lengthening <A>...............breathing <P>...............pause/silence li_blank..........word interruption: left boundary (underscore,blank) blank_re..........word interruption: right boundary (blank,underscore) =.................word fragment /-................truncation +/................self-repairs: left boundary /+................self-repairs: right boundary %.................words which are difficult to understand <%>...............unintelligible speech <;..>.............comment <;T>..............technical break <! ..>............variants of pronunciation <#Klicken>........ <#Klingeln>....... <#Klopfen>........ <#Mikrobe>........ <#Mikrowind>...... <#Rascheln>....... <#Quietschen>..... <#>...............non-articulatory noises <:................beginning of overlay :>................end of overlay (.................beginning of overlapped speech ( and (@ ).................end of overlapped speech ) and @)
There are a lot of special symbols in the text part of the TRL-files. These symbols have been defined for a better transliteration of the phenomena of spontaneous speech.
In the following a description of all used symbols can be found. There
are always short examples from the original TRL-files.
A definition of a symbol consists of
Example:
... mein Name ist J<Z>"ansch , $J $"A $N $S $C $H ...
Symbol: $
Every spelled letter is written as a capital letter and separated by
a blank from the following spelled letter.
$ is set before the capital letter. There is no blank between $ and the
spelled letter.
( +/ - blank
Example:
... *haarknapp <:<#Rascheln> um einen <!1 ein'> Tag verfehlt ... ... was *exkursieren Sie denn ? ...
Symbol: *
non-words are:
A non-word is marked with a * . There is no blank between the asterisk and the non-word.
( +/ <;T> blank
Example:
... <"ah> wir hatten bereits telefoniert<Z> ... ... gu<Z>t , <"ahm> wie w"ar' es bei Ihnen am<Z> neunzehnten Juli ... ... ich denke/- <hm> also bei mir ginge es sehr gut ... ... w"urd' ich sagen , <A> <h"as> wenn wir ...
Symbol: <"ah> <"ahm> <hm> <h"as>
Hesitations are transliterated between pointed brackets.
( <;T> blank
Example:
... Mittagessen <P> <Schmatzen> und<Z> , na ja ... ... ordentlich planen mu"s und <"ah> <Schlucken> wann w"urde Ihnen ... ... einverstanden . <P> <R"auspern> auf Wiedersehen ... ... +/je/+ <Husten> je schneller wir das machen ... ... bei <:<Lachen> mir:> terminlich sehr ung"unstig ... ... dann/- <"ah> <A> <Ger"ausch> das sind sechs Termine ...
Symbol: <Schmatzen> <Schlucken> <R"auspern> <Husten> <Lachen> <Ger"ausch>
Articulatory noises are transliterated at the time of their production
between blanks.
Other verbal and nonverbal productions can be overlaid
with articulatory noises.
( <: blank
Example:
... ich h"atte Zeit +/am<Z>/+ <A> <Schmatzen> ab Dienstag ... ... ich dachte ger<Z>ade ...
Symbol: <Z>
<Z> is added immediately after the letter(s) representing the lengthened sound, regardeless of the place within the word where the lengthening occurred.
small letter capital letter
Example:
... ist G<Z>"urtner , <A> <"ahm> $G $"U $R $T $N $E $R <A> . <A> ...
Symbol: <A>
Most of the time <A> marks inhalation and is thus transcribed
after the punctuation mark.
In case of clear exhalation at the end of a sentence or sentence part,
breathing also can be marked before the punctuation mark.
( blank
Example:
... ich habe <P> M"oglichkeiten dazu ...
Symbol: <P>
In case of a coincidence of <P> and punctuation mark, the pause is set after the punctuation mark.
( blank
Example:
... Ver_ <A> _pflichtungen ... ... statt_ +/f=/+ <h"as> _findet ...
Symbol: _
Sometimes words are interrupted.
Here an underscore should be set immediately after the interruption without
a blank in between, and also an underscore immediately before the continuation.
All events occuring during the interruption are transliterated between
blanks and between the word parts.
- <Z> capital letter small letter
blank
Example:
... +/sieb=/+ siebzehnter ... ... im Ja=/- also ich sag' Ihnen jetzt ...
Symbol: =
In case of a complete break-off of a word without continuation the break-off position is marked with an = added at the break-off. There is no blank between the break-off and the equals sign.
<Z> capital letter small letter
Example:
... ab dem dritten August <A> bis zum/- <P> Moment , ich ... ... ja , ich hab' da eigentlich/- also ich bin vom neunzehnten bis ...
Symbol: /-
Truncation refers to the case in which a topic has been broken off while
speaking and a new topic is initiated after the break-off.
The break-off position is marked with /- and without a preceeding blank.
There is no punctuation mark after /- .
= <A> (only in case of exhalation!) <Z> <;T> capital letter small letter
Example:
... +/am/+ <:<#Mikrobe> am Donnerstag:> kann ich erst ... ... +/im Sep=/+ im September ... ... die Woche +/von/+ <"ah> mit Freitag ... ... also +/das/+ das/+ das zweite ...
Symbol: +/../+
In case of self-repairs after the break-off the reparandum is either
repeated or corrected.
The break-off position is marked with /+ and without blank in between .
The utterance which is to be repeated/corrected after the break off starts
with +/ and without a following blank, so that +/../+ put the reparandum
in brackets.
( blank
= <A> (only in case of exhalation!) <Z> <;T> capital letter small letter
Example:
... %eins , %zwei , %drei , %vier ... ... wann h"atten Sie %da bitte Zeit ...
Symbol: %
Words which are difficult to understand have a % before without a blank in between.
( +/ <;T> blank
Example:
... aber <A> <"ah> <%> <"ahm> wie w"ar's denn ... ... <%> fr"uher geht 's leider nicht ...
Symbol: <%>
unintelligible items are transliterated with <%> between blanks.
( blank
Example:
; Tonqualit"at: viele Nebenger"ausche ... ;gesamter Turn verrauscht ... ... zum Beispiel <;"ubersteuert> ... ... einen Termin vereinbaren <;heiser> ...
Symbol: ;
A comment relating to the whole dialogue is written in front of the
dialogue in the header. A comment line starts with a semicolon. After the
comment follows an empty line.
A comment about a turn is prefixed with a semicolon at the beginning of
the line and added after the contribution without an empty line.
Symbol: <;..>
Local comments are inserted in the text after the concerning position and after a blank, prefixed with a semicolon and set in pointed brackets.
blank
Example:
... ab ein U<;T> ... ... <;T>neunzehnten is<Z>t etwas schwierig ... ... erste Woche ginge <A> <;T> <#Klicken> <A> <"ah> Montag ... ... unsere Termine alle untergebra<;T> <;T>ke ...
Symbol: <;T>
Depending on when the recording button is activated a speech recording
may be truncated at the beginning or at the end, but may be also interrupted
during a dialogue contribution.
In case of a technical disturbance during a lexical unit, <;T> is
added before or after the unit without a blank in between. A blank is placed
between two consecutive <;T>.
If no lexical unit is involved <;T> is written between blanks.
<: +/ <h"as> <Z> capital letter small letter blank
Example:
... damit w"ar' das <!1 des> eigentlich klar ... ... dann kommen Sie <!2 komm' Se> doch ... ... wenn wir <!2 wemma> 's die Woche noch machen ...
Symbol: <!n ..>
Dialectal variants, different speaking styles or other deviations are transliterated in correct orthography according to the German Duden. The actually produced utterance is transliterated after the correct version and after a blank. This utterance is placed in pointed brackets. An exclamation mark and a following digit giving the number of lexical units concerned and a further blank initiates the pronunciation variant.
A more detailled description of the regularities for the transliteration
of pronunciation variants at the orthographic level can be found in:
Verbmobil-Memo 111: Aussprachevarianten in
der VERBMOBIL-Transliteration - Regeln zur konsistenteren Verschriftung
(Burger, Kachelrieß, München, August 1996).
(Sorry, only in german, but the english version is coming soon!)
blank
Example:
... de<Z>m <:<#Klicken> neunundzwanzigsten:> August ... ... <:<#Klingeln> am Mittwoch st"anden:> ... ... <:<#Klopfen> gern so machen:> ... ... <#Mikrobe> das tut mir leid ... ... ich k"onnte Ihnen <:<#Mikrowind> vorschlagen:> ... ... nach Berlin <P> ausmachen . <#Rascheln> <"ahm> ... ... der gesamte <:<#Quietschen> Rest des Mais:> ... ... k"ame mir gelegen , <A> <#> allerdings ...
Symbol: <#Klicken> <#Klingeln> <#Klopfen> <#Mikrobe> <#Mikrowind> <#Rascheln> <#Quietschen> <#>
<#Klicken>: the noise of pushing the record button
<#Klingeln>: telephone ring
<#Klopfen>: e.g. knocking on the table
<#Mikrobe>: touching the microphone
<#Mikrowind>: blowing into the microphone
<#Rascheln>: e.g. paper rustle
<#Quietschen>: e.g. chair squeaking
<#>: non-articulatory noises, which can not be categorized by one
of the other classes.
Non-articulatory noises are transliterated with a prefixed # and in
pointed brackets.
Other verbal and nonverbal productions can be overlaid
with non-articulatory noises.
( <: blank
Example:
... de<Z>m <:<#Klicken> neunundzwanzigsten:> August ... ... bei <:<Lachen> mir:> terminlich sehr ung"unstig ... ... <:<#Klopfen> <Lachen> k"amen mir die Monate:> April ... ... <:<Lachen> ich habe:> <:<#Klopfen> <Lachen> heute:> keine Zeit ...
Symbol: <:..:>
Non-articulatory noises or nonverbal articulatory productions overlaying
lexical units or other events are set in brackets together with the concerned
lexical units or the concerned events.
Behind <: follows the overlay, a blank and the overlaid production.
:> closes the overlaid utterance. There is no blank between the overlaid
production and the closing bracket.
In case of several overlaies the following order of events has to be obeyed:
- non-articulatory noises,
- articulatory noises,
- lexical units.
( blank
/+ /- - <.. > <#.. > <h"as> <A> <Z> <%> <;T> capital letter small letter
Example:
PEG004: wollen wir mal die Termine ausmachen , (f"ur die n"achsten@) Monate ? CLK005: (@ja gerne).. gerne.
CLK087: nee , nicht den elften , den<Z> (f"unfzehnten <A>@).. PEG088: (@f"unfzehnten) und den (achtundzwanzigsten@).. CLK089: (@genau <A>)..
HAH008: nein , da mu"s ich zu einem (Besuch@) nach (Leipzig@).. TIS009: (@m).. (@aha)..
HAH008: nein, da mu"s ich zu einem (Besuch@) nach Leipzig.. TIS009: (@m , aha)..
Symbol: (@ bzw. ( , @) bzw. )
blank
:> /+ /- <Ger"ausch>.. <#>.. <h"as> <A> <P> <Z> <%> <;T> capital letter small letter
K.Kohler, G.Lex, M.Pätzold, M.Scheffers, A.Simpson, W.Thon: Handbuch zur Datenaufnahme und Transliteration in TP14 von VERBMOBIL -3.0 , Verbmobil Report , Nr. 11 , September 1994.
A.Batliner, S.Burger, A.Kießling: Außergrammatische Phänomene in der Spontansprache: Gegenstandsbereich, Beschreibung, Merkmalinventar, Verbmobil-Report , Nr. 57 , Februar 1994.
S.Burger, E.Kachelrieß: Aussprachevarianten in der VERBMOBIL-Transliteration - Regeln zur konsistenteren Verschriftung, Verbmobil-Memo, Nr. 111, August 1996.
The VERBMOBIL Project August 1995
Copyright © 1995 Institut für Phonetik
und Sprachliche Kommunikation, Universität München.
Parts of this document are Copyright © Institut für Phonetik
und digitale Sprachverarbeitung der Christian-Albrechts-Universität
zu Kiel.
This page and all other pages which belonged to this server may be copied,
printed and distributed to other parties, under the condition that the
pages are distributed as shown here. Parts of pages or extended pages may
not be distributed further without permission of the author.
Susanne Burger burger@phonetik.uni-muenchen.de