Lexicon of the Conventions for Transliteration of Spontaneous Speech

Susanne Burger


Institut für Phonetik und sprachliche Kommunikation
München

(Lexicon version of the "Handbuch zur Datenaufnahme und Transliteration in TP14 von VERBMOBIL -3.0"
Kohler,Lex,Pätzold,Scheffers,Simpson,Thon
Kiel,September1994)
A Postscript-Version of the Handbuch is available at the IPDS Kiel.

Same page in German


Contents

    1.General remarks on the transliteration

      1.1 Punctuation

      1.2 File structure

    2. Symbols

    3. References


NEW


1. General remarks on the transliteration:


1.1 Punctuation:

Period

Symbol: " . "

Location: at the end of a sentence; between blanks

Example:

... so , guten Tag , mein Name ist <!1 is'> J"ansch . 
    <"ah> wir hatten bereits telefoniert<Z> . ...

Note:

When in doubt a decision should be based on:

  • grammar
  • intonation
  • pause, breathing
  • begin of a new topic
  • Question mark

    Symbol: " ? "

    Location: at the end of a sentence; between blanks

    Example:

    ... <A> wie schaut 's denn aus , den darauffolgenden 
        Sonntag , den <:<#Mikrobe> neunundzwanzigsten:> 
        bei Ihnen ? geht 's da ? ...

    Note:

    When in doubt a decision should be based on:

  • interrogatives
  • syntax
  • intonation
  • context
  • Comma

    Symbol: " , "

    Location: between sentence parts, subordinate clauses; between blanks

    Example:

    ... <"ahm> morgen , Freitag , <h"as> wie ich seh' , 
        <"ah> mu"s ich feststellen , da"s ich <"ah> 
        "uberhaupt keine Zeit hab' . ... 

    Note:

    When in doubt a decision should be based on:

  • grammar
  • initiative particles at the beginning of the subordinate clause
  • intonation
  • related topics
  • e.g.:

    a) I could , I've always time on Wednesdays .
    b) I could . <P> <A> I've always time on Wednesdays .

    Symbols allowed before blank and punctuation mark:

    capital letter
    small letter
    :>
    <A> (only in case of exhalation!)
    <Z>
    <%>
    <!n ..>
    <;..> 


    1.2 File structure:


    2. Symbols:


    About the lexicon items


    Spelling:

    Example:

    ... mein Name ist J<Z>"ansch , $J $"A $N $S $C $H ...

    Symbol: $

    Every spelled letter is written as a capital letter and separated by a blank from the following spelled letter.
    $ is set before the capital letter. There is no blank between $ and the spelled letter.

    Symbols allowed before $ :

    (
    +/
    -
    blank


    Non-words:

    Example:

    ... *haarknapp <:<#Rascheln> um einen <!1 ein'> Tag 
        verfehlt ... 
    ... was *exkursieren Sie denn ? ...

    Symbol: *

    non-words are:

  • neologisms,
  • mispronunciations,
  • words of foreign languages, which not can be found in the German Duden ,
  • foreign names.
  • A non-word is marked with a * . There is no blank between the asterisk and the non-word.

    Symbols allowed before * :

    (
    +/
    <;T>
    blank


    Hesitations:

    Example:

    ... <"ah> wir hatten bereits telefoniert<Z> ...
    ... gu<Z>t , <"ahm> wie w"ar' es bei Ihnen am<Z> 
        neunzehnten Juli ...
    ... ich denke/- <hm> also bei mir ginge es sehr 
        gut ...
    ... w"urd' ich sagen , <A> <h"as> wenn wir ...

    Symbol: <"ah> <"ahm> <hm> <h"as>

    Hesitations are transliterated between pointed brackets.

  • <"ah>: vocalic articulation, independent of the vowel quality
  • <"ahm>: vocalic articulation + nasal articulation
  • <hm>: nasal articulation
  • <h"as>: rare articulations, which can not be categorized by one of the other classes.
    e.g. /brrt/ /pf/ /puh/ etc.
  • Symbols allowed before <h"as> .. :

    (
    <;T>
    blank


    Articulatory noises:

    Example:

    ... Mittagessen <P> <Schmatzen> und<Z> , na ja ...
    ... ordentlich planen mu"s und <"ah> <Schlucken> wann 
        w"urde Ihnen ...
    ... einverstanden . <P> <R"auspern> auf 
        Wiedersehen ...
    ... +/je/+ <Husten> je schneller wir das machen ...
    ... bei <:<Lachen> mir:> terminlich sehr 
        ung"unstig ... 
    ... dann/- <"ah> <A> <Ger"ausch> das sind sechs 
        Termine ...

    Symbol: <Schmatzen> <Schlucken> <R"auspern> <Husten> <Lachen> <Ger"ausch>

  • <Schmatzen>: munching
  • <Schlucken>: swallowing
  • <R"auspern>: throat clearing
  • <Husten>: coughing
  • <Lachen>: laughing
  • <Ger"ausch>: articulatory noises, which can not be categorized by one of the other classes.
  • Articulatory noises are transliterated at the time of their production between blanks.
    Other verbal and nonverbal productions can be overlaid with articulatory noises.

    Symbols allowed before <Ger"ausch> .. :

    (
    <:
    blank


    Lengthening:

    Example:

    ... ich h"atte Zeit +/am<Z>/+ <A> <Schmatzen> ab 
        Dienstag ...
    ... ich dachte ger<Z>ade ...

    Symbol: <Z>

    <Z> is added immediately after the letter(s) representing the lengthened sound, regardeless of the place within the word where the lengthening occurred.

    Symbols allowed before <Z> :

    small letter
    capital letter


    Breathing:

    Example:

    ... ist G<Z>"urtner , <A> <"ahm> $G $"U $R $T $N 
        $E $R <A> . <A> ...

    Symbol: <A>

    Most of the time <A> marks inhalation and is thus transcribed after the punctuation mark.
    In case of clear exhalation at the end of a sentence or sentence part, breathing also can be marked before the punctuation mark.

    Symbols allowed before <A> :

    (
    blank


    Pauses:

    Example:

    ... ich habe <P> M"oglichkeiten dazu ...

    Symbol: <P>

    In case of a coincidence of <P> and punctuation mark, the pause is set after the punctuation mark.

    Symbols allowed before <P> :

    (
    blank


    Word interruption:

    Example:

    ... Ver_ <A> _pflichtungen ... 
    ... statt_ +/f=/+ <h"as> _findet ...

    Symbol: _

    Sometimes words are interrupted.
    Here an underscore should be set immediately after the interruption without a blank in between, and also an underscore immediately before the continuation.
    All events occuring during the interruption are transliterated between blanks and between the word parts.

    Symbols allowed before li_blank (left underscore) :

    -
    <Z>
    capital letter
    small letter

    Symbols allowed before blank_re (right underscore) :

    blank


    Word fragments:

    Example:

    ... +/sieb=/+ siebzehnter ... 
    ... im Ja=/- also ich sag' Ihnen jetzt ...

    Symbol: =

    In case of a complete break-off of a word without continuation the break-off position is marked with an = added at the break-off. There is no blank between the break-off and the equals sign.

    Symbols allowed before = :

    <Z>
    capital letter
    small letter


    Truncation:

    Example:

    ... ab dem dritten August <A> bis zum/- <P> Moment , 
        ich ... 
    ... ja , ich hab' da eigentlich/- also ich bin vom 
        neunzehnten bis ...

    Symbol: /-

    Truncation refers to the case in which a topic has been broken off while speaking and a new topic is initiated after the break-off.
    The break-off position is marked with /- and without a preceeding blank.
    There is no punctuation mark after /- .

    Symbols allowed before /- :

    =
    <A> (only in case of exhalation!)
    <Z>
    <;T>
    capital letter
    small letter


    Self-repairs:

    Example:

    ... +/am/+ <:<#Mikrobe> am Donnerstag:> kann ich 
        erst ... 
    ... +/im Sep=/+ im September ... 
    ... die Woche +/von/+ <"ah> mit Freitag ... 
    ... also +/das/+ das/+ das zweite ...

    Symbol: +/../+

    In case of self-repairs after the break-off the reparandum is either repeated or corrected.
    The break-off position is marked with /+ and without blank in between .
    The utterance which is to be repeated/corrected after the break off starts with +/ and without a following blank, so that +/../+ put the reparandum in brackets.

    Symbols allowed before +/ :

    (
    blank

    Symbols allowed before /+ :

    =
    <A> (only in case of exhalation!)
    <Z>
    <;T>
    capital letter
    small letter


    Words which are difficult to understand:

    Example:

    ... %eins , %zwei , %drei , %vier ... 
    ... wann h"atten Sie %da bitte Zeit ...

    Symbol: %

    Words which are difficult to understand have a % before without a blank in between.

    Symbols allowed before % :

    (
    +/
    <;T>
    blank


    Unintelligible speech:

    Example:

    ... aber <A> <"ah> <%> <"ahm> wie w"ar's denn ... 
    ... <%> fr"uher geht 's leider nicht ...

    Symbol: <%>

    unintelligible items are transliterated with <%> between blanks.

    Symbols allowed before <%> :

    (
    blank


    Comments:

    Example:

    ; Tonqualit"at: viele Nebenger"ausche ... 
    ;gesamter Turn verrauscht ...
    ... zum Beispiel <;"ubersteuert> ... 
    ... einen Termin vereinbaren <;heiser> ...

    Symbol: ;

    A comment relating to the whole dialogue is written in front of the dialogue in the header. A comment line starts with a semicolon. After the comment follows an empty line.
    A comment about a turn is prefixed with a semicolon at the beginning of the line and added after the contribution without an empty line.

    Symbol: <;..>

    Local comments are inserted in the text after the concerning position and after a blank, prefixed with a semicolon and set in pointed brackets.

    Symbols allowed before <;..> :

    blank


    Technical break:

    Example:

    ... ab ein U<;T> ... 
    ... <;T>neunzehnten is<Z>t etwas schwierig ... 
    ... erste Woche ginge <A> <;T> <#Klicken> <A> 
        <"ah> Montag ... 
    ... unsere Termine alle untergebra<;T> <;T>ke ...

    Symbol: <;T>

    Depending on when the recording button is activated a speech recording may be truncated at the beginning or at the end, but may be also interrupted during a dialogue contribution.
    In case of a technical disturbance during a lexical unit, <;T> is added before or after the unit without a blank in between. A blank is placed between two consecutive <;T>.
    If no lexical unit is involved <;T> is written between blanks.

    Symbols allowed before <;T> :

    <:        
    +/
    <h"as>
    <Z>
    capital letter
    small letter
    blank


    Pronunciation variants:

    Example:

    ... damit w"ar' das <!1 des> eigentlich klar ... 
    ... dann kommen Sie <!2 komm' Se> doch ... 
    ... wenn wir <!2 wemma> 's die Woche noch machen ... 

    Symbol: <!n ..>

    Dialectal variants, different speaking styles or other deviations are transliterated in correct orthography according to the German Duden. The actually produced utterance is transliterated after the correct version and after a blank. This utterance is placed in pointed brackets. An exclamation mark and a following digit giving the number of lexical units concerned and a further blank initiates the pronunciation variant.

    A more detailled description of the regularities for the transliteration of pronunciation variants at the orthographic level can be found in:
    Verbmobil-Memo 111: Aussprachevarianten in der VERBMOBIL-Transliteration - Regeln zur konsistenteren Verschriftung (Burger, Kachelrieß, München, August 1996).
    (Sorry, only in german, but the english version is coming soon!)

    Symbols allowed before <!n ..> :

    blank


    Non-articulatory noises:

    Example:

    ... de<Z>m <:<#Klicken> neunundzwanzigsten:> 
        August ... 
    ... <:<#Klingeln> am Mittwoch st"anden:> ...
    ... <:<#Klopfen> gern so machen:> ... 
    ... <#Mikrobe> das tut mir leid ... 
    ... ich k"onnte Ihnen <:<#Mikrowind> vorschlagen:> ... 
    ... nach Berlin <P> ausmachen . <#Rascheln> <"ahm> ... 
    ... der gesamte <:<#Quietschen> Rest des Mais:> ... 
    ... k"ame mir gelegen , <A> <#> allerdings ...

    Symbol: <#Klicken> <#Klingeln> <#Klopfen> <#Mikrobe> <#Mikrowind> <#Rascheln> <#Quietschen> <#>

    <#Klicken>: the noise of pushing the record button
    <#Klingeln>: telephone ring
    <#Klopfen>: e.g. knocking on the table
    <#Mikrobe>: touching the microphone
    <#Mikrowind>: blowing into the microphone
    <#Rascheln>: e.g. paper rustle
    <#Quietschen>: e.g. chair squeaking
    <#>: non-articulatory noises, which can not be categorized by one of the other classes.

    Non-articulatory noises are transliterated with a prefixed # and in pointed brackets.
    Other verbal and nonverbal productions can be overlaid with non-articulatory noises.

    Symbols allowed before <#>.. :

    (
    <:
    blank


    Overlay:

    Example:

    ... de<Z>m <:<#Klicken> neunundzwanzigsten:> 
        August ... 
    ... bei <:<Lachen> mir:> terminlich sehr 
        ung"unstig ... 
    ... <:<#Klopfen> <Lachen> k"amen mir die Monate:> 
        April ... 
    ... <:<Lachen> ich habe:> <:<#Klopfen> <Lachen> 
        heute:> keine Zeit ... 

    Symbol: <:..:>

    Non-articulatory noises or nonverbal articulatory productions overlaying lexical units or other events are set in brackets together with the concerned lexical units or the concerned events.
    Behind <: follows the overlay, a blank and the overlaid production. :> closes the overlaid utterance. There is no blank between the overlaid production and the closing bracket.
    In case of several overlaies the following order of events has to be obeyed:
    - non-articulatory noises,
    - articulatory noises,
    - lexical units.

    Symbols allowed before <: :

    (
    blank

    Symbols allowed before :> :

    /+
    /-
    -
    <.. >
    <#.. >
    <h"as>
    <A>
    <Z>
    <%>
    <;T>
    capital letter
    small letter


    Overlapping speech (only for dialogues without button pushing):

    Example:

    PEG004: wollen wir mal die Termine ausmachen , 
            (f"ur die n"achsten@)  Monate ?
    CLK005: (@ja gerne).. gerne.
    CLK087: nee , nicht den elften , den<Z> 
            (f"unfzehnten <A>@)..
    PEG088: (@f"unfzehnten) und den 
            (achtundzwanzigsten@)..
    CLK089: (@genau <A>)..
    HAH008: nein , da mu"s ich zu einem (Besuch@) nach 
            (Leipzig@)..
    TIS009: (@m).. (@aha)..
    HAH008: nein, da mu"s ich zu einem (Besuch@) nach 
            Leipzig..
    TIS009: (@m , aha)..

    Symbols allowed before ( resp. (@ :

    blank

    Symbols allowed before ) resp. @) :

    :>
    /+
    /-
    <Ger"ausch>..
    <#>..
    <h"as>
    <A>
    <P>
    <Z>
    <%>
    <;T>
    capital letter
    small letter
    


    3. References:

    K.Kohler, G.Lex, M.Pätzold, M.Scheffers, A.Simpson, W.Thon: Handbuch zur Datenaufnahme und Transliteration in TP14 von VERBMOBIL -3.0 , Verbmobil Report , Nr. 11 , September 1994.

    A.Batliner, S.Burger, A.Kießling: Außergrammatische Phänomene in der Spontansprache: Gegenstandsbereich, Beschreibung, Merkmalinventar, Verbmobil-Report , Nr. 57 , Februar 1994.

    S.Burger, E.Kachelrieß: Aussprachevarianten in der VERBMOBIL-Transliteration - Regeln zur konsistenteren Verschriftung, Verbmobil-Memo, Nr. 111, August 1996.


    The VERBMOBIL Project August 1995


    Copyright © 1995 Institut für Phonetik und Sprachliche Kommunikation, Universität München.
    Parts of this document are Copyright © Institut für Phonetik und digitale Sprachverarbeitung der Christian-Albrechts-Universität zu Kiel.
    This page and all other pages which belonged to this server may be copied, printed and distributed to other parties, under the condition that the pages are distributed as shown here. Parts of pages or extended pages may not be distributed further without permission of the author.


    Susanne Burger burger@phonetik.uni-muenchen.de