Revalidation report for the SmartKom Database

Authors

Florian Schiel, Katerina Louka

Affiliation  

BAS Bayerisches Archiv für Sprachsignale
Institut für Phonetik
Universität München

Postal address

Schellingstr. 3
D 80799 München

E-mail

schiel@phonetik.uni-muenchen.de
bas@phonetik.uni-muenchen.de

Telephone

+49-89-2180-2758

Fax

+49-89-2800362

Corpus Version

1.0

Date

06.12.2004

Status

final

Comment

 

Validation Guidelines

Florian Schiel: The Validation of Speech Corpora, Bastard Verlag, 2003, www.bas.uni-muenchen.de/Forschung/BITS/TP2/Cookbook 

Validation results of the SmartKom Corpus:

Summary

The SmartKom multi-modal corpus was produced in the years 1999 - 2003 at the Bavarian Archive for Speech Signals (BAS) located at the
University of Munich (LMU). The corpus was 100% funded by the German Ministry for Education and Science and is therefore freely available
for all kinds of usage except re-distribution to third parties.

The primary aim of the corpus was the empirical study of human - computer interaction (HCI) in a number of different tasks (domains) and technical
setups (scenarios). See the file "/doc/papers/LREC2002-Overview.ps" for a detailed description of the corpus production.

Introduction and Corpus Description

This document summarizes the results of an in-house validation of the speech corpus SmartKom made in the year 2004 within the project 'BITS' by the Institute of Phonetics of the Ludwig-Maximilians-University Munich. 

The corpus contains 466 sessions. The corpus is structured into sessions which contain one recording of
approx. 4,5 min length with one person. Sessions are stored on numbered DVDs.

The primary aim of the corpus was the empirical study of human - computer
interaction (HCI) in a number of different tasks (domains) and technical setups (scenarios)

The BAS edition of the SMARTKOM multi-modal data collection contains the same data as the original project edition newly
structured and validated against basic BAS guidelines.

I.) Validation of Documentation

The General Documentation directory contains the following documentation files for the SmartKom corpus which can be found under: doc/

README

general documentation

DTD/    

Document type definitions for recording protocols and speaker profiles

german-sampa.txt

Definition of extended German SAM-PA as used in most German speech resources

pardoc/

Copy of the BAS Partitur File definition (HTML: start  with "index.html")

quicktime/

Quicktime installation archive for macintosh and windows systems

readme.ges

Format description to the 2D gesture labelling files *.ges
readme.mar
Format description to the turn segment files *.mar
readme.par
Format description to the BAS Partitur Format (BPF) files *.par

readme.trl

Format description to the transliteration files *.trl

readme.trp 

Format description to the prosodic segmentation files *.trp

readme.ush

Format description to the user state label files *.ush
readme.usm
Format description to the user state label files *.usm
readme.woz.german
subject instruction in German
session-statistics
Listing of all available channels, annotations as well as some recording and speaker features
sk_ger.lex
Pronunciation dictionary (SAM-PA) to all SK data
techdocs/
Project reports in German
trl-coding/
 A copy of the english version of the conventions of transliteration in SmartKom
webpages.zip
webpages of the user interface

·         Administrative Information:

Validating person: n. a.

Date of validation: n. a..

Contact for requests regarding the corpus: ok

Number and type of media: DVD ok

Content of each medium: no information

Copyright statement and intellectual property rights (IPR): ok

·         Technical information:

Layout of mediaInformation about file system type and directory structure:
DVD
DVD nomenclature:  dvd-<DVD number><DVD version  number>

The root directory of each DVD contains the following:

   
readme.##.V
specific Readme for each DVD
data
signal files of the sessions on the DVD
doc
documents about the corpus recording, annotation and pronunciation dictionary
annot
subdirectories for each annotation type
meta
speaker profiles and recording protocols to all SK recordings


File nomenclature
Explanation of used codes (no white space in file names!):
<Type of Recording><Session Number><_> <Technical scenario><Primary task><Recording Channel><_><Turn numbering><_><Speaker ID>. <extension> ok

Type of Recording:
b : biometric data
w : Wizard-of-Oz   
d : demo session
p : test session
v : evaluation session

Technical scenario:
p : Public
m : Mobil
h : Home

Primary task:
k : cinema
t : touristic planing
f : TV guide
r : restaurant
n : navigation
v : VCR programing
m : music jukebox
a : phone
x : fax

Recording channel:
a   : clip-on microphone, channel 1 Sennheiser ME104
b   : clip-on microphone, channel 2 Sennheiser ME104
h   : headset microphone Sennheiser ME104
1-4 : microphone array 4 channels Sennheiser ME104
d   : directional microphone Sennheiser ME 66
w   : system output
p   : playback backround noise front
q   : playback background noise back
t   : tableau coordinates
s   : SIVIT coordinates
i   : infrared video of interaction area
m   : front capture camera
l   : left lateral capture
o   : system display capture
g   : synchronized video streams


Extentions:
.ags  BPF represented as an annotation graph (XML)
.avi  video file AVI (channels g,o)
.ges  gestic labeling file
.mov  video file DV (channels i,l,m)
.par  BAS Partitur Format file (BPF)
.qt   QuickTime file (master frame file)
.rpr  recording session protocol
.spr  speaker protocol file
.trl  transliteration
.trp  user state labeling file ('prosody')
.ush  user state labeling file ('holistic')
.usm  user state labeling file ('mimic')
.wav  RIFF audio file (channels 1,2,3,4,a,b,d,h,p,q,w)


Formats of signals and annotation files:
If non standard formats are used it is common to give a full description or to convert into a standard format: ok

- RIFF audio file
- Video file AVI
- Video file DV
- QuickTime file

Coding:  .wav, .avi, .mov, .qt

Compression: n. a.

Sampling rate: 16 kHz ok

Valid bits per sample: (others than 8, 16, 24, should be reported): ALAW coding: bits/samp, PCM coding, 16 bit ok

Used bytes per sample: 2 bytes/samp ok

Multiplexed signals: (exact de-multiplexing algorithm; tools) n.a.

·         Database contents:

Clearly stated purpose of the recordings:
Empirical Study of Human-Computer interaction (README.doc, /doc/papers/LREC2002-Overview.ps)

Speech type(s): (multi-party conversations, human-human dialogues, read sentences, connected and/or isolated digits, isolated words etc.) ok

Instruction to speakers in full copy: ok  (more informations under /doc/techdocs/ )

·         Linguistic contents of prompted speech:

Specifications of the individual text items:  n.a.

Specification for the prompt sheet design or specification of the design of the speech prompts:  n.a.

Example prompt sheet or example sound file from the speech prompting: n.a.

·         Linguistic contents of non-prompted speech:

Multi-party:(number of speakers, topics discussed, type of setting - formal/informal) ok

Human-human dialogues: (type of dialogues, e.g. problem solving, information seeking, chat etc., relation between speakers, topic(s) discussed, type of setting, scenarios)  n.a.

Human-machine dialogues: (domain(s), topic(s), dialogues strategy followed by the machine, e.g. system driven, mixed initiative, type of system, e.g. test, operational service, Wizard-of-Oz) ok (README.DOC)

·         Speaker information:

Speaker recruitment strategies:  ok (more information under /doc/papers/ and /doc/techdocs/)

Number of speakers: 461
 ok

           Distribution of speakers over sex, age, dialect regions: ok (more informations under: /doc/dtd/readme.spr)
           Description/definition of dialect regions:
ok (more informations under: /doc/dtd/readme.spr)

·         Recording platform and recording conditions:

Recording platform: ok

Position and type of microphones: ok
- Company name and type id: Sennheiser ME104, Sennheiser ME 66
- Electret, dynamic, condenser: no information
- Directional properties:  ok (readme.doc, /doc/techdocs/TechDok-NR-07.ps)
- Mounting:  ok (readme.doc,
/doc/techdocs/TechDok-NR-07.ps )

Position of speakers: (distance to microphone) ok (readme.doc, /doc/techdocs/TechDok-NR-07.ps )

Bandwidth: (if other than zero to half of sampling rate) ok

Number of channels and channel separation:  ok  (readme.doc)

Acoustical environment:  ok (more information under /doc/techdocs/)

 

·         Annotation (BAS Partitur Format Files):

Unambiguous spelling standard used in annotations: ok

Labeling symbols: ok

List of non-standard spellings (dialectal variation, names etc.): given

Distinction of homographs which are no homophones: n.a.

Character set used in annotations: ok

Any other language dependent information as abbreviations etc: given

Annotation manual, guidelines, instructions: ok – (readme.par, doc/pardoc/index.html, doc/papers/Schiel-02-LREC-WS.ps)

Description of quality assurance procedures: no information

Selection of annotators: no information

Training of annotators: no information

Annotation tools used: no information

         Annotation (Orthographic transliteration):

Unambiguous spelling standard used in annotations: ok

Labeling symbols: ok

List of non-standard spellings (dialectal variation, names etc.):  ok

Distinction of homographs which are no homophones: n.a.

Character set used in annotations: ok

Any other language dependent information as abbreviations etc: given

Annotation manual, guidelines, instructions: ok – (readme.trl,/doc/techdocs/TechDok-NR-02.ps, /doc/papers/Beringer-01-verona.ps, /doc/papers/Oppermann-01-EUROSPEECH.ps, /doc/papers/Siepmann-01-ISCA.pdf)

Description of quality assurance procedures: no information

Selection of annotators: no information

Training of annotators: no information

Annotation tools used: no information


Annotation (Annotation 2D Gesture):

Unambiguous spelling standard used in annotations: n.a.

Labeling symbols: ok

List of non-standard spellings (dialectal variation, names etc.):  n.a.

Distinction of homographs which are no homophones: n.a.

Character set used in annotations:  n.a.

Any other language dependent information as abbreviations etc:  n.a.

Annotation manual, guidelines, instructions:ok – (readme.ges, doc/techdocs/TechDok-NR-14.ps, doc/papers/Steininger-London-01.ps, doc/papers/Steininger-Verona-01.pdf)

Description of quality assurance procedures: no information

Selection of annotators: no information

Training of annotators: no information

Annotation tools used: no information


Annotation (Prosodic labeling for User State):

Unambiguous spelling standard used in annotations: n.a.

Labeling symbols: ok

List of non-standard spellings (dialectal variation, names etc.):  ok

Distinction of homographs which are no homophones: n.a.

Character set used in annotations:  ok

Any other language dependent information as abbreviations etc:  n.a.

Annotation manual, guidelines, instructions:ok – (readme.trp, readme.trl, doc/techdocs/TechDok-NR-17.ps, doc/papers/Steininger-02-LREC.pdf )

Description of quality assurance procedures: no information

Selection of annotators: no information

Training of annotators: no information

Annotation tools used: no information


Annotation (User State - interesting emotional
and cognitive state):

Unambiguous spelling standard used in annotations: n.a.

Labeling symbols: ok

List of non-standard spellings (dialectal variation, names etc.):  n.a.

Distinction of homographs which are no homophones: n.a.

Character set used in annotations:  ok

Any other language dependent information as abbreviations etc:  n.a.

Annotation manual, guidelines, instructions:ok – (readme.ush, doc/papers/Steininger-02-LREC.pdf, doc/papers/Steininger-02-LREC-WS.pdf )

Description of quality assurance procedures: no information

Selection of annotators: no information

Training of annotators: no information

Annotation tools used: no information


Annotation (User States labeled without
the audio information ):

Unambiguous spelling standard used in annotations: n.a.

Labeling symbols: ok

List of non-standard spellings (dialectal variation, names etc.):  n.a.

Distinction of homographs which are no homophones: n.a.

Character set used in annotations:  n.a.

Any other language dependent information as abbreviations etc:  n.a.

Annotation manual, guidelines, instructions:ok – (readme.usm, doc/techdocs/TechDok-NR-17.ps, doc/papers/Steininger-02-LREC.pdf, doc/papers/Steininger-02-LREC-WS.pdf)

Description of quality assurance procedures: no information

Selection of annotators: no information

Training of annotators: no information

Annotation tools used: no information

·         Lexicon:

Format: ok

Text-to-phoneme procedure: ok

Explanation or reference to the phoneme set: ok. (/doc/german-sampa.txt)

Phonological or higher order phenomena accounted in the phonemic transcriptions: ok

·         Statistical information:

Frequency of sub-word units: phonemes (diphones, triphones, syllables,...): n.a.

Word frequency table: n.a.

·         Others:

Any other essential language-dependent information or convention: given.

Indication of how many files were double-checked by the producer together with percentage of detected errors: no information

          Status of documentation:  good

II.) Automatic validation

The following list contains all validation steps with the methodology and results.

Completeness of signal files:   Refer to "/doc/session-statistics"

Completeness of meta data files:
ok

Completeness of annotation files: not ok.

            d005_pk
            d006_pk
            d007_pk
            w167_mt
          w382_px
          w149_mn
          w271_hf
          w382_px
          p024_pk
          w029_pk
          w068_pk
          w109_mt   
          w110_mt
          w111_mt
          w112_mt
          w149_mn
          w150_mt
w167_mt
w196_hf
w212_hv
w271_hf
w272_hv
w298_hv
w306_hv
w321_hm
w376_px
w382_px
w411_pa
          p024_pk
w029_pk
w068_pk
w109_mt
w110_mt
w111_mt
w112_mt
w149_mn
w150_mt
w167_mt
w196_hf
w212_hv
w271_hf
w272_hv
w298_hv
w306_hv
w321_hm
w376_px
w382_px
w411_pa

Correctness of file names: ok.

Empty files: none

Status of signal, annotation and meta data files: ok

Cross checks of meta information: ok

Cross checks of summary listings: ok

Annotation and lexicon contents: The Sampa annotation is different from the lexicon Sampa data/ Sampa is incorrectly annotated:
h"Und6t#Qaxt#Unt#z'i:ptsIC
v"e:g@#Unt#pl'Ets@#t"u:6
fi:6h'OxtsaIt@nUntaIn
S'OpIN#z"Ent6
rEstor'a:~#kategor"i:@n
Q'aIn#Unt#tsv"antsIC
axa
f'i:r#Unt#tsv"antsICst6
rEstor'a:~s
f'Ynf#Unt#tsv"antsICst@n
QIt'a:li@n
z'Eks#Unt#f"I6tsIC
das#fr'a:g@#Unt#Q'antv"O6t#Sp"i:l
St'at#InfO6matsj"o:n
b"ESt#Of#n'aInti:s
dr'aI#Unt#tsv"antsIC
n'OYn#Unt#n"OYntsIC
gro:sbrit'ani@n
tur'Ist@n#InfO6matsj"o:n
f'a:st#fu:t#restor"a:~
f'i:r#Unt#tsv"antsIC
SIfsQaUsflU
Q'aIn#Unt#tsv"antsICst6
gU6m'e:#rEstor"a:~
kant'a:t@#be:ve:f"aU#Q'aIn#hUnd6t#z"i:b@n#Unt#f"I6tsIC#kor'a:l
b"u:t@n#Unt#b'In@n#n'a:xrICt@n
S'Ifs#aUsfl"y:g@
S'Ifs#aUsfl"y:g@n
Q'aIn#Unt#n"OYntsIC
rEStor'a:~#kategor"i:
z'a:lomOn#Unt#di:#k"2:nIgIn#fOn#z'a:ba
rEstor'a:~m"E:sIC
tsv'aI#Unt#f"I6tsIC
d'e:Ii#z"o:p
f'Ynf#Unt#dr"aIsIC
rEStor'a:~#Qy:b6z"ICt
das#f"Ynft@#elem'Ent
kant'a:t@#be:ve:f"aU#Q'aIn#hUnd6t#z"i:b@n#Unt#f"I6tsIC
Q'axt#Unt#n"OYntsIC
f'i:r#Unt#dr"aIsIC
Q'axt#Unt#tsv"antsICst@n
rEstor'a:~#b@z"u:xs
ha:#Unt#'QEm
f'asnaxt#am#n"Eka:r
Q'axt#Unt#Q"axtsIC
QEm"e:#Unt#j'a:gua:r
rEstor'a:~#b@z"u:x@s
h"Und6t#zEks#Unt#f'I6tsIC
rEstor'a:~
t'It@l
rEStor'a:~#b@z"u:x
Q'aIn#Unt#z"i:ptsIC
fi:6#h'OxtsaIt@n#Unt#aIn#t'o:d@s#f"al
das#l"e:b@n#Ist#S'2:n
n"EkarSt'aIna:x
p'ark#anl"a:g@
S'Ifs#aUsfl"u:k
z'Ende#t"It@l
tsv'aI#Unt#tsv"antsIC
Q'aIn#Unt#dr"aIsICst6
f'Ynf#Unt#f"I6tsIC
draI#k'2:nICs#Str"a:s@

III.) Manual Validation

10% of the 'usable' data, the audio files and the par files were checked in comparison. 6,52% of the data contained errors. 

IV.) Other Relevant Observations

none

V.) Comments for Improvement

The revalidation was able to repair some data (lexicon,README). The results of the manual validation couldn't be repaired.

VI.) Result

The corpus is ok. The corpus is well documented and the error rate is very low.