``The SmartKom multi-modal corpus was produced in the years 1999 - 2003 at the Bavarian Archive for Speech Signals (BAS) located at the University of Munich (LMU). The corpus was 100% funded by the German Ministry for Education and Science and is therefore freely available for all kinds of usage except re-distribution to third parties.
The primary aim of the corpus was the empirical study of Human - Computer interaction (HCI) in a number of different tasks (domains) and technical setups (scenarios).''
(from the corpus documentation)In the SmartKom data collection subjects were recorded while using a self-explanatory, user adaptive man-machine interface (MMI). The MMI is simulated using a Wizard-of-Oz setup (WOZ, see section
) and
interprets speech and gesture input and analyses the facial
expression of the user.
The total corpus consists of a number of speech channels, four video
channels, the output of a graphic tablet or finger point detector and a
separate multi-modal biometric data collection.
The resulting video data and multi-channel recorded spontaneous speech data serve as a basis
for research and development of speech recognition, gesture recognition
and the user model of SmartKom.
In the following only the speech part of the WOZ data collection is described.