The Munich Automatic Segmentation System MAUS

Contact: Florian Schiel

What is meant by SEGMENTATION?

The aim of phonetic sciences is to analyze the correlation between linguistic categories (e.g. word, syllable, phone) and corresponding signals (e.g. acoustic signal, spectrum, articulatory signal, neuronal signals). Usually, a concrete mapping of categories to the corresponding sections in the signal is done according to the aim of the analysis in question. This results in a partition of the signal in segments, known as segmentation (and labeling).

Because of the subjective nature of the analysis in question (dependency on the observer and the thesis and therefore the necessity of a different description) segmentations are produced manually according to the relevant aspects of the analysis. These data are carefully produced and of the highest possible reliability, which is an absolute prerequisite for experimental phonetic work.

A good training and a good experience is needed to produce careful and reliable manual segmentations which are extremely time consuming (real-time factor up to 400). Therefore, high quality segmentations can usually only be produced for a small amount of data.

In computational linguistics and digital speech processing, especially in ASR, a large amount of segmented data is needed. To produce manual segmentations for this is uneconomical. Therefore, automatic procedures are developed to automatically segment a large amount of data in a relatively short time. On the one hand, this is only possible in reducing the quality of segmentations, which can be traced back to the impreciseness of the analysis of the acoustic signal, on the other hand it can be traced back to a missing hypothesis space of the possible pronunciatiosn of a language.

With MAUS large amount of segmented material can be offered for research and development in the area of technical speech processing while consideration of phonetic information about pronunciation variation. Retrospectively, success in the field of speech processing lead to significant improvements of automatic segmentation.

Short Description of MAUS

Input-->>>>> speech signal and related orthographic representation

Output-->>>> automatically produced segmentation and labeling on the phonemic level

Video: Introduction to MAUS

Technical implementation:

Processing steps:

Download of MAUS

The MAUS download package comprise a number of scripts and binaries to be run under Linux. It is possible to run it under WinXX using cygwin (not contained in the package).
The main scripts of the package are: and other tools to convert and display S&Ls.
For legal reasons the software package only contains parameter files for German language support. Please refer to the possibility to call webservices or use the MAUS Web API (see below) to use other languages than German.

Other software packages needed:


MAUS Webservices

Instead of installing the MAUS package locally on your computer you can use the MAUS webservices instead. The input files will be uploaded to the BAS CLARIN server, processed by MAUS and the result returned to your local computer.
Advantages are: A full description of the different webservice calls and their parameters can found in the corresponding CMDI file. Examples of webservice calls can be retrieved by the following curl command:

curl -X GET

A very easy way to utilize the MAUS webservice is to use the CSH wrapper maus.web which simulates the original MAUS script maus but internally calls the webservice, thus no requirement for a local installation; you just need a CSH on your computer.

WebMAUS -- a comfortable web-interface

An even easier way than the usage of webservices is the new web-interface WebMAUS:

This web application is structured in three parts:

Publications on MAUS

Conference papers:

Verbmobil Memos (German):

Dissertations (German):