Here we briefly describe the requirements for an annotated speech corpus to be used in MAUS training (e.g. for a new language or dialect): MAUS uses two statistical models that are language dependend: the acoustical model (AM) in form of phonetic HMMs, and the pronunciation model (PM) in form of statistically weighted re-write rules. Depending on the amount of annotated speech you have for the new language, you can train the AM, or the PM, or both. As a general rule speech data for MAUS training should be annotated in a known format (e.g. praat *.TextGrid, ELAN *.eaf, BPF *.par or Transcriber), and in a known phonetic alphabet (e.g. IPA or SAMPA). If the corpus consists of long recordings (e.g. interviews or dialogues) the transcription should contain a 'chunk segmentation', i.e. the begin/end of chunks of transcribed speech are marked on the time line. To train the AM you'll need manually segmented and labelled (in some phonetic alphabet) speech of approx. 100 speakers of both genders, preferably spontaneous speech (like in a map task recording). Second best is read speech, but each speaker should then have read different sentences, so that the covered vocabulary is at least a 1000 words. Third best is a speech corpus without phonetic segmentation but with phonetic transcriptions per recording. Forth best is a speech corpus with just the orthographic transcript per recording (or chunk within a recording). If all these options are not available, as a last resort you can map the phoneme symbols of the new language to existing phoneme AMs of other languages, e.g. if your new language requires a long open /a:/ you may use that of the German AM. To train the PM you'll need basically the same as for the AM training but additionally we need a hierarchical relation of each segmented/transcribed phone to a word token. For instance if a recording contains the words 'hello world' transcribed as phone segments /h E l O v 2: 6 l d/ we need a link from the symbols /h E l O/ to the word token 'hello' and from /v 2: 6 l d/ to the token 'world'. Such a hierarchical linking of phone to words can be encoded either directly in a format that allows hierarchical annotation (e.g. Emu, BPF, EAF, annotation graphs) or indirectly by using two time synchronized annotation layers where the begin of a word segment exactly matches the begin of the first phone within that word, and the end of a word segment matches exactly the end of the last phone segment of that word (e.g. praat TextGrid, Transcriber). The training and setup of a new language in MAUS is not a trivial routine task. We therefore offer to do the MAUS extension for you, if you can provide us with the necessary speech data (as outlined above). Please do not hesitate to contact us, in case you need help (bas@bas.uni-muenchen.de).