_/_/_/_/         _/_/         _/_/_/_/
                    _/      _/       _/ _/        _/      _/
                   _/      _/       _/  _/       _/
                  _/      _/       _/   _/       _/
                 _/_/_/_/         _/_/_/_/        _/_/_/
                _/      _/       _/     _/             _/
               _/      _/       _/      _/             _/
              _/      _/       _/       _/    _/      _/
             _/_/_/_/         _/        _/     _/_/_/_/


                   BAVARIAN ARCHIVE FOR SPEECH SIGNALS 

               University of Munich, Institut of Phonetics
               Schellingstr. 3/II, 80799 Munich, Germany
                      bas@phonetik.uni-muenchen.de


  COPYRIGHT Florian Schiel, University of Munich 1998. All rights reserved.   
    This corpus and software may not be disseminated further - not even
      partly - without a written permission of the copyright holders.  

                      Additional Copyright Holders

----------------------------------------------------------------------

Munich AUtomatic Segmentation (MAUS)

BAS Distribution Package   MAUS 

19.08.03 / 21.04.17

----------------------------------------------------------------------

GENERAL REMARKS

The script maus reads a string of phonemic symbols as defined in the param
file KANINVENTAR, reads a signal from the file signal.nis and 
performs a MAUS segmentation according to these inputs.
The resulting segmentation is either written into a BPF MAU tier file
or into a Praat compatible TextGrid file or an Emu compatible structure.

SYNOPSIS and OPTIONS

Basic maus script:
Please refer to the initial comments block in the maus script. Simply call
'maus | less' to read them.

To process a complete corpus use the script maus.corpus. Simply call
'maus.corpus | less' to read the usage message.

To adapt the HMM to a speech corpus (e.g. the speech of one speaker or
a group of speakers or an new language) use the script maus.iter.


CONSTRAINTS

There are a number of constraints on how to use this script; please
read the following carefully:

- never run several maus.corpus processes in parallel on the same speech 
data set, even if you define different output directories. maus handles
different processes gracefully, but maus.corpus and maus.iter do not.
- always check the output produced by the scripts for the key word
'ERROR'. If this occurs usually the results are not correct. A good way 
to use maus is to pipe the stdout and stderr into a log file.
The key 'WARNING' indicates that a maus script might find a situation 
where things possibly go wrong or want to alert the user for a default 
mechanismn. In most cases this does not cause the output to be formally
incorrect, but the segmentation may be not the one you intended to.
- parameter sets of other languages than German provided in this package
are often produced by users of maus and send in for distribution. In some 
cases these parameter sets use existing acoustical models and map them 
to a phoneme set in a different language. For many European languages this 
works surprisingly well (although we don't have any hard data about this).
Also, in some cases the non-German parameter sets contain the German-based
statistical and phonological rule sets (see remarks in the READMEs within 
the parameter sets). Since these rules usually do not fit to other 
languages it is recommended to use the option MODUS=align to ignore
these rule sets and perform a simple forced alignment.

PHONEME SYMBOLS IN INPUT

The string of phonemic input symbols must not contain any other symbol as
defined in PARAM.<lng>/KANINVENTAR. You may alter KANINVENTAR, but then you also 
have to take care of a number of other resource files that depend on 
KANINVENTAR (not recommended).
The symbol '#' may be used between words indicating optional pauses
between the words  but only in KANSTR. This is highly recommended. When reading from a BPF 
file (option BPF) these optional pauses are inserted automatically.

Human noise can be modelled by the symbol '<usb>' anywhere in the input string; 
'<nib>' can be used for other noise.


SILENCE INTERVALS

Automatic modelling:
Maus will automatically insert optional silence models (HMM '#') between words
(see option MINPAUSLEN) and output these as 'detached' silence 
segments '<p:>' (with word number -1) if they exceed MINPAUSLEN times 10msec.
The same is true for utterance initial/final silence ('<' and '>') which used to 
be non-optional HMMs (before maus 3.33); the option NOINITIALFINALSILENCE=true
suppress even these if a user wants to be sure that no silence interval is 
recognized at the beginning of a recording (e.g. confusions with initial 
plosives).

Manual modelling:
Intra-word silence intervals can be modelled by inserting the symbols
'<p:>' (optional silence) or '<p>' (enforced silence, minimum length is 30msec) 
in the canonical input string ('#' in the phonological input will be ignored
because in some phonological forms it marks a compound boundary! This is not the 
case for option KANSTR, though!); e.g. /ba:n<p:>hof/ will model an optional
silence interval between /n/ and /h/; in the MAUS output these models appear
as '<p:>' segments (or do not appear at all, IPA [(...)]). Intra-word silence 
intervals are always linked to the word number in which they appear.
If an optional '<p:>' is the only symbol within a word, it will be modelled 
by an non-optional silence model (HMM '<') because HTK cannot model words
that consist only of a t-model; it will appear as a single segment '<p:>' linked
to that 'silence word'. 
It is allowed to model a 'silence word' as /<p>/ or /<...>/ (where '...' is an arbitrary 
string without blanks, but not one of 'usb' or 'nib') in the KAN
input tier; both will model a non-optional silence model and both will produce 
a '<p:>' in the phonetic output (IPA [(...)]) that has a word link, and the 'word' appears 
as a numbered word in the ORT/KAN tiers (see TAGS PASSING below).

To summarize: 
('#' symbolize word boundaries here,
 '<' '>' utterance begin/end)

KAN input       MODEL            ORT/KAN OUTPUT  MAU OUTPUT
#<nib>#         non-human noise  '<nib>'         segment /<nib>/ (IPA [(.)]) with word number
#<usb>#         human noise      '<usb>'         segment /<usb>/ (IPA [(..)]) with word number
#<...>#         silence word     '<...>'         segment /<p:>/ with word number
#<p>#           silence word     '<p>'           segment /<p:>/ with word number
#...<nib>...#   non-human noise  '...<nib>...'   segment /...<nib>.../ (IPA [(.)])with word number
#...<usb>...#   human noise      '...<usb>...'   segment /...<usb>.../ (IPA [(..)]) with word number
#...<p>...#     non-optional sil '...<p>...'     segment /...<p:>.../ (IPA [(...)]) with word number
#...<p:>...#    optional sil     '...<p:>...'    segment /...<p:>.../ (IPA [(...)]) with word number or deleted
#               (word boundary)  -               segment /<p:>/ (IPA [(...)]) with word number -1 or deleted
<               (initial sil)    -               segment /<p:>/ (IPA [(...)]) with word number -1
>               (initial sil)    -               segment /<p:>/ (IPA [(...)]) with word number -1
(the last three lines are not possible inputs, but are modelled automatically!)


TAGS PASSING from KAN tier to MAUS OUTPUT

The use of a '<...>' as a word in the phonological input (see preceeding paragraph) 
can be used to pass 'tags' from the transcript to the output of MAUS, because 
such 'words' will appear in the MAUS output ORT/KAN levels. The drawback is that 
a small silence interval (30msec minimum) must be modelled for this 'tag word' in the phonetic level.

ADAPT MAUS TO OTHER LANGUAGES

To adapt MAUS to another language, several parameter files and 
programs must be adapted: The set of phonemic symbols used in the input,
the MAUS internal symbol set, the mapping functions between them, the 
Hidden Markov Models used for the search, the mapping from MAUS internal
symbols to HMM and the rule set.

If a new language set PARAM.<rfc5646> is defined, do the following:
           - copy the standard German set dir PARAM to PARAM.<lng>
           - within the new set dir adapt the following files:
             KANINVENTAR :  define here the set of phonemes used 
	     in the canonical input and MAUS output.
             KANINVENTAR must be sorted by descending string length!
	     GRAPHINVENTAR : this is usually just a copy of KANINVENTAR
	     with all symbols starting with a number replaced by a masked
	     symbol string and the extra symbol '#' used for internal word 
             boundary modelling (a '#' inKANINVENTAR does not hurt, but will
             be ignored in the KAN input). Typical examples are:
             r\ -> r-
	     6 -> P6
	     9 -> P9
	     3 -> P3
	     2: -> P2:   etc.
	   - Store the acoustical models for the new language in MMF.mmf; this can be 
             either new HMMs trained on a segmented speech corpus (if available) or
             a set of standard HMMs (e.g. the SUPERHMM set in subdir HMM).
           - store the list of HMM names (~h "..." entries in MMF.mmf) in the file
             HMMINVENTAR (see example); note that the HMM names must not match the 
             phoneme names in GRAPHINVENTAR.
	   - define a mapping of the phoneme names (1st column) to the HMM names 
             (2nd column) in teh file DICT; be sure to use the phoneme names as listed 
             in GRAPHINVENTAR, not as in KANINVENTAR. For example the entry
	     T s
	     will cause MAUS to acoustically model an English voiceless 'th' (/T/) like 
	     a /s/.
	   - If you can provide a (non-statistical) pronunciation rule set for 
             the new language, put it into the file <name>.nrul (see an example in 
             regeln9.nrul). 
             Then call maus with the option RULESET=.../<name>.nrul
             The synopsis for a phonological rule (one line of the RULESET) is:
             (leftcontext)-(pattern)-(rightcontext)>(leftcontext)-(replacement)-(rightcontext)
             all (...) can be comma-separated strings of SAM-PA symbols (including the 
             utterance-initial symbol /</ and the utterance-medial word 
             boundary /#/, but NOT the utterance-end symbol />/!) or even the empty 
             string (meaning all contexts), e.g. 
             P2:-C-s,t>P2:-k-s,t
             -> a /C/ in context 2: ... st can be replaced by a /k/
             -a-#>-A-#
             -> all word final /a/ can be uttered as /A/
             See more examples in PARAM/regeln9.nrul.
           - If you can provide a statistical pronunciation rule set for
             the new language, put it into the file <name>.rul (see an example in
             rml-0.95.rul).
             Then call maus with the option RULESET=.../<name>.rul
             The synopsis for a statistical rule (one line of the RULESET) is:
             (leftcontext),(pattern),(rightcontext)>(leftcontext),(replacement),(rightcontext) ln(P(r|l,p,r) 0.000000
             (leftcontext)/(rightcontext) must be single phoneme symbols that match on both sides
             of the rule (including the utterance-initial symbol /</ and the word boundary /#/, but 
             NOT the utterance-end symbol />/!); the pattern/replacement can be comma-separated 
             strings of SAM-PA symbols (including the utterance-medial word boundary /#/, but 
             NOT the utterance-inital/end symbols /<>/!) or the empty string,
             ln(P(r|l,p,r) is the (natural) log conditional probability for a replacement r given 
             the context and pattern l,p,r. Last column is always '0.000000', e.g.
             t,E,t>t,e:,t -0.916293 0.000000
             -> an /E/ is replaced by an /e:/ in the context t ... t with 40% probability.  
             g,@,n,t>g,N,t -1.292769 0.000000
             -> reduces a /@n/ by /N/ in context g ... t
             I,n,#>I,# -5.190177 0.000000
             -> deletes word fine /n/ in pre-context /I/              
             See more examples in PARAM/rml-0.95.rul.

Remark regarding MAUS rule sets:
A context or pattern string within a MAUS rule as discussed above is parsed in a somewhat sloppy 
(but robust) way: a sequence of characters that should encode a single phonemic symbol 
(i.e. enclosed by '-...-' or ',...,' respectively) such as ',abcd,' is firstly checked whether
the complete sequence 'abcd' represents a symbol in GRAPHINVENTAR. If not, MAUS tries to parse the 
character sequence as a sequence of valid phonemic symbols. E.g. if 'ab' and 'cd' are included in 
GRAPHINVENTAR but 'abcd' is not, the sequence is interpreted as ',ab,cd,'. Parsing is applied 
left-to-right and by maximum local string length. For instance the above example will not lead to 
',a,bcd,' even if 'a' and 'bcd' are valid symbols, because 'ab' and 'cd' are also valid symbols.
Only if the character sequence cannot be parsed into a sequence of valid phonemic symbols, MAUS
will issue a warning message, such as:
File: CrlkontWordVar.cc, Line 46
error in rule 1 (#sa>), discarding
and will ignore this rule for the further processing (rules are counted starting with 0). 
This peculiar way of parsing rules may lead to unexpected results if 
- the order of symbols in GRAPHINVENTAR is not by descending string length
- the rules contain combinations of phonemic symbols that are valid but not intended.


SIGNAL PROPERTIES

This script is intended to work for mono NIST and WAV sound files with
16 kHz sampling rate and 16 bit linear (FIXRATE), because the HMM are
trained to this type of data.
MAUS will automatically resample the signal / convert to mono using sox;
to prevent this automatism set the option 'allowresamp' to 'no'.
The script will complain if you try to use other sampling rates or HMM
trained with other sampling rates. Note that ALL kinds of re-sampling
detoriate the signals!

If you use WAV signal files as input, the tool sox must be installed 
on your computer. 
Then sox will always produce a suitable input file for maus regardless
what you give as an input file.

MAUS CACHE

Maus can check the cache $TEMP for existing *.htk files
with the same name and take these instead of performing the frontend
processing anew (this is done to save time on larger corpora). 
Use the option CLEAN=0 (default is 1) to get this effect, but keep in mind
that your signals must not be altered then.

INTER-WORD SILENCE

The silence model '#' in the HMM set must be a tee-model.
The HVite will always complain about the 'words' '#' or '&' that are 
tee-words. It's safe to ignore these warning.


EXAMPLES

Simply calling maus will issue a long and detailed usage message:

% ./maus 

# usage: maus SIGNAL=signal.nis|wav BPF=signal.par [OUT=maustier.mau][OUTFORMAT=mau|TextGrid][CLEAN=1][PARAM=parameter-dir][CANONLY=no][allowresamp=no][WEIGHT=weight][INSPROB=insprob][STARTWORD=0][ENDWORD=999999]
# usage: maus SIGNAL=signal.nis|wav KANSTR="a: b e: t s e:" [OUT=maustier.mau][OUTFORMAT=mau|TextGrid][CLEAN=1][PARAM=parameter-dir][CANONLY=no][allowresamp=no][WEIGHT=weight][INSPROB=insprob]


# General remarks:
...

The following call will read the canonical pronunciation from a BPF file
and segment the signal in EXAMPLES/GERMAN/g001acn1_000_AAJ.nis using classical 
MAUS into the file EXAMPLES/GERMAN/g001acn1_000_AAJ.mau:

% ./maus v=1 SIGNAL=EXAMPLES/GERMAN/g001acn1_000_AAJ.nis \
   BPF=EXAMPLES/GERMAN/g001acn1_000_AAJ.par

The following call will do the same but write the resulting MAU tier into 
the file 'Result.mau':

% ./maus v=1 OUT=Result.mau SIGNAL=EXAMPLES/GERMAN/g001acn1_000_AAJ.nis \
   BPF=EXAMPLES/GERMAN/g001acn1_000_AAJ.par

The following call will do the same but instead of a BPF tier it will 
create a praat compatible TextGrid file 'Result.TextGrid':

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid \
   SIGNAL=EXAMPLES/GERMAN/g001acn1_000_AAJ.nis \
   BPF=EXAMPLES/GERMAN/g001acn1_000_AAJ.par

The next call will write two additional tiers into the TextGrid output
with a word segmentation and a canonical transcript of the words

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid \
   SIGNAL=EXAMPLES/GERMAN/g046acn1_037_AFI.wav INSORTTEXTGRID=yes\
   BPF=EXAMPLES/GERMAN/g046acn1_037_AFI.par INSKANTEXTGRID=yes

The following call will do the same but instead of a TextGrid it will 
create Emu compatible files 'Result.hlb' and 'Result.phonetic':

% ./maus v=1 OUT=Result OUTFORMAT=emu \
   SIGNAL=EXAMPLES/GERMAN/g001acn1_000_AAJ.nis \
   BPF=EXAMPLES/GERMAN/g001acn1_000_AAJ.par

If you want the output files to have the same name and location of the
signal file, simply ommit the option OUT=...

The next call will use a TRN tier in the input BPF to restrict the
search on a segment given there; by doing this long initial or final silence
intervals are being ignored by maus; this can also be used to selectively
segment only parts of a longer recording; note however that the timing
of the results is always based on the total signal.

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid \
   SIGNAL=EXAMPLES/GERMAN/g046acn1_037_AFI.wav USETRN=yes\
   BPF=EXAMPLES/GERMAN/g046acn1_037_AFI.par 

The following call will do the same but the canonical string that
MAUS uses will start with the 5th word and end with the 9th word 
of the BPF file (counting starts with 0):

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid \
   SIGNAL=EXAMPLES/GERMAN/g001acn1_000_AAJ.nis \
   BPF=EXAMPLES/GERMAN/g001acn1_000_AAJ.par STARTWORD=4 ENDWORD=8

The following call will read the canonical pronunciation from the 
command line instead from a KAN BPF tier; please note the usage 
of blanks and quotes!

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid \
   SIGNAL=EXAMPLES/GERMAN/g001acn2_075_AAK.nis \
   KANSTR="f i: r Q U n t t s v a n t s I C s t 6 # Q aU f # f Y n f Q U n t t s v a n t s I C s t @ n # j u: n i: # d i: n s t a: k # Q aU f # m I t v O x"

Note that the symbol '#' may be used to indicate possible pauses between words.
This might improve the quality of your MAUS output. Optional pauses are 
inserted automatically when reading from a BPF file instead from command
line (see option BPF).
Initial and final pauses are also inserted automatically.

The next call will use a WAV sound file as input instead of SPHERE NIST. 
Maus will automatically recognize this but it will only work if the WAV
sound file contains a mono signal with 16 kHz/16 bit sampling rate:

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid \
   SIGNAL=EXAMPLES/GERMAN/g046acn1_037_AFI.wav \
   BPF=EXAMPLES/GERMAN/g046acn1_037_AFI.par

The next call uses a WAV input with a different sampling rate; maus will 
detect this and re-sample the signal; note that a mau output file
will be still based on the original sampling rate.

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid \
   SIGNAL=EXAMPLES/GERMAN/g046acn1_037_AFI.wav allowresamp=yes\
   BPF=EXAMPLES/GERMAN/g046acn1_037_AFI.par

The next call will not clean the $TEMP area after processing; the 
preprocessed signal file (*.htk) and all intermediate files up to 
the result of the Viterbi alignment (*.rec) will remain.
In a possible identical later call the *.htk file will be recycled thus
saving processing time.

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid CLEAN=0 \
   SIGNAL=EXAMPLES/GERMAN/g046acn1_037_AFI.wav \
   BPF=EXAMPLES/GERMAN/g046acn1_037_AFI.par

The next call will not use the MAUS method but do a forced alignment 
to the given canonical pronunciation. Note that when using CANONLY=yes
the maus script will not require the C++ program word_var-2.0, which might
be useful on platforms where this program does not compile at installation:

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid CLEAN=1 \
   SIGNAL=EXAMPLES/GERMAN/g046acn1_037_AFI.wav \
   BPF=EXAMPLES/GERMAN/g046acn1_037_AFI.par CANONLY=yes

Finally, the last example uses a different parameter set than classical 
MAUS; be very careful when designing such a parameter set:

% ./maus v=1 OUT=Result.TextGrid OUTFORMAT=TextGrid CLEAN=1 \
   SIGNAL=EXAMPLES/GERMAN/g046acn1_037_AFI.wav \
   BPF=EXAMPLES/GERMAN/g046acn1_037_AFI.par PARAM=MyParamDir


PARAMETERS

See the file PARAM/README for details about the parameter files that 
maus needs and about their somewhat complicated relationship to each other.

In this package there are the following parameter sets:

PARAM		: classical MAUS with statistically learned rule set
PARAM.<lang>	: MAUS adapted for language <lang>
PARAM.MAN	: phonological MAUS with a hand-crafted phonological
 		  rule set without statistics
PARAM.sampa     : lnguage independent parameter set (forced alignment only!)

Default is 'PARAM'; to use other parameters sets use the option 'PARAM'
to define the directory where the parameter files reside.

All parameter sets use a set of contex-free phoneme HMMs stored in 
the file MMF.mmf. It always contains a special model for
articulated noise (<usb>), for instance for non-understandable words,
cough, throat clear, laughs or hesitations, for non-articulated noise
(<nib>), for optional silence (#) and non-optional silence (<p:>).

The models are plain left-to-right HMM with 3-5 states and 5 gaussian 
mixctures per state (diagonalized covariance matrices).
They were trained to the manually segmented and labelled part of the 
Verbmobil data set.
A 3-state HMM without leap transitions implies a minimum duration of 
3 x 10msec = 30msec of the corresponding phonetic segment. This may lead to 
'ceiling effects' in duration analysis when using force-alignment modus 
in MAUS, since the Viterbi is then forced to model a minimum of 30msec 
segment. We do not apply shorter minimum durations in our HMM sets (e.g. 
by introducing leap transitions) because the decision whether a phonemic
segment is there or not shouldbe modelled in the pronunciation models not 
in the acoustic model. However, for certain investigations you might consider 
to replace the standard HMM set of a language by a customized HMM set 
with shorter minimum durations. See also:

Katarina Bartkova, Denis Jouvet. Impact of frame rate on automatic speech-text alignment for
corpus-based phonetic studies. ICPhS'2015 - 18th International Congress of Phonetic Sciences,
Aug 2015, Glasgow, United Kingdom. Proceedings ICPhS 2015.

for a discussion of minimum duration modelling in automatic phonetic segmentation 
systems.


HISTORY

See file HISTORY in this dir.


EXIT CODES

0 : everything seems ok (but we never know, do we?)
1 : serious error
2 : probably just a signal file with the wrong coding,
3 : the BPF contains no KAN tier - doin' nothin'


POSSIBLE PROBLEMS

Check out the section 'KNOWN BUGS' in file HISTORY


PROCESSING A CORPUS WITH MAUS

Use the wrapper script maus.corpus
This script reads a list of signal files from a file SLIST, searches for
corresponding BAS Partitur Format (BPF) files to each signal file and 
performs a MAUS segmentation. Please refer to the remarks in the header 
of maus.corpus for detailed usage.
Very useful is the option OUTDIR='#APPEND#': resulting MAU tiers are 
automatically inserted into the input BPF files.
Option CREATETRN=yes or CREATETRN=force will call the speech detector
wav2trn to create a TRN tier in the BPF input file (force will overwrite
existing TRN tiers!). The maus script is then called with option 
USETRN=yes and segments only speech within the detected boundaries.
You may use the simple script txt2par in this package to create BPF 
files from simple two-column TXT files:
- create TXT files with the same name as the sound files with one word 
per line and orthography in the 1st column and transcript (SAM-PA) in 
the second column.
- call txt2par in the dir (creates BPF files in the same dir)
- make list 
ls *.wav > SLIST.txt
- call maus.corpus
maus.corpus SLIST=SLIST.txt BPFDIR=<dir> ...

USING ITERATIVE MAUS

Use the wrapper script maus.iter (also see HISTORY.ITERATIVE for details)
Iterative MAUS denotes a variant of maus.corpus where the acoustical HMM 
of maus are iteratively adapted to the MAUS segmentation of the target 
material. You will need at least 20 min of target material preferably 
of one single speaker or of a speaker group with common features (e.g.
a dialect). See the remarks in the header of maus.iter for usage.
To build a seed model set for maus.iter you may use the tools in 
subdir HMM (see the README there).

USING THE VISUALIZING TOOL GRAPHVIS

graphvis is a Motif binary that should plot a lattice file *.slf on screen.
The lattice file contains the pronunciation graph used by MAUS: the nodes
contain phonemic symbols while the arcs contain probabilities.

Usage: graphvis if=file.slf iv=inventar

The lattice file file.slf can be obtained by running maus with option 
CLEAN=0 and then looking for the last *.slf file in the cache MAUSTEMP.
As inventar use the file PARAM/KANINVENTAR