
Check out the file HISTORY.ITERATIVE for development history of the 
iterative MAUS technique.
Check the development status table at the end of this document for the 
current development status of individual supported languages.
Also check out PARAM.<lng>/README for details about the relationship 
between Phonem inventory filters, rule set and HMM corpus.

HISTORY

05.03.03 : Re-engineering the MAUS software:
           maus produces exactly one segmentation from a NIST file
	   See disclaimers in USAGE
06.03.03 : Tested the possibility of tee-words in HTK
           A 'tee-word' is a word that allows the jump from the 
	   virtual first node of the first HMM to the last virtual node
	   of the last model. This would be helpful to insert 'dummy'
	   words into the lattice that occur in the output, but do not 
	   consume any frames.
	   In our case this would be the '#' model indicating a word 
	   boundary.
	   Modified the '#' model to be a tee-model, HVite spits out a
	   warning about a tee-word but does the alignment alright (at 
	   least for the test sentence!). The rec file then contains 
	   segments with duration zero indicating the word boundary.
	   If the  MINPAUSLEN in maus is set to 1 then, maus even detects
	   small pauses of 1 frame length
	   Verified the results on one test sentence -> ok
	   Used a larger sentence -> ok
	   mau2TextGrid converts MAU tier in TextGrid  praat 
	   as option build in maus -> ok
	   Version 1.0 maus 
07.03.03 : maus integrated in MkVMPar -> ok
           test on VM1.1 -> 
	   Error in DICT : 'word' 'i' had HMM 'i' which does not exist
	   Interesting that this error occurs only if the lattice contains
	   the 'word' 'i'; that means that HVite does not parse the entire 
	   dictionary on start-up.
	   Fixed DICT and tested the error turns -> ok
	   Test of V2.1 -> some errors
	   Test of VM15.1 -> handling of other codings and other languages 
	   missing
	   Version 1.1 should do that
	   Test of VM15.1 -> ok
17.03.03 : Version 1.3 maus :
           Silence intervals of smaller than 3 frames are deleted between words
	   because they are not perceivable. If the following phoneme is a 
	   plosive, the silence is added to the plosive; if not, the silence 
	   is spread equally onto both segments.
	   CLEAN is constraint to its own files, that is, the process will not
	   delete other files than he has created in $TEMP
	   However, it might happen that two instances of maus are working on 
	   equally named files. Therefore the semaphore check is still in action
	   Test of all German VM volumes on linux35:/scratch/PARTEST -> ok
           Lots of bug fixes -> ok
18.03.03 : Version 1.4 maus :
           New option PARAM allows to select a parameter set (default is 
	   $SOURCE/PARAM with the statistical rule set for German -> ok
	   New parameter set PARAM.MAN with phonological rule set -> ok
19.03.03 : Verified that the pause handling works -> ok
           Set the minimum length of pauses between words to 50 msec.
14.08.03 : /y/ als erlaubtes Symbol in KANINV und GRAPHINV eingefuegt.
           DICT bildet /y/ auf /y:/ ab.
           Grund: Neue Konventionen der kanonischen Aussprache erlauben auch
           /y/; daher kommt es im Lexikon vor.
21.08.03 : 1.6 maus : Added possibility to use WAV signal files; input file 
           must have the extension 'wav' or 'WAV'. This is simply done
	   by sox: converting WAV into SPH (NIST).
22.08.03 : 1.7 maus : Added option 'CANONLY=yes' that causes the script to 
           perform a simple forced alignment to the signal without using 
	   the MAUS method.
	   Added options WEIGHT and INSPROB which are essentially the HVite
	   options -s and -p passed through. Since we do not know yet which 
	   values might be optimal, we stick to the theory and set them 
	   to s: 1.0 and p: 0.0 default.
12.09.03 : 1.8 maus : Optimzed the parameters WEIGHT and INSPROB to 7.0 and 0.0
           respectively (see comments in maus for details)
15.09.03 : 1.9 maus : Added option allowresamp=yes
           If set to 'yes', maus will try to re-sample input signals that
	   are not 16 kHz using polyphase of sox.
09.12.03 : 1.11 maus : Bugfix. In the TextGrid output all interval indices 
           were set to '1'. Strangely enough this error did not show up
           when loading the TextGrid into praat ...
20.01.04 : 1.0 maus.corpus
           -> ok
21.01.04 : 1.12 maus : Bug fix in maus. When unknown coding was found in 
           signal file, the CLEAN semaphore was not removed from cache.
           -> ok
	   Change cache handling: all temporary files written by maus are
	   prefixed by the process id of maus and at the end all files 
	   with that id are removed if option CLEAN is set to 1
	   Semaphore is not necessary any more then -> ok
22.01.04 : 1.13 maus : change help message output. To get a help message simply 
           type in 'maus'.
           1.0 maus.corpus : change help message output. 
	   To get a help message simply type in 'maus.corpus'.
23.01.04 : 1.14 maus : maus may now also process BPF files that have no KAN
           tier but have a ORT tier. This works only, if 'create_kan' and
	   'mk_pron' are installed.
29.01.04 : 1.15 maus : MMF can be defined from command line now and does not 
           need to be in dir PARAM any more
30.01.04 : /a:~/ inserted as allowed symbol in KANINV and GRAPHINV.
           DICT maps  /a:~/ to /a:/.
	   Reason: New conventions for Standard German Pronunciation allow
	   /a:~/; therefore it may show up in BPF files or lexica.
06.04.04 : 1.17 maus, kan2mlf.awk : New options STARTWORD and ENDWORD allow
           to select only a portion from the input BPF file.
08.04.04 : 1.18 maus : parameter set KANINVENTAR, GRAPHINVENTAR and DICT
	   extended by 'foreign' phonemes that might appear in German when
	   foreign words are uttered. These phonemes are mapped to their
	   nearest German symbols for HMM modelling but passed as is to the 
	   segmentation output. Therefore a /T/ in the input will be internal 
	   modelled by /s/ but shows up in the output as label /T/ again.
26.08.04 : 1.19 maus : Since the histograms over segment boundary deviations 
           show a rather distinctive shift of 10 msec of the MAUS boundaries 
	   into an earlier position (that is: the MAUS boundaries are 
	   10 msecs too early), we introduce a new option MAUSSHIFT (which 
	   is default set to 10) that shifts the MAUS boundaries by that 
	   given parameter in msecs.
	   (This involves changes in maus.iter, maus.corpus, maus and 
	   rec2mau.awk)
	   Also changed rec2mau.awk so that plosives are recognized by their
	   first char only. That way plosives labels like /k_s/ are treated 
	   correctly, if there is an preceeding inter-word silence that will
	   be spread. (/k_s/ denotes the silence interval of a /k/ plosive)
12.07.06 : 1.20 adapted word_var-2.0 to current Linux distribution SuSE 9.0
           Sources and necessary libraries are in ./word_var
	   To compile a new binary issue:
	   make word_var
	   make install
	   Note that this is still a dynamically linked binary.
19.07.06 : 1.21 adapted word_var to be a statically linked binary.
           cd ./word_var.src
           make word_var
	   make install
17.04.07 : 1.22 added subdir 'ipkclib' containing the header ipkclib.h and the 
           library libipkclib.a for compilation of different OS than Linux
	   Added some hints about that in the docu files.
08.06.08 : 1.24 added option INSORTTIER=no
           If set to 'yes' and option OUTFORMAT is set to 'TextGrid' and input
	   is read from a BPF, maus will try to identify either an ORT tier
	   or - if that fails - a KAN tier (must be there as input!) and 
	   write an additional interval section into the TextGrid file 
	   containing the word segmentation based on the underlying MAUS
	   segmentation. The tier is called either 'ORT:' or 'KAN:' 
	   respectively; it contains non-labeled segments where MAUS labelled
	   a silence interval and a segment either labelled with the 
	   orthography or the canonical transcript for the words. If set to
	   'no' the regular Textgrid output with one interval section is 
	   produced.
09.06.08 : 1.25 added option USETRN=no
           If set to 'yes' maus will search the input BPF for a trn tier
	   that segmentes the utterance within the recording. maus will cut
	   out the segment and run the MAUS segmentation only within these
	   cut out segment. Afterwards the offset and final cut off are
	   re-calculated into the final mau or TextGrid file.
           1.26 added option INSKANTEXTGRID=no
	   See INSORTTEXTGRID respectively. If both options are set to 'yes'
	   first the orthographic tier, then the canonical transcript tier 
	   are exported to the TextGrid file. If the source BPF file does
	   not contain an ORT tier, two canonical transcript tiers are 
	   exported.
12.06.08 : 1.27 : changed behaviour of MINPAUSLEN:
           If both adjacent segments of a deleted inter-word silence are 
	   plosives, the deleted interval is spread equally to both plosives
	   (before that only the word-initial plosive was enlarged).
	   If the word-final segment is a plosive, the deleted interval is 
	   added totally to that final plosive (before the interval was
	   spread equally to word-final plosive and word-initial non-phoneme)
28.06.08 : 1.3 maus.corpus : If the option OUTDIR is set to '#APPEND#' the 
           script will insert the resulting mau tier into the source BPFs
	   This requires maus version 2.0 or higher.
02.07.08 : 2.0 : re-engineered the handling of times and sampling rates:
           The overall behaviour of maus is the same as before with one 
	   important exception: the timing information in the temporary 
	   or final mau output files are not based on the model sampling rate
	   any more, but are always based on the sampling rate of the input
	   signal file. Therefore scripts who are post-processing the mau output
	   and are using input signals other than the HMM sampling rate must be
	   fixed for this new behaviour.
	   Furthermore the HMM sampling rate is no longer fixed but is read from
	   the HCopy config file PRECONFIGNIST; that way other sampling rates
	   can be used in the HMM and maus will automatically adapt to that
	   (still the user has to take care that the config file PRECONFIGNIST
	   matches the HMMs used!)
	   The new version makes it easier to include mau output directly into
	   source BPF files that are not sampled in the HMM sampling rate,
	   as for instance is done in maus.corpus if you use the option 
	   OUTDIR=#APPEND#.
23.07.08 : 2.3 maus : Some bug fixes and the option RULESET added
05.08.08 : 1.5 maus.corpus : changed option BPFDIR from REQ to OPT.
	   If BPFDIR is empty the script will search for the BPF in the 
	   same location as the signal file. 
13.08.08 : 2.4 maus : fixed some problems with the location of intermediate
           signal files: now they are all in the TEMP area and will be cleaned
	   up after maus. Before that you might have found '..._trim.nis' or
	   '..._resamp.nis' files in the location of the input signal file
	   after running maus.
	   Fixed a small problem in mau2TextGridORT.awk : if the KAN tier 
	   of the input BPF contains secondary lexical stress markers (") then 
	   a corrupt TextGrid will be created by maus. Now these markers are
	   simply deleted, until I find out how to use '"' within a praat label
13.08.08 : 1.6 maus.corpus : introduced option CREATETRN=no|yes|force
01.11.08 : 2.5 maus : introduced options INSORTTEXTGRID INSKANTEXTGRID
25.02.09 : 2.6 maus : Bug fix : version 2.5 did not work with other PARAM sets
                                'dummy.rul' was default set to 'dummy'
                      Inserted a fixed locale LANG = en_US.UTF-8 because
		      scripts called by maus will produce output with floating
		      point number formatted with a comma instead of a dot, if
		      the locale of the environment is for instance de_DE.
02.03.09 : 2.7 maus : An optional word-internal silence interval '<p:>' is now 
	   allowed in the input. To force MAUS to model a silence interval 
	   the symbol '<' should be used. The symbols '#', '&' and '<p:>'
	   all model silence intervals that can be of zero length or can be
	   deleted if less than a threshold defined by option MINPAUSLEN.
08.03.09 : 1.7 maus.corpus : fixed minor bug : if option CREATETRN=yes and 
           one single BPF in corpus already had a TRN tier, then the option
	   was deactivated for the rest of the corpus. Now the rest will
	   be checked as before.
11.03.09 : 2.8 maus : the mapping scripts kan2mlf.awk (which maps the 
           canonical input phoneme string to the MAUS internal phoneme set)
	   and rec2mau.awk (which maps the internal phoneme set back to 
	   the input phoneme set) are dependend on the sets stored in the 
	   PARAM dir). To be conform, the scripts are therefore moved to
	   the PARAM set of files and can individually be adapted to 
	   different sets.
	   To summarize:
	   If a new language set PARAM.LANGUAGE is defined, do the following:
	   - copy the standard German set PARAM to PARAM.LANGUAGE
	   - within the new set adapt the following files:
	     KANINVENTAR (the set of phonemes used in the canonical input and 
	     MAUS output)
	     kan2mlf.awk (the mapping script from KANIVENTAR to GRAPHINVENTAR)
	     rec2mau.awk (the reverse mapping)
	   - in case that you add/change/delete models in the HMM set you'll 
	     need to adapt also the following files:
	     GRAPHINVENTAR (the set of MAUS internal phonemes)
	     HMMINVENTAR (the set of used HMMs)
	     DICT (the mapping from GRAPHINVENTAR to HMMINVENTAR)
	     MMF.mmf (the HTK HMM set that matches HMMINVENTAR)
	   A good example for such a new language set is PARAM.HUNGARIAN.
16.03.09 : 2.9 maus : Audio input ALAW raw 8kHz (extensions al, AL, dea, DEA)
           allowed now. ALAW samples are converted to PCM/16kHz.
30.03.09 : 2.10 maus : TextGrid files created with MAUS missed the line
           'item []:' in the header. Although praat didn't seem to notice, 
	   other programs like Emu need this redundant entry for some reason.
	   The new maus version does create this header entry.
21.04.09 : 2.11 maus : praat has a bug that causes boundaries with not exact 
           the same float number for segment end and segment begin to be 
	   disfunctional (can't be moved for instance). We changed the 
	   TextGrid export so that end and begin are always exact the same 
	   number.	   
03.06.09 : PARAM.HUNGARIAN : added virtual model 'geminate /t/' modelled by /t/
28.07.09 : Added provisional support for English PARAM.ENGLISH (see README there)
           Warning: maus will issue warnings if used with this param set and the 
	   German rule set (default), but these warnings can be ignored.
           Re-structured and extended EXAMPLES dir according to supported languages
29.07.09 : 2.12 maus : Re-worked provisional support for Hungarian
05.08.09 : 2.13 maus : added output into Emu format files: OUTFORMAT=emu	   
07.12.09 : 1.9 maus.corpus : bug-fix when maus.corpus was called with a file
           list SLIST=.. that contained no path information
	   improved security for temporary files handled by different instances of 
	   maus.corpus on the same host
	   removed a 'bug' that caused maus.corpus to write the resulting files
	   into the location of BPFDIR=... instead to the location of the signal
	   files if OUTDIR=... was not set.
22.01.10 : 2.14 maus : bug fix: when called without BPF but with USETRN=yes an error
           was issued. Now the option USETRN=yes is being ignored
	   2.14 maus 1.10 maus.corpus : change method to define the rule set; the
	   default link RULESET.rul in PARAM was removed. The default rule set is 
	   now the statistical rule set rml-0.95.rul.
01.07.10 : 2.15 maus : call with options INSORTTEXTGRID or INSKANTEXTGRID and
           without option BPF was handled disgracefully
13.07.10 : 2.16 maus : in rare cases the temporal boundaries in a TextGrid
           result file may differ slightly between the MAU and the ORT|KAN 
	   tiers. This does not bother praat, but other programs that for 
	   instance read the TextGrid to build a hierarchy (e.g. Emu). This
	   behaviour was fixed in this version.
27.10.10 : 2.17 maus : moved ITALIAN from BETA to RELEASED	   
20.11.10 : 2.18 maus : fixed some minor bugs in the export to Emu files
                       forced all outputs to be strictly onsecutive segments
		       Although this ist not necessary according to the 
		       BPF, we found that many import routines require this
24.11.10 : 2.19 maus : fixed some problems when creating Emu output within 
                       maus.corpus: if option OUTFORMAT=emu is selected, then 
		       maus will create 2 files *.hlb and *.phonetic with the
		       same name as the signal file either in the dir of a given
		       file name in option OUT, or. if OUT is not given, in the 
		       dir of the signal file. Note that the actual filename
		       in OUT will be discarded.
15.03.11 : 2.20 maus : fixed minor bug: when USETRN=yes and KANSTR!="" an error 
                       occured because the script looks for a BPF to read the TRN
26.04.11 : 2.21 maus : maus issues a warning if Emu files are produced containing 
                       non-Emu-conform SAM-PA labels such as '{'.
08.07.11 : added simple script 'txt2par' to create BPF input files for the
           usage in maus.corpus. Simply provide TXT files with the same name
	   as the sound files with one word per line and orthography in the 
           1st column and transcript (SAM-PA) in the second column.
15.12.11 : 2.22 maus : technical changes that do not change the functionality:
           all references to developper paths removed.
20.12.11 : 2.23 maus : technical changes that do not change the 
           functionality: when called with CANONLY=yes, the script does not call
	   word_var-2.0 but uses the simple HVite aligner instead; bug fix in
	   par2emu: temporary files were not deleted on error exit; input BPF 
	   is filtered for '\r' (DOS files); sox resampling re-formulated so 
	   that no automatic dithering takes place (the dithering took place if
	   input signals had a higher sampling rate than 16000Hz; this caused 
	   a small amount of white noise be added to the signal which in turn 
	   caused maus to produce randomly fluctuating segmentation results.)
	   added functionality check CHECK/maus.check
24.01.12 : 2.24 maus : changed default setting for allowresamp to 'yes'	   
27.01.12 : 2.25 maus : changed behavior of options PARAM and RULESET: if the
           option has no path before the file name, the script no only checks 
	   in the local directory but also in the SOURCE directory for the given
	   file. This way for instance the language can be changed by simply
	   PARAM=PARAM.HUNGARIAN whereever maus is called.
23.02.12 : 2.26 maus : added option PRINTINV 	   
28.02.12 : 2.27 maus : added an additional error report to stdout and a definite
           error exit code 1 to the case that the sub-script kan2mlf issues
	   an error (before 2.27 only an error message was printed to stderr, but
	   the main script continued.)
05.03.12 : 2.28 maus : added a better error message for a missing rule file	   
           (not distributed!)
08.03.12 : 2.29 maus : added webservice options into the documentation, so that users
           of the webservice based help function can associate the options.
21.03.12 : 1.13	maus.corpus : option PARAM is searched for in SOURCE, if not found 
           (same behavior as in maus!)
10.04.12 : 2.30 maus : added option value OUTFORMAT=mau-append	   
12.04.12 : 2.31 maus : added option value OUTFORMAT=EMU
29.05.12 : 2.32 maus German : the symbol /Q/ will be still accepted as a glottal 
           stop, but also /?/ (SAM-PA) and maus will now always produce a /?/ 
	   in the output regardless whether the input was /Q/ or /?/.
25.06.12 : 2.33 maus : added Dutch language (German rule set, German HMM)	   
           bug fix: caused by a HTK bug the last segment could in rare cases
	   have a negative length.
26.06.12 : 2.34 maus : fixed some inconsistencies in the PARAM dirs: now every 
           language should per default use the best suited rule set, with the
	   exception of PORTUGUESE.BETA which should be used with the option
	   RULESET=regeln9.nrul 
05.07.12 : 1.14 maus.corpus : bug fix: if the option OUTDIR was set to a different 
           location than the location of the signals files, result files already 
	   present in the location of the signal file were overwritten (because 
	   maus writes into that location by default). 
06.07.12 : 1.12 maus.iter : multiple bug fixes related to problems when phoneme
           symbols contain curly brackets '{}'. It still does not work with 
	   backslashes in the phoneme symbol name, e.g. 'r\'. We do not have a solution 
	   for such phoneme sets yet. The only case where this happened until now
	   was Australian english where we map the (only) /r/ allophone /r\/ to /r/
	   and do not map it back in the output!
11.10.12 : 2.35 maus : added language Australian English (rule set and HMM trained
           to a subset (5421 samples) of AUSTALK)	   
18.10.12 : 2.36 maus, 1.15 maus.corpus : added option LANGUAGE=iso639 that overrides PARAM
25.10.12 : 2.37 maus : added option '--version', improved some help texts
07.11.12 : 2.38 maus : stricter handling of boolean parameters: boolean option such as
           USETRN=yes can only handle the following values: '0,1,yes,no,true,false' and their
	   capitalized variants (e.g. 'TRUE').
	   All other values cause an error exit 1.
08.11.12 : 2.39 maus : added special language mode LANGUAGE='sampa' which allows the 
           language independent segmentation of arbitrary inputs coded in SAM-PA.
16.11.12 : 2.40 maus : added CLIPS trained rule sets and HMMs for Italian	   
19.11.12 : 2.41 maus : bug fix: if the rule set allows the complete deletion of a 
           word, this was only represented correctly in the BPF (mau) output. TextGrid
	   is now supported in that way that the deleted word does not appear in the 
	   ORT and KAN tiers any more. emu and EMU output are also supported: the word
	   still appears in the word/cano tier but owns no segment in the phonetic tier. 
09.01.13 : 2.42 maus : LaTeX Umlauts in the BPF input tier ORT are transcoded to UTF-8 
           in the TextGrid output (former coding was ISO8859). The reason for this is
	   that praat cannot handle LaTeX Umlaut encoding as label names gracefully.
	   Emu or mau outputs are not transcoded, that is e.g. an ISO8859 or LaTeX 
	   encoded input is passed as such to the Emu output files.
	   The options STARTWORD and ENDWORD do not work properly with  
	   Emu output; therefore an error message is issued from this version on, if 
	   these options are selected together with OUTFORMAT=emu|EMU.
	   maus checks command line for unknown options and terminates with an ERROR 
	   message if it finds one.
	   If OUTFORMAT is set to emu|EMU and input BPF does not contain a SAM entry,
	   maus adds sampling rate of signal file to input BPF. 
09.01.13 : 1.0 maus.trn : a simple script to exemplify the combined usage of the maus
           options START/ENDWORD and USETRN: by providing an input BPF with a chunk 
	   segmentation coded in TRN entries (see format definition BPF TRN) this script
	   reperatedly calls maus with partial segmentations within a chunk of the 
	   input signal, and concatenates the results into the input BPF. Works only
	   with mau output.
11.01.13 : 1.16 maus.corpus : check command line for unknown options and terminates 
           with an ERROR message if it finds one.
15.01.13 : 2.43 maus : option OUTFORMAT=emu|EMU : if a BPF file named as the input 
           signal file is in the location of the signal file, but not the intended BPF 
	   input, maus transformed this file into the Emu result output instead of 
	   the intended BPF input + newly created MAU tier. this is a very rare case,
	   but if this happens from this version on, a warning issued and the BPF in 
	   the location of the signal file is being overwritten.
25.01.13 : 2.44 maus : bugs in Dutch parameter set : glottal stop in input caused error,
           misleading warnings about disfunctional rules -> rules removed.
28.01.13 : 2.45 maus : introduced language specific default options. These are stored 
           in an CSH script called 'DEFAULTS' in the parameter dir (e.g. PARAM.ITALIAN).
	   Language specific defaults are used, if no option is given on the command line.
	   If no language specific default is given in the DEFAULTS file the global 
	   default defined in the maus script is being used (e.g. WEIGHT = 7.0).
06.02.13 : 2.46 maus : Fixed bug in SAMPA inventar: some geminates were not defined	   
08.02.13 : 1.2 maus.web : introducing a new wrapper to the package that replaces 
           the locally installed maus script by calling the new CLARIN WebMAUS 
	   service instead. By using 'maus.web' instead of 'maus' you can use the maus
	   package without any local installation (and the hassle that comes with 
	   that; see INSTALL in this directory). Simply replace the 'maus' calls 
	   all scripts by 'maus.web'.
	   maus.web validates on the standard benchmark (see CHECK/...). Use
	   CHECK/maus.check.web to verify that on your computer.
14.03.13 : 2.47 maus : WAV input files with bit resolution other than 16 and more
           than 1 channel are automatically converted to 16bit, mono.
18.03.13 : all scripts : replaced '$?' by '$status' and 'gawk' by 'awk' to be 
           cpompatible wth differemt Linux installations (e.g. Ubuntu)
18.03.13 : 2.48 maus : introduced option USETRN=force, a pre-segmentation to 
           cut off leading and trailing silence is done with the helper wav2trn;
	   if the helper is not installed a WARNING will be issued and the script 
	   proceeds without pre-segmentation.
21.03.13 : 2.49 maus : adapted English parameter set to Australian English set
           (former: cloned German set).
	   Parameter set SAMPA: re-stricted HMM set source to languages that have
	   trained HMM; set-up complete benchmark for all SAMPA symbols.
22.03.13 : 2.50 maus : introduced chunk segmentation: if the TRN tier of the input
           BPF contains a chunk segmentation (as defined in the BPF format), maus
	   will recognize this and perform a chunk segmentation using the helper 
	   maus.trn. This works only with OUTFORMAT=mau-append, that is the results
	   are overwritten in a MAU tier of the input BPF.
25.03.13 : 2.51 maus : added helper par2TextGrid that is a general tool to transform
           BPF (MAU,[SAM,ORT,KAN]) into a 1-3 layer TextGrid file. Chunk segmentation
	   mode extended to all output formats.
	   1.3 maus.trn : extended output formats to mau|TextGrid|emu|EMU to 
	   make chunk segmentation mode fully compatible to maus. Some restrictions
	   still apply:
	   - overlapping chunks cannot be processed for TextGrid|emu|EMU output
	   because these formats do not allow segmemnts with negative time.
	   - emu|EMU output requires that the tiers KAN and TRN in the input BPF
	   are matched; other formats tolerated partial TRN (covering only a subset
	   of the KAN tier).
11.04.13 : 2.53 maus : script checks whether the loaded rule file is a dummy file 
           (named 'dummy.rul') which indicates that for the selected language there
	   exits no valid rule set. If the option CANONLY=false, i.e. the script should 
	   use a rule set, a WARNING message is issued to prevent un-voluntary usage 
	   of a dummy rule set.
03.06.13 : 2.54 maus : added special Hungarian SAM-PA symbols /J-/ and /c/ to SAM-PA
           parameter set.
03.07.13 : 2.55 maus : 
	   - Removed arbitrary inter-word silence model /&/ from MAUS inventars,
           because it interfers with the SAMPA vowel /&/.
	   - Changed HMM names in ITALIAN of sub-phonemic segments '*cl' and '*rl' to 
	   '*_cl' and '*_rl' to be conform with KANINVENTAR. This has no effect on 
	   normal operation but simplifies the automatic generation of the language
	   independent set SAMPA.
	   - Reduced experimental SAMPA set ESTONIAN to Wells definition of Estonian 
	   SAM-PA + extra diphthongs + extra French/English sounds.
	   - Removed SAMPA symbols from KANINVENTAR that had diacritic nasalized BEFORE
	   lengthening, e.g. /a~:/; henceforthwith only the following order of 
	   diacritics will be supported: lengthening (:) -> nasalisation (~) -> 
	   -> palatalisation (_j) -> aspriration (_h). E.g. /a:~_h/ would be 
	   allowed, but not /a:_h~/ /a~_h:/ etc.
           - Reworked the SAMPA language set completely: the set should now cover all
	   known SAMPA symbols derived from Wells SAMPA page, German and English wikipedia.
	   The full definition of basic SAMPA symbols in in PARAM.SAMPA/SAMPA.dia; 
	   language specific extensions are defined in SAMPA.dia (e.g. diacritics).
	   SUPERHMM.* still defines all trained HMM that MAUS knows about. SAMPA.map
	   maps SAMPA symbols that have no trained HMM to existing HMMs. The script 
	   mk_set creates the complete set anew (e.g. after adding HMMs to SUPERHMM)
	   - created a parallel UTF-8 table KANINVENTAR.inv in each language parameter
	   set that describes the used SAMPA symbols of that language in more detail;
	   this table is output if the option PRINTINV=true.
	   - created list of plosives that are handled specifically at word boundaries
	   (see script PARAM.<LANG>/rec2mau.awk). At the moment all these language
	   specific lists are linked to the list in PARAM.SAMPA/PLOSIVES. So, if a 
	   plosive is added to the latter, all languages (if they use this plosive),
	   will treat it specially at word boundaries.
08.07.13 : 2.56 maus : added language POLISH (iso639-3: pol): SAMPA set as defined by Wells 
           1996; cloned HMM from German, Italian and Australian models; no rules.
	   Completed IPA column (3) in KANINVENTAR.inv set descriptions.
09.07.13 : 2.57 maus : added new option OUTIPA (boolean); if set maus will use UTF-8 IPA
           symbols in all segmental output tiers instead of SAM-PA.
15.07.13 : 2.58 maus : Missing vowel /1/ in SAMPA set; added SAMPA symbols 
           /s`,z`,g_j,p_j,x_j,ts`,dz`/ to Polish SAMPA set.	   
18.07.13 : 2.59 maus : Added symbols /ddz,ddz_cl,ddz_rl/ to SAMPA set and to Italian set.	   
           2.60 maus : Added lots of symbols to Hungarian set to satisfy different users.
21.08.13 : 2.61 maus : non-human noise model was missing in some languages - fixed.	   
03.09.13 : 2.62 maus : Added SAM-PA symbols /d_j/, /i~/ and /u~/ to LANGUAGE sampa set.
04.09.13 : 2.63 maus : maus.learn may create insertion rules that double phonetic symbols 
           and that are not handled by word_var gracefully; inserted a warning in maus.learn 
	   and removed such nonsense rules from the Italian rule sets.
09.09.13 : 2.64 maus : KAN tier output in TextGrids contains only first SAMPA symbol if 
           used with LANGUAGE=sampa (SAMPA symbols are separated by a blank in KAN tier!)
	   (the same error happened with multiple label entries in the ORT tier) -> fixed.
20.09.13 : 2.65 maus : changed the WEIGHT option for LANGUAGE=eng to 1.0 (the same as 
           LANGUAGE=aus) so that eng and aus deliver exact the same results (this makes 
	   sense since PARAM.ENGLISH is identical to PARAM.AUSTRALIAN for now).
	   Adapted maus.web to the new JSON format return of the webservices.
22.10.13 : 2.66 maus 1.6 maus.trn : Bug fix: helper maus.trn did not pass through errors
           detected by its helper maus; an error in the segmentation of a single chunk 
	   was therefore only reported to stdout, but the exit code of maus remained 0, 
	   although this resulted in a corrupt output file.
15.11.13 : 2.67 maus : added 6 more diphthongs to AUSTRALIANENGLISH set	   
28.11.13 : 1.9 maus.web : changed to XML results format
05.12.13 : 1.17 maus.corpus : added option MMF to overwrite usage of default 
           HMM set in $PARAM/MMF.mmf (we need that for maus.iter!)
	   2.68 maus : added language Newzealand English (LANGUAGE=nze)
19.12.13 : changed default value for option WEIGHT in LANGUAGE=aus|nze|eng from 
           1.0 to 5.0 because users report excessive application of unlikely
	   rules.
23.12.13 : 2.69 maus : added a new (non-default) rule set rml.AUSTRALIAN.20131223.rul 
           for better consistency with the g2p -lng aus method by U. Reichel which is 
	   used in WebMAUS Basic. Basically the canonical pronunciation for the 
	   rule learning algorithm is now produced by the g2p method instead of 
	   manually encoded. Whether this is a better way, remains to be seen; 
	   at the moment the quality of the g2p method is rather poor, probably 
	   because of the UNILEX input. The rule training can be repeated almost 
	   automatically, in case that the g2p method improves in the future, or
	   we get more transcribed material. I noticed a lot of 'certainty rules',
	   i.e. rules that are always observed for a given context onthe data (which 
	   indicates that the phonological encoding deviates from the phonetic encoding,
	   e.g. /V/ is always realized as [6]), and some 'reverse phonological rules',
	   i.e. rules that describe a reversed reduction process, e.g. /Sn/>/S@n/
	   (which indicates that the g2p output is too phonetic). Nevertheless the usage
	   of this (non-default) rule set might improve results when using maus in 
	   conjunction with g2p on Australian English data, for instance in WebMAUS Basic.
           To apply this optional rule set use the option RULESET=rml.AUSTRALIAN.20131223.rul.
22.01.14 : extended helper tool par2TextGrid so that 'shared phonemes' are possible
           (= a phonetic segment that is assigned to two words), and to handle arbitrary
	   phonetic tiers, not just MAU tiers.
06.03.14 : maus.trn 1.7 : bug fix bei OUTFORMAT=TextGrid wurde die SAMPLERATE aus 
           dem Signalfile falsch ermittelt, falls nicht im Input-BPF per SAM Eintrag
	   gegeben.
24.03.14 : maus 2.70 : change HVite option FORCEOUT=F to prevent partial results; 
           HVite now exits a 1 and the error message 'no tokens survived to final node' 
	   is displayed.
24.04.14 : maus 2.71 : added simple check for KAN tier having at least 3 columns	   
14.05.14 : maus 2.72 : added the following symbols to SAMPA phoneme set: NN ww Q: I: U: Y:
           required by Swiss German.
02.06.14 : maus 2.73 : added option NOINITIALFINALSILENCE=no : if set to 'yes', maus
           will suppress the automatic modelling of initial/final silence intervals
27.06.14 : maus 2.74 : added language Georgian (kat): basic alignment 
24.07.14 : maus 2.75 : bug fix: if option ENDWORD was set to 0 for languages
           eng,aus,nze,por,pol,nld the script erroneously tried to set 
	   ENDWORD to 999999.
04.08.14 : maus 2.76 : bug fix: in rare cases the resulting segmentation was not
           exactly consecutive, which caused praat to mis-treat TextGrid output
	   produced by maus. Now all kinds of output formats should always 
	   produce exact matching segmental boundaries.
27.08.14 : maus.corpus 1.18 : added option OUTIPA to pass on to maus;
           removed default setting of MMF, because it interfers with the 
	   setting of LANGUAGE: if MMF is not set as option on the command line
	   the maus script will set it to the correct HMM set depending on 
	   the setting of either PARAM or LANGUAGE (LANGUAGE overwrites PARAM!);
           if MMF is set on command line, maus will use that MMF ignoring 
	   LANGUAGE or PARAM setting.
05.09.14 : maus 2.77 : echo of command line call is now restricted to 
           verbose level > 0 (before 2.77 the echo was independent of verbose 
	   level)
05.09.14 : maus 2.77 : echo of command line call is now restricted to 
           verbose level > 0 (before 2.77 the echo was independent of verbose 
	   level)
05.09.14 : maus.trn 1.18 : echo of command line call is now restricted to 
           verbose level > 0 (before 1.18 the echo was independent of verbose 
	   level)
06.10.14 : maus 2.78 : editing in comments; added option value 
           OUTFORMAT=legacyEMU identical to OUTFORMAT=EMU.
06.10.14 : maus 2.79 : extended Hungarian phoneme set by 
           /zz,ZZ,NN,FF,xx,dd_j,xx_j/
09.10.14 : maus 2.80 : added language AMERICANENGLISH (eng-US) with 
           HMM training and pronunciation rule set training basd on AUSTALK.
           added rfc5646 language codes to option LANGUAGE (old iso639 pseudo 
           codes including 'sampa' are retained for backward compatibility
           until further notice).
           added option INFORMAT=bpf|bpf-sampa to replace iso639 pseudo
           code 'sampa'.
10.10.14 : maus 2.81 : replaced eng-US rule set by set trained on TIMIT.
15.12.14 : maus 2.82 : trained HMM Estonian on BABEL/PhED (HMMLEARN/ESTONIAN);
           trained Estonian rule sets on PhED, part SKK0;
           fixed some errors in conjunction with OUTFORMAT=legacyEMU in maus.web
           maus.trn and CHECK scripts; currently maus and maus.trn still suppport 
           EMU as well as legacyEMU; maus.web only supports legacyEMU.
23.12.14 : maus 2.83 : bug fix in default rule set for Estonian; it is still not 
           clear whether this bug may not appear again in other input contexts; 
           this might require some more tweaking...
08.01.15 : maus 2.84 : added provisional LANGUAGE=fin-FI; cloned HMM; no rule set;
           since there exist no defined SAM-PA for Finnish we use the festival 
           SAM-PA set.
02.02.15 : maus 2.85 : added error reporting if sox signal file conversion to
           16bit PCM, 1 channel fails.
06.02.15 : maus 2.86 : added option MODUS = 'standard'; if set to 'bigram', maus runs
           a free phone recognition without BPF input on the signal. For this a
           phone bigram lattice (option LATBIGRAM) and a compatible mapping table
           from the symbols used in the bigram the HMM in HMM (option DICTBIGRAM) 
           must be present (defaults are PARAM.<LANGUAGE>/DICT.bigram LAT.bigram)
           OUTFORMAT is restricted to mau and TextGrid (tier MAU only!). Note 
           that WEIGHT influences the impact of the bigram.
09.02.15 : maus 2.88 : deprecated option CANONLY=true; now implemented as MODUS=align;
           for backwards compatibility reasons CANONLY=true still works (if MODUS 
           is not set), but a warning is being issued. Option MODUS=bigram
           overrules CANONLY (as before). 
23.02.15 : maus 2.90 : Trained Hungarian HMM (45) on the BEA corpus (min 100 instances
           per class); the remaining 21 models stay cloned models from the German
           HMM set. (Also replaced former cloned HMM for Hungarian in SUPERHHH set).
           Trained a rule set for Hungarian on BEA corpus fragment (approx. 16000
           annotated words), prune=20, smoothing (3200 rules). Rules reflect probably
           mainly systematic differences between the phonological coding (G2P output)
           and the BEA transcription rules, e.g. G2P often predicts 'd_j' but BEA 
           consistently uses 'J-' in segmentation etc. 
           To do: discuss systematic differences and possibly improve G2P for Hunagrian,
           the run rule set training again on BEA.
25.02.15 : maus 2.91 : Cleaning up silence modelling: since G2P now allows passing of 
           <...> in the transcription, it is possible to insert <nib> and <usb>  
           directly into the txt input where noice or human noise should be enforced.
           Since in many languages <p:> was modelled as optional HMM (t-model) we 
           harmonize the usage of the silence HMM 
           <p:< > < as non-optional and # as optional silence model for all languages: 
           Fixed bug in AMERICANENGLISH : the optional inter-word silence
           model '#' was not modelled by an optional HMM '<p:>' (T-model). Now '#' is
           truely optional.
           Fixed bug in DUTCH,ESTONIAN,FINNISH,GEORGIAN : 
             only optional silence model was applied even for 
             'real' silence intervals '<p:>', '<' and '>'.
           Fixed bug in NEWZEALANDENGLISH,POLISH,PORTUGUESE.EUROPE,SAMPA,SPANISH : 
             <p:> was optional T-model, now a real silence model.
02.03.15 : maus 2.92 : Technical change: moved rec2mau.awk from PARAM to SOURCE, since
           it does not need any language specific programming any more.
03.03.15 : maus 2.93 : Removed symbols N and J- from HUNGARIAN phoneme set; fixed buggy 
           phone alignment in BEA corpus; re-training of HMM and pronunciation model 
           HUNGARIAN
09.03.15 : maus 2.94 : Trained Georgian HMM on Corpus provided by Zakharia Pourtskhvanidze
           28 phonemes trained, 24 symbols cloned (mostly needed for foreign words)
11.03.15 : maus 2.95 : Changed default rule set for eng-AU to the newer set of Dec 2013, 
           (phonology derived by G2P) since the former rule set of Oct 2012 contained 
           some very strange rules that are probably caused by a faulty pronunciation dictionary 
           for eng-AU we used at that time. 
           Added MINNI service for ita-IT, eng-US, eng-AU
12.03.15 : maus 2.96 : Added MINNI service for hun-HU, ekk-EE
16.03.15 : maus 2.97 : Bug fix eng-* : the rule set used for eng-GB, eng-AU and eng-NZ 
           (= default rule set of eng-AU) caused a severe internal error in the program
           word_var, when a rule was applied, which context probabilities add up to 1.0
           or more. This is a general weakness of the fact that MAUS used only a float 
           mantisse of 7 when calculation log prob or probabilities. The same effect might 
           be observed in other languages as well (very rare though). 
           To fix this problem, the algorithm to learn the rules maus.learn now subtracts a 
           DISCOUNT value of 0.000001 to each conditional probability log(P(...)), to make 
           sure that the sum of probs always is less than 1.0.
           The default rules set for eng-AU (and other eng-* that point to that) is now
           trained on the AUSTALK corpus (95 speakers, 59 sentences each), with pruning 
           set to 20 and no smoothing and unlikely rules removed manually.
17.03.15 : maus 2.98 : Added new option OUTSYMBOL=sampa|ipa|manner|place to map phonetic 
           symbols in output (default: SAM-PA) to IPA (UTF-8) or IPA manner (vowel, plosive etc.)
           or IPA place of articulation (bilabial, dental, etc.). The mapping is derived from 
           tables PARAM.SAMPA/SAMPA.inv and SAMPA.dia from columns 3 (ipa), 7 (manner) and 8 (place).
           Note that applying OUTSYMBOL!=sampa is causing non-standard output in combination 
           of OUTFORMAT=mau|mau-append (BPF output), since BPF tier MAUS is only defined for SAM-PA.
           Deprecated option OUTIPA (still functional, but is superceeded by OUTSYMBOL!=sampa).
18.03.15 : maus 2.99 : re-calculated German phontactic bigram model (MODUS=bigram) using DARPA
           backoff bigram language modelling with default discounting (HTK HLStats). The former
           bigram model was based on a non-discounted, non-backoff bigram model, which caused a 
           large proportion of bigrams effectively be impossible (prob = 0). The now fixed bigram 
           is produced exactly with the same parameters and methods as in the other languages. 
           Bug fix: in MODUS=bigram numerical SAM-PA symbols in output had a leading 'P' - fixed
23.03.15 : maus 2.100 : added option value OUTFORMAT=par|PAR as aliases to option value 
           OUTFORMAT=mau-append; this is merely done because most users are not familiar with 
           the BPF tier concept.
           added option value OUTFORMAT=csv : this is equivalent with option value 'mau' (default),
           but the default output file name gets the extension 'csv' instead of 'mau'; this should 
           ease the use of simple table output of maus in spread sheet software.
10.04.15 : maus 2.101 : extended phoneme set of Finnish by /d/ and /d:/.
           added converter mausbpf2emuR from MAUS output BPF (OUTFORMAT=par) *.par to 
           emu DB *_annot.json file.
           added wrapper mausbpfDB2emuRDB to create complete emu DB from MAUS output BPF collection.
22.04.15 : maus 2.102 : bug fix: in some place 'cvs' instead of 'csv' was coded.        
24.04.15 : maus 2.103 : in mausbpf2emuDB incompatible level names (to legacyEMU) were used, fixed.
24.04.15 : maus 2.104 : added HMMs for eng-GB; PARAM.ENGLISH (which was a fake to AUSTRALIANENGLISH)
           is now obsolete; new is PARAM.BRITISHENGLISH
27.04.15 : maus 2.105 : language specific options (defined in PARAM.<lang>/DEFAULTS) are read, if
           the option value is 'default' or the empty string.
           Set global value for WEIGHT to 1.0 (was 7.0). 
29.04.15 : maus.trn 1.10 : check for missing KAN/TRN tier, negative times or negative word numbers 
           in TRN tier before starting processing and issue proper error messages
06.05.15 : maus 1.106 : eng-GB : new rule set trained on AIX-MARSEC corpus with prune=10 and nosmooth
13.05.15 : maus.trn 1.11 : added pre-test to check TRN entries for impossible short chunks = chunks 
           that contain more phonemes as are fitting in the speech signal assuming that each 
           phoneme has minimum duration of 20msec. In that case maus.trn throws an error before starting
           the segmentation.
08.06.15 : maus 2.107 : fixed a very rare bug: if the signal is really bad, the Viterbi aligner may 
           skip an entire word, if the word is composed of just one phoneme. This leads to a gap in the 
           word order of the output BPF which is formally ok, but most tools (including the par2TextGrid)
           expect a consecutive order of word numbers. Hence the TextGrid output might be wrong in such a 
           case. Fixed by changing par2TextGrid to calculate the number of words from the segments and 
           not from the maximum link number.
23.06.15 : maus 2.108 : added language Swiss German (gsw-CH); HMM partially (40/84) trained on 
           ETH Zuerich corpora (thanks Volker Dellwo); missing phonemes cloned from 
           other languages; no pronunciation model.
26.06.15 : maus 2.109 : added MODUS=bigram (MINNI) to LANGUAGE gsw-CH based on phonetic 
           segmentations in TEVOID etc corpora of ETH Zuerich.
01.07.15 : maus 2.110 : set default RULESET for LANGUAGE gsw-CH to a phonological rule set 
           SwissGerman.nrul that reflects possible effects caused by other Swiss German 
           dialects than Zuerich; since the rule set has no probabilities, all variants have the 
           same probability (experimental) 
07.07.15 : maus 2.111 : Bug fix in LANGUAGE=kat-GE : input files *.par with /ts_>/ or /tS_>/ caused 
           an empty result caused by a mis-match in the Georgian PARAM set (DICT) 
13.07.15 : maus 2.112 : follow-up bug to version 2.107 : there was a nother bug in one of the AWK helper 
           of par2TextGrid, causing par2TextGrid to stop at a word that has no phoneme assigned -> fixed
22.07.15 : maus 2.113 : extended gsw-CH HMM set by a few additional trainable symbols and a virtual
           symbol /kx/; updated MODUS=bigram service as well.
01.09.15 : maus 2.114 : Swiss German (Dieth): deleted pronunciation rule *-e-# > *-@-# because this 
           is now the default pronunciation producedby the Dieth variant of G2P.
11.09.15 : maus 2.115 : Bug fix - when using chunk segmentation mode (USETRN=true with more than one TRN
           tier in input BPF) and option NOINITIALFINALSILENCE=true, the resulting segmentation was corrupt
           in all OUTFORMATS - fixed.
02.10.15 : maus 2.116 : added language support Russian rus-RU (thanks to Daniil Kocharov 
           and Alexander Belyy)
19.10.15 : maus 2.117 : added language support French fra-FR (thanks to Nina Pörner & Uwe Reichel)
15.12.15 : maus 2.118 : gsw-CH added pronunciation rules -{-r>-E-r and -{:-r>-E:-r (thanks to 
           Hanna Ruch, University of Zurich)
18.12.15 : maus.corpus 1.19 : bug fix USETRN=force was not passed to the maus script and caused an error
           maus 2.119 : KAN tier may contain optional white spaces in regular languages (before: only required
           in LANGUAGE=sampa); this allows users to use KAN strings in input BPF that were created 
           with separated phonemic symbols. Test phase only for LANGUAGE=deu.
12.01.16 : maus 2.120 : bug fix in ITALIAN : due to an un-sorted phoneme inventar in GRAPHINVENTAR the rule 
           application was buggy; instead of using a replacement rule such as #,s,a>#,ts,a where a word-initial
           /s/ is replaced by the affricate /ts/, a /t/ was inserted before /s/. 
13.01.16 : maus 2.121 : extended optional white spaces (see 1.119) to all languages.
           MODUS=bigram support (MINNI) for LANGUAGE=fra-FR.
22.01.16 : maus.trn 1.13 : changed error message for impossible short chunk: 
           now the starting sample of the chunk is reported.
29.01.16 : maus.trn 1.14 : changed minumum estimated duration length per phone HMM to 30msec
           for impossible short chunk check. This causes fewer HVite error where no result is calculated
           because the signal does not fit into the pronunciation model (which is an ackward error message!)
18.02.16 : maus.trn 1.15 : bug fix: if optin OUTFORMAT=par|mau-append and option OUT= was set to the input BPF 
           (effectively the same as leaving OUT empty), the input BPF was incomplete and contained only the MAU 
           tier -> fixed 
29.02.16 : maus 2.122 : added virtual symbol /{:u/ to SAMPA language set and gsw-CH language set
02.03.16 : maus 2.123 : added virtual symbols /A:/ /Ai/ and clone /6.deu/ to gsw-CH language set
03.03.16 : maus 2.124 : added LANGUAGE option values gsw-CH-BE ... gsw-CH-ZH, all pointing to PARAM.SWISSGERMAN
03.03.16 : maus 2.125 : bug fix: option PRINTINV did not work for LANGUAGE=gsw-CH* 
25.04.16 : maus 2.126 : extended par2TextGrid helper for handling syllabic tiers (such as MAS) created by 
           webservice Pho2Syl
28.04.16 : maus 2.127 : introduced new OUTFORMAT=emuR producing a Emu compatible *_annot.json file
29.04.16 : maus 2.128 : bug fix in MODUS=bigram: leading/trailing segments '!ENTER'/'!EXIT' are now 
           correctly labelled as '<p:>'.
27.07.16 : maus 2.129 : changed deprecated options sox -s -2 into sox -e signed-integer -b 16 to avoid 
           sox warnings 
03.08.16 : maus 2.130 : due to a very nasty bug in the UNIX job control the command 'cut' cannot be used
           reliable in parallel called scripts (as we do it on the webMAUS server). To avoid these problems
           all usage of 'cut' is replaced by 'awk' in the maus script and all helper scripts
04.08.16 : maus 2.131 : follow up to 2.130 : set interprter from /bin/csh to /bin/tcsh because we found that 
           in Ubuntu the bug does not appear in the tcsh, only in the csh (?)
31.08.16 : maus 2.132 : helper mausbpfDB2emuRDB creates emuDB in directory named <emuTemplateName>_emuDB 
           instead of <emuTemplateName>; the ZIP file name remains the same <emuTemplateName>.zip
01.09.16 : maus 2.133 : pol-PL replaced cloned HMM set by (partially) trained HMM to CLARIN-PL-STUDIO corpus
           (thanks to Danijel Korzinek); added pol-PL MINNI support. 
           added eng-UK MINNI support.
19.09.16 : maus 2.134 : added SAMPA symbol /pS_j/ as clone of /tS_j/ to Russian SAMPA set,
           added MINNI service for language rus_RU
22.09.16 : maus 2.135 : bug fix : when running with USETRN=force (pre-segmentation enforced) and with input 
           signals that in fact have energy to the very last sample, the script issued a misleading warning
           from the sox trim operation that had no effect on the (valid) output -> misleading warning removed.
10.10.16 : maus 2.136 : added option RELAXMINDUR=false; when set, maus relaxes the minimum duration per segment to 
           10msec for short/lax vowels and consonants, and to 20msec for other vowels and diphthongs; note that 
           this modus is operational and often leads to impossible short vowel and glottal segments; however 
           for investigations that target a certain consonant class only, setting this option might prohibit
           the ceiling effect in the measure duration distribution at 30sec.
           added option BPFTHRESHOLD=10000; if a BPF input file contains more KAN: lines than this threshold, 
           the script exits with an error message, because it is unlikely that the script will return a reasonable
           result in a manageable time (caused by the quadratic increase of processing time with length). 
           added option GETBPFTHRESHOLD=FALSE; if set, the script will return a single number BPFTHRESHOLD to stdout.
25.10.16 : maus 2.137 : set BPFTHRESHOLD=3000 after consultations with Nina Pörner.
27.10.16 : maus 2.138 : added BPF=file.csv input; file.csv is a two-column, ';'-separated spreadsheet CSV table
           with UTF-8 orthography in the 1st and pronunciation encoding in the 2nd column; other extensions than 
           par|PAR|csv|CSV are not accepted any more.
07.11.16 : maus 2.139 : the pre-validation on BPFTHRESHOLD (see 2.136) prevented large BFB input files with chunk 
           segmentation to be processed (since the *total* number of words in KAN was validated).  We changed this,
           so that each chunk is pre-validated invividually: when USETRN=true and at least one chunk in the BPF 
           input file has more than BPFTHRESHOLD words, an ERROR is thrown. 
10.11.16 : maus 2.140 : helper mausbpfDB2emuRDB extended to accept *_annot.json instead of *.par as input;
           this allows to build an emuDB based on already created *_annot.json files.
22.11.16 : maus 2.141 : changed PARAM dirs naming and structure:  a language specific parameter dir is now 
           named 'PARAM.<rfc5646>' (e.g. PARAM.eng-AU' or 'PARAM.iso639-3' (e.g. 'PARAM.eng'); the latter are 
           usually just copies of a rfc5646 directory (e.g. 'eng' is a copy of 'eng-GB'). LANGUAGE codes
           'aus' and 'nze' are not supported any longer; 'sampa' is used for the language independent mode. 
01.12.16 : maus 2.142 : language pol-PL: re-build HMMs and statistical model (LAT) for MODUS=bigram processing, 
           because training corpus has been improved.
           trained rule set from CLARIN-PL Studio corpus: since the corpus was transcribed half-automatically,
           it is not quite clear whether the learned rules really model processes in Polish or rather
           systematic differences between the phonological form produced by G2P and the way the corpus 
           has been segmented; however the rule sets look quite reasonable. Setting the default rule set to 
           POLISH.smooth.prune20.rul (531 derived from 136 basic rules with minimum occurance of 20); other 
           available rules sets are: 
           POLISH.smooth.prune5.rul
           POLISH.nosmooth.prune5.rul
           POLISH.nosmooth.prune20.rul
           POLISH.smooth.prune50.rul
           POLISH.nosmooth.prune50.rul
           chunker 0.1 : new service 'Chunker' added to the MAUS package (thanks to Nina Poerner); 
           the tool is called by the command 'chunker'; the software, benchmarks and 
           data reside in the subdir 'Chunker'.
           Added alias 'emuDB' for OUTFORMAT=emuR.
13.12.16 : maus 2.143 : bug fix: maus reported an error when a word was modelled by a single '<p:>'
           in the BPF input, KAN tier. In fact since version 2.90 all languages model '<p:>' not as a
           T-model any more, so a single '<p:>' is allowed. What is not allowed is a word modelled by a 
           single '#' or '&' model, since these are skipable (optional) T-models of silence which can only 
           be used within words.
14.12.16 : chunker 0.2 : bug fixes: the word-based recognition did not use the trained bigram
           but rather a uni-gram model which led to very poor chunk segmentations e.g. in French,
           the signal was not re-sampled to 16kHz before ASR (leading to slightly decreased ASR rates),
           the energy feature of the HTK ASR frontend was not normalized (leading to very bad 
           ASR rates on weak signals); 
           added new method based on a factor automaton ('force', experimental).
15.12.16 : maus 2.144 : added LANGUAGE Maltese support; only forced alignment using cloned HMMs; SAM-PA 
           set defined by Ruben van de Vijver.                   
16.12.16 : maus 2.145 : experimental feature (LANGUAGE=deu-DE only!): unknown tags '<...>' are modeled
           by non-optional silence; this allows to pass arbitray tags to the ORT/KAN tier output.
           chunker 0.3 : bug fix: input and output file can be the same.
           chunker 0.4 : bug fix: error codes were not passed through (always 0)
22.12.16 : maus.trn 1.17 : bug fix: could not process into OUTFORMAT=emuR|emuDB (*_annot.json).
           removed BPFTHRESHOLD=9999999999 in maus.trn internal MAUS calls.
03.01.17 : maus 3.0 : major upgrade
           * moved this file and other documentation into sub-dir 'DOCU'.
           * bug fix fin-FI : the SAMPA symbol /d:/ was defined wrong as IPA /b:/; HMM set (link auf 
             SUPERHMM.mmf) missed HMMs that were used in the DICT mapping (D.use).
           * added language Catalan cat-ES
           * new design rules regarding HMM sets and phonemic/phonetic symbols for all languages
             1. KANINVENTAR (allowed SAM-PA symbols in KAN input)
                *must* contain the symbols '<p:>' '<nib>' '<usb>' '<' '>'; if a language
                requires SAMPA /P/ (labiodental approximant) use the alternate symbol /v\/.
             2. GRAPHINVENTAR (symbol set for internal processing)
                *must* contain all symbols of KANINVENTAR and the symbol '#'; 
                symbols with leading numerals *must* be masked with 'P' (e.g. /P6/, /P2:I/); 
                symbols with trailing '\' *must* be replaced by symbols with trailing '-' 
                (e.g. /r\/ -> /r-/); GRAPHINVENTAR *must not* contain the symbol 'P'.
             3. DICT (mapping from symbols to HMM) *must* contain the mappings 
                # #
                <p:> #
                < <
                > > or > < (in case we have only one non-optional silence HMM, see point 4)
                All symbols in 1st column *must* match GRAPHINVENTAR; all symbols in 2nd column 
                *must* match HMMINVENTAR (and therefore MMF.mmf). HMM names (2nd column) can be chosen
                arbitrarily (e.g. a:.deu-AT)
             4. MMF.mmf (and HMMINVENTAR) *must* contain
                - an optional silence HMM (t-model) named '#'
                - two non-optional silence HMMs named '<' and '>', or
                  one non-optional silence HMM named '<'
             The helper script kan2mlf.awk and rec2mau.awk are responsible for the mapping from 
             KANINVENTAR (input) to GRAPHINVENTAR, and for the mapping from HMMINVENTAR to 
             phonetic output. The helper script check_param_sets can be used to check all
             language sets for accordance to these rules.
           * add the possibility to use backslash in input symbols (e.g. /h\/).
             Up to now MAUS did not accept backslash and languages that require the usage of X-SAMPA 
             symbols (e.g. /J\/) were only accepted as /J-/, requiring for instance G2P to map these 
             symbols for MAUS input.  THis caused changes to individual languages:
              eng-US now accepts /h\/ (X-SAMPA) instead of /h-/ (MAUS internal symbol) (both!)
              hun-HU now accepts /J\/ (X-SAMPA) instead of /J-/ (MAUS internal symbol) (both!)
              eng-NZ now accepts /r/ and /r\/ as input (modelled by the same acoustic NZE model, though)
              eng-AU now accepts /r/, /R/ and /r\/ as input (modelled by the same acoustic AE model, though)
              in analogy all X-SAMPA symbols in the language-independent set with trailing backslash 
              are now recognized by MAUS (and the old form with '-'!).
           * added chunker benchmark deu-DE short to maus benchmark CHECK/maus.checklist.
           * added new option INSYMBOL=sampa|ipa that allows IPA symbols instead of SAMPA/X-SAMPA
             in input files.
           * deprecated option INFORMAT.
           * BPFTHRESHOLD is now compared to number of KAN lines and number of word links in single TRN
             line, if USETRN=true; this allows correct pre-validation of chunks in maus.trn, and the 
             BPFTHRESHOLD=9999999999 in maus.trn internal MAUS calls can be removed.
           * re-worked and harmonized (across languages) the modelling of silence and noise:
             Automatic modelling:
             Maus will automatically insert optional silence models (HMM '#') between words
             (see option MINPAUSLEN) and output these as 'detached' silence 
             segments '<p:>' (with word number -1) if they exceed MINPAUSLEN times 10msec.
             The same is true for utterance initial/final silence, but these are modelled 
             non-optional (HMMs '<' and '>'), and therefore have a minimun length; to 
             suppress this use NOINITIALFINALSILENCE=true.
             
             Manual modelling:
             Intra-word silence intervals can be modelled by inserting the symbols
             '<p:>' (optional silence) or '<' (enforced silence) 
             in the canonical input string ('#' in the phonological input will be ignored
             because in some phonological forms it marks a compound boundary! This is not the 
             case for option KANSTR, though!); e.g. /ba:n<p:>hof/ will model an optional
             silence interval between /n/ and /h/; in the MAUS output these models appear
             as '<p:>' segments (or do not appear at all). Intra-word silence intervals are always linked
             to the word number in which they appear.
             If an optional '<p:>' is the only symbol within a word, it will be modelled 
             by an non-optional silence model (HMM '<') because HTK cannot model words
             that consist only of a t-model; it will appear as a single segment '<p:>' linked
             to that 'silence word'. 
             It is allowed to model a 'silence word' as /</ or /<...>/ (where '...' is an arbitrary 
             string without blanks, but not one of 'usb' or 'nib') in the KAN
             input tier; both will model a non-optional silence model and both will produce 
             a '<p:>' in the phonetic output that has a word link, and the 'word' appears 
             as a numbered word in the ORT/KAN tiers (see TAGS PASSING below).
             
             To summarize: 
             ('#' symbolize word boundaries here,
              '<' '>' utterance begin/end)
             
             KAN input       MODEL            ORT/KAN OUTPUT  MAU OUTPUT
             #<nib>#         non-human noise  '<nib>'         segment /<nib>/ with word number
             #<usb>#         human noise      '<usb>'         segment /<usb>/ with word number
             #<...>#         silence word     '<...>'         segment /<p:>/ with word number
             #...<nib>...#   non-human noise  '...<nib>...'   segment /...<nib>.../ with word number
             #...<usb>...#   human noise      '...<usb>...'   segment /...<usb>.../ with word number
             #...<...#       non-optional sil '...<...'       segment /...<p:>.../ with word number
             #...<p:>...#    optional sil     '...<p:>...'    segment /...<p:>.../ with word number or deleted
             #               (word boundary)  -               segment /<p:>/ with word number -1 or deleted
             <               (initial sil)    -               segment /<p:>/ with word number -1
             >               (initial sil)    -               segment /<p:>/ with word number -1
             (the last three lines are not possible inputs, but are modelled automatically!)
           * added tags passing feature
           Unknown tags '<...>' given as words (not embedded in other symbols!) in the input 
           KAN tier are modeled by non-optional silence; this allows to pass arbitray tags to 
           the ORT/KAN tier output of MAUS, e.g. a speaker ID etc.
           To pass such tags through G2P from the orthographic input use the g2p.pl option -com yes.
03.01.17 : maus 3.1 : suppressed warnings from helper programs, unless debug level is v > 0  
04.01.17 : maus 3.2 : added MINNI service for cat-ES
05.01.17 : maus 3.3 : replace provisional parameter set for spa-ES by new set trained 
           on GLISSANDO News corpus (thanks to Juan Maria Garrido for providing GLIASSANDO 
           and Bernhard Jackl for the MAUS training).
           added MINNI service for spa-ES
05.01.17 : maus 3.4 : fixed KANINVENTAR.inv for languages spa and cat,
           added some clones to the spa-ES and cat-ES HMM set that are more suitable 
           chunker 0.5 : 
           - Fixed a bug that led to a seg fault in cases where the KAN
             tier's first entry is a tag or pause.
           - Moved this History section to a separate file HISTORY.
           - Help page now lists all available languages.
           - Set MAXNUMTHREADS option default to 1 in master.config.
13.01.17 : maus 3.5 : [internal: setting SOURCE variable automatically;
           this requires the scripts to reside in the installation dir; 
           symbolic links to these scripts work fine; changed the pre-installation
           script mk_distribution and the makefile in 
           dist:/share/local/sources/ips_utils/makefile] 
19.01.17 : maus 3.6, maus.trn 1.18 : if option USETRN was set to true and the input
           BPF contained a TRN tier with more than 2 lines, the script did not remove 
           temporary files in TEMP (CLEAN=0 instead of default CLEAN=1).
           maus 3.7 : revised version of mlt-MT phonetic symbols set (now 70) 
           to work with revised version g2p.pl 1.54. Patch of wrong HMM entry in 
           PARAM.rus-RU/DICT.bigram that prohibited the usage of MINNI modus for 
           Russian.
           Added option value 'bpf' for OUTFORMAT as a synonym for 'par' to 
           be conform to BALLOON services (that use 'bpf').
26.01.17 : maus 3.8 : changed top level name in emuDB output (*_annot.json) 
           from 'utterance' to 'bundle' to be conform with EMU-SMDB nomenclatura.
27.01.17 : maus 3.9 : bug fix: if OUTFORMAT=emuDB and the input BPF contained
           blank separated phonetic symbols in the KAN tier ('bpfs' style), then 
           in the output *_annot.json file the label 'cano' contained only the 
           first phonetic symbol of the KAN tier -> fixed. 
03.02.17 : maus 3.10 : bug fix in LANGUAGE=eng-AU and MODUS=bigram : MINNI service 
           did not work, caused by a buggy setting in DICT.bigram -> fixed
07.02.17 : maus 3.11 : there were complaints that the last segment delivered by maus
           ends not exactly at the end of the signal file; although this is not 
           a requirement for all annotation formats that maus produces, we 
           implemented a fix, so that the last segment delivered from MAUS ends always
           exactly with the signal.
16.02.17 : maus 3.11 : changed level/attribute names in emuDB output: 'word' -> 'ORT', 
           'cano' -> 'KAN', 'phonetic' -> 'MAU'; the idea is that names that consist
           of three capital letters are syntactically defined in the BPF standard, see:
           http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html#Partitur
24.02.17 : chunker 0.10 (see Chunker/HISTORY)
           maus 3.12 : added missing SAMPA symbol /N/ for hun-HU
28.02.17 : maus 3.13 : patched missing input symbol /4/ for spa-ES
14.03.17 : maus 3.14 : bug fix in LANGUAGE=mlt-MT the phonemes /r/ /tts/ and /hh/ did 
           not work in input.
10.04.17 : maus.pipe 1.0 : wrapper for 'pipeline' services, WARNING: Alpha.
13.04.17 : maus.trn 1.19 : check and removal of temporary files improved; added missing OUTFORMAT=bpf
13.04.17 : maus.pipe 1.2 : restructured tool; added pipe 'G2P_CHUNKER_MAUS'
17.04.17 : maus.pipe 1.3 : enforce CHUNKER parameter insymbols='sampa' for pipe 'G2P_CHUNKER_MAUS'; added PHO2SYL pipe service
           maus.pipe 1.4 : extended to all possible pipes; re-worked CMDI option descriptions; improved error reporting about missing options
18.04.17 : maus 3.15 : bug fix: script attempted to remove existing temporary files in $TEMP, 
           if they existed; this failed if the temp file is owned by a different user, even if the 
           temp file has full rights (666); now the script does not remove the temp file but simply tries to 
           overwrite it (which works always for 666 rights).
18.04.17 : maus.pipe 1.5 : several bug fixes; changed temporary file handling
19.04.17 : maus.pipe 1.6 : removed pipe CHUNKPREP_G2P; default test for rate on rate=1
20.04.17 : maus.pipe 1.7 : added pre-check of mandatory parameter OUTFORMAT; the user now gets an error
           message *before* the pipe starts, ifthe last service does not support OUTFORMAT; that way
           the user does not have to wait to the end of the pipe to get an error.
21.04.17 : maus 3.16 : added language Basque eus eus-ES eus-FR (all the same model): clone model 
21.04.17 : maus.pipe 1.8 : bug fix: if PHO2SYL is asked to produce a TextGrid (which it can do), 
           the maus.pipe script issues an error -> fixed.
26.04.17 : maus.pipe 1.9 : fixed several small bugs concerning error reporting; added pre-check for 
           compatibility of TEXT input file extension to first service in pipeline; synchronized options
           tier_G2P (tgitem) and tier_CHUNKPREP (tier) into a common option InputTierName;
27.04.17 : maus.pipe 1.10 : deprecated parameter 'rate'; sample rate is now determined from input SIGNAL;
           since we only allow pipelines that require a SIGNAL, this is a convenient way to get rid 
           of a parameter most users don't understand anyway.
01.05.17 : maus 3.17 : extended helper mausbpfDB2emuDB so that either *.wav|nis|nist|sph can be input instead 
           of only *.wav; this can be relevant in webservices that batch process a mixture of *.wav and 
           NIST/SPHERE signal files.
08.04.17 : maus 3.18 : suppress warning that occurs if the file given in parameter OUT coincidently is the 
           same file as MAUS uses to create a *_annot.json file internally, and then move it to OUT; 
           extended BPF-to-emuR conversion (scripts mausbpfDB2emuDB, mausbpf2emuR) to handle optional 
           syllable tiers KAS/MAS; this is not necessary for conversion of maus output, but maus.pipe
           utilizes this conversion for pipeline final PHO2SYL service and web API uses this to build 
           the meuDB ZIP after batch processing; option emuRtemplate of script 
           mausbpfDB2emuDB replaced by option emuRDBname=<baseNameEmuDB> (the old option is reconized
           for backward comaptibility, though); Config template is now in-document; removed template 
           from package.
11.05.17 : maus 3.19 : changed the way MAUS parses the phonological input (KAN: tier): up to now we
           assumend that the KAN tier is encoded in SAMPA which implies left-right parsable phoneme
           sets. We also allowed blank separated KAN strings where each phoneme symbol is already 
           separated by a blank to allow language indenpendent SAMPA; technically we deleted all blanks
           from the input and then parsed left-right. With the advent of more and more
           languages that do not have a SAMPA definition (e.g. eng-SC) and on the same time do not allow
           left-right parsing, we decided to make this somewhat sloppy convention more strict: if the 
           KAN tier contains a 4th column (the glutinated KAN string is ususlly in the third), we assume
           that the phonological input is blank separated, and we do not try to left right parse this,
           but simply check, whether each blank separated symbol is part of the language phoneme set.
           Consequently, some languages in MAUS can only be processed with blank separated input (but this 
           is not a major issue, since G2P outputs blank separated KAN tiers as default anyway). Faulty 
           encodings where the KAN input is basically not blank separated but for some reason there are 
           two 'words' in the KAN tier line (such as Italian 'sedia' /sedj a/ are not supported any longer.
11.05.17 : maus 3.20 : added a primitiv 'used options protocol' for 
           OUTFORMAT=par|mau-append|legacyEmu|emu|EMU|emuR|emuDB : in case of some form of BPF output,
           a header entry 'MAO:' is added to the BPF header containing a list of named value pairs 
           'OPTION=value'; in case of a legacy Emu file, the options are added as labels to the top
           level bundle; in case of a emuDB _annot.json file a list of named value pairs 'OPTION=value'
           is stored in a level attribute 'MAO' to level 'bundle'.
12.05.17 : maus 3.21 : bug fix the deu-DE phoneme set was lacking the 4 affricates /ts,tS,dZ,pf/ (which are 
           in the German SAMPA definition by Wells); the reason this never caused a problem is that the sloppy
           way MAUS parsed the KAN input string (before 3.19) simply split those into single phonemes. 
           Now we added those 4 affricates to the deu-DE HMM set.
18.05.17 : maus 3.22 : added some filters to remove LF from the input BPF file to avoid unexpected effects 
           when these are transferred to output formats other than BPF.
31.05.17 : maus 3.23 : bug fix in mausbpf2emuDB : blanks were deleted in label strings before transfering to 
           _annot.json, now white space sequences are condensed to a single blank in the label string; 
           backslashes were not masked in _annot.json files, now there are masked as required.
           Some languages had incomplete KANINVENTAR.inv lists (phones with trailing backslashes were 
           missing, this did not affect MAUS processing).
           Extended helper mausbpf2emuR so that flat hierarchy (only phonetic tier as from MINNI output) is 
           processed correctly; extended mausbpfDB2emuDB to handle flat hierarchies and added schema check
           for DBConfig.
           Added maus OUTFORMAT=bpf|par|emuDB|emuR for MODUS=bigram (minimal header with SAM entry), so that MINNI 
           can produce emuDB as other services.
           Languages mlt, eus, spa had some minor problems with optional silence modelling.
01.06.17 : maus.pipe 1.13 : added pipeline MINNI_PHO2SYL
02.06.17 : maus 3.24 : MODUS=bigram : until this version an input BPF was ignored. Now if a BPF is given, the 
           MINNI result is added to the input BPF (possibly replacing an existing MAU tier) as in the other modi.
           This works only for OUTFORMAT=par|bpf|mau-append.
           Added option createDBConfigOnly=<path+name> to helper mausbpfDB2emuDB; if set, the script only writes 
           the DBConfig.json file to <path+name> and exits.
07.06.17 : maus.pipe 1.14 : G2P option 'com' was not implemented
09.06.17 : maus 3.25 : changed validators for _annot.json and _DBconfig.json in helpers mausbpf2emuR and
           mausbpfDB2emuDB to more stable servers and set option validate=true as default. 
           bug: temporary dirs $TEMP/$PID__BPFDIR were not cleaned up -> fixed
           maus.pipe 1.15 : PIPE=MINNI_PHO2SYL : the MAS tier was not converted into the _annot.json file
           correctly; currently maus will pass on existing tiers in the input BPF and just add/replace the 
           MAU tier; therefore in a PIPE (which requires a TEXT input) where the BPF input alraedy contains 
           ORT/KAN etc. tiers, the result will be a BPF that has partly hierarchical tiers and partly not
           (MAS from MINNI a,d consequently a MAS without links); this poses a problem when converting to 
           emuDB output; in this version MAU and MAS withozrt links will be converted but any other tiers in 
           input BPF will not.
22.06.17 : maus.pipe 1.16 : MAUS option INSYMBOL was missing; this caused the PIPE=MAUS_PHO2SYL to report unknown 
           phoneme symbols in input when INSYMBOL=ipa
22.06.17 : maus 3.26 : changed OUTFORMAT=csv : instead of simply producing the MAU tier in a file with extension 
           '*.csv', now a real CSV spreadsheet file with ';' seperated 6 columns is created: 
           0RT;KAN;MAU;TOKEN;BEGIN;DURATION
           Note that the structure is fiexed, even if a column is empty (e.g. in MODUS=bigram where no info 
           regarding ORT or KAN are in the output); segments with token number -1 (not linked) have empty fields
           ORT and KAN.
23.06.17 : callHavenOnDemandASR 1.1 : wrapper to perform automatic transcription via HPE ASR
           callHavenOnDemandASR 1.2 : APIKey can be given as empty string on command line, then the internal 
           default APIKey is used; introduced check of signal length less than 30min; bug in file size check:
           1MB > 1GB.
27.06.17 : maus 3.27 : added acoustic models for nld-NL and nld-BE based on phonetic transcripts from the CGN corpus (NL + VL).
           callHavenOnDemandASR 1.3 : longer files did not receive results due to bug in polling routine: fixed
           Note: each poll costs API units; therefore I build in some 30sec delays to avoid larger costs.
28.06.17 : maus 3.28 : added basic language support for Romanian ron-RO 
28.06.17 : maus 3.29 : fixed missing three lines in PARAM.ron-RO/KANINVENTAR.inv
30.06.17 : maus.trn 1.21 : changed code (same functionality) to tolerate input files with hashes ('#') in the file name
04.07.17 : maus 3.30 : added full MAUS and MINNI service for nld-NL based on CGN corpus
           added acoustic modelling and MINNI service for nld-BE based on CGN corpus
05.07.17 : maus.pipe 1.17 : added mapping file maus.pipe.G2P.mapping that allows pipes with mixed language settings:
           If MAUS supports a language, but BALLOON does not, the alternate language for BALLOON tools is read from this 
           mapping file
           maus 3.31 : added full support for nld-BE: pronunciation model is trained with reference to nld-NL, so that 
           systematic deviations from nld-NL to nld-BE are covered by MAUS; call G2P with lng = nld-NL for nld-BE 
14.07.17 : maus 3.32 : added basic support (forced alignment) for Australian Aboriginal Languages (aus-AU) 
           Unsolved problem: the input symbol 'r\`' cannot be processed; therefore input BPF containing this symbol fail.
16.07.17 : maus 3.33 : changed utterance initial/final silence modelling: models are now optional silence 
           models (HMM #...); added explicite silence model <p> to replace the awkward usage of '<' as non-optional
           silence model; retain option NOINITIALFINALSILENCE=true to suppress even the optional models.
18.07.17 : maus 3.34 : fixed bug in rec2mau.awk that caused zero length initial/final silence intervals *not* to be
           suppressed (they had minimum lenth of 1 frame); now zero length silence intervals are suppressed. 
19.07.17 : runASR 1.1 : added OUTFORMAT=emuDB
20.07.17 : maus 3.35 : added LANGUAGE=eng-SC; re-calculated cloned HMM for por-PT, set schwa-elision rule set as default
           for por-PT
21.07.17 : maus.pipe 1.18 : changed handling of 'mixed language pipes': up to now only G2P changed its language option 
           depending on the mapping in maus.pipe.G2P.mapping. But since the same problem can happen in the other direction,
           i.e. the pipe is called with gsw-CH-BE but MAUS kows only gsw-CH, we added the mapping maus.pipe.MAUS.mapping.
26.07.17 : maus 3.36 : extended database Pan_AUS for aus-AU acoustic training; enabled MODUS=bigram (MINNI) for aus-AU
           the default MAUSSHIFT=10 value was set to 0; the value of 10msec shift (which was used for most languages) 
           contradicts our newest findings (see BA thesis of B. Jackl 2017, LMU Munich) that the systematic shift of MAUS
           segment boundaries is caused by a bias in the training material of the acoustic model of MAUS; the value 10msec
           is therefore only valid for the German MAUS set, but not for most other languages (which were trained on other 
           language data); therefore starting with this version only the German (10), Catalonian (-4) and Spanish (-4) have 
           specific MAUSSHIFT values, all other languages use the default of 0.
           Helper mausbpfDB2emuDB extended to handle *_annot.json and *.par files that contain only an ORT tier as delivered
           by runASR. 
28.07.17 : maus 3.37 : re-trainig of acoustic model of aus-AU on PanAUS 0.5.1
03.08.17 : maus 3.38 : bug-fix: in the (probably rare) case that maus is called with USETRN=force and the pre-segmentation
           estimates a TRNOFFSET=0 a initial silence segment with negative duration -1 was created. 
16.08.17 : maus 3.39 : bug fix in mausbpf2emuR : MAS tiers in recordings with only one word were not processed
           bug fix in maus : some temporary files were created without chmod 666 and therefore not removable
18.08.17 : maus.pipe 1.20 : added G2P option imap; if set, lng=und is set automatically without changing LANGUAGE
28.08.17 : maus 3.40 : enabled the disabled WARNING, if INS{ORT|KAN}TEXTGRID=true but has no effect
31.08.17 : maus.pipe 1.21 : check input TEXT file if empty before starting the PIPE to avoid confusing ERROR messages
07.09.17 : maus.pipe 1.22 : bug in LANGUAGE mapping caused LANGUAGE=gsw-CH to fail -> fixed (and patched 1.21)
           maus 3.41 : disabled WARNING 'options INS***TEXTGRID have no effect'
09.09.17 : maus 3.42 : added basic aligment service for language nor-NO based on corpus 'NB Tale'
                       (thanks to Johanna Cronenberg)
           maus 3.43 : added MINNI service to nor-NO
12.09.17 : maus 3.44 : added basic service for jpn-JP based on CSJ corpus
13.09.17 : maus 3.45 : added MINNI service for jpn-JP based on CSJ corpus with merged sub-phonemic plosives
                       (i.e. MINNI does not recognize *_cl and *_rl)
24.09.17 : maus 3.46 : Japanese phonemes with embedded backslash '\' were not 
           handle correctly; the conversion from X-SAMPA (KANINVEBTAR) to internal
           symbol set (GRAPHINVENTAR) now replaces all backslashes by '-',
           not only trailing backslashes, e.g. 'N\N\' becomes internally 'N-N-'
28.09.17 : maus 4.0 : major update due to several internal re-codings; new features are:
           - introduce video processing: unknown extensions are treated as video input, audiotrack
             is extracted from video (using ffmpeg) and processed as input, if possible, the 
             original sampling rate of the audiotrack is being used, otherwise output is based on 
             16000Hz sampling rate.
           - maus reports all ERRORS and WARNINGS now to stderr instead of stdout
           - internal: conversion to NIST deprecated; all internal processing now based on RIFF WAVE
           maus.pipe 2.0 : major update due to several internal re-codings; reports all ERRORS 
           and WARNINGS now to stderr instead of stdout
           maus.trn 1.22 : reports all ERRORS and WARNINGS now to stderr instead of stdout
           par2Textgrid 1.3 : reports all ERRORS and WARNINGS now to stderr instead of stdout
09.10.17 : maus 4.1 : some bug fixes caused by internal re-coding, added WARNING when signals with 
           less than 16kHz are processed.
11.10.17 : maus.pipe 2.2 : removed buggy video conversion: all services now process video on their own
17.10.17 : maus.pipe 2.3 : added ASR option 'diarization'
19.10.17 : maus.pipe 2.4 : added quota pre-check for pipes with MAUS
26.10.17 : maus 4.2 : added symbols /O:, I:, 6:/ to language aus-AU 
01.11.17 : maus 4.3 : re-coded mausbpfDB2emuDB, mausbpf2emuR plus helpers: code is now generic, so 
           that all combinations of BPF tiers are transformed
03.11.17 : maus 4.4 : bug in mausbpfDB2emuDB: *_annot files were not analysed correctly, fixed;
           extended OUTFORMAT=csv by a 7th column carrying the speaker diarization (if in input BPF, 
           otherwise column SPEAKER is empty).
06.11.17 : maus.pipe 2.5 : bug in emuR output of service PHO2SYL, fixed; changed default G2P option to '-com yes' 
06.11.17 : maus 4.5 : bug in helper mausbpfemuR : input BPF without class 4 but class1mult BPF tiers caused a syntax error in output, fixed
09.11.17 : maus.pipe 2.6 : changed module PHO2SYL: depending on BPF input from the pipe a syllabification
           of KAN (-> KAS) or a syllabification of MAU|SAP|PHO (in that order, first found is used) or both are performed; 
           the (senseless) option 'phontier_PHO2SYL' is now obsoleten maus.pipe, but still accepted by the script
           maus.pipe 2.7 : bug in OUTFORMAT=TextGrid and PIPE=*_PHO2SYL fixed
10.11.17 : maus.pipe 2.8 : changed temporary file storage to unique file names
17.11.17 : maus 4.6 : removed misleading Swiss German variants gsw-CH-* from LANGUAGE set;
           simplified LANGUAGE to PARAM dir mapping (less maintenance required, only the PARAM 
           dirs define what language are supported (as with chunker btw); adapted maus.pipe for
           MAUS and CHUNKER processing accordingly.
21.11.17 : maus.pipe 2.9 : added checks for file type and existence of TEXT, RULESET and imap before 
           starting the pipe; TEXT input is ignored for pipes that do not require TEXT input and a WARNING 
           is issued.
22.11.17 : maus 4.7 : added pronuciation model nor-NO based on NB Tale corpus
27.11.17 : maus.pipe 2.10 : bug when called without TEXT argument: wrong ERROR message, fixed.
07.12.17 : maus 4.8 : new improved version of language spa-ES : the GLISSANDO corpus re-labelled,
           acoustic and pronunciation models re-trained.
08.12.17 : maus.pipe 2.11 : added pipes CHUNKER_MAUS and CHUNKER_MAUS_PHO2SYL
01.02.18 : maus 4.9 : bug in LANGUAGE=spa-ES : the acoustic model used a non-optional silence
           model for optional inter-word silence modellig instead of an optional silence model; 
           this caused very bad segmentation results for spa-ES; this error is probably relevant 
           only for maus version 4.8 (7. Dec 2017 - 30. Jan 2018)
02.02.18 : maus 4.10 : added cross check for RULESET extension (rul|nrul) vs. types (statistical|phonological)
           maus 4.11 : added option PRESEG to replace deprecated USETRN=force; that way pre-segmentation can 
           be applied to chunks (USETRN=true PRESEG=true); until next rollout USETRN=force still works;
           for a short period (2.2.-16.2.18) there was a bug in this version that in very rare cases caused 
           the service WebMAUS Basic to crash; this was patched without new version on the 16.2.18, 09:30) 
03.02.18 : maus.pipe 2.12 : added MAUS option PRESEG (default is false)
26.02.18 : maus.pipe 2.13 : added correct handling of PHO2SYL -lng 'und' option when pipe has either 
           LANGUAGE=sampa or the G2P service uses an imap 
30.09.18 : maus 4.13, maus.trn 2.1 : maus now correctly distinguishes between a proper single TRN entry
           with word number list and a in-proper TRN as output by wav2trn.
           There is a problem with the eng-US rule sets trained on TIMIT: it seems that the 
           rule sets with pruning = 5 contain so called 'replacement rules' with a ln() = -0.000001 that 
           effectively always apply to the left-hand context of the rule. word_var-2.0 sometimes crashes,
           when such a context appears in the input. Since I could not figure out what the proble is 
           (the rules look perfectly normal), I replaced the ln() = -0.000001 by a lower probability, and then 
           the error vanished. A lower prob. than 1.0 for a rule makes sense anyway, since the acoustics 
           should in the end decide whether the replacement is applicable.
           Changed both rule sets with pruning = 5 accordingly; copies of the old versions are retained
           in files *.20181002
03.10.18 : maus 4.14, maus.trn 2.2, mausbpfDB2emuRDB, par2emu : made file names of temporary files unique; there
           have been problems with temporary files that were left by debugging on the server; this fix should 
           solve this problem in the future.
10.10.18 : maus 4.15 : fixed INSYMBOL=ipa : when the input IPA contained symbols that are not actually IPA (e.g.
           'I' instead of 'ɪ', the script simply ignored these symbols so that the output missed a phoneme. From 
           this version on maus issues an error as soon as any symbol appears in the canonical input that is not 
           defined by the mapping tables IPATABLE1 and IPATABLE2.
07.11.18 : maus 4.17 : added LANGUAGE=tha-TH; basic forced alignment
           added 'c_h' 'ts\' 'ts\_h' to PARAM.SAMPA/PLOSIVES
09.11.18 : maus 4.18 : added WARNING for the case that a phonetic symbol in the output cannot be mapped to
           ipa, manner or place (option OUTSYMBOL) because missing information in IPATABLES.
           added MINNI for tha-TH based on phonemic transcripts in LOTUS; changed handling of phonological input:
11.11.18 : maus 4.19 : trailing tone markers ('..._1 - ..._5; e.g. in Thai) are deleted from input, since MAUS does not 
           differenciate between tones.
           re-worked OUTSYMBOL=place tables, KANINVENTAR.inv tables, aggregated new HMM to SUPERHMMs
13.11.18 : maus 4.20 : added language deu-LU by extending and cloning deu-DE inventar and HMM set; the extension
           was based on the phoneme set of Peter Gilles, but all deu-DE symbols are still maintained. 
14.11.18 : maus 4.21 : changed language code deu-LU to ltz-LU; updated PARAM.SAMPA for new symbols in tlz-LU
15.11.18 : maus 4.22 : set a default phonological rule set for tha-TH that alows replacement of canonical
           /r/ by /l/ in any context, and replaced the Thai /r/ HMM by the Italian /r/ HMM
18.11.18 : maus.pipe 3.0 : added service SUBTITLE and changed structure : 
           partial pipes ..._MAUS[_SUBTITLE][_PHO2SYL] are seen as one 
           building block because SUBTITLE and PHO2SYL only appear in pipes that contain MAUS and always after MAUS; 
           this makes the code of maus.pipe much shorter and better maintainable
22.11.18 : maus 4.23 : added (default) phonological rule set ltz-LU_manualRules.nrul kindly provided by 
           Peter Gilles, University of Luxembourgh; ltz-LU is now using this rule set instead of the statistical 
           rule set of German; in case you want to use the German rule set use option RULESET=deu-DE_rml-0.95.rul.
23.11.18 : maus 4.24 : ltz-LU : added phoneme /d_0/ (devoiced alveolar plosive)
28.11.18 : maus 4.25 : ltz-LU : added/corrected rules in the phonological rule set ltz-LU_manualRules.nrul
08.12.18 : maus 5.0 : major update because output format CSV has changed (not backwards compatible!)
           OUTFORMAT=csv has been extended from a 6-column table to a 11-column table; CSV now contains 
           data from the BPF tiers ORT,KAN,TRO,KAS,SPK,MAS,MAU,TRN; conversion is now performed by external 
           helper mausbpf2csv which is part of the MAUS distribution (and can be used as a conversion tool
           on its own)
           maus.pipe 4.0 : major update because output format CSV has changed (not backwards compatible!);
           enabled emuDB|emuR|csv output for pipes ending on ..._SUBTITLE; enabled csv output for pipes
           ending on ..._PHO2SYL; now almost all output formats are possible for almost all pipes.
12.12.19 : maus.pipe 4.2 : added '-verb 0' to G2P service;
             bug fix: some pipes reported ERROR but returned exit 0 - fixed;
             bug fix: G2P reported a WARNING because it got an empty -imap option - fixed
13.12.18 : maus 5.1 : added LANGUAGE=swe-SE (cloned from Norwegian, no pronunciation model);
             bug fix: if the last label in the MAUS result started with a '{' and the last segment
             needed correction, maus terminated with an Shell ERROR -> fixed
16.12.18 : maus 5.2 : added Albanian LANGUAGE=sqi-AL: mainly cloned from Hungarian
27.12.18 : maus.pipe 4.3 : changed LANGUAGE mapping in modules (internal)
           callGoogleASR 2.3 : fixed the way quotas are printed 
           maus 5.3 : added missing optional silence '<p:>' to language spa-ES
02.01.19 : maus 5.4 : disabled WARNING that the signal is extracted from a video input because 
           this in combination with maus.trn produces very long WARNING output.
04.01.19 : maus 5.5 : fixed a bug in option INSYMBOL=ipa : in some rare cases a wrong IPA->SAMPA
             mapping was applied which caused an 'unknown symbol' ERROR;
             option INSYMBOL=ipa : KAN tier is passed to output as IPA (was tranformed into 
             SAMPA in earlier versions);
             input MP4 with more than one soundtrack caused ERROR: now the default soundtrack is selected,
             if multiple soundtracks, the script checks whether LANGUAGE matches the default soundtrack
             and gives a WARNING  when mismatch;
           maus.pipe 4.4 : input MP4 with more than one soundtrack caused ERROR; now the default soundtrack is selected
07.01.19 : maus 5.6 : the pre-processed SIGNAL is now passed onto maus.trn, not the original SIGNAL;
           this avoids that the maus calls in maus.trn repeat e.g. the extraction of a soundtrack
           from video input over and over again.   
           maus.pipe 4.5 : a video input is not passed through the pipe as video any more but rather
           as the  default soundtrack (extracted by ffmpeg); this avoids that services in the 
           pipe extract the soundtrack over an over again, and - even worse - might extract 
           different tracks.
14.01.19 : maus 5.7 : using installed HTK tools instead of copies in the distribution SOURCE dir
20.01.19 : maus.pipe 4.6 : bug fix in PHO2SYL language mapping
27.01.19 : maus 5.8 : added SAMPA /X\/ (uvular fricative) to SAMPA inventory
01.02.19 : maus.pipe 4.7 : changed CHUNKER call so that signals with capital extensions 
           (e.g. '.WAV') are accepted by chunker
19.02.19 : maus.pipe 5.0 : internal re-organisation of sources, functionality the same 
06.03.19 : maus.pipe 5.1 : added module ANONYMIZER
09.03.19 : maus.pipe 5.2 : changed module SUBTITLE so that in case no original transcript is
           given via the TEXT input to the pipe, the transcript is either recovered from 
           TRN (CHUNKPREP in PIPE) or from TRL|TR2|TRS (input to the PIPE is BPF) tier(s) or - 
           if everything fails - from the ORT tier. If there is 
           a module ANONYMIZER before SUBTITLE in the pipe, the original/recovered transcript
           is anonymized according to the list in ATERMS before passing it to SUBTITLE.
11.03.19 : par2Textgrid 1.5 : all existing BPF tiers (ORT,KAN,MAU|SAP|PHO|IPA,MAS,TRN) in input
           are converted by default; no WARNING if a tier is not present; added TRN tier
11.03.19 : maus 5.9 : OUTFORMAT=TextGrid : all BPF input tiers are passed on to par2TextGrid
13.03.19 : maus.pipe 5.3 : added video input support for AVI and FVL
           maus 5.10 : bug in helper par2TextGrid 1.5 caused errors in TextGrid TRN tier  
14.03.19 : maus.pipe 5.4 : fixed bug in SUBTITLE module when recovering original transscript from BPF input
26.03.19 : par2TextGrid 2.1 : complete re-write of par2TextGrid; this is a non-backwards compatible
           update!
           par2TextGrid is now a general usable tool to convert most types of BPF files into standard
           praat TextGrid. All BPF tiers that are currently supported are converted automatically; for
           backwards-compatibility the options INSORTTEXTGRID=false and INSKANTEXTGRID=false are still
           recognized and the tiers are suppressed accordingly. The main new feature is that BPF with 
           parallel class 4 time layers (e.g. MAU and SAP and WOR in the same file) are now possible: 
           the output TextGrid then contains blocks of intrinsically synchroneous blocks of layers that
           are all derived from one class 4 BPF tier, e.g. if the input BPF contains SAP and MAU and 
           ORT and KAN, the TextGrid will have the layers ORT-SAP, KAN-SAP, SAP, ORT-MAU, KAN-MAU, and MAU.
           maus.pipe 5.5 : extended OUTFORMAT support using par2TextGrid 2.1 in pipe that end on 
           CHUNKER; PHO2SYL and SUBTITLE. 
28.03.19 : maus.pipe 5.6 : removed original transcript recovery from TR* BPF tiers: only ORT is used!
           Some minor bug fixes in mausbpf2emuR and mausbpf2csv.
20.04.19 : maus.pipe 6.0 : major update introducing option 'Keep everything' (KEEP=true): 
           output a ZIP archive instead the normal output of the last service; this ZIP then contains 
           not only the output of the pipe, but also the input data (marked in the file name with 
           '_INPUT'), a <signal>_README.txt describing the input file names and pipeline options, as 
           well as intermediate results of the pipeline that would otherwise be lost, because they 
           cannot be passed through the rest of the pipe (e.g. an anonymized version of the input video 
           produced by ANONYMIZER; intermediate results are marked in the file name with 
           '_<producing-service>', e.g. if the input was 'Signal1.mp4' then an intermediate result 
           produced by the ANONYMIZER service would be named 'Signal1_ANONYMIZER.mp4').
25.04.19 : maus.pipe 6.1 : added option '--list-pipes' (the name says everything)
           maus 5.12 : removed (German) g2p fall-back in maus; improved video processing;  
           removed some out-of-date WARNINGS
27.04.19 : maus 5.13 : added phonemic symbols /Nm/, /kp/ (= double articulated), /e_r/, /o_r/ (raised)
           to (X-)SAMPA phoneme inventory
04.05.19 : maus 5.14, maus.trn 3.0 : re-worked maus.trn to process chunks in parallel
           maus.pipe 6.2 : bug fix : pipes starting with CHUNKER crashed
06.05.19 : maus.trn 3.1 : bug fix : the pre-screening for chunks too short to be processed
           did not work properly for KAN tiers with blank-separated SAMPA strings - fixed
           Crashed sub maus jobs were not handled gracefully (just wait for time-out) - fixed
           Chunks that could not be processed are now labelled (in the MAU tier) as 'chunkNotProcessed>'
           in segments of length 10 samples at the beginning of the chunk (word segments are
           accordingly mapped to these very short segments, but the word labelling stays
           as in the input, as does the TRN tier). Sub maus jobs that do not die, causes the
           service to wait for time-out (currently 1600sec)
           maus.trn 3.2 : improved checking of sub maus jobs, when very many sub jobs crash, the 
           main process might 'hang' forever, because the number of still running jobs (that very in 
           fact crashed) was determined incorrect - fixed
           Changed minimum average phone duration limit for pre-screenung cunks in maus.trn to 
           40msec (was 30msec).
07.05.19 : maus.trn 3.3 : improved pre-screening : when RELAXMINDUR is set, the pre-screening average 
           phone duration is set to 10msec; improved ERROR message of screening; 
08.05.19 : maus.trn 3.4 : improved multi-threading and added option MULTITHREADING=true
           maus 5.15 : set default statistical rule set for eng-SC to rml.prune50smooth.rul, which is a 
           robust set of rules (must see at least 50 occurances of a rule); the rule sets with lesser
           pruning thresholds (10,20) are faulty; added option MULTITHREADING=true
           maus.trn 3.5 : build-in time-out for forking jobs (2h) in case that more sub jobs 
           'hang' than MAXFORK (which could result in an indefinite 'hang')
           maus 5.16 : added language afr-ZA, forced alignment only
11.05.19 : maus.pipe 6.3 : when a pipe has no TEXT input, the SUBTITLE service reconstructs
           the transcript from the ORT tier, but it can be that MAUS processed only a part 
           of the ORT tier, if a subset was defined in TRN and USETRN==true was set.
           Starting from this version the reconstruction is then constrained to the 
           ORT subset as defined in the TRN tier.
12.05.19 : maus.pipe 6.4 : changed G2P -oform (output format) to 'bpfs' (KAN tiers contain 
           blank-separated phoneme symbols)
17.05.19 : maus 5.17 : changed deu_DE default MAUSSHIFT from 10.0 to 7.13 after re-validation
18.05.19 : maus 5.18 ; introduced flexible frame rate for segment boundaries (option TARGETRATE);
           TARGETRATE is default 100000 units of 100nsec (= 10msec, backwards compatible), but can 
           be reduced to minimum 10000 (= 1msec) framerate, if for instance segmental 
           analysis require more fine grained quantization. Note though that increasing the 
           frame rate *does not* improve average MAUS accuracy (tested on German VM 
           benchmark only!) nor improve the boundary deviation histogram.
28.05.19 : maus 5.19 : Albanian: phoneme /4/ was missing -> fixed
04.05.19 : maus.pipe 6.5 : added MAUS option TARGETRATE 
06.06.19 : maus.pipe 6.6 : emuDB output file in KEEP ZIP hat the wrong extension '._annot.json' -> fixed
10.06.19 : maus.pipe 6.7 : replaced media file pre-processing by a call to 'audioEnhance'; this enables
           pipes to process MP3 input, other bit resolutions than 16bit, multi-channel files.
11.06.19 : maus.pipe 6.8 : wrong content in KEEP=true ZIP output, if ANONYMIZER is last service -> fixed
11.06.19 : maus 5.20 : bug fix: with the introduction of AUDIOENHANCE in maus.pipe, maus did not insert 
           the correct bundle name in emuR output *_annot.json when in a pipeline -> fixed
13.06.19 : maus 5.21 : added a number of X-SAMPA symbols to LANGUAGE=sampa on request of the DoReCo project
19.06.19 : maus 5.22 : added X-SAMPA /dz\/ to LANGUAGE=sampa
           maus.pipe 6.9 : changed options INSORTTEXTGRID and INSKANTEXTGRID to 'true'
24.06.19 : maus 5.23 : added X-SAMPA /@e/ /@:e/ to LANGUAGE=sampa
           maus.pipe 6.10 : added AudioEnhance option NOISEPROFILE
30.06.19 : maus 5.24 : added X-SAMPA /J_+/ /@`/ /z=/ to LANGUAGE=sampa
10.07.19 : maus 5.25 : bug fix : when using a 'nrul' set and the LANGUAGE contains X-SAMPA symbols
           that contain '\', the rules are not used correcty, i.e. symbols with '\' can be in the 
           output although no rules are predicting them; from this version on X-SAMPA symbols 
           that contain '\' are removed from the symbol set before calling the variant generator;
           this does not change any processing of X-SAMPA symbols, since rules containing such 
           symbols cannot be used in 'nrul' sets anyway (because of '-' being the context separator)
11.07.19 : maus 5.26 : transcription tags of the form '<...>' that are passed through G2P (-com yes)
           and are modelled as explicite silence ('<p>') so that theya re passed through maus as well,
           may now contain the characters '<>' in the tag string, e.g. '<<x>tag>'.
           maus.trn 3.6 : bug fix : when running in chunk segmentation mode (USETRN=true) and NOINITIALFINALSILENCE=true
           a silence interval was inserted at the end of some chunks; from this version on the final
           segment always fits to the end of the chunk
13.07.19 : maus 5.27 : added X-SAMPA symbols for DoReCo
           maus 5.28 maus.trn 3.7 : bug fix : silence intervals of sample length 1 between 
           consecutive chunks removed
14.07.19 : maus 5.29 maus.trn 3.8 : bug fix : the pre-check for impossible short chunks did not consider reduced 
           frame rate (via option TARGETRATE)
15.07.19 : patch in maus 5.29 maus.trn 3.8 : wav2trn calculates duration *2 samples* too high!
           maus.pipe 6.12 : option TARGETRATE was not passed on to module MAUS 
           maus.trn 3.9 : changed length of dummy MAU segments of non-processed chunks so that they
           cover the complete chunk (and are therefore better visible in sound editors) 
27.07.19 : maus.pipe 6.13 : added ASR option ACCESSCODE
30.08.19 : maus 5.30 : added 8 new X-SAMPA symbols to LANGUAGE=sampa
06.09.19 : maus 5.31 : bug fix in Georgian phoneme set: model /c_>/ was missing leading to an ERROR when in BPF input
23.09.19 : maus 5.32 : added 7 new X-SAMPA symbols to LANGUAGE=sampa
01.10.19 : maus 5.33 : integrated output format conversion using annotConv
03.10.19 : maus.pipe 7.0 : pipes with ASR are no longer supported without AAI authentication;
           LANGUAGE: added variants of English and Spanish that are supported by ASR module and mapped these to
           supported English and Spanish variants in modules G2P_PHO2SUL and MAUS
08.10.19 : mausbpf2emuR 5.33 : added BPF class 2 tier support (SPD IPA)
           mausbpf2csv 5.33 : added BPF class 2 tier support (SPD)
11.10.19 : annotConv 1.3 : added fallback option SAMPLERATE (just needed if the sample 
           rate cannot be determined from the input BPF)
           mausbpf2eaf 1.3 : enabled SPD conversion as singular tier
14.10.19 : annotConv 1.4 : bug fix : exit code 0 after ERROR in EAF fixed
21.10.19 : maus.pipe 7.1 : allow OUTFORMAT=eaf for pipes ending with MAUS or ANONYMIZER; this is just
           a pre-liminary fix; the next step will be the introduction of annotConv into maus.pipe
30.10.19 : maus.pipe 8.0 : replaced all output conversions by annotConv; stream-lined list of 
           recognized output format descriptions: deprecated emuR, PAR, BPF, textgrid, tg, TG, CSV, EAF
           mau-append
05.11.19 : maus 5.34 : re-enabled already deprecated option INSORTTEXTGRID and INSKANTEXTGRID after
           user complaints
07.11.19 : maus 5.35 : added option ADDSEGPROB=false; if set the frame-normalized natrual log 
           Viterbi likelihood is appended to the phonetic symbol in the MAU tier (separated by blank)
           Note that setting this option will break the BPF standard, and must not be used in a
           pipeline in which the MAUS result is processed further (e.g. PHO2SYL).
           maus.pipe 8.1 : added option ADDSEGPROB=false for pipes that end on MAUS
11.11.19 : maus.pipe 8.2 : fixed typo in variable name 'OUTPFORMAT'; fixed missing extension handling
           in KEEP=true.
24.11.19 : maus 5.36 : added HMM '<usb>' to German set (was a link to '<nib>')
30.11.19 : maus 5.37 : bug fix: if both option INSORTTEXTGRID and INSKANTEXTGRID
           were set, the results other than TextGrid caused an ERROR.
09.12.19 : maus 5.38 : added tone markers to tha-TH processing; syllable nuclei carrying 
           a tone marker '..._1 - ..._5' are processed as nuclei without markers, but 
           the tone marker is carried over to the MAUS output.
13.12.19 : maus 5.39 : added missing phoneme /@:/ to tha-TH phoneme inventar
           maus.pipe 8.3 : added a list ASRNONTOKENLANGUAGES that define languages like tha-TH
           for which the ASR does not deliver word-tokenized output but rather the complete
           utterance in one string; for these the following G2P module will pass the 'word'
           through the usual word tokenization (iform txt).
17.12.19 : maus 5.40 : extended the KANINVENTAR.inv list by the language specific README; 
           fixed missing KANINVENTAR.inv tables for Swiss dialects gsw-CH-**;
           fixed non-masked numericals in GRAPHINVENTAR/DICT in the languages sampa,ron-RO,
           ltz-LU and jpn-JP.
           maus.pipe 8.4 : added jpn-JP to ASRNONTOKENIZEDLANGUAGES, and since Google and Watson
           behave different on jpn-JP made the decision to extract txt from the ASR BPF more 
           strict: only if the BPF contains really only one 'word' (= the total utterance), the
           text is extracted and passed on to G2P as iform txt
23.12.19 : maus 5.41 : added /G/ to phoneme set of ltz-LU
26.12.19 : maus 5.42 : fixed buggy entry in tha-TH KANINVENTAR.inv;
           fixed OUTSYMBOL=ipa|manner|place for tha-TH
           tone markers '..._1" etc. are treated in IPA as in SAMPA.
27.12.19 : maus 5.43 : added syllabic variants l= m= n= to LANGUAGEs eng-AU and eng-NZ to 
           be conform with pho2syl service;
           added syllabic variants l=` to LANGUAGE nor-NO to be conform with pho2syl service.
           maus.pipe 8.5 : added new option '-embed maus' to all pho2syl_wrapper.pl calls
28.12.19 : maus.pipe 8.6 : enabled the usage of option OUTSYMBOL=sampa|ipa|x-sampa|maus-sampa|arpabet
           for PIPEs with last service PHO2SYL
01.01.20 : maus.pipe 8.7 : fixed SUBTITLE problem with LANGUAGEs jpn-JP and tha-TH: subtitle texts are
           now taken from the word-tokenization instead of the input TEXT; this has the disadvantage
           that subtitles have no punctuation and are possibly in another script that TEXT, but at 
           least we get a usable result.
09.01.20 : maus 5.44 : bug fix : '0' was not masked with 'P' in ltz-LU and swe-SE
           bug fix: /d_0/ was not in acoustic model of ltz-LU
           bug fix: the default phonological pronunciation model of ltz-LU prevented
           the phonemes /s\/ and /z\/ to be passed to the MAU output; switched to default forced alignment
17.01.20 : maus 5.45 : extended tha-TH phoneme set by un-lengthened schwa /@/
           maus.pipe 8.8 : bug fix: the extraction of ASR results for tha-TH did not work 
           because for instance Google ASR tokenizes the ASR result between digits and does not
           put the total result in one string; fixed this by extraction the complete ORT 
           layer from the ASr BPF result file and concatenate this into a txt file which is then
           passed on to G2P.
25.01.20 : maus.pipe 8.10 : enabled G2P option 'syl=yes'; when set the KAN tier will contain '.' 
           syllable boundaries and G2P outsym maus-sampa is switched to sampa
           maus 5.47 : enable KAN tier input with syllabe markers '.' (which are ignored by MAUS)
03.02.20 : maus 5.47 : added 8 new phoneme symbols (for language Sanzhi Dargwa) to language independent phoneme set
04.02.20 : maus 5.48 : added closure only phonemes t_cl, p_cl and k_cl (clones from Italian) to
           the tha-TH phoneme set (experimental)
05.02.20 : maus 5.49 : added new phoneme symbol dZ_j to language independent phoneme set
06.02.20 : maus 5.50 : added 'error tone _8' to tha-TH phoneme set 
08.02.20 : maus.pipe 8.11 : bug fix : LANGUAGE=sampa was not translated to -lng und in CHUNKPREP module
19.03.20 : maus 5.51 : bug fix in tha-TH : tone variants of schwa /@/ and closure models were missing
25.03.20 : maus.trn 3.10 : disabled pre-screening of chunk lengths; some user requested rather a 
           <notProcessedChunk> marking in the MAU tier that a full ERROR.
30.03.20 : maus 5.52 : added language Icelandic isl-IS
           maus 5.53 : adjusted isl-IS phoneme mapping 
02.04.20 : maus 5.54 : added phoneme symbol 'q_h' to language independent set; added Icelandic phonemes to 
           language independent set
11.04.20 : maus 5.55 : bug fix: X-SAMPA diacritic 'advanced' /_+/ was removed from KAN input because 
           function words are (sometimes) marked with a trailing '+' in KAN. Now only trailing '+'
           without a preceeding '_' are removed.
14.04.20 : maus 5.56 : added /4/ to inventar eng-AU
15.04.20 : maus 5.57 : eng-AU : mapped /4/ to eng-US /4/
30.04.20 : maus 5.58 : added phoneme symbols /i_?\/ and /x:/ to language independent set
01.05.20 : maus.pipe 8.12 : adapted to new G2P 1.108: -embed maus does no longer disable the options syl and stress
           added G2P option stress=no
           maus 5.59 : bug fix : special characters in KAN tier ".#'\"+" were not suppressed in 
           blank-separated KAN strings.
10.05.20 : maus 5.60 : changed language specific IPATABLE = KANINVENTAR.inv to PARAM.SAMPA/KANINVENTAR.inv
           maus.pipe 8.14 : bug fix in maus.pipe.MAUS : KAN tier with stress marker were not tranlated to IPA correctly
16.05.20 : maus 5.61 : added phoneme symbol /s_>/ (alveolar ejective fricative) to language independent set
18.05.20 : maus 5.62 : changed IPATABLE back to language specific table KANINVENTAR.inv because the language 
           independent table (see version 5.60) lead to ambiguous mappings (e.g. IPA u: -> SAMPA uu)
19.05.20 : maus.pipe 8.15 : bug fix: using INSYMBOL=ipa and OUTSYMBOL=ipa in parallel lead to a mapping ERROR
           changed IPATABLE back to language specific table KANINVENTAR.inv
20.05.20 : maus 5.63 : added LANGUAGE=und as alias for LANGUAGE=sampa;
           the following symbols are ignored in IPA input (KAN) but passed on to the KAN output: ˈˌ.#"'+
21.05.20 : maus.pipe 8.16 : enabled chunker option 'maus'
23.05.20 : maus.pipe 8.17 : enabled option USEREMAIL
30.05.20 : maus.pipe 8.19 : integrated textEnhance service; added new option USEAUDIOENHANCE
31.05.20 : maus.pipe 8.20 : set default for LEFT_BRACKET = "#" because the interface cannot pass "#" as value
04.06.20 : maus.pipe 8.21 : adapted textEnhance call with option '--infile'; fixed icsiarg parser so that value
           '{}' can be passed to the script
11.06.20 : maus 5.65 : bug fix in MAU IPA output mapping
           maus.pipe 8.22 : bug fix in KAN IPA output mapping 
13.06.20 : maus 5.66 : added phoneme symbols S_> and x_> to language independent phoneme set
15.06.20 : maus.pipe 8.23 : modified SUBTITLE module to produce subtitles based on TRO tier in BPF input,
           if a TRO tier is present; this makes sense, if another module (e.g. ASR) in the pipe produces 
           original text with punctuation, which would be lost if we create subtitles only based on the ORT tier.
           Bug fix: the textEnhance was applied to non-txt input to G2P
02.07.20 : maus 5.67 : added audioEnhance pre-processing on all input media formats except *.wav
10.07.20 : maus.pipe 8.24 : allow odt doc docx pdf rtf as input formats
16.07.20 : maus.pipe 8.25 : add G2P option 'except=exceptionDictionary'
28.07.20 : maus 5.68 : added check for non-ASCII characters in RULESET file (not allowed: ERROR)
30.07.20 : maus 5.69 : added 51 new X-SAMPA symbols to the Language Independent Set (LANGUAGE=sampa); mostly clicks.
24.08.20 : maus 5.70 : added 3 new X-SAMPA symbols to the Language Independent Set (LANGUAGE=sampa): r\=` ts\: s\:
29.08.20 : maus 5.71 : inserted comment leading '%' as comment marker to all lines in KANINVENTAR.inv that are not part of the CSV
29.08.20 : maus.pipe 8.26 : fixed bug : MAUS RULESET was not saved in KEEP dir when MAUS was last service in pipe
31.08.20 : maus 5.72 : fixed KANINVENTAR.inv tables: some lines had trailing TABs
07.09.20 : maus.pipe 8.27 : fixed bug : KEEP=true did not work for PIPE=ASR_... types; TEXTENHANCE output copy 
           in KEEP ZIP had the wrong extension ".wav" (now it is '.txt')
11.09.20 : maus 5.73 : set BPFTHRESHOLD=3000 to test
19.09.20 : maus.pipe 8.28 : bug: non-TXT input was copied to TEXTENHANCE output in KEEP although no textEnhance was used
08.10.20 : maus 5.74 : added three new symbols to language independent set: ou y2 ie (Dolgan language)
15.10.20 : maus 5.75 : added three new geminate symbols to language independent set: J\J\ d`d` g_wg_w
21.10.20 : maus.pipe 8.29 : option BRACKETS={} did not work ('{}' were deleted in option) -> fixed
23.10.20 : maus 5.76 : expanded the SAMPA encoding of geminated consonants (which is ambique) to all possible forms to ease
           the use of Language Independent mode, e.g. ddS, dSdS and d:S are all the same model
24.10.20 : maus.pipe 8.30 : added SUBTITLE option value OUTFORMAT='vtt'
01.11.20 : maus 5.77 : fixed PARAM.SAMPA/mk_set with 'set noglob' command (no bug, just better code!)
05.11.20 : maus 5.78 : added 4 new symbols to language independent set: 1~ 1:~ h~ j~ (Texistepec Popoluca)
11.11.20 : maus.pipe 8.31 : delete '\t' and '\n' from reconstructed transcript from TRO before passing it to subtitle
16.11.20 : maus 5.79 : added 8th column with minimum duration to phoneme table of 'Language Independent' set
18.11.20 : maus 5.80 : added option RELAXMINDURTHREE: like RELAXMINDUR this option causes the HMM models to be set to
           a lower minimum duration, here 3 states in each model (= 30msec for standard frame rate); note
           that setting this option might ease analysis of segments' length since there is a uniform lower
           ceiling effect at 30msec for each phoneme class, but it also will dteriorate the segmental 
           accuracy of maus, since the restraints for longer segments such as affricates are waived.
19.11.20 : maus 5.81 : bug fix in sqi-AL (Albanian) phoneme set: /c/ was missing
20.11.20 : maus.pipe 8.32 : added new maus option RELAXMINDURTHREE
09.12.20 : maus.pipe 9.0 : added new pipeline 'ASR_SUBTITLE'; this pipeline will only work properly with ASR services
           that produce a WOR tier (a word alinment)
10.12.20 : maus 5.82 : bug fixes in Language Independet phone set table KANINVENTAR.inv
04.01.21 : maus 5.83 : added symbols s\_h @\ 3\ G\ to Language Independent Set
11.01.21 : maus 5.84 : bug fix : options ADDSEGPROB and OUTSYMBOL were incompatible -> fixed 
17.01.21 : maus 5.85 : added symbol r= to Language Independent Set; bug fix in mausbpf2emuR: ids in the *_annot.json
           had either gaps or be double (which causes an error when loading the DB)
20.01.21 : maus 5.86 : added symbols l_t m_t n_t to Language Independent Set
04.02.21 : maus 5.87 : bug fix in mausbpf2emuR : links to special levels with zero length segments (PHO,SAM) and gaps were wrong 
18.02.21 : maus 5.88 : re-newed cloning in language por-PT to current SUPERHMM with preference language spa (was cloned from German)
21.02.21 : maus 5.90 : added language far-IR forced alignment and MINNI; no pronuciation model (yet) 
24.02.21 : maus 5.91 : added symbol q_w to Language Independent Set
02.03.21 : maus.pipe 9.1 : bug fix in maus.pipe.ASR : runASR was called with corrupt arguments that caused exceed quota codes not to be passed
06.03.21 : maus.pipe 9.2 : bug fix : pipes ending with MAUS and SUBTITLE reported error when called with OUTFORMAT=exb|tei although this works
10.03.21 : maus 5.92 : changed mapping of un-trained geminate phoneme symbols in th eLanguage Independent Set (LANGUAGE=sampa) so 
           that the variants 'x:' and 'xx' always point to the same HMM

DEVELOPMENT STATUS TABLE

The following table gives an overview about the current status of the individual 
languages supported by maus. Support for a language <lng> can be of ascending 
modelling complexity. The gold standard is a full acoustical and pronunciation 
modelling for MAUS and a bigram model for MINNI; the minimum support is a 
mapping of a <lng> phoneme set to existing phonems of other languages (SUPERHMM,
see HMM/README for details).

lng    trained HMM on:   trained pronunciation on:     trained MINNI bigram on:

afr-ZA cloned from nld   -                             -	
aus-AU Pan_AUS           -                             Pan_AUS
cat-ES GLISSANDO News    GLISSANDO News                GLISSANDO News (292000 phones)
deu-DE KielCorpus 1      KielCorpus 1                  BASStat (2269063 phones)
ekk-EE BABEL+PhED        PhED (147925 words)           PhED+BABEL (684952 phones)
eng-GB AIX-MARSEC        AIX-MARSEC (53682 words)      AIX-MARSEC (206280 phones)                            -
eng-US TIMIT             TIMIT (54384 words)           TIMIT (213704 phones)
eng-AU AUSTALK           AUSTALK (71332 words)         AUSTALK (248790 phones)
eng-NZ WatsonCorpus      (AUSTALK)                     -
eus-ES SUPERHMM          -                             -
eus-FR SUPERHMM          -                             -
far-IR ETH-Zurich        -                             ETH-Zurich (83216 phones)
fin-FI SUPERHMM          -                             -
fra-FR Rhapsodie         Rhapsodie                     Rhapsodie (100658 phones)
gsw-CH TEVIOS etc.       -                             TEVIOS etc. (243795 phones)
hun-HU BEACorpus         BEACorpus (41986 words)       BEACorpus (211326 phones)
ita-IT CLIPS-MT-MANUAL   CLIPS-MT-MANUAL (47341 words) CLIPS-MT-MANUAL (90704 phones)
jpn-JP CSJ               -                             CSJ (1550398 phones)
kat-GE ZakhariaCorpus    -                             -
ltz-LU Kiel Corpus 1     Kiel Corpus 1                 -
mlt-MT SUPERHMM          -                             -     
nld-BE CGN (*.fon)       (German)                      CGN *.pho (274483 phones)
nld-NL CGN (*.fon)       CGN *.awd - *.pho (71738 w.)  CGN *.pho (293545 phones)	
nor-NO NB Tale Corpus    -                             NB Tale (298372 phones)
pol-PL CLARIN-PL-STUDIO  CLARIN-PL-STUDIO (305000 w.)  CLARIN-PL-STUDIO (2mio phones)
por-PT SUPERHMM          -                             -
ron-RO SUPERHMM          -                             -
rus-RU INTAS Corpus      -                             INTAS Corpus (53856 phones)
spa-ES GLISSANDO News    GLISSANDO News                G2P/GLISSANDO (480000 phones)
sqi-AL SUPERHMM (hun!)   -                             -
swe-SE NB Tale Corpus    -                             -
tha-TH LOTUS             -                             LOTUS



KNOWN BUGS / PROBLEMS / INSTALLATION ISSUES

- on some LINUX systems 'mawk' might be installed instead of GNU awk.
  Then you will most likely get an error message like:
  awk: run time error: regular expression compile failed (missing operand)
  awk: run time error: regular expression compile failed (missing operand)
  Try installing GNU awk instead.
- The usage of the SAM-PA symbol '?' for glottal stop will not work in 
  the option KANSTR="? i: g @ n" because the shell can't handle the '?'.
  However, it works if the input is read from a BPF file with the option 
  BPF=file.par.
- The table of Extended German SAM-PA lists some non-standard symbols 
  %< and %> for uncertain word boundaries which are not recognized by MAUS
  The symbols '#' and '<p:>' are recogized and modelled as optional
  silence intervals; the symbols '<' and '>' are recognized and modelled
  as non-optional silence intervals.
  Please note that the symbol '#' is also used in the the option 
  KANSTR="..." to mark word boundaries  
- The tier KAN in the BPF input file must not contain any 'silence words'
  that is words that are entirely encoded as a single optional silence model, e.g.
  KAN:  0   <p:>
  If you must model such silence words, use the non-optional silence model '<p>' instead.
- If running in 'chunk segmentation' mode (= USETRN=1 and more than one
  TRN entry in the input BPF), overlapping chunks cannot be processed for 
  TextGrid|emu|EMU output because these formats do not allow segments 
  with negative time.
- If running in 'chunk segmentation' mode (= USETRN=1 and more than one
  TRN entry in the input BPF) and the TRN tier in the input BPF does not 
  cover the entire KAN tier (= they are not synchronous, e.g. the TRN 
  describes only a subset of words), this will not work for 
  OUTFORMAT=emu|EMU, and the script will terminate with an ERROR; other 
  formats tolerated partial TRN (covering only a subset of the KAN tier).
- If a phoneme set requires coding of quantities with an intermediate ':',
  such as Estonian Q III 'C:C', there is a rare bug in the pronunciation module
  that issues an error, if 
  + the rule set contains two rules of the form L,C:C,R>L,X,R and L,C:C,R>L,Y,Z,R
    where L,R,X,Y,Z are arbitrary symbols, and
  + the input BPF contains the sequence L,C:C,R
  We take care that the rule sets delivered in the MAUS package do not contain 
  any such rules.
- maus reports:
  .../word_var-2.0: Befehl nicht gefunden.
  or
  .../word_var-2.0: command not found
  word_var and also graphvis are very old binaries compiled with ELF 5.
  Most likely your Linux system is missing one of the following libraries:
  libXm.so.2.0
  libm.so.5
  libg++.so.27
  Since version 1.20 we replaced the word_var-2.0 binary by a new
  binary compiled under SuSE 9.0
  Also we added the sources for word_var in the subdir ./word_var
- you run into problems when creating a TextGrid output.
  Some 'awk' installations produce float number with a comma
  instead of a dot (e.g. 1,2435 instead of 1.2435). This can be
  caused by a wrong LC_* or LANG environment variable. Since
  maus expects awk to print floats with dots, this may cause
  your problems.
- Rule sets (option RULESET=*.nrul|rul) distinguish between three 'boundary'
  symbols:
  /</ : utterance initial
  />/ : utterance end
  /#/ : utterance medial word boundary (= not the utterance initial or end!) 
  So, the logical structure of an utterance looks like:
  < wordfirst # ... # wordlast >
  /</ and /#/ may be used in rules, but due to bug that cannot easily fixed 
  the symbol />/ may not (since it is part of the rule syntax). E.g. we want 
  to model the possible insertion of a /?/ before a word-initial /a:/, then 
  the phonological rules would look like:
  <-a:-><-?,a:-
  #-a:->#,?,a:-
  In case we want the rule only applied only to utterance intial /a:/ we simply use
  <-a:-><-?,a:-
  If we use the utterance final symbol />/ in the same fashion, such as
  -a:->>-a:,n->  (after an utterance final /a:/ an /n/ may be iserted), this 
  leads to an error (because the first '>' is interpreted as the rule '>' and 
  not the utterance end symbol!).

