HISTORY runASR 
19.07.7 : runASR 1.0 : first beta test with ASRType=callHavenOnDemandASR
          Note that this service is experimental and can be terminated any time.
          runASR 1.1 : added OUTFORMAT=emuDB by conversion in wrapper
24.07.17 : callHavenOnDemandASR 1.6 : make test for multiple calls more robust:
           script uses user specific STATFILE in tmp; checks for additional 
           entries in STATFILE while waiting.
           script use lock file to avoid parallel running processes; restrict
           to host 'linux22' and user 'tomcat7'.
25.07.17 : runASR 1.2 : enabled pipelining in service maus.pipe 
02.08.17 : callHavenOnDemandASR 1.7 : added a time-out to remove locking files
           that are older than the HPE API processing time-out (2h) to prevent 
           'hanging' locking files after a crash.
25.07.17 : runASR 1.3 : minor bug fixes
31.07.17 : runASR 1.4 : improved error handling
05.09.17 : maus.pipe 1.22 : handle video formats for ASR service
21.09.17 : callGoogleASR 1.0 : introduce wrapper to google speech cloud
           runASR 2.0 : introduce wrapper to google speech cloud
29.09.17 : callGoogleASR 1.1 : implemented monthly quota check; calls that would 
           exceed the monthlyQuotaSec quota, are rejected; moved log file from 
           /var/lib/googleASR/googleASR.stat to /r22/CLARIN/googleASR.stat (backup!)
05.10.17 : callHavenOnDemandASR 1.8 : ffmpeg video conversion retains original sampling 
           frequency of soundtrack if possible; otherwise 16000Hz are used
           callGoogleASR 1.2 : added video support compatible to callHavenOnDemand
06.10.17 : callHavenOnDemandASR 1.9 : fixed  ffmpeg options for video conversion
06.10.17 : callGoogleASR 1.3 : fixed  ffmpeg options for video conversion
09.10.17 : callGoogleASR 1.4 : replace json parsing by more robust jq
12.10.17 : callGoogleASR 1.5 : round real length of signal in secs up to next 15secs interva
           to match Google billing strategy (every begun 15secs interval is billed); adapted
           statistics file by rounding up.
13.10.17 : callWatsonASR 1.0 : umlimited support for IBM Watson 
15.10.17 : runASR 2.1 : introduce wrapper for IBM Watson
           callWatsonASR 1.1 : add speaker diarization support
21.10.17 : callHavenOnDemandASR 1.11 : now inserts silence intervals 
           <p:> in WOR tier output
02.11.17 : runASR 2.2 : replaced helper asrbpf2emuR by the new generic version of mausbpf2emuR (maus 4.2)
03.11.17 : callWatsonASR 1.2 : bug in csv production: column SPK was out of sync if silence intervals were 
           inserted in WOR tier, fixed
           callEMLASR 1.0 : first version
           runASR 2.3 : introduce wrapper for European Media Lab (EML); add OUTFORMAT=txt as simple text format
04.11.17 : runASR 2.4 : added options 'autoSelect' and 'allServices'
09.11.17 : runASR 2.5 : improved ASR service selection according valid combinations of LANGUAGE and diarization
12.11.17 : runASR 2.6 : re-worked all call*ASR helpers to improve consistent logging into linux22:/r22/CLARIN
19.11.17 : callEMLASR 1.2 : changed upload URL and call-back-uri from Test server to Prod server
21.11.17 : runASR 2.7 : fixed bugs in ASRType=allServices when OUTFORMAT!=txt; allow relaxed diarization option 
           in ASRType=allServices and ASRType=autoSelect (falling back to diarization=false)
24.11.17 : callEMLASR 1.3 : changed all REST calls to "https" to be sure that encryption is used for user data;
           changed EML queue to 'eml-transcribe-bas' and chnaged/added project names in callEMLASR.lng-codes
           (languages eng-GB,ita-IT,spa-ES)
06.12.17 : runASR 2.8 : ASRType=allServices and OUTFORMAT!=txt now throughs an error; temporary BPF files for 
           ASRType=allServices are now stored safely in /tmp not in OUT:r.par.
11.01.18 : runASR 2.9 : changed options OUT and OUTFORMAT to be mandatory (to avoid confusions in wrappers)
26.01.18 : runASR 3.0 : added service callLSTDutchASR: this service has no quotas but only processes nld-NL;
           no diarization support (ignored); word segmentation support; there are three language variants: 
           nld-NL-GN - daily conversations, nld-NL-OH - oral history interviews, nld-NL-PR - parliament talks
           callLSTDutchASR : first version 2.0
31.01.18 : callLSTDutchASR 2.1 : OUTFORMAT=bpf : WOR tier contained power of 10 numbers as
           start samples (e.g. 1.234e+06) fixed
           callLSTDutchASR 2.1 : WOR contained silence interval gaps -> insert '<p:>' intervals with word link '-1'
20.02.18 : callHavenOnDemandASR 1.14 : activated lock file check again because of race conditions in the backend
           processing, when more than 3 files are processed ('too many calls to this project'), and added a further
           time delay of 30secs after successfully locking to avoid a new curl call following a last poll curl
           call from the previous process. This probably slows down the usage considerably, especially if you
           upload many short files.
21.02.18 : callHavenOnDemandASR 1.15 : still encountering race condition errors; improved locking again
24.09.18 : runASR 3.1 : name of call... service is simply passed through; no need to adapt runASR in the future
           for new call services
           runASR 3.2 : bug fix : in most service calls NIST SPHERE format soundfile were not processed
02.10.18 : callLSTDutchASR 2.4 : new access code to LST services
04.10.18 : callGoogleASR 2.0 : implemented longrunningspeech (= asynchroneous) mode for signals longer 
           than 60sec; set max length of single file to 600sec and monthly quota to 14400sec; files>60s are
           now uploaded to Google Cloud Storage and then passed to Google Cloud Speech AIP and then 
           deleted from the Storage; we will have to pay for storage as well as speech recognition 
           beyond 60min per month. To stop billing the monthlyQuota must be set down to 60min *and* the 
           maxFileSizeSec to 60sec (=synchroneous mode) *and* the Cloud Storage bucket 'florian-schiel-speech3'
           must be deleted (using for instance the Storage Browser). 
19.10.18 : callWatsonASR 1.5 : improved handling of parallel calls; this should reduce the number of parallel 
           calls to the Watson server; there seems to exists a (non documented) limit of parallel processed
           calls per user at Watson; with 1.5 it should become very unlikely that this happens.
           callEMLASR 1.6 : improved handling of parallel calls as above; we don't know about any call frequency
           limits, but to be sure for the future and have fairly consistent codes.
           callHavenOnDemand 1.17 : improved handling of parallel calls as above
           callGoogleASR 2.1 : improved handling of parallel calls as above
           callWebASR 0.3 : improved handling of parallel calls as above
24.10.18 : callLSTASR : unsolved problem : multiple (>2) parallel calls to the LST server in a narrow
           time window causes a '404 not found' error in most calls, when retrieving the result XML 
           file, although the polling call did not report about any errors; informed LST developper group
26.10.18 : callWebASR 0.5 : got a new systems name 'Oral History Transcription Basic' that is supposed 
           to deliver the result faster; implemented this and reqorked the *.lng-codes handling so that 
           WebASR can now be called with four RCF codes: eng-GB, eng-GB-OH, eng-GB-OHFAST, eng-GB-LE (lectures);
           tested eng-GB-OHFAST: instead of 30min processing time now it is 5min, and the ASR result is 
           deteriorated (in my example; no systematic evaluation). 
           Looked at the words time alignment in the results XML file, but the alignment is too bad to be
           usable; therefore we did not implement an extraction into a WOR tier output.
29.10.18 : callLSTDutchASR 2.6 : solved all problems with LST ASR by inserting the unique project name into 
           the name of the uploaded signal file; it seems that the LST server is not able to separate files
           from different projects, and therefore all kind of missing file/result mixtures etc. happen 
           if severl calls using the same signal file name happen at the same time.
07.11.18 : callHavenOnDemand 1.18 : map transcript output '<Music/Noise>' to '<nib>' to be conform with 
           other ASR and postprocessing (e.g. G2P and MAUS recognize '<nib>' as a noise model)
15.11.18 : callWebASR 0.5 : changed WebASR account from 'kisler@..' to 'bas@bas.uni-muenchen.de'; the 
           password stays the same.         
14.12.18 : callGoogleASR 2.2 : bug fix : in longer signals only the first element of the array 'results ([0])'
           in the response JSON file from the Google server was parsed to output; now all results elements
           are concatenated and send to output; we still use just the first 'alternatives' element within
           each 'results' element, because we assume that this is the one with the highest confidence measure.
10.01.19 : runASR 3.5 : changed server ID check in all call* scripts to the regex pattern 'webapp.*\.phonetik'
           must match output of 'hostname -A'; that way we can have parallel running servers with aliases
           'webapp' 'webapp2' etc.  
10.01.19 : runASR 3.6 : changed name of local partition to a generic name '/srv/webapp'
18.01.19 : runASR 3.7 : fixed a bug in all call* ASR scripts that prevented to call call* scripts via the 
           PATH variable.
25.01.19 : callGoogleASR 2.7 : there were sporadic errors due to a fast removal of the input signal from
           google cloud bucket in assynchroneous mode - fixed by delaying the removal
05.02.19 : runASR 3.8 : disabled HavenOnDemand ASR, since for two weeks now the server does not respond 
           nor can we login to the server system to check our account.
18.02.19 : callWatsonASR 1.9 : fixed a bug in the script that caused an uncontrolled crash when 
           the first word in a result[].alternatives[] array was a meta tag like '[atmen]'.
           change CSV output to new maus 5.x format.
18.02.19 : callGoogleASR 2.8 : change CSV output to new maus 5.x format.
18.02.19 : callLSTDutchASR 2.10 : change CSV output to new maus 5.x format.
18.02.19 : callWebASR 0.9 : change CSV output to new maus 5.x format.
05.03.19 : runASR 4.0 : terminated service HavenOnDemand
22.04.19 : callWatsonASR 1.10 : bug fix: sometimes start/duration in OUTFORMAT=bpf, tier WOR were not 
           given in integer but scientific format (1.9089e+05)
01.06.19 : EML service does not work; disabled EML in Prod CMDI and in runASR (list 'asrWrappers') until resolved
08.06.19 : callEMLASR : adapted CSV output format to new 11-column standard
18.06.19 : runASR 4.1 : added ASRType callLSTEnglishASR
21.06.19 : runASR 4.2 : added retrieve of error.log file from project in callLST*ASR if the call fails;
           changing all ERROR/WARNING outputs in callLST*ASR to 'append' (>> /dev/stderr, was 'write' before)
           callEMLASR 1.11 : replaced signal pre-processing by audioEnhance
           callGoogleASR 2.9 : replaced signal pre-processing by audioEnhance
           callLSTDutchASR 2.11 : replaced signal pre-processing by audioEnhance
           callLSTEnglishASR 0.2 : replaced signal pre-processing by audioEnhance
           callWatsonASR 1.11 : replaced signal pre-processing by audioEnhance
27.06.19 : callEMLASR 1.12 : bug fix: in BPF and emuR output the wrong sampling rate of 16000Hz was used, 
           instead of the sampling rate of the input signal
23.07.19 : callLSTEnglishASR 0.3 : adapted code for proper error handling of short or absent speech in signals
27.07.19 : runASR 4.3, callGoogleASR 2.10 : added licensing for quota over-draft (ACCESSCODE)
05.10.19 : runASR 5.0, ... : added output conversion by annotConv; replaced OUTFORMAT=json|xml by
           OUTFORMAT=native
18.10.19 : callWatsonASR 1.14 : added new credentials valid 1 year with monthly quota of 500min 
           (new IBM Academics Program); script now uses both accounts alternately and observes
           30000sec/month quota and understands option '--checkQuota'.
30.11.19 : runASR 5.1, callWatsonASR 1.16 : implemented usage of ACCESSCODE for Watson ASR
01.01.20 : callGoogleASR 2.13 : extended monthly quota to 450h, extended file quota to 3h
21.01.20 : callWatsonASR 2.0 : stopped old account and switched to new account with 30000sec monthly quota
           fixed bug when video was processed, the BPF output had no header.
           callGoogleASR 2.14 : fixed bug when video was processed, the BPF output had no header.
           callLSTDutchASR 2.14 : fixed bug when video was processed, the BPF output had no header.
           callLSTDutchASR 0.7 : fixed bug when video was processed, the BPF output had no header.
07.04.20 : runASR 5.4, callGoogleASR 2.15 : bug fix: if the result transcript contained a '?',
           the conversion caused an error and an empty file was returned
15.06.20 : callGoogleASR 2.16 : enabled punctuation in ASR results
           runASR 5.5 callGoogleASR 2.16 : produce TRO tier in bpf output with punctuation and '\s' at the 
           end of each token; produce txt output with punctuation, if the ASR method produces a TRO tier
17.06.20 : adapted all call*ASR scripts to tolerate numberSpeakDiar option
20.06.20 : runASR 5.7 callGoogleASR 2.17 : add speaker diarization, output in SPK, and alignment, output in WOR
           adapted CSV generation, bug in Google service produces double sentence output which filtered, 
           added re-ordering of speaker labels and optional label match via 'speakMatch' (as in SpeakDiar service).
           adapted all call*ASR scripts to 14-column CSV output.
21.06.20 : runASR 5.8 added option 'speakMatch'
22.06.20 : callLSTDutchASR 2.16 added punctuation output into TRO tier
           callLSTEnglishASR 0.9 added punctuation output into TRO tier
           callWatsonASR 2.2 : added re-ordering of speaker labels and 'speakMatch' support
28.06.20 : runASR 5.9 : moved re-ordering/matching of speaker labels from call*ASR scripts to runASR
29.06.20 : runASR 5.10 : generic csv format converion in runASR based on bpf produced by call*ASR scripts;
           basically the call*ASR scripts are now called only with OUTFORMAT=bpf and =native
06.07.20 : callFraunhoferASR 2.0 : adapted to update in test server Fraunhofer; bug fix error codes
24.09.20 : callEMLASR 1.16 : removed string conversion from iso8859 to UTF-8; obviously t
           callFraunhoferASR 2.3 : bug fix : ORT tier contained punctuation -> removed punctuation from
           ORT tier and added TRO tier with punctuation to output
08.10.20 : callFraunhoferASR 2.4 : if MPEG7 without transcript is returned, do not issue ERROR but a WARNING 
           and create empty result file; removed XML fomatting of error output since FhG server sometimes
           returns HTML formatted error messages; set poll-timeout down from 3h to 10min, since the FhG
           server often seems to 'hang' and then we wait 3h for a null result.
11.10.20 : callWatsonASR 2.5 : extended list of languages by eng-AU and fra-CA; activated 'opt-out' to prevent
           user data logging
28.10.20 : callLSTDutchASR 3.0 : some changes at LST server incoporated; functionality is the same 
30.10.20 : callEMLASR 1.17 : replaced wget calls by curl calls in polling to avoid wget log file output
11.11.20 : callFraunhoferASR 2.4 : bug fix : in the TRO tier of BPF output the '\s' was missing after each token
           runASR 5.12 : bug fix : OUTFORMAT=native caused an unspecified error
19.11.20 : callWatsonASR 2.6 : bug fix : script returned generic script error
21.11.20 : callFraunhoferASR 2.6 : bug fix : some race conditions (long files) causes endless writing loop
                                   added TRN tier creation based on the ASR chunking of result
01.12.20 : callWatsonASR 2.7 : implemented endpoint change from watsonplatform.net to watson.cloud.ibm.com
05.12.20 : callLSTEnglishASR 1.1 : some changes at LST server incoporated; functionality is the same
26.01.21 : callFraunhofer 2.7 : fixed quotas: file size 2GB, file length 5h (our limit, not FhGs)
28.01.21 : callFraunhofer 2.8 : bug fix: input files that took longer than 60sec to upload to the internal
           BAS server (approx. larger than 600MB) caused an generic curl error 
10.02.21 : runASR 5.14 : bug fix : WARNING was issued when using a ACCESSCODE, but billing was correct
02.03.21 : runASR 5.15 : exceed quota codes are logged only if a result is returned; the awkward second entry
           in the log files after an error occurred is obsolete.
22.03.21 : runASR 5.17 : added option '--checkQuotaSecs' to all call... scripts that display quota
02.04.21 : runASR 5.18 : bug fix in Google and Watson: no STATFILE logging since 22.3.21 -> fixed
           added ACCESSCODE logging to Fraunhofer although we do not bill Fraunhofer;
           ASRType=autoSelect now checks quotas of ASR services so that no service is selected that has not 
           enough free quota
03.04.21 : callWatsonASR 2.13 : curl call to IBM server randomly returns error code 92 ('Stream error in HTTP/2 framing layer');
           the only fix that seems to work is repeating the call; added a loop with expanding wait times
           between calls (0.1 .... 5sec) over 100 trials; if another exit code is returned or 100 unsuccessful
           trials are reached, an ERROR is issued.
19.06.21 : callFraunhoferASR 2.11 : new asrServerURL and new credentials after the old version did not work any longer
           caused by an expired certificate on the server
23.06.21 : callFraunhoferASR 2.12 : new models names after the old versions were not longer available; the service now
           uses the most recent models of deu-DE and eng-US ASR regardless of the version number
           caused by an expired certificate on the server
10.07.21 : callGoogleASR 2.27 : updated Google Could SDK that allows the upload of signals in the cloud bucket to 327.0.0
03.12.21 : callWatsonASR 2.14 : change '92 error' loop to 10 times; failed calls are nowlogged into STATFILE with 
           string 'FALILED' instead of ACCESSCODE in third column; adapted STATFILE analysis
04.12.21 : callWatsonASR 2.15 : change '92 error' loop to 1 times (to test whether the over-usage was caused by this)
16.12.21 : callFraunhoferASR 2.13 : set polling time-out to 10h (= double the max input length, was: 3h!)
25.01.22 : runASR 6.0 : added new service callAmberscriptASR for testing
26.01.22 : callAmberscript 1.1 : added new API key for testing the benchmark
11.02.22 : runASR 6.1, callAmberscriptASR 1.3 : added languages 
15.02.22 : callEMLASR 1.18 : bug in XML conversion of results: when speaker diarization was switched off, the created WOR tier
           was corrupt; languages eng-GB and eng-US do not deliver speaker diarization results: set them to 'nodia' in 
           callEMLASR.lng-codes.  
22.02.22 : callAmberscriptASR : bug in TRO tier the white space marker '\s' was missing causing all words 'glued together' 
           e.g. in Subtitle WebVTT output 
25.02.22 : runASR 6.3 (+ all call...) : added option TROSpeakerID=false; if set and if a SPK and a TRO tier are delivered from
           the ASR service, speaker ID markers are added to the turn-starting words of the TRO tier
           callEMLASR 1.19 : added TRO tier output (= copy of ORT since EML does not produce punctuation)
26.02.22 : callGoogleASR 2.29 : added TRN tier output, if service produces SPK and TRO tier
26.02.22 : callWatsonASR 2.17 : added TRO,TRN tier output, if service produces SPK and TRO tier;
           fixed the random curl error return '92 - protocol error' problem by 
           removing the header 'Transfer-Encoding: chunked' from the curl call.
01.03.22 : runASR 6.4 : replaced the '\s' between speaker label and word in TRO tier by a blank, since service Subtitle seems 
           only the regard '\s' at the end of a word label but not within.
05.03.22 : callLSTDutchASR 3.2 : changed server name to https://webservices.cls.ru.nl/asr_nl/
15.03.22 : callWatsonASR 2.18 : changed the following languages to 'new generation models': eng-US|AU|GB, spa-ES, kor-KR, jap-JP, fra-FR;
           all new models support speaker diarization, have a significant better quality and are 5times faster 
07.07.22 : runASR 6.5 : removed the colon after the speaker ID (<INT:> -> <INT>) when inserting speaker diarization results into the TRO 
           tier (option TROSpeakerID=true) to be confrom with OH.D rules in WebVTT files
15.10.22 : runASR 6.6 : added a pre-check whether the present runASR call is already running on the server; if so, the call is terminated with an ERROR
           (this is a work-around to prevent multiple backend calls that occur sometimes on jobs that last longer than 600sec; since
           we must prevent multiple ASR service calls - since they cost money -, we do this pre-check in runASR until we find the bug);
           it follows that if a user uploads the exact same input file on different browsers or in different tabs of his browser and
           tries to run ASR, he will get an ERROR message, until the already running process is terminated.
19.10.22 : callWatsonASR 2.19 : switched to new account with free 1200000sec/month
21.11.22 : callFraunhoferASR 2.16 : bug fix : the internal monthly free quota check did not filter ACCESSQUOTA calls from the STATFILE, and therefore 
           calculated a way too high accumulated processed seconds
22.11.22 : callFraunhoferASR 2.17 : sometimes polling requests are not answered by the FhG server; when the time-out of 60
           curl return error code 28 which terminates the callFraunhoferASR; instead of exiting we try again to poll after
           curl error 28 and it seems to work.
05.01.23 : callWatsonASR 2.20 : suddenly a new API key appeared in the IBM Cloud config and at the same time I got an automated email 
           that my '30 days trial is over'; the old API key does not work any longer, but the new one does, so I replaced it
29.04.23 : callFraunhoferASR 3.0 : first 'on-premise' version installed in offline modus; see linux11:/srv/webapp/CLARIN/FhG_audioMining# ls README_Flo.md for details
03.05.23 : callLSTEnglishASR 1.2 : adapted script to new server version: order of attributes in word token list has changed
27.05.23 : runASR 6.7 : added option USEWORDASTURN=true : will copy an ASR-generated WOR tier (if there) into a TRN tier; 
           this might then used by a following MAUS process as time ankers and improve the phonetic alignment within words
           callFraunhoferASR 3.1 : changed initial poll time from 10 to 2sec and minimum polltime from 10 to 4sec
