HISTORY runASR

19.07.7 : runASR 1.0 : first beta test with ASRType=callHavenOnDemandASR
          Note that this service is experimental and can be terminated any time.
          runASR 1.1 : added OUTFORMAT=emuDB by conversion in wrapper
24.07.17 : callHavenOnDemandASR 1.6 : make test for multiple calls more robust:
           script uses user specific STATFILE in tmp; checks for additional 
           entries in STATFILE while waiting.
           script use lock file to avoid parallel running processes; restrict
           to host 'linux22' and user 'tomcat7'.
25.07.17 : runASR 1.2 : enabled pipelining in service maus.pipe 
02.08.17 : callHavenOnDemandASR 1.7 : added a time-out to remove locking files
           that are older than the HPE API processing time-out (2h) to prevent 
           'hanging' locking files after a crash.
25.07.17 : runASR 1.3 : minor bug fixes
31.07.17 : runASR 1.4 : improved error handling
05.09.17 : maus.pipe 1.22 : handle video formats for ASR service
21.09.17 : callGoogleASR 1.0 : introduce wrapper to google speech cloud
           runASR 2.0 : introduce wrapper to google speech cloud
29.09.17 : callGoogleASR 1.1 : implemented monthly quota check; calls that would 
           exceed the monthlyQuotaSec quota, are rejected; moved log file from 
           /var/lib/googleASR/googleASR.stat to /r22/CLARIN/googleASR.stat (backup!)
05.10.17 : callHavenOnDemandASR 1.8 : ffmpeg video conversion retains original sampling 
           frequency of soundtrack if possible; otherwise 16000Hz are used
           callGoogleASR 1.2 : added video support compatible to callHavenOnDemand
06.10.17 : callHavenOnDemandASR 1.9 : fixed  ffmpeg options for video conversion
06.10.17 : callGoogleASR 1.3 : fixed  ffmpeg options for video conversion
09.10.17 : callGoogleASR 1.4 : replace json parsing by more robust jq
12.10.17 : callGoogleASR 1.5 : round real length of signal in secs up to next 15secs interva
           to match Google billing strategy (every begun 15secs interval is billed); adapted
           statistics file by rounding up.
13.10.17 : callWatsonASR 1.0 : umlimited support for IBM Watson 
15.10.17 : runASR 2.1 : introduce wrapper for IBM Watson
           callWatsonASR 1.1 : add speaker diarization support
21.10.17 : callHavenOnDemandASR 1.11 : now inserts silence intervals 
           <p:> in WOR tier output
02.11.17 : runASR 2.2 : replaced helper asrbpf2emuR by the new generic version of mausbpf2emuR (maus 4.2)
03.11.17 : callWatsonASR 1.2 : bug in csv production: column SPK was out of sync if silence intervals were 
           inserted in WOR tier, fixed
           callEMLASR 1.0 : first version
           runASR 2.3 : introduce wrapper for European Media Lab (EML); add OUTFORMAT=txt as simple text format
04.11.17 : runASR 2.4 : added options 'autoSelect' and 'allServices'
09.11.17 : runASR 2.5 : improved ASR service selection according valid combinations of LANGUAGE and diarization
12.11.17 : runASR 2.6 : re-worked all call*ASR helpers to improve consistent logging into linux22:/r22/CLARIN
19.11.17 : callEMLASR 1.2 : changed upload URL and call-back-uri from Test server to Prod server
21.11.17 : runASR 2.7 : fixed bugs in ASRType=allServices when OUTFORMAT!=txt; allow relaxed diarization option 
           in ASRType=allServices and ASRType=autoSelect (falling back to diarization=false)
24.11.17 : callEMLASR 1.3 : changed all REST calls to "https" to be sure that encryption is used for user data;
           changed EML queue to 'eml-transcribe-bas' and chnaged/added project names in callEMLASR.lng-codes
           (languages eng-GB,ita-IT,spa-ES)
06.12.17 : runASR 2.8 : ASRType=allServices and OUTFORMAT!=txt now throughs an error; temporary BPF files for 
           ASRType=allServices are now stored safely in /tmp not in OUT:r.par.
11.01.18 : runASR 2.9 : changed options OUT and OUTFORMAT to be mandatory (to avoid confusions in wrappers)
26.01.18 : runASR 3.0 : added service callLSTDutchASR: this service has no quotas but only processes nld-NL;
           no diarization support (ignored); word segmentation support; there are three language variants: 
           nld-NL-GN - daily conversations, nld-NL-OH - oral history interviews, nld-NL-PR - parliament talks
           callLSTDutchASR : first version 2.0
31.01.18 : callLSTDutchASR 2.1 : OUTFORMAT=bpf : WOR tier contained power of 10 numbers as
           start samples (e.g. 1.234e+06) fixed
           callLSTDutchASR 2.1 : WOR contained silence interval gaps -> insert '<p:>' intervals with word link '-1'
20.02.18 : callHavenOnDemandASR 1.14 : activated lock file check again because of race conditions in the backend
           processing, when more than 3 files are processed ('too many calls to this project'), and added a further
           time delay of 30secs after successfully locking to avoid a new curl call following a last poll curl
           call from the previous process. This probably slows down the usage considerably, especially if you
           upload many short files.
21.02.18 : callHavenOnDemandASR 1.15 : still encountering race condition errors; improved locking again
24.09.18 : runASR 3.1 : name of call... service is simply passed through; no need to adapt runASR in the future
           for new call services
           runASR 3.2 : bug fix : in most service calls NIST SPHERE format soundfile were not processed
02.10.18 : callLSTDutchASR 2.4 : new access code to LST services
04.10.18 : callGoogleASR 2.0 : implemented longrunningspeech (= asynchroneous) mode for signals longer 
           than 60sec; set max length of single file to 600sec and monthly quota to 14400sec; files>60s are
           now uploaded to Google Cloud Storage and then passed to Google Cloud Speech AIP and then 
           deleted from the Storage; we will have to pay for storage as well as speech recognition 
           beyond 60min per month. To stop billing the monthlyQuota must be set down to 60min *and* the 
           maxFileSizeSec to 60sec (=synchroneous mode) *and* the Cloud Storage bucket 'florian-schiel-speech3'
           must be deleted (using for instance the Storage Browser). 
19.10.18 : callWatsonASR 1.5 : improved handling of parallel calls; this should reduce the number of parallel 
           calls to the Watson server; there seems to exists a (non documented) limit of parallel processed
           calls per user at Watson; with 1.5 it should become very unlikely that this happens.
           callEMLASR 1.6 : improved handling of parallel calls as above; we don't know about any call frequency
           limits, but to be sure for the future and have fairly consistent codes.
           callHavenOnDemand 1.17 : improved handling of parallel calls as above
           callGoogleASR 2.1 : improved handling of parallel calls as above
           callWebASR 0.3 : improved handling of parallel calls as above
24.10.18 : callLSTASR : unsolved problem : multiple (>2) parallel calls to the LST server in a narrow
           time window causes a '404 not found' error in most calls, when retrieving the result XML 
           file, although the polling call did not report about any errors; informed LST developper group
26.10.18 : callWebASR 0.5 : got a new systems name 'Oral History Transcription Basic' that is supposed 
           to deliver the result faster; implemented this and reqorked the *.lng-codes handling so that 
           WebASR can now be called with four RCF codes: eng-GB, eng-GB-OH, eng-GB-OHFAST, eng-GB-LE (lectures);
           tested eng-GB-OHFAST: instead of 30min processing time now it is 5min, and the ASR result is 
           deteriorated (in my example; no systematic evaluation). 
           Looked at the words time alignment in the results XML file, but the alignment is too bad to be
           usable; therefore we did not implement an extraction into a WOR tier output.
29.10.18 : callLSTDutchASR 2.6 : solved all problems with LST ASR by inserting the unique project name into 
           the name of the uploaded signal file; it seems that the LST server is not able to separate files
           from different projects, and therefore all kind of missing file/result mixtures etc. happen 
           if severl calls using the same signal file name happen at the same time.
07.11.18 : callHavenOnDemand 1.18 : map transcript output '<Music/Noise>' to '<nib>' to be conform with 
           other ASR and postprocessing (e.g. G2P and MAUS recognize '<nib>' as a noise model)
15.11.18 : callWebASR 0.5 : changed WebASR account from 'kisler@..' to 'bas@bas.uni-muenchen.de'; the 
           password stays the same.         
14.12.18 : callGoogleASR 2.2 : bug fix : in longer signals only the first element of the array 'results ([0])'
           in the response JSON file from the Google server was parsed to output; now all results elements
           are concatenated and send to output; we still use just the first 'alternatives' element within
           each 'results' element, because we assume that this is the one with the highest confidence measure.
10.01.19 : runASR 3.5 : changed server ID check in all call* scripts to the regex pattern 'webapp.*\.phonetik'
           must match output of 'hostname -A'; that way we can have parallel running servers with aliases
           'webapp' 'webapp2' etc.  
10.01.19 : runASR 3.6 : changed name of local partition to a generic name '/srv/webapp'
18.01.19 : runASR 3.7 : fixed a bug in all call* ASR scripts that prevented to call call* scripts via the 
           PATH variable.
25.01.19 : callGoogleASR 2.7 : there were sporadic errors due to a fast removal of the input signal from
           google cloud bucket in assynchroneous mode - fixed by delaying the removal
05.02.19 : runASR 3.8 : disabled HavenOnDemand ASR, since for two weeks now the server does not respond 
           nor can we login to the server system to check our account.
18.02.19 : callWatsonASR 1.9 : fixed a bug in the script that caused an uncontrolled crash when 
           the first word in a result[].alternatives[] arry was a meta tag like '[atmen]'.
           change CSV output to new maus 5.x format.
18.02.19 : callGoogleASR 2.8 : change CSV output to new maus 5.x format.
18.02.19 : callLSTDutchASR 2.10 : change CSV output to new maus 5.x format.
18.02.19 : callWebASR 0.9 : change CSV output to new maus 5.x format.

