=================================================
==== Use case long recording with raw text ======
=================================================

What we have: A long audio file (*.wav) and the corresponding orthographic transcription (*.txt).
The recording is too long for segmentation with WebMAUS (> 3000 words); and we do not have a manual chunk segmentation.

What we want: Complete segmentation into words and phones
Intermediate goal: Segmentation into short chunks, so that MAUS can do the complete segmentation.

Solution: Web Interfaces G2P + Chunker + WebMAUS General

Notes: 

* the audio book chapter used in this usecase is provided by the librivox project (librivox.org)
* you can perform all three processing steps described in this use case in one step by using the Pipeline service with option 
    PIPE = G2P_CHUNKER_MAUS

=================================================
=============== Procedure =======================
=================================================

* go to http://clarin.phonetik.uni-muenchen.de/BASWebServices

=================== G2P =========================

* go to G2P

* upload audiobook/fraubovary.txt (by dropping the file in the grey area and clicking 'Upload')

* choose the following options (leaving all other options at defaults):
    Language=German (DE)
    Tool embedding=maus

* click "Run Web Service"

* download the result and save it as audiobook/fraubovary.par

* OPTIONAL:
    Open fraubovary.par with a text editor.
    You will find approximately 3500 lines starting with the letters "KAN", 
    and the same number of lines starting with the letters "ORT".
    The "ORT" lines contain the normalized and tokenized wordforms of the text.
    The "KAN" lines contain canonical pronunciation forms of all words (in SAMPA encoding).

================= Chunker =======================

* go to Chunker

* upload audiobook/fraubovary.wav and audiobook/fraubovary.par

* (depending on your internet connection, this may take a while)

* choose the following options (leaving all other options at defaults):
    Language=German (DE)

* click "Run Web Service". This will take a few minutes.

* download the output and and save it as audiobook/fraubovary.par (make sure to replace the old version!)

* OPTIONAL: 
    Open fraubovary.par with a text editor. 
    When you scroll all the way down, you should find lines that start with the letters "TRN:".
    Every line corresponds to one chunk.

================== WebMAUS ======================

* go to WebMAUS General

* upload audiobook/fraubovary.wav and audiobook/fraubovary.par

* (again, this may take a while)

* choose the following options (leaving all other options at defaults):
    Language=German (DE)
    Chunk segmentation=true
    Output format=TextGrid (or the output format of your choice)
    ORT tier in TextGrid=true
    KAN tier in TextGrid=true

* click "Run Web Service". This will take a few minutes.

* download the resulting file, or 

* click on the EmuWebApp symbol for inspection.
