Next: Check List
Up: Reference and Check List
Previous: Validation Check List
Contents
Example
As an example appendix C contains the specification for the WebCommand speech corpus, while in appendix D you
will find the original main corpus documentation file. Based on these
data a validator and a client might come up with the following
`validation contract' specifying the reference and the validation check
list:
between .... (validator)
and .... (client)
- The client and the validator agree that the validator will
perform a validation of the speech corpus `WebCommand' (corpus) with regard
to common
rules of best practice in the field of SLR production. The corpus and the
software tools that were used for the annotation will be
provided by the client (or a third party such as the producer)
including all specifications and documentations as being originally
delivered to the client.
- The validation process is based on the specification of the corpus
(technical annex ...). The
validator will deliver his results in a confidential report to the
client. The report should address errors / deviations from the
specification in such detail that the client is able to correct these
errors (if possible).
- The validation will cover the following:
- formal checks for completeness, terminology, readability and
parsability of signal files, meta
data and annotation files.
- check for superfluous files in all locations of the SLR.
- check of the technical specifications of signals files;
empty signals; clipped
signals; corrupt signal files
- speaker distribution as stated in the specification. Documented sex
checked in 50% randomly selected speakers.
- completeness of documentation
- consistency of speech corpus with documentation
- readability (on Windows and Macintosh) of
documentation files
- manual validation of a minimum of 10% randomly selected
transcription files including the adherence to the specified prompt texts.
Errors to be reported are: typos (the `Duden'
being the reference), mismatch between transcript and the spoken
utterance in the recording, wrong noise
marker.
- formal check and completeness of the lexicon (coverage), check for
mismatch to spelling in the transcripts.
- manual validation of a minimum of 15% randomly selected lexical
entries. Errors to be reported are: typos,
mismatch to spelling in the transcripts, inconsistent canonical
pronunciation.
- readability of distribution media on Windows, Macintosh and Linux
- Time plan of validation:
Begin: ....
Intermediate report: ....
Final report: ....
- Compensation
....
- Confidentiality and legal stuff
....
Signature Validator Signature Client
Next: Check List
Up: Reference and Check List
Previous: Validation Check List
Contents
Angela Baumann
2004-06-03