Intended audience

This document should act as a guideline for speech corpus validation. It may be used as introductory reading for the newbie or as a reference and/or check list for the experienced scientist/engineer. More specifically it will be most likely used by

  1. producers of speech corpora (quality control)
  2. institutions that are about to invest in a speech corpus/ speech corpus production and want to perform their own validation
  3. institutions that do external validations for other parties
If the validation is not carried out for inhouse purposes, but initiated by an external producer / buyer / client we will refer to this producer / buyer / client as the `client' for the remainder of this document, whereas the institution that performs the validation is referred to as the `validator'. The person / institution that actually produces the speech corpus in question will be referred to as the `producer'. Note that in some cases all three might be the same.

The cookbook is not intended to be used for the quality assessment of speech corpora. If you are interested in this - much more difficult - task, please refer to [2].

Furthermore, the document does not cover the basic knowledge about Digital Speech Processing or even more specialized topics like the above mentioned applications in the field of SLP. We recommend referring to the document The Production of Speech Corpora ([5]) for details about best practice in this closely related topic.

At the end of many chapters you will find a check list where all the main points to follow are listed in an abbreviated form. If you do not understand contents of these lists, you may easily find the sections describing the topic in more detail by following the references given to each keyword. All check list (including the chapter 2 of [2]) are summarized in appendix A.

Angela Baumann 2004-06-03