next up previous contents
Next: Corpus Specification Up: The Production of Speech Previous: Comments   Contents

Speech Corpus Production

This part of the cookbook describes the entire process of speech corpus production in a more or less chronological manner. Figure [*] shows the major steps of the process and their relation on a time axis progressing from top to bottom.

Figure 3.1: Typical schedule of a speech corpus production
As you can see, some steps have a strict order because they rely on results or data produced in the previous step, while others may be carried out in parallel. For example, it does not make sense to start with the creation of the pronunciation dictionary before the annotation is finished, because you need a basic transcription to create the dictionary. On the other hand, in many corpus productions collection, post-processing and annotation run in parallel to save time.

Also shown in figure [*] is the ideal concept of external validations at least at two points in time by an independent validation institution. Although in most cases insufficient funding prevents such a design, you should at least do an in-house validation then.

All the shown tasks will be discussed in the following chapters in detail. At the end of each chapter you will find a useful check list as a help for your individual speech corpus production.



Subsections
next up previous contents
Next: Corpus Specification Up: The Production of Speech Previous: Comments   Contents
BITS Projekt-Account 2004-06-01