next up previous contents
Next: Meta Data Up: Legal Aspects, Contracts Previous: BAS   Contents

Sharing Model

Speech corpora productions range from EUR 20.000 for a small mono-language read speech corpus to several millions of EUR for a large multi-language, multi-modal WOZ corpus. In almost all cases it makes sense to share these corpora.

For the production of a shared corpus, the obvious organizational form is collaboration. This means that partners form a consortium with the aim of creating a shared speech corpus, e.g. a multi-language corpus. Each partner is responsible for a part of the corpus, e.g. his language, and in the end all corpora are exchanged freely within the consortium. Of course a very careful corpus design and strict monitoring by an independent partner outside the consortium are indispensable conditions so that the deal works out satisfactory for all partners.

SpeechDat (M), SpeechDat (II) and SpeechDat Car were the first large corpus productions based on this sharing model; others might follow. See www.speechdat.org for details about the SpeechDat projects.


next up previous contents
Next: Meta Data Up: Legal Aspects, Contracts Previous: BAS   Contents
BITS Projekt-Account 2004-06-01