Next: Meta Data
Up: Legal Aspects, Contracts
Previous: BAS
Contents
Speech corpora productions range from EUR 20.000 for a small
mono-language read speech corpus to several millions of
EUR for a large multi-language, multi-modal WOZ corpus.
In almost all cases it makes sense to share these corpora.
- Small corpora are often highly innovative - sharing them
after a period of exclusive use generates revenue for the owner without
compromising his competitive advantage.
- Large corpora are often too expensive to produce for a single
institution - a common specification, a distributed collection effort,
and a one-to-one exchange of corpus data helps to reduce the cost
for each partner.
- In general, the value of a corpus multiplies with the number
of contexts (e.g. languages, recording environments, etc.) for
which it is available.
For the production of a shared corpus, the obvious organizational form is
collaboration. This means that partners form a consortium with the aim
of creating
a shared speech corpus, e.g. a multi-language corpus. Each partner is responsible
for a part of the corpus, e.g. his language, and in the end all corpora are exchanged
freely within the
consortium. Of course a very careful corpus design and strict monitoring
by an independent partner outside the consortium are indispensable
conditions so that the deal works out satisfactory for all partners.
SpeechDat (M), SpeechDat (II) and SpeechDat Car were the first large
corpus productions based on this sharing model; others might follow. See
www.speechdat.org for details about the SpeechDat projects.
Next: Meta Data
Up: Legal Aspects, Contracts
Previous: BAS
Contents
BITS Projekt-Account
2004-06-01