Next: Bibliography
Up: SmartKom
Previous: SmartKom Speaker Profile
Contents
The SmartKom Speech Corpus is a special case of a scientific corpus
production. Because the outcome of the total project cannot be defined
in detail at the beginning, specifications for the corpus production
tend to be inaccurate and open. However, this may also be considered to
be an advantage because that way the corpus production can be adapted to
the needs of the project partners.
There are 3 major problems with this kind of corpus production:
- Logically, the corpus production should start ahead in time before the rest
of the partners start their work. That way the necessary data will be
available when needed and not at the end of the total project. However
in most cases this is not possible because of the funding structure and
because it is almost impossible to define the exact data type needed
beforehand.
- A data collection that adapts to the progress of a scientific
project tends to yield many different and inconsistent data types. For
example, if during the project an evaluation of special modules is
needed and the data collection provides very specialized data for this
purpose, these data might not easily be integrated into a monolithic
corpus. Care has to be taken that all differing data types are
documented in great detail to ensure the future re-usage of the corpus.
- In most cases the funding for a scientific corpus production ends
at the same time as the scientific work. This is a problem because data
will be produced up to the very last minute and will not be properly
integrated into the corpus. The solution is to arrange for a third party
outside of the project that will take care of the corpus after the
scientific project has ended. This institution must be funded independently
from the project and must take the responsibility for the data for a longer
time span. In the case of SmartKom the BAS took over the data after
the SmartKom project was finished.
Next: Bibliography
Up: SmartKom
Previous: SmartKom Speaker Profile
Contents
BITS Projekt-Account
2004-06-01