Abtract

The questionnaire for this study took place in December 2003 over the Internet. Experts in German speaking countries were selectively targeted to determine the future requirements of speech resources (in the next 5 to 15 years) and (successive) technical developments with the help of 34 questions. 37 experts answered the questions, six of the respondents did not finish the questionnaire. Nevertheless, the opinions of all respondents were used in the evaluation. The numbers in parenthesis shown in the text state how many experts were of the same opinion. At the end of the study a survey shows how many participants in total responded to each respective question. The main topics are subdivided into five chapters.

Main aspects of the study at a glance

Part I: Future speech resources

In the future, corpora with non-grammatical speech (30), corpora with speakers of all age categories (generations corpora) (23), resources with multi-lingual speech (e.g. German with English foreign words) (18) and second language resources (non-native speakers of German) (18) will be needed. For collecting dialects, it is recommend to only include dialect flavored standard language within a speech resource at a merely representative amount.

Part II: Organization of cooperation between speech technology institutions

A more intensified cooperation between institutions is desired. Even if there is collaboration between institutions in the future, the individual product development for competition reasons will remain.

Part III: Organization of the recording and distribution of resources

Integrated resources should register the users gestures, mimic and emotion in the recorded data.

Part IV: Technical speech applications of the future

The technical speech applications of the future will be speech dialog systems between human beings and machine and the area of information management. In the field of military applications surveillance, translation, evaluation and procurement of spoken (and written) contents to improve the strategic procedure in the case of conflict will be most important.

Part V: Basic research

Basic research should target the protection of rare and threatened languages, not only speech applications. However, use orientation should be a central aspect within basic research for not wasting time in the theoretical. The research will concentrate in the next years on planning long-term corpora and be concerned with the speech development of humans.


Summary of the overall results from the study

Part I: Future speech resources

The demand for corpora with non-grammatical speech is classified clearly as particularly important with 30 matching opinions. According to the opinion of 23 experts, the marketability of corpora with speakers from different generations will continue to grow. Gathering 18 votes each is a great demand for multi-lingual and second language resources. To the two latter, there is the objection that with regular updates of the lexicon entries, no additional corpora will be necessary because changes in speech development would be registered automatically.
Lesser unambiguous results are to be used for the recording of emotion and biometrics. 21 experts consider the occupation with the emotional component of language are important. However, some see a problem in the recording and the objective future processing in the case of annotation. Other responders consider this topic to simply be over-evaluated. The study of Biometrics is considered only partially important for the future. 17 people questioned believe although the demand will be high, but a few of them just under the condition that the access security and robustness are guaranteed. However, this is only currently possible in combination with other modalities and procedures.
Dialects should only be recorded in a limited manner on dialect flavored standard language in the field of application. Within a speech resource the dialect should only be covered representatively and it should not be the focus. If separate dialect corpora are created, they can be dynamically be put to use.
While imputing child speech data, children of different age groups should be recorded. The experts divide them into 7 stages. For the purpose of speech acquisition research it is recommended to start with first investigations (at the latest) 9 months after birth. For computer applications it is sufficient to start at 5 years of age. For application scenarios, leisure time, medicine, science as well as and teaching and learning (21) of language were mentioned. The required speech material is very different, it is described in detail in the text.

Part II: Organization of cooperation between speech technology institutions


A stronger cooperation between institutions is desired by all the questionnaire participants because through cooperation all of the infrastructure of speech technology will be strengthened and the quality of the resources will increase. The model of the common resource creation and individual product development will in the future remain the standard for competition reasons. Furthermore, scientific institutions should be favored over companies by freeing them of license fees provided that the resource is being used for science. If resources are financed through public authority, they should serve the general public and therefore, also be generally accessible. The promotion for the creation of speech corpora should come from both government and the private sector, where only the state compared to the companies will be willing to support research purposes that are economically uninteresting. Costs for the creation of resources can be balanced through marketing although only in part. Legally, before the recording, the protection of the speaker must be guaranteed as well as other security arrangements must be put in writing. The general demand in the field of mobile services, the wish for promotion of the unification of Europe and the improvement in the sensor and computer engineering certainly are beneficial in the development of speech technology, on the other hand, the current economical crisis of Germany and lack of fully developed systems on the market hamper demand.

Part III: Organization of registration and distribution of speech resources


Integrated resources should register gestures, mimic and the emotion of the user and include their labeling. In addition, precise metadata about the speaker should always be available and the scenario should be described in detail.
Both the quality of recording via the Telephone/Cell Phone as well as the studio recordings have their own legitimacy. Studio recordings are better suited for speech synthesis and allow simultaneous synthetic simulation of phone quality.
With the pronunciation models and explicit pronunciation rules (lexica) as well as the statistical models go hand in hand and should be combined together in order to be able to use the advantages of both procedures best.

Part IV: Technical speech applications of the future


Speech Dialog Systems between the human and machine and innovations in the field of information management will above all other areas no doubt be the speech applications the of future.
In the area of medical application, in particular in therapy and training applications, as well as with support of artificial speech for self expression dealing with speaking and speech impediments are most important.
In the future Biometrics will remain limited to personalizing and not for use of identifying the physical state of a drivers condition.
Speech technology in the military field will mainly be used for surveillance, translation, evaluation and the conveyance of spoken (messages) but also can be used with written material in order to improve the strategic procedure in the case of conflict.
Both new inventions and modifications of existing products will be carried out in the future. A decision for one or the other depends on the individual and is dependent on already existing products and the precise needs of the user. Whether products such as dialog systems are newly generated or modified will be dependent on the acceptance of the customer.

Part V: Basic research


The creation of new corpora should also contain the protection of rare and threatened languages and not be driven by a purely application viewpoint. According to the survey, the user orientation is an important area on which the basic research should be based. Rare languages can absolutely have a "market value".
Worth protecting outside of Europe especially are the small language groups of former USSR, African, Indian and South American languages. Within Europe, German dialects and gypsy languages need to be preserved.
Corpora will have specific characteristics soon: Naturalness, multi modality, multi linguality and portability. Above all also easily available data streams such as TV and radio will be used for corpora.
For the construction of corpora, special institutions should be superordinated and applications for projects made comprehensively between institutions. Above all, the research planning will be the construction of long-term corpora and making headway for the unification of standards.
The further research of speech development of the human being will be an important field in the future.
Responsible organ for operation of basic research should be the State on a national level (in the form of BMBF and DFG), and on the international level, the EU in form of specific initiatives. In this case, State institutions should hereby in particular control the evaluations and take care of the finances, universities and other research institutions on the other hand should be responsible for carrying out the research. Industry will on a long-term basis be indispensable as a sponsor, which will mean that they will have a say in decision-making.