Next: On-site Recordings
Up: Recording Techniques
Previous: Recording Techniques
Contents
Telephone Recordings
For telephone recordings you need:
- An ISDN telephone account.
- Hardware that allows you to handle and record phone calls
(nowadays this will be a ISDN interface of some kind).
- A software library or DLL to access your hardware.
- A control program that allows you to model the recording session,
normally a simple chain of played back instructions from the server
and recordings of the speech of the calling speaker.
- Speech prompts recorded from a clear and easy to listen to voice
(you might need a studio-like environment for that or you can order them
from a supplier).
- A good `beep' for the prompting (you get very good beeps from the
Internet).
- Finally, the `script' itself, describing the session.
Unfortunately there are no public-domain
ready-to-use software packages for the setup
of an ISDN speech recording server available. There is of course the
possibility to buy a professional VoiceXML engine plus ISDN hardware, but
in most cases the investment is not justified. If you are lucky enough
to own a VoiceXML engine, simply design your recording session
in a VoiceXML document and run it through your hardware.
Here are some useful hints
if you are going to design your own server:
- If you are planning a single corpus recording, do not try and
develop a complete VoiceXML machine. Although there are some very
powerful tools for the handling of XML available (especially for JAVA),
the effort is probably too high.
- Most manufacturers of low-cost ISDN cards provide libraries for the
API to their hardware. In most cases these APIs are compatible with the
Common ISDN API (CAPI). Also, they might provide some
demo applications for their cards that can easily be adapted to your
needs.
- Prompt the start of the recording by playing a short `beep' sound file.
- A silence detector during the recording can be used to avoid
empty recordings.
- Most ISDN interfaces will allow you to detect DTMF tones sent from
the calling phone. Make use of this capability to give the caller
better control during the recording session. For instance the caller
might
- skip already known instructions
- call a help message
- repeat bad recordings
- A speech detector may be used to shorten the overall session time
by adjusting the individual recording times to the actual length of the
input.
However, a speech detector will not work reliably in
all situations, e.g. loud environment noise, or technical noise which is
common in mobile phone connections.
If you plan to use an automatic speech detector, try to keep its
configuration simple with at most 2 to 3 parameters to adjust5.2. Then add
at least half a second before and after the detected speech to your recorded sound file.
- If you have to use a fixed recording interval for each speech
item, try to find a speaker that is extremely slow and test your system.
Every recording prompt needs to be adjusted to an individual length. If
you simply set a very long fixed recording length for all recordings, the
speakers will have to wait a (subjectively) very long time between prompts and
will start to make other noise or even utter to themselves.
- The raw data provided by the telephone company and recorded by your
ISDN card will be either in ULAW (US, Asia) or ALAW (EU)5.3.
- Low cost ISDN cards do not provide
`echo-canceling'5.4. Thus the prompt beep
might be audible in the recording.
To design your recording session you will need a number of pre-recorded
sound files for instructions, greetings, help messages, prompting etc.
Here are some hints for the production of these sound files:
- If you do not have the proper equipment and a studio, consider
ordering these sound files from a professional studio. Mention that you will
need the sound files for playback over the phone, so they might add some
compression to the signals which makes the speech much more
understandable.
- Use a voice that is clear and easy to understand for your pre-recorded prompts
and instructions.
- Carefully remove any DC component from the prompt sound files; they
might cause a strong clicking noise when played in your telephone server.
- Adjust your recording level so as to avoid clipping the sound file.
Clippings5.5 in
the original signal tend to be much more audible in the compressed form.
Finally, here are some design hints for the `script' itself:
- At the begin of the session you should clearly explain what is
going to happen and what is the purpose of the data collection. If
possible insert a mechanism that allows the caller to skip these
messages by pressing a button.
- Insert informational voice messages into your script, e.g.
You have now completed more than half of the recording.
...
Please remember to speak only after the beep.
- Clarify the legal aspect of the speech recording. For instance it
is a good idea to include a sound file like the following at the very
beginning of the script:
The recordings of your voice during this call will be used for the
development of future speech recognition techniques. For this purpose
your voice recordings will be distributed anonymously to scientists and
developers. If you do not agree to that, please hang up now.
Next: On-site Recordings
Up: Recording Techniques
Previous: Recording Techniques
Contents
BITS Projekt-Account
2004-06-01