Next: Annotation File Formats
Up: File Formats
Previous: File Formats
Contents
Signal File Formats
There exist quite a number of more or less standardized signal file
formats. In this document we will concentrate on the most common
formats in speech processing.
In most cases a signal file format consists of a so called header,
which contains information on the signal, e.g. sampling frequency,
sample type and width, machine format, number of channels etc.), and a body
which contains the digitized signal samples.
- RAW data
The simplest format: no header, only body with the digitized signal.
Disadvantage: you have to get the necessary specifications of the signal
from elsewhere. SAM4.7
uses for instance raw signal files and stores the
signal information in a separate label file with the same base name and a different
file name extension.
Some corpora add an extension that `defines' the specs of the contained
data. For instance:
- .dea, .al .la : ALAW, 8 bits
- .deu, .ul .lu : ULAW, 8 bits
- .raw, .pcm : everything possible
- NIST SPHERE4.8
The NIST SPHERE format was defined by the Speech Group at the National
Institute for Standards and Technology, USA. It consists of a readable
header in plain text (7 bit US ASCII) followed by the signal data in binary form. Because of
the simple but nevertheless extendable format it is widely used in the
speech science community and in many speech corpora.
Most scientific
tools may recognize NIST SPHERE automatically; other commercial tools may
not. Big advantage: since the header information is in plain text, it is very
easy to extract and insert values there (this is often a problem with binary
headers). Big disadvantage: modifying the header requires modifying the entire file.
Common filename extensions: .nis or .nist
- WAVE, RIFF4.9
The WAVE file format is a subset of Microsoft's RIFF specification for
the storage of multimedia files. A RIFF file starts out with a file
header followed by a sequence of data chunks. Advantage: Most Windows
based tools understand (only) this format. Disadvantage: binary header
is not easy to manipulate and to read.
Extensions: .wav
- SHORTEN4.10
Shorten is not a format but a compression algorithm developed by Tony
Robinson. It uses the redundancy of about 50% in speech signals to
compress the data accordingly. The header is preserved if it is a standard
header known by shorten or if you tell the algorithm how long the header
part is.
Compressed speech files were a big hype in the late eighties but then
storage media became so cheap that most people no longer see why
it is necessary to go
through all the hassle. Also, we found in the SpeechDat project that
compression by gzip reaches almost the same reduction as shorten
and has the advantage that most platforms can decompress data without any
additional software installed.
However, if you think it might be a good idea to
use compressed files and you want to use shorten, please inform Tony
Robinson about it.
Extensions: .shn
Next: Annotation File Formats
Up: File Formats
Previous: File Formats
Contents
BITS Projekt-Account
2004-06-01