Signal File Formats

Next: Annotation File Formats Up: File Formats Previous: File Formats Contents

Signal File Formats

There exist quite a number of more or less standardized signal file formats. In this document we will concentrate on the most common formats in speech processing.
In most cases a signal file format consists of a so called header, which contains information on the signal, e.g. sampling frequency, sample type and width, machine format, number of channels etc.), and a body which contains the digitized signal samples.

RAW data
The simplest format: no header, only body with the digitized signal. Disadvantage: you have to get the necessary specifications of the signal from elsewhere. SAM^4.7 uses for instance raw signal files and stores the signal information in a separate label file with the same base name and a different file name extension.
Some corpora add an extension that `defines' the specs of the contained data. For instance:
- .dea, .al .la : ALAW, 8 bits
- .deu, .ul .lu : ULAW, 8 bits
- .raw, .pcm : everything possible
NIST SPHERE^4.8
The NIST SPHERE format was defined by the Speech Group at the National Institute for Standards and Technology, USA. It consists of a readable header in plain text (7 bit US ASCII) followed by the signal data in binary form. Because of the simple but nevertheless extendable format it is widely used in the speech science community and in many speech corpora. Most scientific tools may recognize NIST SPHERE automatically; other commercial tools may not. Big advantage: since the header information is in plain text, it is very easy to extract and insert values there (this is often a problem with binary headers). Big disadvantage: modifying the header requires modifying the entire file.
Common filename extensions: .nis or .nist
WAVE, RIFF^4.9
The WAVE file format is a subset of Microsoft's RIFF specification for the storage of multimedia files. A RIFF file starts out with a file header followed by a sequence of data chunks. Advantage: Most Windows based tools understand (only) this format. Disadvantage: binary header is not easy to manipulate and to read.
Extensions: .wav
SHORTEN^4.10
Shorten is not a format but a compression algorithm developed by Tony Robinson. It uses the redundancy of about 50% in speech signals to compress the data accordingly. The header is preserved if it is a standard header known by shorten or if you tell the algorithm how long the header part is.
Compressed speech files were a big hype in the late eighties but then storage media became so cheap that most people no longer see why it is necessary to go through all the hassle. Also, we found in the SpeechDat project that compression by gzip reaches almost the same reduction as shorten and has the advantage that most platforms can decompress data without any additional software installed.
However, if you think it might be a good idea to use compressed files and you want to use shorten, please inform Tony Robinson about it.
Extensions: .shn

Next: Annotation File Formats Up: File Formats Previous: File Formats Contents

BITS Projekt-Account 2004-06-01