next up previous contents
Next: Signal / Symbolic Data Up: Distribution Previous: Media Production   Contents


Compression / Compatibility

By using standard compression algorithms you may reduce the total amount of speech data to 55-65% depending on the technical specifications of your speech signals12.3. There also exist special compression algorithms for speech12.4, but we found that they do not yield significantly better compression rates than standard algorithms (like gzip, zip) when used in `loss-less' mode12.5.

However, we do not recommend using compression at all. Working with uncompressed data directly from the distribution medium is much more convenient, while on the other hand the reduction of costs do not justify the additional effort. Furthermore, by using compression on your distribution media you'll increase the probability of software incompatibilities on the user side.

If you're using a well established standard medium like CD-R, it's very unlikely that you'll run into hardware compatibility problems. On the other hand, large tapes and magneto-optical media may require special hardware to read.

Avoid special hardware whenever possible to avoid trouble with extinct or not supported hardware in the future. The author has seen cases where a valuable speech corpus could not be loaded any more, because it was produced on special DEC magneto-optical disks. The data were actually lost because of that fact.


next up previous contents
Next: Signal / Symbolic Data Up: Distribution Previous: Media Production   Contents
BITS Projekt-Account 2004-06-01