The spoken content of a speech corpus is the second major feature that determines the possible usage of the resource. Of course, this feature is not totally orthogonal to other specifications, for instance the speaking style. Basically, there are four main approaches defining the spoken content of a corpus: by vocabulary, by domain, by task or by phonological distribution. These might be applied in a mixed manner in some cases.


