Next: Check List Up: Manual Validation Previous: Manual Validation Tools Contents

Logistics

With a re-annotation scheme you have to take care that the resulting files of your validators can be automatically compared to the original annotation files. Include all re-annotations into the validation report package.

Even if you are not using a re-annotation technique, we recommend that the validators create a copy of each validated transcript or annotation file and mark the errors found in the file in a way that allows a later automatic extraction of the errors. For example, if the following is a piece of phonemic segmentation from the corpus

SAP:	2343	16574	h
SAP:	18917	9780	OY
SAP:	28697	2376	d
SAP:	31073	3289	@

and the validator checking this data decides that the phoneme category /d/ is wrong, he adds a special marked line into his copy of the annotation file like:

SAP:	2343	16574	h
SAP:	18917	9780	OY
SAP:	28697	2376	d
ANN_ERROR:	SAP:    28697   2376	t
SAP:	31073	3289	@

This way the validator can provide detailed information about the errors to the client / producer, which is often required in the validation contract.

Only employ validators that are native speakers of the corpus language. If you are working with a group of validators, try to achieve the same level of expertise for all of them. For instance, if you are validating the phonemic segmentation of speech signals, only hire well trained phoneticians and let them participate in a special training to make sure everybody has the same conception of the potential errors found in the data.

Define an error scheme for each type of annotation, i.e. a closed set of error types together with their description and examples. Test the scheme on a small scale set of data before the whole group of validators starts working.

For larger validation groups use a database system to keep track of already validated data. Use some kind of server/client architecture to automatically deal out data that are not validated yet and to collect the results. A simple and very effective tool to achieve this is the WWWTranscribe tool. See appendix B for a short description of WWWTranscribe and how to get it.

Next: Check List Up: Manual Validation Previous: Manual Validation Tools Contents

Angela Baumann 2004-06-03