To create the dictionary you will most likely proceed through parts of the following procedures (depending on what resources you have):
Define the orthographic representation for your corpus and
transliterate your data or render your text material accordingly *
Create a complete list of unique words. Watch out for capital
letters at the beginning of sentences9.8 *
Define the desired contents of each entry in your
dictionary *
Use automatic procedures to create as much content as
possible such as: look-up existing dictionaries,
text-to-phoneme converters, part-of-speech taggers,
etc. (pass 1) **
Verify the contents of pass 1 and/or create information
manually from scratch and produce a corrected version of
the dictionary (pass 2) *
If possible, let this be done by one person for the complete
dictionary **
Repeat the last step by a second person for the complete
dictionary (pass 3) **
Automatically find the differences between pass 1 and pass 2 or
between pass 1 and pass 3
where pass 2 and pass 3 are not consistent and discuss these
inconsistencies with a
group of experts to come up with the final version of the dictionary **
Repeat the last four steps for all content types that need manual
labeling/verification *
Use a simple parser to ensure a proper coding of the final
dictionary. Especially look out for inconsistent usage of blanks and
tab signs. You may also check for homophones and homographs and check
whether they are really valid for your language.
Sources for existing pronunciation dictionaries may be the ELDA9.9, the LDC9.10or the BAS9.11.