g2p.pl -v g2p.pl -task apply -lng myLanguage -i myInputFile -iform myInputFormat \ -o myOutputFile -oform myOutputFormat [-featset myFeatureSet] \ [-syl mySyl] [-stress myStress] [-align myAlign] \ [-map myMap] [-c myConfigFile] [-tgitem myTgItem] [-tgrate myTgRate] [-ensemble myEnsemble] [-embed myEmbed] g2p.pl -task train -lng myLanguage -c myConfigFile [-featset myFeatureSet] -v: displays current version myTask: train,apply myLanguage: aus,deu,ekk,eng,fra,ita,kat,hun,nld,nze,pol,ron,slk,sqi,use corresponding to iso639-3 wherever available (Exceptions: aus, nze, use for Australien and New Zealand and American English, resp.). Alternatively rfc5646 codes are accepted (e.g. eng-GB, eng-AU, eng-NZ, eng-US) myInputFormat: txt - plain text bpf - bas partiture format list - word list (different POS tagging strategy) tcf - TCF format; element is obligatory tg - textGrid; long and short format supported myOutputFormat: txt - t r a n s c r i p t i o n t r a n s c r i p ... tab - word;t r a n s c r i p t i o n exttab - word;t r a n s c r i p t i o n; partOfSpeech;m o r p h s; m o r p h c l a s s e s Currently only supported for lng=deu|eng. bpf - bas partiture format contains ORT and KAN tiers, no blancs between phonemes bpfs- bas partiture format contains ORT and KAN tier, blancs between phonemes extbpf - bas partiture format contains ORT, KAN, POS, KSS, TRL, MRP tiers. KSS: full transcription TRL: ORT + punctuation MRP: m o r p h s;m o r p h c l a s s e s extbpfs - bas partiture format. As bpfext with blancs between phonemes lex - word;t r a n s c r i p t i o n words are unique and alphanumerically sorted extlex - output as for exttab, but unique and sorted. Currently only supported for lng=aus|deu|eng|nze|use. tcf - TCF format with added trancriptions (for non-TCF input, elements and are generated from scratch) exttcf - additional output of part of speech, morphs and morph classes. Currently only supported for lng=aus|deu|eng|nze|use. tg - textGrid. If iform is tg, the same format (long or short) is returned. Item bas_trs will be added, that contains the transcription for each interval in item myItem which is to be specified in -tgitem myItem. Requires iform 'tg' exttg - extended textGrid. If iform is tg, the same format (long or short) is returned. Items bas_trs, bas_pos, bas_m, and bas_mc will be added that contain transcription, part of speech, morphemes, and morpheme classes, respectively, for each interval in item myItem which is to be specified in -tgitem myItem. Requires iform 'tg'. Currently only supported for lng=aus|deu|eng|nze|use. myFeatureset: used feature set for grapheme-phoneme conversion and for word stress determination extended (currently only supported for lng=deu|eng) mySyl: syllabification yes| myStress: word stress assignment yes| myAlign: letter alignment of phonemes yes||maus. If switched on, also the grapheme string is splitted letter-wise (multi-character letters are kept). No grapheme split for tcf input (original token content is kept). myMap: from_to (e.g. deu_eng) - maps G2P output from SAMPA inventory of lngX to lngY. lngY includes , which is the MAUS general SAMPA inventory. Currently supported for: pol_sampa hun_sampa deu_maus (sampa subset for German MAUS, without /6/-diphtongs) hun_maus (sampa subset for Hungarian MAUS) {aus|deu|ekk|eng|fra|ita|kat|hun|nld|nze|pol|ron|slk|sqi|use}_ipa use_arpabet instead of 3-letter language code rfc5646 code can be used, e.g. eng-US_arpabet. REMARK: if a bpf(s) output file is to be created as MAUS input set macro parameter -embed to 'maus'! default myConfigFile: language-dependent default config files given. myTgItem: needed for iform=tg. Name of the TextGrid item from which the text is to be extracted myTgRate: needed for the combination of iform=tg and oform=bpf(s). Sample rate (in samples per second), so that time values of the TextGrid can be converted to sample values in the bpf myEnsemble: : sequential classification reducing the grapheme context until mapping is found forest: parallel classification with maximum votes decision myEmbed: |maus. Macro parameter for task=apply. If set to 'maus', then -{syl|stress} set to 'no' -align: if not 'no', set to 'maus' -map set to 'myLanguage_maus' -oform: if not 'bpf(s)', set to 'bpf' Uwe Reichel, IPS, 20141006