Inverse Text Normalization¶
Inverse text normalization (ITN) is a part of the Automatic Speech Recognition (ASR) post-processing pipeline. ITN is the task of converting the raw spoken output of the ASR model into its written form to improve text readability.
For example, “in nineteen seventy” -> “in 1975” and “it costs one hundred and twenty three dollars” -> “it costs $123”.
For more details, see the tutorial NeMo/tutorials/text_processing/Inverse_Text_Normalization.ipynb in Google’s Colab.
The base class for every grammar is
This tool is designed as a two-stage application: 1. classification of the input into semiotic tokens and 2. verbalization into written form.
For every stage and every semiotic token class there is a corresponding grammar, e.g.
Together, they compose the final grammars
VerbalizeFinalFst that are compiled into WFST and used for inference.
- class nemo_text_processing.inverse_text_normalization.en.ClassifyFst(*args: Any, **kwargs: Any)¶
- class nemo_text_processing.inverse_text_normalization.en.VerbalizeFinalFst(*args: Any, **kwargs: Any)¶
Example prediction run:
python run_prediction.py --input=<INPUT_TEXT_FILE> --output=<OUTPUT_PATH> --language=<LANGUAGE> [--verbose]
The input is expected to be lower-cased. Punctuation are outputted with separating spaces after semiotic tokens, e.g. “i see, it is ten o’clock…” -> “I see, it is 10:00 . . .”. Inner-sentence white-space characters in the input are not maintained.
Data Cleaning for Evaluation¶
python clean_eval_data.py --input=<INPUT_TEXT_FILE>
Example evaluation run on (cleaned) Google’s text normalization dataset :
python run_evaluation.py --input=./en_with_types/output-00001-of-00100 <--language LANGUAGE> [--cat CLASS_CATEGORY] [--filter]