NeMo Text Normalization converts text from written form into its verbalized form. It is used as a preprocessing step before Text to Speech (TTS). It could also be used for preprocessing Automatic Speech Recognition (ASR) training transcripts.
For example, “at 10:00” -> “at ten o’clock” and “it weighs 10kg.” -> “it weights ten kilograms .”.
NeMo Text Normalization  is based on WFST-grammars . We also provide a deployment route to C++ using Sparrowhawk  – an open-source version of Google Kestrel . See Text Procesing Deployment for details.
For more details, see the tutorial NeMo/tutorials/text_processing/Text_Normalization.ipynb in Google’s Colab.
The base class for every grammar is
This tool is designed as a two-stage application: 1. classification of the input into semiotic tokens and 2. verbalization into written form.
For every stage and every semiotic token class there is a corresponding grammar, e.g.
Together, they compose the final grammars
VerbalizeFinalFst that are compiled into WFST and used for inference.
Example prediction run:
python run_prediction.py <--input INPUT_TEXT_FILE> <--output OUTPUT_PATH> <--language LANGUAGE> [--input_case INPUT_CASE]
INPUT_CASE specifies whether to treat the input as lower-cased or case sensitive. By default treat the input as cased since this is more informative, especially for abbreviations. Punctuation are outputted with separating spaces after semiotic tokens, e.g. “I see, it is 10:00…” -> “I see, it is ten o’clock . . .”.
Inner-sentence white-space characters in the input are not maintained.
Example evaluation run on Google’s text normalization dataset :
python run_evaluation.py --input=./en_with_types/output-00001-of-00100 --language=en [--cat CLASS_CATEGORY] [--input_case INPUT_CASE]