Text Normalization

NeMo Text Normalization converts text from written form into its verbalized form. It is used as a preprocessing step before Text to Speech (TTS). It could also be used for preprocessing Automatic Speech Recognition (ASR) training transcripts.

For example, “at 10:00” -> “at ten o’clock” and “it weighs 10kg.” -> “it weights ten kilograms .”.

NeMo Text Normalization [] is based on WFST-grammars []. We also provide a deployment route to C++ using Sparrowhawk [] – an open-source version of Google Kestrel []. See Text Procesing Deployment for details.


The base class for every grammar is GraphFst. This tool is designed as a two-stage application: 1. classification of the input into semiotic tokens and 2. verbalization into written form. For every stage and every semiotic token class there is a corresponding grammar, e.g. taggers.CardinalFst and verbalizers.CardinalFst. Together, they compose the final grammars ClassifyFst and VerbalizeFinalFst that are compiled into WFST and used for inference.

class nemo_text_processing.text_normalization.en.ClassifyFst(*args: Any, **kwargs: Any)


class nemo_text_processing.text_normalization.en.VerbalizeFinalFst(*args: Any, **kwargs: Any)



Example prediction run:

python run_prediction.py  <--input INPUT_TEXT_FILE> <--output OUTPUT_PATH> <--language LANGUAGE> [--input_case INPUT_CASE]

INPUT_CASE specifies whether to treat the input as lower-cased or case sensitive. By default treat the input as cased since this is more informative, especially for abbreviations. Punctuation are outputted with separating spaces after semiotic tokens, e.g. “I see, it is 10:00…” -> “I see, it is ten o’clock . . .”. Inner-sentence white-space characters in the input are not maintained.


Example evaluation run on Google’s text normalization dataset []:

python run_evaluation.py  --input=./en_with_types/output-00001-of-00100 --language=en [--cat CLASS_CATEGORY] [--input_case INPUT_CASE]