Inverse Text Normalization

Inverse text normalization (ITN) is a part of the Automatic Speech Recognition (ASR) post-processing pipeline. ITN is the task of converting the raw spoken output of the ASR model into its written form to improve text readability.

For example, “in nineteen seventy” -> “in 1975” and “it costs one hundred and twenty three dollars” -> “it costs $123”.

NeMo ITN [] is based on WFST-grammars []. We also provide a deployment route to C++ using Sparrowhawk [] – an open-source version of Google Kestrel []. See Text Procesing Deployment for details.


The base class for every grammar is GraphFst. This tool is designed as a two-stage application: 1. classification of the input into semiotic tokens and 2. verbalization into written form. For every stage and every semiotic token class there is a corresponding grammar, e.g. taggers.CardinalFst and verbalizers.CardinalFst. Together, they compose the final grammars ClassifyFst and VerbalizeFinalFst that are compiled into WFST and used for inference.

class nemo_text_processing.inverse_text_normalization.en.ClassifyFst(*args: Any, **kwargs: Any)


class nemo_text_processing.inverse_text_normalization.en.VerbalizeFinalFst(*args: Any, **kwargs: Any)



Example prediction run:

python  --input=<INPUT_TEXT_FILE> --output=<OUTPUT_PATH> --language=<LANGUAGE> [--verbose]

The input is expected to be lower-cased. Punctuation are outputted with separating spaces after semiotic tokens, e.g. “i see, it is ten o’clock…” -> “I see, it is 10:00 . . .”. Inner-sentence white-space characters in the input are not maintained.

Data Cleaning for Evaluation

python  --input=<INPUT_TEXT_FILE>


Example evaluation run on (cleaned) Google’s text normalization dataset []:

python  --input=./en_with_types/output-00001-of-00100 <--language LANGUAGE> [--cat CLASS_CATEGORY] [--filter]