Inverse Text Normalization

Inverse text normalization (ITN) is a part of the Automatic Speech Recognition (ASR) post-processing pipeline. ITN is the task of converting the raw spoken output of the ASR model into its written form to improve text readability.

For example, “in nineteen seventy” -> “in 1975” and “it costs one hundred and twenty three dollars” -> “it costs $123”.

NeMo ITN [TEXTPROCESSING-ITN5] is based on WFST-grammars [TEXTPROCESSING-ITN3]. We also provide a deployment route to C++ using Sparrowhawk [TEXTPROCESSING-ITN2] – an open-source version of Google Kestrel [TEXTPROCESSING-ITN1]. See Text Procesing Deployment for details.

Classes

The base class for every grammar is GraphFst. This tool is designed as a two-stage application: 1. classification of the input into semiotic tokens and 2. verbalization into written form. For every stage and every semiotic token class there is a corresponding grammar, e.g. taggers.CardinalFst and verbalizers.CardinalFst. Together, they compose the final grammars ClassifyFst and VerbalizeFinalFst that are compiled into WFST and used for inference.

nemo_text_processing.inverse_text_normalization.ClassifyFst

alias of nemo_text_processing.inverse_text_normalization.

nemo_text_processing.inverse_text_normalization.VerbalizeFinalFst

alias of nemo_text_processing.inverse_text_normalization.

Prediction

Example prediction run:

python run_prediction.py  --input=<INPUT_TEXT_FILE> --output=<OUTPUT_PATH>  [--verbose]

The input is expected to be lower-cased. Punctuation are outputted with separating spaces after semiotic tokens, e.g. “i see, it is ten o’clock…” -> “I see, it is 10:00 . . .”. Inner-sentence white-space characters in the input are not maintained.

Data Cleaning for Evaluation

python clean_eval_data.py  --input=<INPUT_TEXT_FILE>

Evaluation

Example evaluation run on (cleaned) Google’s text normalization dataset [TEXTPROCESSING-ITN4]:

python run_evaluation.py  --input=./en_with_types/output-00001-of-00100 [--cat CLASS_CATEGORY] [--filter]

References

TEXTPROCESSING-ITN1

Peter Ebden and Richard Sproat. The kestrel tts text normalization system. Natural Language Engineering, 21(3):333, 2015.

TEXTPROCESSING-ITN2

Alexander Gutkin, Linne Ha, Martin Jansche, Knot Pipatsrisawat, and Richard Sproat. Tts for low resource languages: a bangla synthesizer. In 10th Language Resources and Evaluation Conference. 2016.

TEXTPROCESSING-ITN3

Mehryar Mohri. Weighted Automata Algorithms, pages 213–254. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. URL: https://doi.org/10.1007/978-3-642-01492-5_6, doi:10.1007/978-3-642-01492-5_6.

TEXTPROCESSING-ITN4

Richard Sproat and Navdeep Jaitly. Rnn approaches to text normalization: a challenge. arXiv preprint arXiv:1611.00068, 2016.

TEXTPROCESSING-ITN5

Yang Zhang, Evelina Bakhturina, Kyle Gorman, and Boris Ginsburg. Nemo inverse text normalization: from development to production. 2021. arXiv:2104.05055.