NeMo Text Processing Deployment

This tool deploys NeMo Inverse Text Normalization (ITN) and NeMo Text Normalization (TN) for production [TOOLS-ITN_DEPLOY3]. It uses Sparrowhawk [TOOLS-ITN_DEPLOY2] – an open-source version of Google Kestrel [TOOLS-ITN_DEPLOY1].

Requirements

nemo_text_processing package

Usage

Starts docker container with production backend with plugged in grammars. This is entry point script.

Arguments:

  • GRAMMARS - tn_grammars or itn_grammars to export either TN or ITN grammars from nemo_text_processing.

  • LANGUAGE - en for English

  • INPUT_CASE - cased or lower_cased (lower_cased is supported only in TN grammars).

  • MODE - choose test to run test on the grammars inside the container.

For example:

# to export ITN grammars
bash export_grammar.sh --GRAMMARS=itn_grammars --LANGUAGE=en

# to export and test TN grammars
bash export_grammar.sh --GRAMMARS=itn_grammars --INPUT_CASE=cased --MODE=test --LANGUAGE=en

This script runs the following steps in sequence:

Exports grammar ClassifyFst and VerbalizeFst from nemo_text_processing to OUTPUT_DIR/classify/tokenize_and_classify.far and OUTPUT_DIR/verbalize/verbalize.far respectively.

python pynini_export.py <--output_dir OUTPUT_DIR> <--grammars GRAMMARS> <--input_case INPUT_CASE> <--language LANGUAGE>

Builds C++ production backend docker

bash docker/build.sh

Plugs in grammars into production backend by mounting grammar directory classify/ and verbalize/ with sparrowhawk grammar directory inside docker. Returns docker prompt

# to launch container with the exported grammars
bash docker/launch.sh

# to launch container with the exported grammars and run tests on TN grammars
bash docker/launch.sh test_tn_grammars

# to launch container with the exported grammars and run tests on ITN grammars
bash docker/launch.sh test_itn_grammars

Runs TN or ITN in docker container:

echo "two dollars fifty" | ../../src/bin/normalizer_main --config=sparrowhawk_configuration.ascii_proto

This returns $2.50 for ITN.

References

TOOLS-ITN_DEPLOY1

Peter Ebden and Richard Sproat. The kestrel tts text normalization system. Natural Language Engineering, 21(3):333, 2015.

TOOLS-ITN_DEPLOY2

Alexander Gutkin, Linne Ha, Martin Jansche, Knot Pipatsrisawat, and Richard Sproat. Tts for low resource languages: a bangla synthesizer. In 10th Language Resources and Evaluation Conference. 2016.

TOOLS-ITN_DEPLOY3

Yang Zhang, Evelina Bakhturina, Kyle Gorman, and Boris Ginsburg. Nemo inverse text normalization: from development to production. 2021. arXiv:2104.05055.