Grammar customization#

Warning

TN/ITN transitioned from NVIDIA/NeMo repository to a standalone NVIDIA/NeMo-text-processing repository. All updates and discussions/issues should go to the new repository.

All grammar development is done with Pynini library. These grammars can be exported to .far files and used with Riva/Sparrowhawk, see Text Processing Deployment for details.

Steps to customize grammars#

Install NeMo-TN from source
Run nemo_text_processing/text_normalization/normalize.py or nemo_text_processing/inverse_text_normalization/inverse_normalize.py with –verbose flag to evaluate current behavior on the target case, see argument details in the scripts and this tutorial
Modify existing grammars or add new grammars to cover the target case using Tutorial on how to write new grammars

Add new test cases here:

Run python tests:

(optionally build grammars first and save to CACHE_DIR)
cd tests/nemo_text_processing &&
cd pytest <LANGUAGE>/test_*.py --cpu --tn_cache_dir=CACHE_DIR_WITH_FAR_FILES (--run_audio_based flag to also run audio-based TN tests, optional)

Run Sparrowhawk tests:

cd tools/text_processing_deployment &&
bash export_grammars.sh --GRAMMARS=<TN/ITN grammars> --LANGUAGE=<LANGUAGE> --MODE=test

WFST TN/ITN resources could be found in here.

Riva resources#

Riva Text Normalization customization for TTS.

Riva ASR/Inverse Text Normalization customization.