Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.
Grammar customization
Warning
TN/ITN transitioned from NVIDIA/NeMo repository to a standalone NVIDIA/NeMo-text-processing repository. All updates and discussions/issues should go to the new repository.
All grammar development is done with Pynini library. These grammars can be exported to .far files and used with Riva/Sparrowhawk, see Text Processing Deployment for details.
Steps to customize grammars
Install NeMo-TN from source
Run nemo_text_processing/text_normalization/normalize.py or nemo_text_processing/inverse_text_normalization/inverse_normalize.py with –verbose flag to evaluate current behavior on the target case, see argument details in the scripts and this tutorial
Modify existing grammars or add new grammars to cover the target case using Tutorial on how to write new grammars
- Add new test cases here:
Run python tests:
(optionally build grammars first and save to CACHE_DIR) cd tests/nemo_text_processing && cd pytest <LANGUAGE>/test_*.py --cpu --tn_cache_dir=CACHE_DIR_WITH_FAR_FILES (--run_audio_based flag to also run audio-based TN tests, optional)
Run Sparrowhawk tests:
cd tools/text_processing_deployment && bash export_grammars.sh --GRAMMARS=<TN/ITN grammars> --LANGUAGE=<LANGUAGE> --MODE=test
WFST TN/ITN resources could be found in here.