Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to the Migration Guide for information on getting started.

Grammar customization

Warning

TN/ITN transitioned from NVIDIA/NeMo repository to a standalone NVIDIA/NeMo-text-processing repository. All updates and discussions/issues should go to the new repository.

All grammar development is done with Pynini library. These grammars can be exported to .far files and used with Riva/Sparrowhawk, see Text Processing Deployment for details.

Steps to customize grammars

  1. Install NeMo-TN from source

  2. Run nemo_text_processing/text_normalization/normalize.py or nemo_text_processing/inverse_text_normalization/inverse_normalize.py with –verbose flag to evaluate current behavior on the target case, see argument details in the scripts and this tutorial

  3. Modify existing grammars or add new grammars to cover the target case using Tutorial on how to write new grammars

  4. Add new test cases here:
    • Run python tests:

    (optionally build grammars first and save to CACHE_DIR)
    cd tests/nemo_text_processing &&
    cd pytest <LANGUAGE>/test_*.py --cpu --tn_cache_dir=CACHE_DIR_WITH_FAR_FILES (--run_audio_based flag to also run audio-based TN tests, optional)
    
    • Run Sparrowhawk tests:

    cd tools/text_processing_deployment &&
    bash export_grammars.sh --GRAMMARS=<TN/ITN grammars> --LANGUAGE=<LANGUAGE> --MODE=test
    

WFST TN/ITN resources could be found in here.

Riva resources