How to fine-tune a Riva NMT Multilingual model with Nvidia NeMo#

This tutorial walks you through how to fine-tune a Riva NMT Multilingual model with Nvidia NeMo.

NVIDIA Riva Overview#

NVIDIA Riva is a GPU-accelerated SDK for building speech AI applications that are customized for your use case and deliver real-time performance.
Riva offers a rich set of speech and natural language understanding services such as:

  • Automated speech recognition (ASR)

  • Text-to-Speech synthesis (TTS)

  • Neural Machine Translation (NMT)

  • A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, and intent classification.

In this tutorial, we will fine-tune a Riva NMT Multilingual model with Nvidia NeMo.
To understand the basics of Riva NMT APIs, refer to the “How do I perform Language Translation using Riva NMT APIs with out-of-the-box models?” tutorial in Riva NMT Tutorials.

For more information about Riva, refer to the Riva developer documentation.
For more information about Riva NMT, refer to the Riva NMT documentation

NVIDIA NeMo Overview#

NVIDIA NeMo is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures.

For more information about NeMo, refer to the NeMo product page and documentation. The open-source NeMo repository can be found here.

Fine-tuning Riva NMT Multilingual model with NVIDIA NeMo#

For this tutorial, we will be fine-tuning the Riva NMT Multilingual Any-to-En model on the Scielo English-Spanish-Portugese dataset.

This tutorial covers fine-tuning only the NMT Multilingual model. Fine-tuning a Multilingual model is a relatively more challenging task (like choosing a balanced dataset covering multiple languages). At this stage, multilingual fine-tuning is only supported with specific NeMo and Pytorch lightning versions(PTL<2.0). We suggest you to use the specific NeMo branch as shared here.

The process of fine-tuning here can be split into following steps:

  1. Data download.

  2. Data preprocessing.

  3. Fine-tuning the NMT model with NeMo.

  4. Evaluate the fine-tuned NMT model with NeMo.

  5. Exporting the NeMo model

  6. Deploying the fine-tuned NeMo NMT model on the Riva Speech Skills server.

Let’s walk through each of these steps in detail.

Requirements and Setup#

This tutorial needs to be run from inside a NeMo docker container. If you are not running this tutorial through a NeMo docker container, please refer to the Riva NMT Tutorials to get started.

Before we get into the Requirements and Setup, let us create a base directory for our work here.

import os
base_dir = "NMTFinetuning"
!mkdir $base_dir
base_dir=os.path.abspath("NMTFinetuning")
  1. Clone the NeMo github repository.

NeMoBranch = "r1.19.0"
!git clone -b $NeMoBranch https://github.com/bpritam14/NeMo.git $base_dir/NeMo
!apt-get update && apt-get install -y libsndfile1 ffmpeg
%cd $base_dir/NeMo
!./reinstall.sh
%cd ..

Check CUDA installation.

import torch
torch.cuda.is_available()
WARNING: You may need to install `apex`.
!git clone https://github.com/NVIDIA/apex.git
%cd apex
!git checkout 57057e2fcf1c084c0fcc818f55c0ff6ea1b24ae2
!pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
%cd ..
  1. Install the nemo2riva library from the Riva Quick Start Guide.

# Install the `nemo2riva` library
!python3 -m pip install nemo2riva
  1. Install additional libraries required for this tutorial.

!python3 -m pip install scikit-learn

Step 1. Data download#

Let us download the Scielo English-Spanish-Portugese dataset. Specifically we are going to download the Moses’s version of the dataset, which consists of 3 files, en_pt_es.en, en_pt_es.pt and en_pt_es.es. Each newline-separated entry in the en_pt_es.en file is a translation of the corresponding entry in the en_pt_es.es & en_pt_es.pt file, and vice-versa.

data_dir = base_dir + "/data"
!mkdir $data_dir

# Download the Scielo dataset
!wget -P $data_dir https://figshare.com/ndownloader/files/14019293
# Untar the downloaded the Scielo dataset
!tar -xvf $data_dir/14019293 -C $data_dir

Step 2. Data preprocessing#

Data preprocessing consists of multiple steps to improve the quality of the dataset. NeMo documentation provides detailed instructions about the 8-step data preprocessing for NMT. NeMo also provides a jupyter notebook that takes users programatically through the different preprocessing steps. Note that depending on the dataset, some or all preprocessing steps can be skipped.

To simplify the fine-tuning process in the Riva NMT program, we have provided 3 preprocessing scripts through the NeMo repository. The input to these scripts will be the 2 parallel corpus (i.e., source and target language) data files. In this tutorial, we are using the Moses’ version of the Scielo dataset, which directly provides us the source (en_pt_es.en) and target (en_pt_es.es) data files. If the dataset does not directly provide these files, then we first need to generate these 2 files from the dataset before using the preprocessing scripts.

The scripts below exposes a number of parameters, the most common of which are:

  • input-src: Path to the input file which contains text in source language.

  • input-tgt: Path to the input file which contains text in target language.

  • output-src: File path where the normalized and tokenized source language’s data is to be saved.

  • output-tgt: File path where the normalized and tokenized target language’s data is to be saved.

  • source-lang: Source language’s language code.

  • target-lang: Target language’s language code.

Others specific to script will be covered in respective sections.

a. Language filtering#

The language filtering preprocessing script is used for verifying language in machine translation data sets, using the Fasttext Language Identification model. If the script is used on a parallel corpus, it verifies both a source and a target language. Filtered data is stored into the files specified by output_src and output-tgt, and the removed lines are put into the files specified by removed_src and removed-tgt. If language cannot be detected (e.g. date), the line is removed.

This script exposes a number of parameters, the most common of which are:

  • removed-src: File path where the discarded data from source language is to be saved.

  • removed-tgt: File path where the discarded data from target language is to be saved.

  • fasttext-model: Path to fasttext model. The description and download links are here.

# Let us first download the fasttext model.
!wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -O $data_dir/lid.176.bin
# Running the language filtering preprocessing script. 
!python $base_dir/NeMo/scripts/neural_machine_translation/filter_langs_nmt.py \
    --input-src $data_dir/en_pt_es.en \
    --input-tgt $data_dir/en_pt_es.es \
    --output-src $data_dir/en_es_preprocessed1.en \
    --output-tgt $data_dir/en_es_preprocessed1.es \
    --removed-src $data_dir/en_es_garbage1.en \
    --removed-tgt $data_dir/en_es_garbage1.es \
    --source-lang en \
    --target-lang es \
    --fasttext-model $data_dir/lid.176.bin

# Run similarly for en and pt too (or other languages as needed)
!python $base_dir/NeMo/scripts/neural_machine_translation/filter_langs_nmt.py \
    --input-src $data_dir/en_pt_es.en \
    --input-tgt $data_dir/en_pt_es.pt \
    --output-src $data_dir/en_pt_preprocessed1.en \
    --output-tgt $data_dir/en_pt_preprocessed1.pt \
    --removed-src $data_dir/en_pt_garbage1.en \
    --removed-tgt $data_dir/en_pt_garbage1.pt \
    --source-lang en \
    --target-lang pt \
    --fasttext-model $data_dir/lid.176.bin

b. Length filtering#

The length filtering script is a multi-processed script, for filtering a parallel corpus to remove sentences that are less than a minimum length or longer than a maximum length. It also filters based on the length ratio between source and target sentences.

This script exposes a number of parameters, the most common of which are:

  • removed-src: File path where the discarded data from source language is to be saved.

  • min-length: Minimum sequence length.

  • max-length: Maximum sequence length.

  • ratio: Ratio of the length of the source sentence to the length of the target sentence.

# Running the length filtering preprocessing script.
!python $base_dir/NeMo/scripts/neural_machine_translation/length_ratio_filter.py \
    --input-src $data_dir/en_es_preprocessed1.en \
    --input-tgt $data_dir/en_es_preprocessed1.es \
    --output-src $data_dir/en_es_preprocessed2.en \
    --output-tgt $data_dir/en_es_preprocessed2.es \
    --removed-src $data_dir/en_es_garbage2.en \
    --removed-tgt $data_dir/en_es_garbage2.es \
    --min-length 1 \
    --max-length 512 \
    --ratio 1.3

# Run similarly for en and pt too (or other languages as needed)
!python $base_dir/NeMo/scripts/neural_machine_translation/length_ratio_filter.py \
    --input-src $data_dir/en_pt_preprocessed1.en \
    --input-tgt $data_dir/en_pt_preprocessed1.pt \
    --output-src $data_dir/en_pt_preprocessed2.en \
    --output-tgt $data_dir/en_pt_preprocessed2.pt \
    --removed-src $data_dir/en_pt_garbage2.en \
    --removed-tgt $data_dir/en_pt_garbage2.pt \
    --min-length 1 \
    --max-length 512 \
    --ratio 1.3

Tokenization and Normalization#

The tokenization and normalization script normalizes and tokenizes the input source and target language data.

!python $base_dir/NeMo/scripts/neural_machine_translation/preprocess_tokenization_normalization.py \
    --input-src $data_dir/en_es_preprocessed2.en \
    --input-tgt $data_dir/en_es_preprocessed2.es \
    --output-src $data_dir/en_es_final.en \
    --output-tgt $data_dir/en_es_final.es \
    --source-lang en \
    --target-lang es

!python $base_dir/NeMo/scripts/neural_machine_translation/preprocess_tokenization_normalization.py \
    --input-src $data_dir/en_pt_preprocessed2.en \
    --input-tgt $data_dir/en_pt_preprocessed2.pt \
    --output-src $data_dir/en_pt_final.en \
    --output-tgt $data_dir/en_pt_final.pt \
    --source-lang en \
    --target-lang pt

Training, Dev and Validation split#

For the last step of data preprocessing, we are going to split our dataset into training, dev and validation sets.
This is an optional step - Many datasets already come with training, dev and validation splits, but the Scielo dataset we are using in this tutorial does not come with such a split. So we will be using scikit-learn to split our dataset.

"""
    Read all final files into memory
"""
def read_data_from_file(filename):
    with open(filename) as f:
        lines = f.readlines()
    return lines
    
en_es_final_en = read_data_from_file(data_dir + "/en_es_final.en")
en_es_final_es = read_data_from_file(data_dir + "/en_es_final.es")
en_pt_final_en = read_data_from_file(data_dir + "/en_pt_final.en")
en_pt_final_pt = read_data_from_file(data_dir + "/en_pt_final.pt")

print("Number of entries in the final Scielo English-Spanish dataset = ", len(en_es_final_en))
print("Number of entries in the final Scielo English-Portugese dataset = ", len(en_pt_final_en))
"""
    Split the dataset into train, test and val using scikit learn's train_test_split
"""
from sklearn.model_selection import train_test_split

test_ratio = 0.10
validation_ratio = 0.11 # (10% of remaining)
train_ratio = 1.0 - validation_ratio - test_ratio

en_es_final_en_trainval, en_es_final_en_test, en_es_final_es_trainval, en_es_final_es_test = \
    train_test_split(en_es_final_en, en_es_final_es, test_size=test_ratio, random_state=1)

en_es_final_en_train, en_es_final_en_val, en_es_final_es_train, en_es_final_es_val = \
    train_test_split(en_es_final_en_trainval, en_es_final_es_trainval, test_size=validation_ratio, random_state=1)

en_pt_final_en_trainval, en_pt_final_en_test, en_pt_final_pt_trainval, en_pt_final_pt_test = \
    train_test_split(en_pt_final_en, en_pt_final_pt, test_size=test_ratio, random_state=1)

en_pt_final_en_train, en_pt_final_en_val, en_pt_final_pt_train, en_pt_final_pt_val = \
    train_test_split(en_pt_final_en_trainval, en_pt_final_pt_trainval, test_size=validation_ratio, random_state=1)


print("Number of entries in the final Scielo English-Spanish training, validation and test dataset are {}, {} and {}".format(len(en_es_final_en_train),len(en_es_final_en_val),len(en_es_final_en_test)))
print("Number of entries in the final Scielo English-Portugese training, validation and test dataset are {}, {} and {}".format(len(en_pt_final_en_train),len(en_pt_final_en_val),len(en_pt_final_en_test)))
"""
    Write the train, test and val data into files
"""
en_es_final_en_train_filename = "en_es_final_train.en"
en_es_final_en_val_filename = "en_es_final_val.en"
en_es_final_en_test_filename = "en_es_final_test.en"
en_es_final_es_train_filename = "en_es_final_train.es"
en_es_final_es_val_filename = "en_es_final_val.es"
en_es_final_es_test_filename = "en_es_final_test.es"

en_es_final_en_train_filepath = data_dir + "/" + en_es_final_en_train_filename
en_es_final_en_val_filepath = data_dir + "/" + en_es_final_en_val_filename
en_es_final_en_test_filepath = data_dir + "/" + en_es_final_en_test_filename
en_es_final_es_train_filepath = data_dir + "/" + en_es_final_es_train_filename
en_es_final_es_val_filepath = data_dir + "/" + en_es_final_es_val_filename
en_es_final_es_test_filepath = data_dir + "/" + en_es_final_es_test_filename


en_pt_final_en_train_filename = "en_pt_final_train.en"
en_pt_final_en_val_filename = "en_pt_final_val.en"
en_pt_final_en_test_filename = "en_pt_final_test.en"
en_pt_final_pt_train_filename = "en_pt_final_train.pt"
en_pt_final_pt_val_filename = "en_pt_final_val.pt"
en_pt_final_pt_test_filename = "en_pt_final_test.pt"

en_pt_final_en_train_filepath = data_dir + "/" + en_pt_final_en_train_filename
en_pt_final_en_val_filepath = data_dir + "/" + en_pt_final_en_val_filename
en_pt_final_en_test_filepath = data_dir + "/" + en_pt_final_en_test_filename
en_pt_final_pt_train_filepath = data_dir + "/" + en_pt_final_pt_train_filename
en_pt_final_pt_val_filepath = data_dir + "/" + en_pt_final_pt_val_filename
en_pt_final_pt_test_filepath = data_dir + "/" + en_pt_final_pt_test_filename

def write_data_to_file(data, filename):
    f = open(filename, "w")
    for data_entry in data:
        f.write(data_entry)
    f.close()
    
write_data_to_file(en_es_final_en_train, en_es_final_en_train_filepath)
write_data_to_file(en_es_final_en_val, en_es_final_en_val_filepath)
write_data_to_file(en_es_final_en_test, en_es_final_en_test_filepath)
write_data_to_file(en_es_final_es_train, en_es_final_es_train_filepath)
write_data_to_file(en_es_final_es_val, en_es_final_es_val_filepath)
write_data_to_file(en_es_final_es_test, en_es_final_es_test_filepath)  


write_data_to_file(en_pt_final_en_train, en_pt_final_en_train_filepath)
write_data_to_file(en_pt_final_en_val, en_pt_final_en_val_filepath)
write_data_to_file(en_pt_final_en_test, en_pt_final_en_test_filepath)
write_data_to_file(en_pt_final_pt_train, en_pt_final_pt_train_filepath)
write_data_to_file(en_pt_final_pt_val, en_pt_final_pt_val_filepath)
write_data_to_file(en_pt_final_pt_test, en_pt_final_pt_test_filepath)    

Step 3. Fine-tuning the NMT model with NeMo.#

NeMo provides the finetuning script needed to fine tune a multilingual NMT NeMo model. We can use this script to launch training.

We start by downloading the out-of-the-box (OOTB) any to english multilingual NMT NeMo model from NGC. It is this model, that we will be fine-tuning on the Scielo dataset.

Download the model#

# Create directory to hold model
model_dir = base_dir + "/model"
!mkdir $model_dir

# Download the NMT model from NGC using wget command
!wget -O $model_dir/megatronnmt_any_en_500m_1.0.0.zip --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatronnmt_any_en_500m/versions/1.0.0/zip 

# Unzip the downloaded model zip file.
!unzip $model_dir/megatronnmt_any_en_500m_1.0.0.zip -d $model_dir/pretrained_ckpt

# Alternate way to download the model from NGC using NGC CLI (Please make sure to install and setup NGC CLI):
#!cd $model_dir && ngc registry model download-version "nvidia/nemo/megatronnmt_any_en_500m:1.0.0"

Download Tokenizer#

tokenizer_dir = base_dir + "/tokenizer"
!mkdir $tokenizer_dir

!wget -O $tokenizer_dir/spm_64k_all_32_langs_plus_en_nomoses.model https://github.com/aishwaryac-nv/tutorials/blob/aishwaryac/add-nmt-tutorials/nmt_configs/spm_64k_all_32_langs_plus_en_nomoses.model

The NeMo NMT finetuning script exposes a number of parameters:

  • trainer.precision: Type of precision used. In our case it is bf16

  • trainer.devices: Number of gpus to allocate for finetuning.

  • trainer.max_epochs: The maximum number of epochs to run finetuning for.

  • trainer.max_steps: The maximum number of steps to run finetuning for. max_steps can override max_epochs, as we do in this tutorial.

  • trainer.val_check_interval: This parameter decides the number of training steps to perform before running validation on the entire validation dataset.

  • model.make_vocab_size_divisible_by: In our case the vocab size is 64128.

  • model.pretrained_model_path: Path to the local OOTB .nemo model.

  • model.train_ds.tgt_file_name: Path to the training dataset’s target language’s data file(s). In our case, this is a list of files.

  • model.train_ds.src_file_name: Path to the training dataset’s source language’s data file(s). In our case, this is a list of files.

  • model.train_ds.tokens_in_batch: Number of tokens in a single training batch. Pls note that this is not the number of data entries in a training batch, but the number of tokens.

  • model.validation_ds.tgt_file_name: Path to the validation dataset’s target language’s data file(s). In our case, this is a list of files.

  • model.validation_ds.src_file_name: Path to the validation dataset’s source language’s data file(s). In our case, this is a list of files.

  • model.test_ds.tgt_file_name: Path to the test dataset’s target language’s data file (It doesn’t take multiple files, so actual evals done afterwards). In our case, it will be one single file as multiple files are not yet supported in this version.

  • model.test_ds.src_file_name: Path to the test dataset’s source language’s data file (It doesn’t take multiple files, so actual evals done afterwards). In our case, it will be one single file as multiple files are not yet supported in this version.

  • model.encoder_tokenizer.model: Path to the tokenizer model, In our case it is - configs/tokenizer/spm_64k_all_32_langs_plus_en_nomoses.model

  • model.decoder_tokenizer.model: Path to the tokenizer model, In our case it is - configs/tokenizer/spm_64k_all_32_langs_plus_en_nomoses.model

  • exp_manager.create_wandb_logger: To be set to true if using wandb, otherwise it is an optional parameter.

  • exp_manager.wandb_logger_kwargs.name: Name of the experiment if using wandb.

  • exp_manager.wandb_logger_kwargs.project: Name of the project if using wandb.

  • exp_manager.resume_if_exists: Set it to true if you want to continue to train from a certain point.

  • exp_manager.exp_dir: Path to the experiment directory, which serves as the working directory for NeMo finetuning.

  • exp_manager.checkpoint_callback_params.monitor: The metric to monitor. Add val_sacreBLEU_avg for multiple languages (val_sacreBLEU_es-en if finetuning on single pair of language e.g. es-en)

  • exp_manager.checkpoint_callback_params.mode: The mode of the metrics to monitor.

  • exp_manager.checkpoint_callback_params.save_top_k

  • exp_manager.checkpoint_callback_params.save_best_model: Flag to indicate whether the best model must be saved after each training step.

Note: ++model.pretrained_language_list=None: Remove this if you are training in en2any direction

#Formatting to avoid hydra errors, files expect list of a string as input
train_src_files=[str(en_es_final_es_train_filepath) + ', ' + str(en_pt_final_pt_train_filepath)]
train_tgt_files=[str(en_es_final_en_train_filepath) + ', ' + str(en_pt_final_en_train_filepath)]
val_src_files=[str(en_es_final_es_val_filepath) + ', ' + str(en_pt_final_pt_val_filepath)] 
val_tgt_files=[str(en_es_final_en_val_filepath) + ', ' + str(en_pt_final_en_val_filepath)] 
!HYDRA_FULL_ERROR=1
!python $base_dir/NeMo/examples/nlp/machine_translation/megatron_nmt_training.py \
  trainer.precision=32 \
  trainer.devices=1 \
  trainer.max_epochs=5 \
  trainer.max_steps=200000 \
  trainer.val_check_interval=5000 \
  trainer.log_every_n_steps=5000 \
  ++trainer.replace_sampler_ddp=False \
  model.multilingual=True \
  model.pretrained_model_path=$model_dir/pretrained_ckpt/megatronnmt_any_en_500m.nemo \
  model.micro_batch_size=1 \
  model.global_batch_size=2 \
  model.encoder_tokenizer.library=sentencepiece \
  model.decoder_tokenizer.library=sentencepiece \
  model.encoder_tokenizer.model=$tokenizer_dir/spm_64k_all_32_langs_plus_en_nomoses.model \
  model.decoder_tokenizer.model=$tokenizer_dir/spm_64k_all_32_langs_plus_en_nomoses.model \
  model.src_language=['es, pt'] \
  model.tgt_language=en \
  model.train_ds.src_file_name=$train_src_files \
  model.train_ds.tgt_file_name=$train_tgt_files \
  model.test_ds.src_file_name=$en_es_final_es_test_filepath \
  model.test_ds.tgt_file_name=$en_es_final_en_test_filepath \
  model.validation_ds.src_file_name=$val_src_files \
  model.validation_ds.tgt_file_name=$val_tgt_files \
  model.optim.lr=0.00001 \
  model.train_ds.concat_sampling_probabilities=['0.1, 0.1'] \
  ++model.pretrained_language_list=None \
  +model.optim.sched.warmup_steps=500 \
  ~model.optim.sched.warmup_ratio \
  exp_manager.resume_if_exists=True \
  exp_manager.resume_ignore_no_checkpoint=True \
  exp_manager.create_checkpoint_callback=True \
  exp_manager.checkpoint_callback_params.monitor=val_sacreBLEU_avg \
  exp_manager.checkpoint_callback_params.mode=max \
  exp_manager.checkpoint_callback_params.save_top_k=5 \
  +exp_manager.checkpoint_callback_params.save_best_model=true

Step 4. Evaluate the fine-tuned NMT model with NeMo.#

Now that we have a finetuned model, we need to check how well it performs.
We run inference with a NeMo provided script nmt_transformer_infer_megatron.py, on a small subset of the test dataset, first with the OOTB model and then with the fine-tuned model. Then we compare the translations from both models.

The NeMo inference script nmt_transformer_infer_megatron.py supports multiple input parameters, the most important of which are:

  • model: Path to the .nemo to run inference on

  • srctext: Path to the text file containing new-line separated input samples to run inference on

  • tgtout: Path to the text file where translations are to be saved

  • source_lang: Source language’s language code.

  • target_lang: Target language’s language code.

  • batch_size: Batch size for inference.

  • trainer.precision: Precision of the model. In this section, we learn to run inference with this script.

First, let us create a working directory for evaluation.

eval_dir = base_dir + "/eval"
!mkdir $eval_dir

We pick a small subset of the test data for inference and write it into a file.

infer_input_data_en = en_pt_final_en_test[:10]
infer_input_data_pt = en_pt_final_pt_test[:10]

infer_input_data_pt_filename = "infer_input_data_pt.pt"
infer_input_data_pt_filepath = eval_dir + "/" + infer_input_data_pt_filename

f = open(infer_input_data_pt_filepath, "w")
for infer_input_data_pt_entry in infer_input_data_pt:
    f.write(infer_input_data_pt_entry)
f.close()    

Let us run inference on the NeMo NMT OOTB model.

infer_ootbmodel_output_data_en_filename = "infer_ootbmodel_output_data_en.en"
infer_ootbmodel_output_data_en_filepath = eval_dir + "/" + infer_ootbmodel_output_data_en_filename

!python $base_dir/NeMo/examples/nlp/machine_translation/nmt_transformer_infer_megatron.py \
    model_file=$model_dir/pretrained_ckpt/megatronnmt_any_en_500m.nemo \
    srctext=$infer_input_data_pt_filepath \
    tgtout=$infer_ootbmodel_output_data_en_filepath \
    source_lang=pt \
    target_lang=en \
    batch_size=10 \
    trainer.precision=32

Now we run inference on the NeMo NMT finetuned model.
Note: Please be sure to set the model parameter below to point the finetuned .nemo checkpoint, that can be found in the $model_dir/results directory.

infer_finetuned_output_data_en_filename = "infer_finetuned_output_data_en.en"
infer_finetuned_output_data_en_filepath = eval_dir + "/" + infer_finetuned_output_data_en_filename

!python $base_dir/NeMo/examples/nlp/machine_translation/nmt_transformer_infer_megatron.py \
    model_file=$model_dir/pretrained_ckpt/megatronnmt_any_en_500m.nemo \
    srctext=$infer_input_data_pt_filepath \
    tgtout=$infer_finetuned_output_data_en_filepath \
    source_lang=pt \
    target_lang=en \
    batch_size=10 \
    trainer.precision=32

Let us display the translations from both OOTB and finetuned models for our inference test subset. As we performed eval only on 10 examples. You can use bleu scores to evaluate a larger test set.

with open(infer_ootbmodel_output_data_en_filepath) as f:
    infer_ootbmodel_output_data_en = f.readlines()

with open(infer_finetuned_output_data_en_filepath) as f:
    infer_finetuned_output_data_en = f.readlines()
    
for infer_input_data_pt_entry, infer_input_data_pt_entry, infer_ootbmodel_output_data_en_entry, infer_finetuned_output_data_en_entry in \
    zip(infer_input_data_pt, infer_input_data_pt, infer_ootbmodel_output_data_en, infer_finetuned_output_data_en):
    print("Portugese: ", infer_input_data_pt_entry)
    print("Portugese-English Translation - Ground Truth: ", infer_input_data_en)
    print("Portugese-English Translation - OOTB model Generated:     ", infer_ootbmodel_output_data_en_entry)
    print("Portugese-English Translation - Finetuned model Generated:", infer_finetuned_output_data_en_entry)
    print("------------------------")

Step 5. Exporting the NeMo model#

NeMo and Riva allow you to export your fine-tuned model in a format that can deployed using NVIDIA Riva; a highly performant application framework for multi-modal conversational AI services using GPUs.

Export to Riva#

Riva provides the nemo2riva tool which can be used to convert a .nemo model to a .riva model. This tool is available through the Riva Quick Start Guide, and was installed during the Requirements and Setup step above. Update the path for the custom model, by default it is saved as nemo_experiments/megatron_nmt/checkpoints/megatron_nmt.nemo

!nemo2riva --out $model_dir/megatronnmt_custom_any_en_500m.riva <saved_custom_nemo_model_path>

Step 6. Deploying the fine-tuned NeMo NMT model on the Riva Speech Skills server.#

The NeMo-finetuned NMT model needs to be deployed on Riva Speech Skills server for inference.
Please follow the “How to deploy a NeMo-finetuned NMT model on Riva Speech Skills server?” tutorial from Riva NMT Tutorials - This notebook covers deploying the .riva file obtained from Step 5, on Riva Speech Skills server.