Overview¶

Riva handles deployments of full pipelines, which can be composed of one or more NVIDIA TAO Toolkit models and other pre-/post-processing components. Additionally, the TAO Toolkit models have to be exported to an efficient inference engine and optimized for the target platform. Therefore, the Riva server cannot use NVIDIA NeMo or TAO models directly because they represent only a single model.

The process of gathering all the required artifacts (for example, models, files, configurations and user settings) and generating the inference engines, will be referred to as the Riva model repository generation. The Riva ServiceMaker Docker image has all the tools necessary to generate the Riva model repository and can be pulled from NGC as follows:

docker pull nvcr.io/nvidia/riva/riva-speech:1.6.0-beta-servicemaker

The Riva model repository generation is done in two phases:

Phase 1: The build phase. During the build phase, all the necessary artifacts (models, files, configurations, and user settings) required to deploy a Riva service are gathered together into an intermediate file called RMIR (Riva Model Intermediate Representation). For more information, continue to the next section.

Phase 2: The deploy phase. During the deploy phase, the RMIR file is converted into the Riva model repository and the neural networks in TAO Toolkit or NeMo format are exported and optimized to run on the target platform. The deploy phase should be executed on the physical cluster on which the Riva server will be deployed. For more information, refer to the Riva Deploy section.

Model Development with TAO Toolkit¶

Models trained from NVIDIA TAO Toolkit normally have the format .tao. To use those models in Riva, users needs to convert the model checkpoints to .riva format for building and deploying with Riva ServiceMaker using tao_export.

TAO Export for TAO Toolkit¶

Follow the TAO Toolkit Launcher Quick Start Guide instructions to setup.

Configure the TAO Toolkit launcher. The TAO Toolkit launcher uses Docker containers for training and export tasks. The launcher instance can be configured in the ~/.tao_mounts.json file. Configuration requires mounting at least three separate directories where data, specification files, and results are stored. A sample is provided below.

{
    "Mounts":[
        {
            "source": "~/tao/data",
            "destination": "/data"
        },
        {
            "source": "~/tao/specs",
            "destination": "/specs"
        },
        {
            "source": "~/tao/results",
            "destination": "/results"
        },
        {
            "source": "~/.cache",
            "destination": "/root/.cache"
        }
    ],
    "DockerOptions":{
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
        }
    }
}

Convert the TAO Toolkit checkpoints to the Riva format using tao … export. The example below demonstrates exporting a Jasper model trained in NeMo, where:
- -m is used to specify the Jasper model checkpoints location
- -e is used to specify the path to an experiment spec file
- -r indicates where the experiment results (logs, output, model checkpoints, etc.) are stored
```
tao speech_to_text export -m /data/asr/jasper.tao -e /specs/asr/speech_to_text/export.yaml -r /results/asr/speech_to_text/
```
Here is an example experiment spec file (export.yaml):
```
# Path and name of the input .nemo/.tao archive to be loaded/exported.
restore_from: /data/asr/jasper.tao

# Name of output file (will land in the folder pointed by -r)
export_to: jasper.riva
```
Note that TAO Toolkit comes with default experiment spec files that can be pulled by calling:
```
tao speech_to_text download_specs -o /specs/asr/speech_to_text/ -r /results/asr/speech_to_text/download_specs/
```

Besides speech_to_text from the ASR domain, TAO Toolkit also supports several conversational AI tasks from the NLP domain:

intent_slot_classification
punctuation_and_capitalization
question_answering
text_classification
token_classification

More details can be found in tao --help.

Model Development with NeMo¶

NeMo is an open source PyTorch-based toolkit for research in conversational AI. While TAO Toolkit is the recommended path for typical users of Riva, some developers may prefer to use NeMo because it exposes more of the model and PyTorch internals. Riva supports the ability to import models trained in NeMo.

For more information, refer to the NeMo project page.

NeMo2Riva Export for NeMo¶

Models trained in NVIDIA NeMo have the format .nemo. To use these models in Riva, users need to convert the model checkpoints to .riva format for building and deploying with Riva ServiceMaker using the nemo2riva tool. The nemo2riva tool is currently packaged and available via Riva Quickstart.

Follow the NeMo installation instructions to setup a NeMo environment; version 1.1.0 or greater. From within your NeMo 1.1.0 environment:

pip3 install nvidia-pyindex
pip3 install nemo2riva-1.3.0_beta-py3-none-any.whl
nemo2riva --out /NeMo/<MODEL_NAME>.riva /NeMo/<MODEL_NAME>.nemo

To export the Hifi-GAN model from NeMo to Riva format, run the following command after configuring the NeMo environment:
```
nemo2riva --out /NeMo/hifi.riva /NeMo/tts_hifigan.nemo
```
For additional information and usage, run:
```
nemo2riva --help
```

Usage:

nemo2riva [-h] [--out OUT] [--validate] [--schema SCHEMA] [--format FORMAT] [--verbose VERBOSE] [--key KEY] source

When converting NeMo models to Riva EFF input format, passing the input .nemo file as a parameter creates a .riva next to the .nemo input.

If no --format is passed, the Riva-preferred format for the supplied model architecture is selected automatically.

The format is also derived from schema if the --schema argument is supplied, or if nemo2riva is able to find the schema for this NeMo model among known models - there is a set of YAML files in the nemo2riva/validation_schemas directory, or you can add your own.

If the --key argument is passed, the model graph in the output EFF file is encrypted with that key.

positional arguments:

source Source .nemo file

optional arguments:

-h, --help: Show this help message and exit
--out OUT: Location to write resulting Riva EFF input to (default: None)
--validate: Validate using schemas (default: False)
--schema SCHEMA: Schema file to use for validation (default: None)
--format FORMAT: Force specific export format: ONNX|TS|CKPT (default: None)
--verbose VERBOSE: Verbose level for logging, numeric (default: None)
--key KEY: Encryption key or file, default is None (default: None)

Riva Build¶

The riva-build tool is responsible for deployment preparation. It’s only output is an intermediate format (called a RMIR) of an end-to-end pipeline for the supported services within Riva. This tool can take multiple different types of models as inputs. Currently, the following pipelines are supported:

speech_recognition (for ASR)
speech_synthesis (for TTS)
qa (for question answering)
token_classification (for token level classification, for example, Named Entity Recognition)
intent_slot (for joint intent and slot classification)
text_classification
punctuation

Launch an interactive session inside the Riva ServiceMaker image.
```
docker run --gpus all -it --rm -v <artifact_dir>:/servicemaker-dev -v <riva_repo_dir>:/data --entrypoint="/bin/bash" nvcr.io/nvidia/riva/riva-speech:1.6.0-beta-servicemaker
```
where:
- <artifact_dir> is the folder or Docker volume that contains the .riva file and other artifacts required to prepare the Riva model repository.
- <riva_repo_dir> is the folder or Docker volume where the Riva model repository is generated.
Run the riva-build command from within the container.
```
riva-build <pipeline> /servicemaker-dev/<rmir_filename>:<encryption_key> /servicemaker-dev/<riva_filename>:<encryption_key> <optional_args>
```
where:
- <pipeline> must be one of the following:
  - speech_recognition
  - speech_synthesis
  - qa
  - token_classification
  - intent_slot
  - text_classification
  - punctuation
- <rmir_filename> is the name of the RMIR file that will be generated.
- <riva_filename> is the name of the riva file(s) to use as input.
- <args> are optional arguments that can be used to configure the Riva service. The following section covers the different ways the ASR, NLP, and TTS services can be configured.

<encryption_key> is optional. In the case where the .riva file was generated without an encryption key, the input/output files can be specified with <riva_filename> instead of <riva_filename>:<encryption_key>.

By default, if a file named <rmir_filename> already exists, it will not be overwritten. To force the <rmir_filename> to be overwritten, one can use the -f or --force argument. For example, riva-build <pipeline> -f ...

Riva Deploy¶

The riva-deploy tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and a target model repository directory. It is responsible for performing the following functions:

Function 1: Adds the Triton Inference Server custom backends for pre- and post-processing specifically for the given model.

Function 2: Generates the TensorRT engine for the input model.

Function 3: Generates the Triton Inference Server configuration files for each of the modules (pre-, post- processing and the model).

Function 4: Creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

The Riva model repository can be generated from the Riva .rmir file(s) with the following command:

riva-deploy /servicemaker-dev/<rmir_filename>:<encryption_key> /data/models

By default, if the destination folder (/data/models/ in the example above) already exists, it will not be overwritten. To force the destination folder to be overwritten, one can use the -f or --force parameter. For example, riva-deploy -f ...