Overview¶

Jarvis handles deployments of full pipelines, which can be composed of one or more NVIDIA Transfer Learning Toolkit (TLT) models and other pre-/post-processing components. Additionally, the TLT models have to be exported to an efficient inference engine and optimized for the target platform. Therefore, the Jarvis server cannot use NVIDIA NeMo or TLT models directly because they represent only a single model.

The process of gathering all the required artifacts (for example, models, files, configurations and user settings) and generating the inference engines, will be referred to as the Jarvis model repository generation. The Jarvis ServiceMaker Docker image has all the tools necessary to generate the Jarvis model repository and can be pulled from NGC as follows:

docker pull nvcr.io/nvidia/jarvis/jarvis-speech:1.1.0-beta-servicemaker

The Jarvis model repository generation is done in two phases:

Phase 1: The build phase. During the build phase, all the necessary artifacts (models, files, configurations, and user settings) required to deploy a Jarvis service are gathered together into an intermediate file called JMIR (Jarvis Model Intermediate Representation). For more information, continue to the next section.

Phase 2: The deploy phase. During the deploy phase, the JMIR file is converted into the Jarvis model repository and the neural networks in TLT or NeMo format are exported and optimized to run on the target platform. The deploy phase should be executed on the physical cluster on which the Jarvis server will be deployed. For more information, refer to the Jarvis Deploy section.

Model Development with NeMo¶

NeMo is an open source PyTorch-based toolkit for research in conversational AI. While TLT is the recommended path for typical users of Jarvis, some developers may prefer to use NeMo because it exposes more of the model and PyTorch internals. Jarvis supports the ability to import models trained in NeMo.

For more information, refer to the NeMo project page.

Model Development with TLT¶

Models trained from frameworks such as NVIDIA NeMo and NVIDIA Transfer Learning Toolkit (TLT) normally have the format .nemo and .tlt, respectively. To use those models in Jarvis, users needs to convert the model checkpoints to .ejrvs format for building and deploying with Jarvis ServiceMaker.

Note

If you trained your model with the recent NeMo release (1.0.0.b4), you can directly use tlt … export from

the TLT launcher to export the NeMo models to the Jarvis required format. For older NeMo releases (1.0.0.b1-b3), this export path might work for some but not all models; you should use at your own risk. For even older NeMo releases (before 1.0.0.b1), it will not work due to missing artifacts that Jarvis ServiceMaker requires.

TLT Export for NeMo/TLT¶

Follow the TLT Launcher Quick Start Guide instructions to setup.

Configure the TLT launcher. The TLT launcher uses Docker containers for training and export tasks. The launcher instance can be configured in the ~/.tlt_mounts.json file. Configuration requires mounting at least three separate directories where data, specification files, and results are stored. A sample is provided below.

{
    "Mounts":[
        {
            "source": "~/tlt/data",
            "destination": "/data"
        },
        {
            "source": "~/tlt/specs",
            "destination": "/specs"
        },
        {
            "source": "~/tlt/results",
            "destination": "/results"
        },
        {
            "source": "~/.cache",
            "destination": "/root/.cache"
        }
    ],
    "DockerOptions":{
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
        }
    }
}

Convert either NeMo or TLT checkpoints to the Jarvis format using tlt … export. The example below demonstrates exporting a Jasper model trained in NeMo, where:
- -m is used to specify the Jasper model checkpoints location
- -e is used to specify the path to an experiment spec file
- -r indicates where the experiment results (logs, output, model checkpoints, etc.) are stored
```
tlt speech_to_text export -m /data/asr/jasper.tlt -e /specs/asr/speech_to_text/export.yaml -r /results/asr/speech_to_text/
```
Here is an example experiment spec file (export.yaml):
```
# Path and name of the input .nemo/.tlt archive to be loaded/exported.
restore_from: /data/asr/jasper.tlt

# Name of output file (will land in the folder pointed by -r)
export_to: jasper.ejrvs
```
Note that TLT comes with default experiment spec files that can be pulled by calling:
```
tlt speech_to_text download_specs -o /specs/asr/speech_to_text/ -r /results/asr/speech_to_text/download_specs/
```

Besides speech_to_text from the ASR domain, TLT also supports several conversational AI tasks from the NLP domain:

intent_slot_classification
punctuation_and_capitalization
question_answering
text_classification
token_classification

More details can be found in tlt --help.

Jarvis Build¶

The jarvis-build tool is responsible for deployment preparation. It’s only output is an intermediate format (called a JMIR) of an end-to-end pipeline for the supported services within Jarvis. This tool can take multiple different types of models as inputs. Currently, the following pipelines are supported:

speech_recognition (for ASR)
speech_synthesis (for TTS)
qa (for question answering)
token_classification (for token level classification, for example, Named Entity Recognition)
intent_slot (for joint intent and slot classification)
text_classification
punctuation

Launch an interactive session inside the Jarvis ServiceMaker image.
```
docker run --gpus all -it --rm -v <artifact_dir>:/servicemaker-dev -v <jarvis_repo_dir>:/data --entrypoint="/bin/bash" nvcr.io/nvidia/jarvis/jarvis-speech:1.1.0-beta-servicemaker
```
where:
- <artifact_dir> is the folder or Docker volume that contains the Jarvis .ejrvs file and other artifacts required to
prepare the Jarvis model repository.
- <jarvis_repo_dir> is the folder or Docker volume where the Jarvis model repository is generated.
Run the jarvis-build command from within the container.
```
jarvis-build <pipeline> /servicemaker-dev/<jmir_filename>:<encryption_key> /servicemaker-dev/<ejrvs_filename>:<encryption_key> <optional_args>
```
where:
- <pipeline> must be one of the following:
  - speech_recognition
  - speech_synthesis
  - qa
  - token_classification
  - intent_slot
  - text_classification
  - punctuation
- <jmir_filename> is the name of the JMIR file that will be generated.
- <ejrvs_filename> is the name of the ejrvs file(s) to use as input.
- <args> are optional arguments that can be used to configure the Jarvis service. The next section covers the different ways
the ASR, NLP, and TTS services can be configured.
- <encryption_key> is optional. In the case where the .ejrvs file was generated without an encryption key, the input/output
files can be specified with <ejrvs_filename> instead of <ejrvs_filename>:<encryption_key>.

Jarvis Deploy¶

The jarvis-deploy tool takes as input one or more Jarvis Model Intermediate Representation (JMIR) files and a target model repository directory. It is responsible for performing the following functions:

Function 1: Adds the Triton Inference Server custom backends for pre- and post-processing specifically for the given model.

Function 2: Generates the TensorRT engine for the input model.

Function 3: Generates the Triton Inference Server configuration files for each of the modules (pre-, post- processing and the model).

Function 4: Creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

The Jarvis model repository can be generated from the Jarvis .jmir file(s) with the following command:

jarvis-deploy /servicemaker-dev/<jmir_filename>:<encryption_key> /data/models

NeMo to Jarvis¶

There is a dedicated version of the NVIDIA NeMo toolkit (1.0.0b4) that supports Jarvis. In this release, you can export selected models (Jasper and QuartzNet only) directly from NeMo.

Prepare to export from your .nemo file using the convasr_to_enemo.py script.

python convasr_to_enemo.py --nemo_file=/NeMo/QuartzNet15x5Base-En.nemo --onnx_file=output/quartz.onnx  --enemo_file=/NeMo/quartznet_asr.enemo

Follow the Jarvis Build documentation to use quartznet_asr.enemo instead of .ejrvs for the build phase.