Quick Start Guide

Abstract: This NVIDIA |ProjectVersion| Quick Start Guide is intended for system administrators or software developers to learn about the Jarvis AI Services stack deployment process. This guide will explain the components of the system and enumerate the steps required to deploy on a single workstation/server as well as how to deploy in a production capacity, leveraging Kubernetes on an NVIDIA GPU-enabled compute cluster.

Overview

The Jarvis API Services server deployment is made up of three components:

  • Jarvis Speech Server - middleman server, provides APIs for speech-related Jarvis AI Services (Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text to Speech (TTS)).

Users make direct requests to the Jarvis speech server.

The remainder of this document describes how to set up each component individually for local deployment, starting with the configuration of the model repository. The end of the document addresses deployment on Kubernetes using a Helm chart.

Prerequisites

Before you begin using Jarvis AI Services, it’s assumed that you meet the following prerequisites.

  1. You have access and are logged into NVIDIA GPU Cloud (NGC). For step-by-step instructions, see the NGC Getting Started Guide.

  2. You have access to a Turing T4 or Volta V100 GPU. For more information, see the Jarvis Services Support Matrix.

  3. You have Docker installed with support for NVIDIA GPUs.

    a. For DGX users, see Preparing to use NVIDIA > Containers.

    a. For users other than DGX, Docker >= 19.03 is required.

Models Available For Deployment

There are two ways you can deploy Jarvis Services:

  • You can use our quick start scripts to set up the servers and deploy the Jarvis services using Docker by following the steps in this chapter.

  • You can use a Helm chart. Included in the NGC Helm Repository is a chart designed to automate the steps for push-button deployment to a Kubernetes cluster.

When using either of the push-button deployment options, Jarvis will use our pre-trained models from NGC. You can also fine-tune custom models with Neural Modules (NeMo). Creating a model repository using a fine-tuned model trained in Nemo is a more advanced approach to generating a model repository. For step-by-step instructions, refer to the applicable Jarvis AI Services User Guide: Automatic Speech Recognition (ASR), Natural Language Processing (NLP), or Text-to-Speech (TTS).

Automatic Speech Recognition

For Jarvis 0.2 EA, we provide pre-trained English models to perform offline and streaming recognition. The Jarvis ASR service can automatically add punctuation, output word timestamps and return top-n transcripts.

Natural Language Processing

Jarvis 0.2 EA contains a variety of pre-trained models to demonstrate the NLP APIs, including sequence & token classification, intent classification models for different query domains, and a sequence domain classifier. All of the NLP models are intended to be examples of potential use cases and are all optional.

Text-to-Speech

Jarvis 0.2 EA contains a state-of-the-art speech synthesis model based on Tacotron 2 and WaveGlow, implemented as a Triton custom backend for performance. Two versions of the TTS model are available: one optimized for batch inference and one optimized for streaming inference. In case both are deployed, the Jarvis Speech API server will choose one or the other based on whether the client requests batch or online inference.

Local Deployment Using Quick Start Scripts

This release of Jarvis includes quick start scripts to help you get started with Jarvis Services. These scripts are meant for deploying the services locally for testing and running our example applications. The scripts can be downloaded from the File Browser tab for Jarvis Quick Start, or downloaded via the command-line with the NGC CLI tool by running:

ngc registry model-script download-version ea-2-jarvis/jarvis_quickstart:ea2

Configuring

General configuration for the services deployed using the quick start script is done by editing the file config.sh. By default, the configuration file is set to launch all available services on the supported GPU (T4 or V100) which is selected automatically based on the GPUs available in the system.

Important: By default, the Jarvis Speech Services API server will listen on port 50051.

All of the configuration options are documented within the configuration file itself. Follow the instructions in the config.sh file to change the default deployment behavior of the script. Advanced users can select which specific models to deploy for each service by commenting out lines corresponding to the pre-build model configuration files.

Downloading Required Models and Containers From NGC

The jarvis_init.sh script downloads all required models and containers from NGC and generates the model repository. You will need to provide an NGC API key for it to work. The key can be provided either through the environment variable NGC_API_KEY or as a configuration file (which is automatically generated by running ngc config set).

If the NGC key cannot be automatically discovered from your environment, the init script will prompt you to enter it.

Run the script with the command bash jarvis_init.sh. Upon successful completion of this command, users should see the following output:

Logging into NGC Docker registry if necessary...
Pulling required Docker images if necessary...
 > Pulling Jarvis Speech Server images.
 > Pulling nvcr.io/ea-2-jarvis/jarvis-speech:ea2. This may take some time...
Jarvis initialization complete. Run bash jarvis_start.sh to launch services.

Launching the Servers and Client Container

After downloading the required models and containers, the Jarvis Services servers can be started by running bash jarvis_start.sh. This will launch a Triton container to serve the models and the Jarvis Services API servers for speech and vision.

Example output:

Starting Triton container
 > Waiting for Triton server to load all models...retrying in 10 seconds
 > Waiting for Triton server to load all models...retrying in 10 seconds
 > Waiting for Triton server to load all models...retrying in 10 seconds
 > Triton server is ready…
Starting Jarvis Speech Services
Starting Jarvis Vision Services

To verify that the servers have started correctly, users can verify that the output of docker logs jarvis-triton shows:

I0428 03:14:46.464529 1 grpc_server.cc:1973] Started GRPCService at 0.0.0.0:8001
I0428 03:14:46.464569 1 http_server.cc:1443] Starting HTTPService at 0.0.0.0:8000
I0428 03:14:46.507043 1 http_server.cc:1458] Starting Metrics Service at 0.0.0.0:8002

and that the output of docker logs jarvis-speech shows:

I0428 03:14:50.440943 1 jarvis_server.cc:66] TTS Server connected to Triton Inference Server at jarvis-triton:8001
I0428 03:14:50.440943 1 jarvis_server.cc:66] NLP Server connected to Triton Inference Server at jarvis-triton:8001
I0428 03:14:50.440951 1 jarvis_server.cc:68] ASR Server connected to Triton Inference Server at jarvis-triton:8001
I0428 03:14:50.440955 1 jarvis_server.cc:71] Jarvis Conversational AI Server listening on 0.0.0.0:50051

To start a container with sample clients for each service, run bash jarvis_start_client.sh. From inside the client container, users can try the different services using the provided Jupyter notebooks by simply running:

jupyter notebook --ip=0.0.0.0 --allow-root --notebook-dir=/work/notebooks

Stopping

To shut down the Jarvis Services server containers, run bash jarvis_stop.sh.

Removing

To clean up the local Jarvis installation, run bash jarvis_destroy.sh. This will stop and remove all Jarvis-related containers, as well as delete the Docker volume used to store model files. The Docker images themselves will not be removed.

Using Helm To Deploy Jarvis AI Services on Kubernetes

Included in the NGC Helm Repository is a chart designed to automate for push-button deployment to a Kubernetes cluster.

The Jarvis AI Services Helm Chart can be used to deploy ASR, NLP, TTS, and Vision services automatically. The Helm chart performs a number of functions:

  • Pulls Docker images from NGC for Jarvis Speech Server, Jarvis Vision Server, Triton Inference Server, and utility containers for downloading and converting models.

  • Downloads requested model artifacts from NGC as configured in the values.yaml file.

  • Generates the Triton Inference Server model repository.

  • Starts the Triton Inference Server with the appropriate configuration.

  • Starts the Jarvis Speech and/or Vision Servers as configured in a Kubernetes pod.

  • Exposes the Triton Inference Server and Jarvis Servers as Kubernetes services.

Example pre-trained models are released with Jarvis for each of the services. The Helm chart comes pre-configured for downloading and deploying all of these models.

Note: The Helm chart configuration can be modified for your use case by modifying the values.yaml file. In this file, you can change settings related to which models to deploy, where to store them, and how to expose the services.

  1. Download and modify the Helm chart for your use, fetch it from NGC:

    export NGC_API_KEY=<ngc_api_key>
    
    helm fetch https://helm.ngc.nvidia.com/ea-2-jarvis/charts/jarvis-api-0.2-ea.tgz \
        --username=\$oauthtoken --password=$NGC_API_KEY --untar
    

    The result of the above operation will be a new directory called jarvis-api in your current working directory. Within that directory is a values.yaml file which can be modified to suit your use case (see Kubernetes Secrets and Jarvis Settings).

  2. After the values.yaml file has been updated to reflect the deployment requirements, Jarvis can be deployed to the Kubernetes cluster:

    helm install jarvis-api
    
  3. Helm configuration. The following sections point out a few key areas of the values.yaml file and considerations for deployment. Consult the individual service documentation for more details as well as the Helm chart’s values.yaml file, which contains inline comments explaining the configuration options.

Kubernetes Secrets

The helm deployment uses two Kubernetes secrets for obtaining access to NGC: one for Docker images, and another for model artifacts. By default, these are named jarvis-ea-regcred and jarvis-ngc-read, respectively. The names of the secrets can be modified in the values.yaml file.

Docker images

  1. Set up a secret for pulling containers from NGC by logging in to NGC with Docker on your local machine:

    docker login nvcr.io
    > \$oauthtoken
    > $NGC_API_KEY
    
  2. Configure the jarvis-ea-regcred secret based on your local machine’s Docker credentials:

    kubectl create secret generic jarvis-ea-regcred \
        --from-file=.dockerconfigjson=$HOME/.docker/config.json \
        --type=kubernetes.io/dockerconfigjson
    

Model artifacts

Set up a secret which contains your NGC API Key (for pulling model artifacts) by running:

kubectl create secret generic jarvis-ngc-read --from-literal=key=$NGC_API_KEY

assuming your NGC API Key is set as the $NGC_API_KEY environment variable.

Jarvis Settings

The values.yaml for Jarvis is intended to provide maximal flexibility in deployment configurations.

The replicaCount field is used to configure the number of identical instances (or pods) of the services that are deployed. When load-balanced appropriately, increasing this number (as resources permit) will enable horizontal scaling for increased load.

Individual speech services (ASR, NLP, or TTS) may be disabled by changing the jarvis.speechServices.[asr|nlp|tts] key to false.

Prebuilt models not required for your deployment can be deleted from the list in modelRepoGenerator.ngcModelConfigs. NVIDIA recommends you remove models and disable services that are not used to reduce deployment time and GPU memory usage.

By default, models are downloaded from NGC, optimized for TensorRT (if necessary) before the service starts, and stored in an ephemeral location. When the pod terminates, these model artifacts are deleted and the storage is freed for other workloads. This behavior is controlled by the modelDeployVolume field and its default value emptyDir: {}. See the Kubernetes Volumes documentation for alternative options that can be used for persistent storage.

Note:

  • Persistent storage should only be used in homogenous deployments where GPU models are identical.

  • Currently provided models nearly fill a T4’s memory. We recommend running a subset of models/services if using a single GPU.

Running The Jarvis Client And Transcribing Audio Files

For ASR, the following commands can be run from inside the Jarvis client container to perform streaming and offline transcription of audio files.

  1. For offline recognition, run:

    /usr/local/bin/jarvis_asr_client --audio_file=/work/wav/test/1272-135031-0000.wav
    
  2. For streaming recognition, run:

    /usr/local/bin/jarvis_streaming_asr_client --audio_file=/work/wav/test/1272-135031-0000.wav
    

Running The Jarvis Client And Converting Text To Audio Files

From within the Jarvis Client container, synthesize the audio files by running:

jarvis_tts_client --voice_name=ljspeech --text="Hello, this is a speech synthesizer."\
    --audio_file=/work/wav/output.wav

The audio files are stored in the /work/wav directory.

The streaming API can be tested by using the command line option --online=true. However, there is no difference between both options with the command-line client since it saves the entire audio to a WAV file.

Integrating Jarvis AI Services Into Your Own Application

All Jarvis AI Services described in this document are exposed using gRPC to maximize compatibility with existing software infrastructure and ease integration. gRPC officially supports twelve languages, including C++, Java, Python, and Golang, with unofficial support for many others.

The gRPC services and messages/data structures are defined with protocol buffer definition files, which are packaged with this release. Using these files, you can generate Jarvis AI Services bindings to any supported language of your choice. The generated code can be compiled into your application, with the only additional dependency being the gRPC library.

Python Example

These steps show how you can integrate Jarvis AI Services into your own application using Python as an example. Full API documentation for all services is included in the Jarvis 0.1 EA release.

  1. Install the jarvis_api Python wheel from the Jarvis Quick Start scripts.

    pip install jarvis_api-0.10.0_ea2-py3-none-any.whl
    
  2. With the Python bindings successfully installed, integrate the Jarvis services in your application. For example, to classify the intent of a query, you would add:

# required imports
import grpc
import jarvis_api.jarvis_nlp_pb2 as jnlp
import jarvis_api.jarvis_nlp_pb2_grpc as jnlp_srv

# establish connection to Jarvis server and initialize client
channel = grpc.insecure_channel('localhost:50051')
jarvis_nlp = jnlp_srv.JarvisNLPStub(channel)

# make request
req = jnlp.AnalyzeIntentRequest(query="How is the weather today in New England")
resp = jarvis_nlp.AnalyzeIntent(req)

print(resp.intent.class_name)

Support

After you’re up-and-running, refer to the following User Guides for additional information and support.