Getting Started#

Prerequisites#

Setup#

  • NVIDIA AI Enterprise License: Riva NMT NIM is available for self-hosting under the NVIDIA AI Enterprise (NVAIE) License.

  • NVIDIA GPU(s): Riva NMT NIM runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized. See the Supported Models for more information.

  • CPU: x86_64 architecture only for this release

  • OS: any Linux distributions which:

  • CUDA Drivers: Follow the installation guide.   We recommend:

    • Using a network repository as part of a package manager installation, skipping the CUDA toolkit installation as the libraries are available within the NIM container

    • Installing the open kernels for a specific version:

      Major Version

      EOL

      Data Center & RTX/Quadro GPUs

      GeForce GPUs

      > 550

      TBD

      X

      X

      550

      Feb 2025

      X

      X

      545

      Oct 2023

      X

      X

      535

      June 2026

      X

      525

      Nov 2023

      X

      470

      Sept 2024

      X

  1. Install Docker.

  2. Install the NVIDIA Container Toolkit.

After installing the toolkit, follow the instructions in the Configure Docker section in the NVIDIA Container Toolkit documentation.

To ensure that your setup is correct, run the following command:

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

This command should produce output similar to the following, where you can confirm the CUDA driver version and available GPUs.

   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA H100 80GB HBM3          On  |   00000000:1B:00.0 Off |                    0 |
   | N/A   36C    P0            112W /  700W |   78489MiB /  81559MiB |      0%      Default |
   |                                         |                        |             Disabled |
   +-----------------------------------------+------------------------+----------------------+

   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |  No running processes found                                                             |
   +-----------------------------------------------------------------------------------------+

Installing WSL2 for Windows#

Certain downloadable NIMs can be used on an RTX Windows system with Windows System for Linux (WSL). To enable WSL2, perform the following steps.

  1. Be sure your computer can run WSL2 as described in the Prerequisites section of the WSL2 documentation.

  2. Enable WSL2 on your Windows computer by following the steps in Install WSL command. By default, these steps install the Ubuntu distribution of Linux. For alternative installations, see Change the default Linux distribution installed.

NGC Authentication#

Generate an API key#

To access NGC resources, you need an NGC API key. You can generate a key here: Generate Personal Key.

When creating an NGC API Personal key, ensure that at least “NGC Catalog” is selected from the “Services Included” dropdown. More Services can be included if this key is to be reused for other purposes.

Note

Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, please refer the NGC User Guide.

Export the API key#

Pass the value of the API key to the docker run command in the next section as the NGC_API_KEY environment variable to download the appropriate models and resources when starting the NIM.

If you’re not familiar with how to create the NGC_API_KEY environment variable, the simplest way is to export it in your terminal:

export NGC_API_KEY=<value>

Run one of the following commands to make the key available at startup:

# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc

# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc

Note

Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE, or using a password manager.

Docker Login to NGC#

To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:

echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin

Use $oauthtoken as the username and NGC_API_KEY as the password. The $oauthtoken username is a special name that indicates that you will authenticate with an API key and not a user name and password.

Launching the NIM#

The following command launches a Docker container on any of the supported GPUs.

docker run -it --rm --name=riva-nmt \
  --runtime=nvidia \
  --gpus '"device=0"' \
  --shm-size=8GB \
  -e NGC_API_KEY=$NGC_API_KEY \
  -e NIM_TAGS_SELECTOR=name=megatron-riva-1b \
  -e NIM_HTTP_API_PORT=9000 \
  -e NIM_GRPC_API_PORT=50051 \
  -p 9000:9000 \
  -p 50051:50051 \
  nvcr.io/nim/nvidia/riva-nmt:1.3.0

Supported Models#

Model

Language

Model Type

Compute Capability (CC)

GPU Memory

megatron-riva-1b

Supported Languages

prebuilt

>=7.0

9 GB

Running Inference#

Note

It may take a up to 30 minutes depending on your network speed, for the container to be ready and start accepting requests from the time the docker container is started.

  1. Open a new terminal and run the following command to check if the service is ready to handle inference requests:

curl -X 'GET' 'http://localhost:9000/v1/health/ready'

If the service is ready, you get a response similar to the following.

{"status":"ready"}
  1. Install the Riva Python client

Riva uses gRPC APIs. You can download proto files from Riva gRPC Proto files and compile them to a target language using Protoc compiler. You can find Riva clients in C++ and Python languages at the following locations.

Install Riva Python client

sudo apt-get install python3-pip
pip install nvidia-riva-client

Download Riva sample client

git clone https://github.com/nvidia-riva/python-clients.git
  1. Run Text-to-Text translation inference:

python3 python-clients/scripts/nmt/nmt.py --server 0.0.0.0:50051 --text "This will become German words" --source-language-code en --target-language-code de

Above command will translate the text from English to German and output will be as shown below.

## Das werden deutsche Wörter

The sample client supports a number of options while making a transcription request to the gRPC endpoint, as described below.

  • --text - Text to translate.

  • --list-models - List available models and supported languages.

  • --source-language-code - Source language code. Refer to the Supported Languages section for supported language codes.

  • --target-language-code - Target language code. Refer to the Supported Languages section for supported language codes.

Supported Languages#

Riva NMT NIM supports translation for any to any language pair from below 36 languages.

  • Simplified Chinese (zh-CN)

  • Traditional Chinese (zh-TW)

  • Russian (ru)

  • German (de)

  • European Spanish (es-ES)

  • LATAM Spanish (es-US)

  • French (fr)

  • Danish (da)

  • Greek (el)

  • Finnish (fi)

  • Hungarian (hu)

  • Italian (it)

  • Lithuanian (lt)

  • Latvian (lv)

  • Dutch (nl)

  • Norwegian (no)

  • Polish (pl)

  • European Portuguese (pt-PT)

  • Brazilian Portuguese (pt-BR)

  • Romanian (ro)

  • Slovak (sk)

  • Swedish (sv)

  • Japanese (ja)

  • Hindi (hi)

  • Korean (ko)

  • Estonian (et)

  • Slovenian (sl)

  • Bulgarian (bg)

  • Ukrainian (uk)

  • Croatian (hr)

  • Arabic (ar)

  • Vietnamese (vi)

  • Turkish (tr)

  • Indonesian (id)

  • Czech (cs)

  • Thai (th)

Runtime Parameters for the Container#

Flags

Description

-it

--interactive + --tty (see Docker docs)

--rm

Delete the container after it stops (see Docker docs).

--name=<container-name>

Give a name to the NIM container. Use any preferred value.

--runtime=nvidia

Ensure NVIDIA drivers are accessible in the container.

--gpus '"device=0"'

Expose NVIDIA GPU 0 inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use. See GPU Enumeration for further information on for mounting specific GPUs.

--shm-size=8GB

Allocate host memory for multi-GPU communication.

-e NGC_API_KEY=$NGC_API_KEY

Provide the container with the token necessary to download adequate models and resources from NGC. See NGC Authentication.

-e NIM_HTTP_API_PORT=<port>

Specify the port to use for HTTP endpoint. Port can have any value except 8000. Default 9000.

-e NIM_GRPC_API_PORT=<port>

Specify the port to use for GRPC endpoint. Default 50051.

-p 9000:9000

Forward the port where the NIM HTTP server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (9000 here), while the right-hand side is the container port where the NIM HTTP server is published. Container port can be any value except 8000.

-p 50051:50051

Forward the port where the NIM gRPC server is published inside the container to access from the host system. The left-hand side of : is the host system ip:port (50051 here), while the right-hand side is the container port where the NIM gRPC server is published.

Model Caching#

On initial startup, the container will download the models from NGC. You can skip this download step on future runs by caching the model locally using a cache directory as shown below.

# Create the cache directory on the host machine:
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p $LOCAL_NIM_CACHE
chmod 777 $LOCAL_NIM_CACHE

# Run the container with the cache directory mounted in the appropriate location:
docker run -it --rm --name=riva-nmt \
      --runtime=nvidia \
      --gpus '"device=0"' \
      --shm-size=8GB \
      -e NGC_API_KEY \
      -e NIM_HTTP_API_PORT=9000 \
      -e NIM_GRPC_API_PORT=50051 \
      -p 9000:9000 \
      -p 50051:50051 \
      -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
      nvcr.io/nim/nvidia/riva-nmt:1.3.0

On subsequent runs, the models will be loaded from cache.

Stopping the Container#

The following commands stop the container by stopping and removing the running docker container.

docker stop riva-nmt
docker rm riva-nmt