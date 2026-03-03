Getting Started#
Prerequisites#
Setup#
NVIDIA AI Enterprise License: Riva ASR NIM is available for self-hosting under the NVIDIA AI Enterprise (NVAIE) License.
NVIDIA GPU(s): Riva ASR NIM runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized. Refer to the Supported Models for more information.
CPU: x86_64 architecture only for this release
OS: any Linux distributions which:
Are supported by the NVIDIA Container toolkit
glibc>= 2.35 in the output of
ld -v
CUDA Drivers: Follow the installation guide. We recommend:
Using a network repository as part of a package manager installation, skipping the CUDA toolkit installation as the libraries are available within the NIM container
Installing the open kernels for a specific version:
Major Version
EOL
Data Center & RTX/Quadro GPUs
GeForce GPUs
> 550
TBD
X
X
550
Feb 2025
X
X
545
Oct 2023
X
X
535
June 2026
X
525
Nov 2023
X
470
Sept 2024
X
Install Docker.
Install the NVIDIA Container Toolkit.
After installing the toolkit, follow the instructions in the Configure Docker section in the NVIDIA Container Toolkit documentation.
To ensure your setup is correct, run the following command:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
This command produces output similar to the following, where you can confirm the CUDA driver version and available GPUs.
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:1B:00.0 Off | 0 |
| N/A 36C P0 112W / 700W | 78489MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Installing WSL2 for Windows#
Refer NIM on WSL2 documentation for setup instructions.
NGC Authentication#
Generate an API key#
To access NGC resources, you need an NGC API key. You can generate a key here: Generate Personal Key.
When creating an NGC API Personal key, ensure at least “NGC Catalog” is selected from the “Services Included” drop-down. More services can be included if this key is to be reused for other purposes.
Note
Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, refer to the NGC User Guide.
Export the API key#
Pass the value of the API key to the
docker run command in the next section as the
NGC_API_KEY environment variable to download the appropriate models and resources when starting the NIM.
If you’re not familiar with how to create the
NGC_API_KEY environment variable, the simplest way is to export it in your terminal:
export NGC_API_KEY=<value>
Run one of the following commands to make the key available at startup:
# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc
# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Note
Additional secure options include saving the value in a file so that you can retrieve with
cat $NGC_API_KEY_FILE or using a password manager.
Docker Login to NGC#
To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Use
$oauthtoken as the username and
NGC_API_KEY as the password. The
$oauthtoken username is a special name that indicates you will authenticate with an API key, not a username and password.
Launching the NIM#
Models are available in two formats:
Prebuilt: Prebuilt models use TensorRT engines for optimized inference. ONNX or PyTorch models are used in cases where TensorRT engine is not available. You can download and use them directly on the corresponding GPU.
RMIR: This intermediate model format requires an additional deployment step before you can use it. You can optimize this model with TensorRT and deploy it on any supported GPU. This model format is automatically chosen if a prebuilt model is not available for your GPU.
NIM automatically downloads the prebuilt model on supported GPUs or generates an optimized model on-the-fly using RMIR model on other GPUs.
For WSL2 deployment, supported models are listed in the Supported Models. We recommend using the profile with the smallest Batch Size because it is optimized for lower memory usage. Smaller batch size profiles may not be available for all models.
Refer to the Supported Models section to choose the desired model and set the appropriate values for
CONTAINER_ID and
NIM_TAGS_SELECTOR.
NIM_TAGS_SELECTOR allows you to select a specific profile from the available options.
For example, the following commands deploy the Parakeet 1.1b en-US model profile with all modes using default batch size.
export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-1-1b-ctc-en-us,mode=all"
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-e NIM_TAGS_SELECTOR \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
If deploying in a WSL2 environment, you may need to use
podman instead of
docker in the respective commands.
It could take up to 30 minutes for the Docker container to be ready and start accepting requests depending on your network speed.
Note
The Parakeet CTC 1.1b English (En-US) model occasionally throws
Too many open files error. Add
--ulimit nofile=2048:2048 to the
docker run command as a workaround.
Running Inference#
Open a new terminal and run the following command to check whether the service is ready to handle inference requests:
curl -X 'GET' 'http://localhost:9000/v1/health/ready'
If the service is ready, you get a response similar to the following.
{"status":"ready"}
Install the Riva Python client.
Riva uses gRPC APIs. You can download proto files from Riva gRPC Proto files and compile them to a target language using Protoc compiler. You can find Riva clients in C++ and Python languages at the following locations.
Install Riva Python client
sudo apt-get install python3-pip
pip install -U nvidia-riva-client
Download Riva sample client
git clone https://github.com/nvidia-riva/python-clients.git
Run Speech-to-Text (STT) inference.
Riva ASR supports Mono, 16-bit audio in WAV, OPUS and FLAC formats. In case you do not have a speech file available, you can use a sample speech file embedded in the Docker container launched in the previous section.
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .
Streaming transcription using gRPC and Realtime Websocket API
Ensure NIM with streaming mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with streaming word in the name.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code en-US --automatic-punctuation \
--input-file en-US_sample.wav
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0.0:9000 \
--language-code en-US --automatic-punctuation \
--input-file en-US_sample.wav
Input speech file is streamed to the service chunk-by-chunk.
Offline transcription using gRPC and HTTP API
Ensure NIM offline mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with offline word in the name.
Input speech file is sent to the service in one shot.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code en-US --automatic-punctuation \
--input-file en-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=en \
-F file="@en-US_sample.wav"
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/vi-VN_sample.wav .
Streaming transcription using gRPC and Realtime Websocket API
Ensure NIM with streaming mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with streaming word in the name.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code vi-VN --automatic-punctuation \
--input-file vi-VN_sample.wav
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0.0:9000 \
--language-code vi-VN --automatic-punctuation \
--input-file vi-VN_sample.wav
Input speech file is streamed to the service chunk-by-chunk.
Offline transcription using gRPC and HTTP API
Ensure NIM offline mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with offline word in the name.
Input speech file is sent to the service in one shot.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code vi-VN --automatic-punctuation \
--input-file vi-VN_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=vi \
-F file="@vi-VN_sample.wav"
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/es-US_sample.wav .
Streaming transcription using gRPC and Realtime Websocket API
Ensure NIM with streaming mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with streaming word in the name.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code es-US --automatic-punctuation \
--input-file es-US_sample.wav
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0.0:9000 \
--language-code es-US --automatic-punctuation \
--input-file es-US_sample.wav
Input speech file is streamed to the service chunk-by-chunk.
Offline transcription using gRPC and HTTP API
Ensure NIM offline mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with offline word in the name.
Input speech file is sent to the service in one shot.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code es-US --automatic-punctuation \
--input-file es-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=es \
-F file="@es-US_sample.wav"
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/zh-CN_sample.wav .
Streaming transcription using gRPC and Realtime Websocket API
Ensure NIM with streaming mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with streaming word in the name.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code zh-CN --automatic-punctuation \
--input-file zh-CN_sample.wav
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0.0:9000 \
--language-code zh-CN --automatic-punctuation \
--input-file zh-CN_sample.wav
Input speech file is streamed to the service chunk-by-chunk.
Offline transcription using gRPC and HTTP API
Ensure NIM offline mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with offline word in the name.
Input speech file is sent to the service in one shot.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code zh-CN --automatic-punctuation \
--input-file zh-CN_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=zh \
-F file="@zh-CN_sample.wav"
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/zh-TW_sample.wav .
Streaming transcription using gRPC and Realtime Websocket API
Ensure NIM with streaming mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with streaming word in the name.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code zh-TW --automatic-punctuation \
--input-file zh-TW_sample.wav
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0.0:9000 \
--language-code zh-TW --automatic-punctuation \
--input-file zh-TW_sample.wav
Input speech file is streamed to the service chunk-by-chunk.
Offline transcription using gRPC and HTTP API
Ensure NIM offline mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Above command queries available ASR models and prints to console. You should see a model with offline word in the name.
Input speech file is sent to the service in one shot.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code zh-TW --automatic-punctuation \
--input-file zh-TW_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=zh-TW \
-F file="@zh-TW_sample.wav"
Copy a sample audio file from the NIM container to the host machine, or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .
The Parakeet TDT NIM supports only the offline API.
Ensure that the NIM offline mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Transcription using gRPC and HTTP APIs
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code en-US \
--input-file en-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=en-US \
-F file="@en-US_sample.wav"
Parakeet 1.1b RNNT Multilingual model supports streaming speech-to-text transcription in multiple languages. The model identifies the spoken language and provides the transcript corresponding to spoken language.
Transcription using gRPC API
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .
docker cp $CONTAINER_ID:/opt/riva/wav/fr-FR_sample.wav .
Streaming mode example
Ensure NIM with streaming mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--list-models
Input speech file is streamed to the service chunk-by-chunk.
# Transcribe English speech
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code multi --automatic-punctuation \
--input-file en-US_sample.wav
# Transcribe French speech
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code multi --automatic-punctuation \
--input-file fr-FR_sample.wav
Offline mode example
Ensure NIM offline mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Input speech file is sent to the service in one shot.
Transcription using gRPC, HTTP, and Realtime APIs
# Transcribe English speech
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code multi --automatic-punctuation \
--input-file en-US_sample.wav
# Transcribe French speech
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code multi --automatic-punctuation \
--input-file fr-FR_sample.wav
# Transcribe English speech
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=multi \
-F file="@en-US_sample.wav"
# Transcribe French speech
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=multi \
-F file="@fr-FR_sample.wav"
# Transcribe English speech
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0.0:9000 \
--language-code multi --automatic-punctuation \
--input-file en-US_sample.wav
# Transcribe French speech
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0:9000 \
--language-code multi --automatic-punctuation \
--input-file fr-FR_sample.wav
Transcription using gRPC API
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/es-US_sample.wav .
Streaming mode example
Ensure NIM with streaming mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--list-models
. Input speech file is streamed to the service chunk-by-chunk.
python3 python-clients/scripts/asr/transcribe_file.py \
--server 0.0.0.0:50051 \
--language-code es-US --automatic-punctuation \
--input-file es-US_sample.wav
Offline mode example
Ensure NIM with offline mode model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Input speech file is sent to the service in one shot.
Transcription using gRPC, HTTP, and Realtime APIs
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code es-US --automatic-punctuation \
--input-file es-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=es \
-F file="@es-US_sample.wav"
python3 python-clients/scripts/asr/realtime_asr_client.py \
--server 0.0.0.0:9000 \
--language-code es-US --automatic-punctuation \
--input-file es-US_sample.wav
Whisper supports transcription in multiple languages. See Supported Languages for the list of all available languages and corresponding code. Specifying input language as multi will enable auto language detection. Specifying correct language is recommended as it will improve accuracy and latency. Whisper model has punctuation enabled by default.
Copy an example audio file from the NIM container to the or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .
Ensure NIM with Whisper Large v3 model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Transcription using gRPC and HTTP APIs
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language en --input-file en-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=en \
-F file="@en-US_sample.wav"
When the language code is not known beforehand, the language code multi can be passed. The model will predict the language for each 30 second chunk and return it to the client. The following command will print the transcript along with the predicted language.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language-code multi \
--input-file en-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=multi \
-F file="@en-US_sample.wav"
Note
The Whisper model supports offline mode only.
Whisper supports translation from multiple languages to the English language. Refer to Supported Languages for the list of all available languages and corresponding code. Specifying the input language as multi enables auto language detection. Specifying the correct input language is recommended because it improves accuracy and latency.
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/fr-FR_sample.wav .
Translation using gRPC and HTTP APIs
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language fr --input-file fr-FR_sample.wav \
--custom-configuration task:translate
curl -s http://0.0.0.0:9000/v1/audio/translations -F language=fr \
-F file="@fr-FR_sample.wav"
Note
The Whisper model supports offline mode only.
Canary supports transcription in ar-AR, cs-CZ, da-DK, de-DE, en-GB, en-US, es-ES, es-US, fr-CA, fr-FR, he-IL, hi-IN, it-IT, ja-JP, ko-KR, nb-NO, nl-NL, nn-NO, pl-PL, pt-BR, pt-PT, ru-RU, sv-SE, th-TH, tr-TR, zh-CN languages. Specifying the input language is required. The Canary model has punctuation enabled by default.
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/en-US_sample.wav .
Ensure NIM with the Canary model is deployed.
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--list-models
Transcription using gRPC and HTTP APIs
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language en-US --input-file en-US_sample.wav
curl -s http://0.0.0.0:9000/v1/audio/transcriptions -F language=en-US \
-F file="@en-US_sample.wav"
Note
The Canary model supports offline mode only.
Canary supports translation from en-US to ar-AR, bg-BG, cs-CZ, da-DK, de-DE, el-GR, en-US, et-EE, fi-FI, fr-FR, hi-IN, hr-HR, hu-HU, id-ID, it-IT, ja-JP, ko-KR, lt-LT, lv-LV, nb-NO, nl-NL, pl-PL, pt-BR, pt-PT, ro-RO, ru-RU, sk-SK, sl-SI, sv-SE, th-TH, tr-TR, uk-UA, vi-VN, zh-CN languages and ar-AR, cs-CZ, da-DK, de-DE, es-ES, es-US, fr-CA, fr-FR, he-IL, hi-IN, it-IT, ja-JP, ko-KR, nb-NO, nl-NL, nn-NO, pl-PL, pt-BR, pt-PT, ru-RU, sv-SE, tr-TR, zh-CN to en-US.
Copy an example audio file from the NIM container to the host machine or use your own.
docker cp $CONTAINER_ID:/opt/riva/wav/fr-FR_sample.wav .
docker cp $CONTAINER_ID:/opt/riva/examples/asr_lib/1272-135031-0000.wav .
Translation to English
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language fr-FR --input-file fr-FR_sample.wav \
--custom-configuration target_language:en-US,task:translate
curl -s http://0.0.0.0:9000/v1/audio/translations -F language=fr-FR \
-F target_language=en-US -F file="@fr-FR_sample.wav"
Translation from English
python3 python-clients/scripts/asr/transcribe_file_offline.py \
--server 0.0.0.0:50051 \
--language en-US --input-file 1272-135031-0000.wav \
--custom-configuration target_language:fr-FR,task:translate
curl -s http://0.0.0.0:9000/v1/audio/translations -F language=en-US \
-F target_language=fr-FR -F file="@1272-135031-0000.wav"
Note
The Canary model supports offline mode only.
Refer to the Customization page to see Riva ASR NIM features using sample Python clients and additional information on customizing model behavior. For building your own application in Python, refer to the Python code or try out the Riva ASR Notebook Jupyter Notebook for a more interactive guide.
Runtime Parameters for the Container#
|
Flags
|
Description
|
|
|
|
Delete the container after it stops (see Docker docs)
|
|
Give a name to the NIM container. Use any preferred value.
|
|
Ensure NVIDIA drivers are accessible in the container.
|
|
Expose NVIDIA GPU 0 inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use. See GPU Enumeration for further information on mounting specific GPUs.
|
|
Allocate host memory for multi-GPU communication.
|
|
Provide the container with the token necessary to download adequate models and resources from NGC. See NGC Authentication.
|
|
Specify the port to use for HTTP endpoint. Port can have any value except 8000. Default 9000.
|
|
Specify the port to use for GRPC endpoint. Default 50051.
|
|
Forward the port where the NIM HTTP server is published inside the container to access from the host system.
|
|
Forward the port where the NIM gRPC server is published inside the container to access from the host system.
|
|
Use this to filter tags in the auto profile selector. This can be a list of key-value pairs, where the key is the profile property name and the value is the desired property value.
Model Caching#
On initial startup, the container will download the models from NGC. You can skip this download step on future runs by caching the model locally using a cache directory as shown below.
# Create the cache directory on the host machine:
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p $LOCAL_NIM_CACHE
chmod 777 $LOCAL_NIM_CACHE
# Set the appropriate values
export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-1-1b-ctc-en-us,mode=str,model_type=prebuilt"
# Run the container with the cache directory mounted in the appropriate location:
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
On subsequent runs, the models will be loaded from cache.
Deploy the RMIR models and export the generated models for future use.
# Create the cache directory on the host machine:
export NIM_EXPORT_PATH=~/nim_export
mkdir -p $NIM_EXPORT_PATH
chmod 777 $NIM_EXPORT_PATH
# Set the appropriate values
export CONTAINER_ID=parakeet-1-1b-ctc-en-us
export NIM_TAGS_SELECTOR="name=parakeet-1-1b-ctc-en-us,mode=str,model_type=rmir"
# Run the container with the export directory mounted in the appropriate location:
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $NIM_EXPORT_PATH:/opt/nim/export \
-e NIM_EXPORT_PATH=/opt/nim/export \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
Once the model deployment is complete, container terminates with below log.
INFO:inference:Riva model generation completed
INFO:inference:Models exported to /opt/nim/export
INFO:inference:Exiting container
Subsequent runs can be made with below command with NIM_DISABLE_MODEL_DOWNLOAD=true. This command loads the exported models instead of downloading the models from NGC again.
# Run the container with the cache directory mounted in the appropriate location:
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $NIM_EXPORT_PATH:/opt/nim/export \
-e NIM_EXPORT_PATH=/opt/nim/export \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
Stopping the Container#
The following commands stop the container by stopping and removing the running docker container. Set the
CONTAINER_ID variable to the value which was used when starting the container.
docker stop $CONTAINER_ID
docker rm $CONTAINER_ID