Getting Started#
Prerequisites#
Setup#
NVIDIA AI Enterprise License: Riva TTS NIM is available for self-hosting under the NVIDIA AI Enterprise (NVAIE) License.
NVIDIA GPU(s): Riva TTS NIM runs on any NVIDIA GPU with sufficient GPU memory, but some model/GPU combinations are optimized. See the Supported Models for more information.
CPU: x86_64 architecture only for this release
OS: any Linux distributions which:
Are supported by the NVIDIA Container toolkit
Have
glibc
>= 2.35 in the output ofld -v
CUDA Drivers: Follow the installation guide. We recommend:
Using a network repository as part of a package manager installation, skipping the CUDA toolkit installation as the libraries are available within the NIM container
Installing the open kernels for a specific version:
Major Version
EOL
Data Center & RTX/Quadro GPUs
GeForce GPUs
> 550
TBD
X
X
550
Feb 2025
X
X
545
Oct 2023
X
X
535
June 2026
X
525
Nov 2023
X
470
Sept 2024
X
Install Docker.
Install the NVIDIA Container Toolkit.
After installing the toolkit, follow the instructions in the Configure Docker section in the NVIDIA Container Toolkit documentation.
To ensure that your setup is correct, run the following command:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
This command should produce output similar to the following, where you can confirm the CUDA driver version and available GPUs.
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:1B:00.0 Off | 0 |
| N/A 36C P0 112W / 700W | 78489MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
NGC Authentication#
Generate an API key#
To access NGC resources, you need an NGC API key. You can generate a key here: Generate Personal Key.
When creating an NGC API Personal key, ensure that at least “NGC Catalog” is selected from the “Services Included” dropdown. More Services can be included if this key is to be reused for other purposes.
Note
Personal keys allow you to configure an expiration date, revoke or delete the key using an action button, and rotate the key as needed. For more information about key types, please refer the NGC User Guide.
Export the API key#
Pass the value of the API key to the docker run
command in the next section as the NGC_API_KEY
environment variable to download the appropriate models and resources when starting the NIM.
If you’re not familiar with how to create the NGC_API_KEY
environment variable, the simplest way is to export it in your terminal:
export NGC_API_KEY=<value>
Run one of the following commands to make the key available at startup:
# If using bash
echo "export NGC_API_KEY=<value>" >> ~/.bashrc
# If using zsh
echo "export NGC_API_KEY=<value>" >> ~/.zshrc
Note
Other, more secure options include saving the value in a file, so that you can retrieve with cat $NGC_API_KEY_FILE
, or using a password manager.
Docker Login to NGC#
To pull the NIM container image from NGC, first authenticate with the NVIDIA Container Registry with the following command:
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
Use $oauthtoken
as the username and NGC_API_KEY
as the password. The $oauthtoken
username is a special name that indicates that you will authenticate with an API key and not a user name and password.
Launching the NIM#
Models are available in two formats:
Prebuilt: Prebuilt models use TensorRT engines for optimized inference. ONNX or PyTorch models are used in cases where TensorRT engine is not available. You can download and use them directly on the corresponding GPU.
RMIR: This intermediate model format requires an additional deployment step before you can use it. You can optimize this model with TensorRT and deploy it on any supported GPU. This model format is automatically chosen if a prebuilt model is not available for your GPU.
Riva TTS NIM automatically downloads the prebuilt model on supported GPUs or generates an optimized model on-the-fly using RMIR model on other GPUs.
Please refer to the Supported Models section to choose the desired model. Afterward, set CONTAINER_ID
and NIM_TAGS_SELECTOR
appropriately in the commands below.
For example, the following commands deploy Magpie TTS Multilingual model.
export CONTAINER_ID=magpie-tts-multilingual
export NIM_TAGS_SELECTOR=name=magpie-tts-multilingual
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-e NIM_TAGS_SELECTOR \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
Note
It may take up to 30 minutes for the Docker container to be ready and start accepting requests, depending on your network speed.
Running Inference#
Open a new terminal and run the following command to check if the service is ready to handle inference requests:
curl -X 'GET' 'http://localhost:9000/v1/health/ready'
If the service is ready, you get a response similar to the following.
{"status":"ready"}
Install the Riva Python client
Riva uses gRPC APIs. You can download proto files from Riva gRPC Proto files and compile them to a target language using Protoc compiler. You can find Riva clients in C++ and Python languages at the following locations.
Install Riva Python client
sudo apt-get install python3-pip
pip install nvidia-riva-client
Download Riva sample client
git clone https://github.com/nvidia-riva/python-clients.git
Query available TTS Models and Voices
python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 --list-voices
This command will show below output when Magpie TTS Multilingual model is deployed. Output is truncated for brevity.
{
"en-US,es-US,fr-FR": {
"voices": [
"Magpie-Multilingual.EN-US.Female.Neutral",
"Magpie-Multilingual.EN-US.Female.Calm",
"Magpie-Multilingual.EN-US.Female.Fearful",
...
"Magpie-Multilingual.FR-FR.Male.Male-1",
"Magpie-Multilingual.FR-FR.Female.Female-1",
"Magpie-Multilingual.FR-FR.Female.Angry",
...
"Magpie-Multilingual.ES-US.Male.Male-1",
"Magpie-Multilingual.ES-US.Female.Female-1",
"Magpie-Multilingual.ES-US.Female.Neutral",
"Magpie-Multilingual.ES-US.Male.Neutral",
...
Run Text-to-Speech (TTS) inference
You can run below command to synthesize speech from text. Synnthesized speech will be saved in output.wav
.
python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
--language-code en-US \
--text "Hello, this is a speech synthesizer." \
--voice Magpie-Multilingual.EN-US.Female.Female-1 \
--output output.wav
The sample client supports the following options to make a transcription request to the gRPC endpoint.
--text
- Text input to synthesize, supports providing SSML tags with text. See customization for further information.--language-code
- A language of input text. For exampleen-US
. Refer Supported Models for available languages.--list-voices
- List available voices. This argument should be used exclusively just to query the available voices before running inference.--voice
- A voice name to use. You can determine the value from the output of--list-voices
option.
Above section demonstrates the Riva TTS NIM features using sample Python clients. For building your own application in Python you can refer the Python code or try out the Riva TTS Notebook Jupyter Notebook for interactive guide.
Runtime Parameters for the Container#
Flags |
Description |
---|---|
|
|
|
Delete the container after it stops (see Docker docs). |
|
Give a name to the NIM container. Use any preferred value. |
|
Ensure NVIDIA drivers are accessible in the container. |
|
Expose NVIDIA GPU 0 inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use. See GPU Enumeration for further information on for mounting specific GPUs. |
|
Allocate host memory for multi-GPU communication. |
|
Provide the container with the token necessary to download adequate models and resources from NGC. See NGC Authentication. |
|
Specify the port to use for HTTP endpoint. Port can have any value except 8000. Default 9000. |
|
Specify the port to use for GRPC endpoint. Default 50051. |
|
Forward the port where the NIM HTTP server is published inside the container to access from the host system. The left-hand side of |
|
Forward the port where the NIM gRPC server is published inside the container to access from the host system. The left-hand side of |
|
Use this to filter tags in auto profile selector. This can be a list of key-value pairs, where the key is the profile property name and the value is the desired property value. For example: |
|
No |
|
Required if |
|
Required if |
|
Required if |
Model Caching#
On initial startup, the container will download the models from NGC. You can skip this download step on future runs by caching the model locally using a cache directory as shown below.
# Create the cache directory on the host machine
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p $LOCAL_NIM_CACHE
chmod 777 $LOCAL_NIM_CACHE
# Set appropriate value for container ID
export CONTAINER_ID=magpie-tts-multilingual
# Set the appropriate values for NIM_TAGS_SELECTOR.
export NIM_TAGS_SELECTOR="name=magpie-tts-multilingual,model_type=prebuilt"
# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
On subsequent runs, the models will be loaded from cache.
RMIR models needs to be deployed before you can use it. We need to deploy the models and export the generated models for later use.
# Create the cache directory on the host machine
export NIM_EXPORT_PATH=~/nim_export
mkdir -p $NIM_EXPORT_PATH
chmod 777 $NIM_EXPORT_PATH
# Set appropriate value for container ID
export CONTAINER_ID=riva-tts
# Set the appropriate values for <model> from the Supported Models table
export NIM_TAGS_SELECTOR="name=fastpitch-hifigan-en-us,model_type=rmir"
# Run the container with the export directory mounted in the appropriate location
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $NIM_EXPORT_PATH:/opt/nim/export \
-e NIM_EXPORT_PATH=/opt/nim/export \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
Once the model deployment is complete, container terminates with below log.
INFO:inference:Riva model generation completed
INFO:inference:Models exported to /opt/nim/export
INFO:inference:Exiting container
Subsequent runs can be made with below command with NIM_DISABLE_MODEL_DOWNLOAD=true. Exported models are loaded instead of downloading models from NGC.
# Run the container with the cache directory mounted in the appropriate location
docker run -it --rm --name=$CONTAINER_ID \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=8GB \
-e NGC_API_KEY \
-e NIM_TAGS_SELECTOR \
-e NIM_DISABLE_MODEL_DOWNLOAD=true \
-e NIM_HTTP_API_PORT=9000 \
-e NIM_GRPC_API_PORT=50051 \
-p 9000:9000 \
-p 50051:50051 \
-v $NIM_EXPORT_PATH:/opt/nim/export \
-e NIM_EXPORT_PATH=/opt/nim/export \
nvcr.io/nim/nvidia/$CONTAINER_ID:latest
Stopping the Container#
The following commands stop the container by stopping and removing the running docker container.
docker stop $CONTAINER_ID
docker rm $CONTAINER_ID