Support Matrix#
This documentation describes the software and hardware that Riva TTS NIM supports.
Hardware#
NVIDIA Riva TTS NIM is supported on NVIDIA Volta or later GPU (Compute Capability >= 7.0). Avoid exceeding the available memory when selecting models to deploy; 16+ GB VRAM is recommended.
GPUs Supported#
GPU |
Precision |
---|---|
A30, A100 |
FP16 |
H100 |
FP16 |
A2, A10, A16, A40 |
FP16 |
L4, L40, GeForce RTX 40xx |
FP16 |
GeForce RTX 50xx |
FP16 |
Software#
Linux operating systems (Ubuntu 22.04 or later recommended)
NVIDIA Driver >= 535
NVIDIA Docker >= 23.0.1
Supported Models#
Riva TTS NIM supports the following models.
NIM automatically downloads the prebuilt model if it is available on the target GPU (GPUs with Compute Capability >= 8.0) or generates an optimized model on-the-fly using RMIR model on other GPUs (Compute Capability >= 7.0).
Model |
Languages Supported |
Inference Mode |
Publisher |
WSL Support |
Model Format |
---|---|---|---|---|---|
English (en-US), Spanish (es-US), French (fr-FR), German (de-DE) |
Streaming & Offline |
NVIDIA |
❌ |
Prebuilt |
|
English (en-US) |
Streaming & Offline |
NVIDIA |
❌ |
Prebuilt |
|
English (en-US) |
Offline |
NVIDIA |
✅ |
Prebuilt |
|
English (en-US) |
Streaming & Offline |
NVIDIA |
❌ |
Prebuilt, RMIR |
Note
All models use FP16 precision.
Magpie TTS Multilingual#
Model supports text to speech in English (en-US), Spanish (es-US) and French (fr-FR) languages.
To use this model, set CONTAINER_ID
to magpie-tts-multilingual
. Then, set NIM_TAGS_SELECTOR
to one of the values from the following table as required. For more information, refer to Launching the NIM.
Model Profiles#
Model Profile |
Inference Mode |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|
|
offline and streaming |
4.8 |
5.8 |
Available Voices#
Magpie-Multilingual.EN-US.Sofia (Default Recommended for Female speaker)
Magpie-Multilingual.EN-US.Sofia.Calm
Magpie-Multilingual.EN-US.Sofia.Fearful
Magpie-Multilingual.EN-US.Sofia.Happy
Magpie-Multilingual.EN-US.Sofia.Angry
Magpie-Multilingual.EN-US.Sofia.Neutral
Magpie-Multilingual.EN-US.Ray.Calm
Magpie-Multilingual.EN-US.Ray.Fearful
Magpie-Multilingual.EN-US.Ray.Happy
Magpie-Multilingual.EN-US.Ray.Neutral
Magpie-Multilingual.EN-US.Ray.Angry
Magpie-Multilingual.EN-US.Ray.Disgusted
Magpie-Multilingual.EN-US.Ray (Recommended for Male speaker)
Magpie-Multilingual.EN-US.Mia
Magpie-Multilingual.ES-US.Isabela (Recommended for Female speaker)
Magpie-Multilingual.ES-US.Isabela.Neutral
Magpie-Multilingual.ES-US.Isabela.Angry
Magpie-Multilingual.ES-US.Isabela.Happy
Magpie-Multilingual.ES-US.Isabela.Calm
Magpie-Multilingual.ES-US.Isabela.Pleasant_Surprise
Magpie-Multilingual.ES-US.Isabela.Sad
Magpie-Multilingual.ES-US.Diego (Recommended for Male speaker)
Magpie-Multilingual.ES-US.Diego.Neutral
Magpie-Multilingual.ES-US.Diego.Angry
Magpie-Multilingual.ES-US.Diego.Happy
Magpie-Multilingual.ES-US.Diego.Calm
Magpie-Multilingual.ES-US.Diego.Pleasant_Surprise
Magpie-Multilingual.ES-US.Diego.Sad
Magpie-Multilingual.ES-US.Diego.Disgust
Magpie-Multilingual.FR-FR.Louise (Recommended for Female speaker)
Magpie-Multilingual.FR-FR.Louise.Angry
Magpie-Multilingual.FR-FR.Louise.Calm
Magpie-Multilingual.FR-FR.Louise.Disgust
Magpie-Multilingual.FR-FR.Louise.Sad
Magpie-Multilingual.FR-FR.Louise.Happy
Magpie-Multilingual.FR-FR.Louise.Fearful
Magpie-Multilingual.FR-FR.Louise.Neutral
Magpie-Multilingual.FR-FR.Pascal (Recommended for Male speaker)
Magpie-Multilingual.FR-FR.Pascal.Neutral
Magpie-Multilingual.FR-FR.Pascal.Angry
Magpie-Multilingual.FR-FR.Pascal.Calm
Magpie-Multilingual.FR-FR.Pascal.Sad
Magpie-Multilingual.DE-DE.Jason Magpie-Multilingual.DE-DE.Leo Magpie-Multilingual.DE-DE.Aria
Magpie TTS Zeroshot#
This model supports synthesizing speech in English (en-US) from input text and an audio prompt of three to ten seconds. A set of built-in voices are also available to use.
To use this model, set CONTAINER_ID
to magpie-tts-zeroshot
. Then, set NIM_TAGS_SELECTOR
to one of the values from the following table, based on your requirements. For more information, refer to Launching the NIM.
Note
Access to Magpie TTS Zeroshot model is restricted. Apply for access
Model Profiles#
Model Profile |
Inference Mode |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|
|
offline and streaming |
3.8 |
4.8 |
Available Voices#
"Magpie-ZeroShot.Female-1", (Default, Recommended for Female speaker)
"Magpie-ZeroShot.Female-Neutral",
"Magpie-ZeroShot.Female-Angry",
"Magpie-ZeroShot.Female-Fearful",
"Magpie-ZeroShot.Female-Calm",
"Magpie-ZeroShot.Female-Happy",
"Magpie-ZeroShot.Male-1", (Recommended for Male speaker)
"Magpie-ZeroShot.Male-Calm",
"Magpie-ZeroShot.Male-Neutral",
"Magpie-ZeroShot.Male-Angry",
"Magpie-ZeroShot.Male-Fearful"
Magpie TTS Flow#
This is an offline-only model to support synthesizing speech in English (en-US) from input text. In addition to built-in voice support, this model also supports speech synthesis using an audio prompt of three to ten seconds along with a transcript for the provided audio prompt.
To use this model, set CONTAINER_ID
to magpie-tts-flow
. Then, set NIM_TAGS_SELECTOR
to one of the values from the following table, based on your requirements. Refer to Launching the NIM for details.
Note
Access to the Magpie TTS Flow model is restricted. Apply here to request access.
Model Profiles#
Model Profile |
Inference Mode |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|
|
offline |
7.2 |
5.1 |
Available Voices#
English-US-Magpie-Flow.Female-1 (Default, Recommended for Female speaker)
English-US-Magpie-Flow.Female.Calm
English-US-Magpie-Flow.Female.Fearful
English-US-Magpie-Flow.Female.Happy
English-US-Magpie-Flow.Female.Neutral
English-US-Magpie-Flow.Female.Angry
English-US-Magpie-Flow.Female.Disgusted
English-US-Magpie-Flow.Female.Sad
English-US-Magpie-Flow.Male.Calm
English-US-Magpie-Flow.Male.Fearful
English-US-Magpie-Flow.Male.Happy
English-US-Magpie-Flow.Male.Neutral
English-US-Magpie-Flow.Male.Angry
English-US-Magpie-Flow.Male.Disgusted
English-US-Magpie-Flow.Male.Sad
English-US-Magpie-Flow.Male-1 (Recommended for Male speaker)
Fastpitch HifiGAN en-US#
Model supports text to speech in English (en-US) language.
For using this model, set CONTAINER_ID
to riva-tts
. Set NIM_TAGS_SELECTOR
to one of the values from the following table as required. Refer Launching the NIM for details.
Model Profiles#
Model Profile |
Inference Mode |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|
|
offline and streaming |
1.4 |
2 |