Support Matrix#

This documentation describes the software and hardware that Riva TTS NIM supports.

Hardware#

NVIDIA Riva TTS NIM is supported on NVIDIA Volta or later GPU (Compute Capability >= 7.0). Avoid exceeding the available memory when selecting models to deploy; 16+ GB VRAM is recommended.

GPUs Supported#

GPU	Precision
A30, A100	FP16
H100	FP16
A2, A10, A16, A40	FP16
L4, L40, GeForce RTX 40xx	FP16
GeForce RTX 50xx	FP16

Software#

Linux operating systems (Ubuntu 22.04 or later recommended)
NVIDIA Driver >= 535
NVIDIA Docker >= 23.0.1

Supported Models#

Riva TTS NIM supports the following models.

NIM automatically downloads the prebuilt model if it is available on the target GPU (GPUs with Compute Capability >= 8.0) or generates an optimized model on-the-fly using RMIR model on other GPUs (Compute Capability >= 7.0).

Model	Languages Supported	Inference Mode	Publisher	WSL Support	Model Format
Magpie TTS Multilingual	English (en-US), Spanish (es-US), French (fr-FR), German (de-DE)	Streaming & Offline	NVIDIA	❌	Prebuilt
Magpie TTS Zeroshot	English (en-US)	Streaming & Offline	NVIDIA	❌	Prebuilt
Magpie TTS Flow	English (en-US)	Offline	NVIDIA	✅	Prebuilt
Fastpitch HifiGAN en-US	English (en-US)	Streaming & Offline	NVIDIA	❌	Prebuilt, RMIR

Note

All models use FP16 precision.

Magpie TTS Multilingual#

Model supports text to speech in English (en-US), Spanish (es-US) and French (fr-FR) languages.

To use this model, set CONTAINER_ID to magpie-tts-multilingual. Then, set NIM_TAGS_SELECTOR to one of the values from the following table as required. For more information, refer to Launching the NIM.

Model Profiles#

Model Profile (Selected using `NIM_TAGS_SELECTOR`)	Inference Mode	CPU Memory (GB)	GPU Memory (GB)
`name=magpie-tts-multilingual`	offline and streaming	4.8	5.8

Available Voices#

English (en-US)

Magpie-Multilingual.EN-US.Sofia (Default Recommended for Female speaker)
Magpie-Multilingual.EN-US.Sofia.Calm
Magpie-Multilingual.EN-US.Sofia.Fearful
Magpie-Multilingual.EN-US.Sofia.Happy
Magpie-Multilingual.EN-US.Sofia.Angry
Magpie-Multilingual.EN-US.Sofia.Neutral
Magpie-Multilingual.EN-US.Ray.Calm
Magpie-Multilingual.EN-US.Ray.Fearful
Magpie-Multilingual.EN-US.Ray.Happy
Magpie-Multilingual.EN-US.Ray.Neutral
Magpie-Multilingual.EN-US.Ray.Angry
Magpie-Multilingual.EN-US.Ray.Disgusted
Magpie-Multilingual.EN-US.Ray (Recommended for Male speaker)
Magpie-Multilingual.EN-US.Mia

Spanish (es-US)

Magpie-Multilingual.ES-US.Isabela (Recommended for Female speaker)
Magpie-Multilingual.ES-US.Isabela.Neutral
Magpie-Multilingual.ES-US.Isabela.Angry
Magpie-Multilingual.ES-US.Isabela.Happy
Magpie-Multilingual.ES-US.Isabela.Calm
Magpie-Multilingual.ES-US.Isabela.Pleasant_Surprise
Magpie-Multilingual.ES-US.Isabela.Sad
Magpie-Multilingual.ES-US.Diego (Recommended for Male speaker)
Magpie-Multilingual.ES-US.Diego.Neutral
Magpie-Multilingual.ES-US.Diego.Angry
Magpie-Multilingual.ES-US.Diego.Happy
Magpie-Multilingual.ES-US.Diego.Calm
Magpie-Multilingual.ES-US.Diego.Pleasant_Surprise
Magpie-Multilingual.ES-US.Diego.Sad
Magpie-Multilingual.ES-US.Diego.Disgust

French (fr-FR)

Magpie-Multilingual.FR-FR.Louise (Recommended for Female speaker)
Magpie-Multilingual.FR-FR.Louise.Angry
Magpie-Multilingual.FR-FR.Louise.Calm
Magpie-Multilingual.FR-FR.Louise.Disgust
Magpie-Multilingual.FR-FR.Louise.Sad
Magpie-Multilingual.FR-FR.Louise.Happy
Magpie-Multilingual.FR-FR.Louise.Fearful
Magpie-Multilingual.FR-FR.Louise.Neutral
Magpie-Multilingual.FR-FR.Pascal (Recommended for Male speaker)
Magpie-Multilingual.FR-FR.Pascal.Neutral
Magpie-Multilingual.FR-FR.Pascal.Angry
Magpie-Multilingual.FR-FR.Pascal.Calm
Magpie-Multilingual.FR-FR.Pascal.Sad

German (de-DE)

Magpie-Multilingual.DE-DE.Jason Magpie-Multilingual.DE-DE.Leo Magpie-Multilingual.DE-DE.Aria

Magpie TTS Zeroshot#

This model supports synthesizing speech in English (en-US) from input text and an audio prompt of three to ten seconds. A set of built-in voices are also available to use.

To use this model, set CONTAINER_ID to magpie-tts-zeroshot. Then, set NIM_TAGS_SELECTOR to one of the values from the following table, based on your requirements. For more information, refer to Launching the NIM.

Note

Access to Magpie TTS Zeroshot model is restricted. Apply for access

Model Profiles#

Model Profile (Selected using `NIM_TAGS_SELECTOR`)	Inference Mode	CPU Memory (GB)	GPU Memory (GB)
`name=magpie-tts-zeroshot`	offline and streaming	3.8	4.8

Available Voices#

English (en-US)

"Magpie-ZeroShot.Female-1", (Default, Recommended for Female speaker)
"Magpie-ZeroShot.Female-Neutral",
"Magpie-ZeroShot.Female-Angry",
"Magpie-ZeroShot.Female-Fearful",
"Magpie-ZeroShot.Female-Calm",
"Magpie-ZeroShot.Female-Happy",
"Magpie-ZeroShot.Male-1", (Recommended for Male speaker)
"Magpie-ZeroShot.Male-Calm",
"Magpie-ZeroShot.Male-Neutral",
"Magpie-ZeroShot.Male-Angry",
"Magpie-ZeroShot.Male-Fearful"

Magpie TTS Flow#

This is an offline-only model to support synthesizing speech in English (en-US) from input text. In addition to built-in voice support, this model also supports speech synthesis using an audio prompt of three to ten seconds along with a transcript for the provided audio prompt.

To use this model, set CONTAINER_ID to magpie-tts-flow. Then, set NIM_TAGS_SELECTOR to one of the values from the following table, based on your requirements. Refer to Launching the NIM for details.

Note

Access to the Magpie TTS Flow model is restricted. Apply here to request access.

Model Profiles#

Model Profile (Selected using `NIM_TAGS_SELECTOR`)	Inference Mode	CPU Memory (GB)	GPU Memory (GB)
`name=magpie-tts-flow`	offline	7.2	5.1

Available Voices#

English (en-US)

English-US-Magpie-Flow.Female-1 (Default, Recommended for Female speaker)
English-US-Magpie-Flow.Female.Calm
English-US-Magpie-Flow.Female.Fearful
English-US-Magpie-Flow.Female.Happy
English-US-Magpie-Flow.Female.Neutral
English-US-Magpie-Flow.Female.Angry
English-US-Magpie-Flow.Female.Disgusted
English-US-Magpie-Flow.Female.Sad
English-US-Magpie-Flow.Male.Calm
English-US-Magpie-Flow.Male.Fearful
English-US-Magpie-Flow.Male.Happy
English-US-Magpie-Flow.Male.Neutral
English-US-Magpie-Flow.Male.Angry
English-US-Magpie-Flow.Male.Disgusted
English-US-Magpie-Flow.Male.Sad
English-US-Magpie-Flow.Male-1 (Recommended for Male speaker)

Fastpitch HifiGAN en-US#

Model supports text to speech in English (en-US) language.

For using this model, set CONTAINER_ID to riva-tts. Set NIM_TAGS_SELECTOR to one of the values from the following table as required. Refer Launching the NIM for details.

Model Profiles#

Model Profile (Selected using `NIM_TAGS_SELECTOR`)	Inference Mode	CPU Memory (GB)	GPU Memory (GB)
`name=fastpitch-hifigan-en-us`	offline and streaming	1.4	2