Voices and Emotional Styles#
Each TTS model ships with a catalog of built-in voices. This page explains how voice names are structured, which speakers support emotional style variants, and how to discover and select voices at runtime.
Naming Convention#
Voice names follow a hierarchical pattern that encodes the model, locale, speaker, and optional emotional style.
Magpie TTS Multilingual#
Magpie-Multilingual.{LOCALE}.{Speaker}[.{Emotion}]
Segment |
Description |
Examples |
|---|---|---|
|
Model prefix |
Always |
|
Language and region in uppercase |
|
|
Speaker identity |
|
|
Optional emotional style |
|
When you omit the emotion segment (for example, Magpie-Multilingual.EN-US.Aria), the model uses a default neutral style for that speaker.
Examples:
Magpie-Multilingual.EN-US.Aria– Aria with default styleMagpie-Multilingual.EN-US.Aria.Happy– Aria with happy emotional styleMagpie-Multilingual.FR-FR.Pascal.Calm– Pascal (French) with calm styleMagpie-Multilingual.VI-VN.Long.Fearful– Long (Vietnamese) with fearful style
Magpie TTS Zeroshot#
Magpie-ZeroShot.{Gender}-{Style}
Built-in voices use a {Gender}-{Style} pattern. Female-1 and Male-1 are the recommended defaults.
Examples: Magpie-ZeroShot.Female-1, Magpie-ZeroShot.Male-Angry
Magpie TTS Flow#
English-US-Magpie-Flow.{Gender}[.{Emotion}]
Similar to Zeroshot, but uses dots to separate the emotion. Default voices use {Gender}-1.
Examples: English-US-Magpie-Flow.Female-1, English-US-Magpie-Flow.Male.Happy
Emotional Style Availability#
Not all speakers have emotional variants, and the available emotions differ by speaker and locale.
Emotions by Model#
Emotion |
Multilingual |
Zeroshot |
Flow |
|---|---|---|---|
Angry |
✅ |
✅ |
✅ |
Calm |
✅ |
✅ |
✅ |
Fearful |
✅ |
✅ |
✅ |
Happy |
✅ |
✅ |
✅ |
Neutral |
✅ |
✅ |
✅ |
Sad |
✅ |
❌ |
✅ |
Disgusted |
❌ |
❌ |
✅ |
Disgust |
✅ (ES-US, FR-FR, VI-VN only) |
❌ |
❌ |
PleasantSurprise |
✅ (ES-US only) |
❌ |
❌ |
Multilingual Speakers with Emotional Variants#
In the Magpie TTS Multilingual model, emotional styles are available only for specific speaker-locale combinations.
Speaker |
EN-US |
ES-US |
FR-FR |
DE-DE |
ZH-CN |
VI-VN |
IT-IT |
|---|---|---|---|---|---|---|---|
Aria |
6 emotions |
– |
– |
– |
– |
– |
– |
Jason |
4 emotions |
– |
– |
– |
– |
– |
– |
Leo |
5 emotions |
– |
– |
– |
– |
– |
– |
Mia |
5 emotions |
– |
– |
– |
– |
– |
– |
Ray |
5 emotions |
– |
– |
– |
– |
– |
– |
Sofia |
5 emotions |
– |
– |
– |
– |
– |
– |
Diego |
– |
6 emotions |
– |
– |
– |
– |
– |
Isabela |
– |
7 emotions |
– |
– |
– |
– |
– |
Pascal |
– |
– |
6 emotions |
– |
– |
– |
– |
Long |
– |
– |
– |
– |
– |
7 emotions |
– |
Speakers without emotional variants (for example, Diego in EN-US, Louise in all locales) produce speech in a default neutral style.
Discover Voices at Runtime#
List the voices a deployed TTS NIM is currently serving.
python3 python-clients/scripts/tts/talk.py \
--server 0.0.0.0:50051 \
--list-voices
curl -sS http://localhost:9000/v1/audio/list_voices | jq
python3 python-clients/scripts/tts/realtime_tts_client.py \
--server localhost:9000 \
--list-voices
The output lists voice names grouped by language code. Use these exact names in the --voice parameter when synthesizing speech.
Select a Voice#
Pass the full voice name to the --voice flag. If omitted, the server selects the first available voice for the specified --language-code.
python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
--language-code en-US \
--text "Hello from Aria with a happy tone." \
--voice Magpie-Multilingual.EN-US.Aria.Happy \
--output output.wav
Cross-Language Accents#
The Magpie TTS Multilingual model allows mixing a voice from one locale with text in another language to produce accented speech. For example, synthesize English text with a French-accented voice:
python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
--language-code en-US \
--text "This English text is spoken with a French accent." \
--voice Magpie-Multilingual.FR-FR.Pascal \
--output output.wav
Complete Voice Reference#
For the full list of all available voices per model and locale, refer to the TTS support matrix.