Is this page helpful?

Troubleshooting NVIDIA TTS NIM Microservice Issues#

This page covers troubleshooting issues specific to the NVIDIA TTS NIM microservices. For issues shared across all NVIDIA Speech NIM microservices, see Common Issues.

gRPC Message Size Limit Exceeded#

Symptom#

Offline synthesis fails with a gRPC error when synthesizing long text. The error message indicates the response exceeds the maximum message size.

Cause#

gRPC limits message size to 4 MB by default. Long input text can produce audio that exceeds this limit when returned as a single response in offline mode. TTS output is capped at 20 seconds of audio per request.

Solution#

Use streaming synthesis instead of offline synthesis. Streaming returns audio in chunks and handles arbitrarily long text.

python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
  --language-code en-US \
  --text "Your long input text here" \
  --voice Magpie-Multilingual.EN-US.Aria \
  --stream \
  --output output.wav

Invalid or Unrecognized Voice Name#

Symptom#

The synthesis request returns an error indicating the voice name is invalid or not found.

Cause#

Voice names are model-specific and follow the format Model.LOCALE.Speaker. Using an incorrect name, misspelling a voice, or requesting a voice that belongs to a different model triggers this error.

Solution#

List the available voices for the deployed model:

python3 python-clients/scripts/tts/talk.py \
  --server 0.0.0.0:50051 \
  --list-voices

Use the exact voice name from the list. Voice name formats vary by model:

Model	Format	Example
Magpie TTS Multilingual	`Magpie-Multilingual.LOCALE.Speaker`	`Magpie-Multilingual.EN-US.Aria`
Magpie TTS Zeroshot	`Magpie-ZeroShot.Speaker`	`Magpie-ZeroShot.Female-1`
Magpie TTS Flow	`English-US-Magpie-Flow.Speaker`	`English-US-Magpie-Flow.Female-1`
RAD-TTS HiFi-GAN	`English-US-RadTTS.Speaker`	`English-US-RadTTS.Female-1`

Verify you are using a voice that matches the deployed model. For example, Magpie-ZeroShot.Female-1 is only available on the Magpie TTS Zeroshot model. See the TTS support matrix for all available voices per model.

Streaming HTTP Output Is Raw Audio (Not WAV)#

Symptom#

The audio file produced by the streaming HTTP endpoint (/v1/audio/synthesize_online) cannot be played or sounds like static. Audio players report an invalid or unrecognized format.

Cause#

The streaming HTTP API returns raw LPCM audio data without a WAV header. Opening the raw file directly in an audio player fails because the player cannot determine the sample rate, bit depth, or channel count.

Solution#

Convert the raw output to WAV using sox. Match the sample rate to your model: 22050 Hz for Magpie models, 44100 Hz for RAD-TTS.

curl -sS http://localhost:9000/v1/audio/synthesize_online --fail-with-body \
  -F language=en-US \
  -F text="Your text here" \
  -F voice=Magpie-Multilingual.EN-US.Aria \
  -F sample_rate_hz=22050 \
  --output output.raw

sox -b 16 -e signed -c 1 -r 22050 output.raw output.wav

Match the -r value to the sample_rate_hz used in the request.

Voice Cloning Audio Prompt Rejected#

Symptom#

The voice cloning request fails or returns an error indicating the audio prompt is invalid.

Cause#

The audio prompt does not meet the requirements. Magpie TTS Zeroshot and Magpie TTS Flow require a reference audio file that is:

3 to 5 seconds in duration (recommended).
16-bit mono WAV format.
22.05 kHz sample rate.

Solution#

Verify the audio prompt format:
```
sox --info reference.wav
```

Convert the audio to the required format if needed:

sox input.wav -r 22050 -c 1 -b 16 reference.wav

Trim the audio to 3–5 seconds:
```
sox reference.wav trimmed.wav trim 0 5
```
For Magpie TTS Flow, provide the --zero_shot_transcript parameter with the exact transcript of the audio prompt.