Troubleshooting NVIDIA TTS NIM Microservice Issues#
This page covers troubleshooting issues specific to the NVIDIA TTS NIM microservices. For issues shared across all NVIDIA Speech NIM microservices, see Common Issues.
gRPC Message Size Limit Exceeded#
Symptom#
Offline synthesis fails with a gRPC error when synthesizing long text. The error message indicates the response exceeds the maximum message size.
Cause#
gRPC limits message size to 4 MB by default. Long input text can produce audio that exceeds this limit when returned as a single response in offline mode. TTS output is capped at 20 seconds of audio per request.
Solution#
Use streaming synthesis instead of offline synthesis. Streaming returns audio in chunks and handles arbitrarily long text.
python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 \
--language-code en-US \
--text "Your long input text here" \
--voice Magpie-Multilingual.EN-US.Aria \
--stream \
--output output.wav
Invalid or Unrecognized Voice Name#
Symptom#
The synthesis request returns an error indicating the voice name is invalid or not found.
Cause#
Voice names are model-specific and follow the format Model.LOCALE.Speaker. Using an incorrect name, misspelling a voice, or requesting a voice that belongs to a different model triggers this error.
Solution#
List the available voices for the deployed model:
python3 python-clients/scripts/tts/talk.py \ --server 0.0.0.0:50051 \ --list-voices
Use the exact voice name from the list. Voice name formats vary by model:
Model
Format
Example
Magpie TTS Multilingual
Magpie-Multilingual.LOCALE.SpeakerMagpie-Multilingual.EN-US.AriaMagpie TTS Zeroshot
Magpie-ZeroShot.SpeakerMagpie-ZeroShot.Female-1Magpie TTS Flow
English-US-Magpie-Flow.SpeakerEnglish-US-Magpie-Flow.Female-1RAD-TTS HiFi-GAN
English-US-RadTTS.SpeakerEnglish-US-RadTTS.Female-1Verify you are using a voice that matches the deployed model. For example,
Magpie-ZeroShot.Female-1is only available on the Magpie TTS Zeroshot model. See the TTS support matrix for all available voices per model.
Streaming HTTP Output Is Raw Audio (Not WAV)#
Symptom#
The audio file produced by the streaming HTTP endpoint (/v1/audio/synthesize_online) cannot be played or sounds like static. Audio players report an invalid or unrecognized format.
Cause#
The streaming HTTP API returns raw LPCM audio data without a WAV header. Opening the raw file directly in an audio player fails because the player cannot determine the sample rate, bit depth, or channel count.
Solution#
Convert the raw output to WAV using sox. Match the sample rate to your model: 22050 Hz for Magpie models, 44100 Hz for RAD-TTS.
curl -sS http://localhost:9000/v1/audio/synthesize_online --fail-with-body \
-F language=en-US \
-F text="Your text here" \
-F voice=Magpie-Multilingual.EN-US.Aria \
-F sample_rate_hz=22050 \
--output output.raw
sox -b 16 -e signed -c 1 -r 22050 output.raw output.wav
Match the -r value to the sample_rate_hz used in the request.
Voice Cloning Audio Prompt Rejected#
Symptom#
The voice cloning request fails or returns an error indicating the audio prompt is invalid.
Cause#
The audio prompt does not meet the requirements. Magpie TTS Zeroshot and Magpie TTS Flow require a reference audio file that is:
3 to 5 seconds in duration (recommended).
16-bit mono WAV format.
22.05 kHz sample rate.
Solution#
Verify the audio prompt format:
sox --info reference.wav
Convert the audio to the required format if needed:
sox input.wav -r 22050 -c 1 -b 16 reference.wav
Trim the audio to 3–5 seconds:
sox reference.wav trimmed.wav trim 0 5
For Magpie TTS Flow, provide the
--zero_shot_transcriptparameter with the exact transcript of the audio prompt.