Batch Synthesis from Text Files#
The WebSocket realtime client (realtime_tts_client.py) supports synthesizing speech from text files and processing multiple lines in parallel. This is useful for converting large text datasets, generating audio for multiple prompts, or benchmarking throughput.
Prerequisites#
A deployed TTS NIM microservice. Refer to the TTS tutorial for deployment steps.
Installed the NVIDIA Riva Python client.
Prepare a Text File#
The realtime client accepts two input file formats.
Plain Text Format#
Each line becomes a separate synthesis request. Empty lines are skipped.
Welcome to NVIDIA speech synthesis.
This is the second line of text to synthesize.
Each line produces a separate audio file.
Pipe-Separated Format#
Each line contains an identifier and text separated by a pipe (|). The client extracts the text portion after the pipe.
audio_001|Welcome to NVIDIA speech synthesis.
audio_002|This is the second line of text to synthesize.
audio_003|Each line produces a separate audio file.
This format is common in speech dataset pipelines where each line corresponds to a labeled audio sample.
Synthesize from a Text File#
Pass the file path with --input-file. Each non-empty line is synthesized independently.
python3 python-clients/scripts/tts/realtime_tts_client.py \
--server localhost:9000 \
--language-code en-US \
--voice Magpie-Multilingual.EN-US.Aria \
--input-file input.txt \
--output output.wav
With a single request (default), the client processes lines sequentially. Each line produces a separate WAV file named with a numeric index:
output0.wav– first lineoutput1.wav– second lineoutput2.wav– third line
The index is appended before the file extension of the --output filename.
Process Lines in Parallel#
Use --num-parallel-requests to synthesize multiple lines concurrently. The client opens multiple WebSocket connections and limits concurrency with a semaphore.
python3 python-clients/scripts/tts/realtime_tts_client.py \
--server localhost:9000 \
--language-code en-US \
--voice Magpie-Multilingual.EN-US.Aria \
--input-file input.txt \
--num-parallel-requests 4 \
--output output.wav
This processes up to 4 lines simultaneously, reducing total synthesis time for large files.
Note
Each parallel request opens a separate WebSocket connection. Set --num-parallel-requests based on your server capacity and GPU memory. Higher values increase throughput but also increase GPU memory and CPU usage.
Combine with Voice Cloning#
Batch synthesis works with zero-shot voice cloning. Add the audio prompt flags alongside --input-file.
python3 python-clients/scripts/tts/realtime_tts_client.py \
--server localhost:9000 \
--language-code en-US \
--input-file input.txt \
--zero-shot-audio-prompt-file prompt.wav \
--num-parallel-requests 2 \
--output output.wav
Combine with a Custom Dictionary#
Apply custom pronunciations to all lines in the batch by passing a dictionary file.
python3 python-clients/scripts/tts/realtime_tts_client.py \
--server localhost:9000 \
--language-code en-US \
--voice Magpie-Multilingual.EN-US.Aria \
--input-file input.txt \
--custom-dictionary custom_dict.txt \
--output output.wav
Refer to Customizing TTS Models for dictionary format details.
Play Audio in Real Time#
Use --play-audio to play each synthesized result through the system speakers as it completes. This works with or without --output.
python3 python-clients/scripts/tts/realtime_tts_client.py \
--server localhost:9000 \
--language-code en-US \
--text "Play this audio through the speakers." \
--play-audio
Note
Real-time playback requires PyAudio. Install it with pip install pyaudio.
Reference: Realtime Client Flags#
Flag |
Default |
Description |
|---|---|---|
|
– |
Path to a text file (plain or pipe-separated) |
|
|
Number of concurrent WebSocket connections |
|
– |
Output WAV file path (indexed for multiple lines) |
|
|
Play audio through system speakers |
|
|
Output encoding ( |
|
|
Output audio sample rate |
|
|
Enable debug logging |