Batch Synthesis from Text Files#

The WebSocket realtime client (realtime_tts_client.py) supports synthesizing speech from text files and processing multiple lines in parallel. This is useful for converting large text datasets, generating audio for multiple prompts, or benchmarking throughput.

Prerequisites#

Prepare a Text File#

The realtime client accepts two input file formats.

Plain Text Format#

Each line becomes a separate synthesis request. Empty lines are skipped.

Welcome to NVIDIA speech synthesis.
This is the second line of text to synthesize.
Each line produces a separate audio file.

Pipe-Separated Format#

Each line contains an identifier and text separated by a pipe (|). The client extracts the text portion after the pipe.

audio_001|Welcome to NVIDIA speech synthesis.
audio_002|This is the second line of text to synthesize.
audio_003|Each line produces a separate audio file.

This format is common in speech dataset pipelines where each line corresponds to a labeled audio sample.

Synthesize from a Text File#

Pass the file path with --input-file. Each non-empty line is synthesized independently.

python3 python-clients/scripts/tts/realtime_tts_client.py \
    --server localhost:9000 \
    --language-code en-US \
    --voice Magpie-Multilingual.EN-US.Aria \
    --input-file input.txt \
    --output output.wav

With a single request (default), the client processes lines sequentially. Each line produces a separate WAV file named with a numeric index:

  • output0.wav – first line

  • output1.wav – second line

  • output2.wav – third line

The index is appended before the file extension of the --output filename.

Process Lines in Parallel#

Use --num-parallel-requests to synthesize multiple lines concurrently. The client opens multiple WebSocket connections and limits concurrency with a semaphore.

python3 python-clients/scripts/tts/realtime_tts_client.py \
    --server localhost:9000 \
    --language-code en-US \
    --voice Magpie-Multilingual.EN-US.Aria \
    --input-file input.txt \
    --num-parallel-requests 4 \
    --output output.wav

This processes up to 4 lines simultaneously, reducing total synthesis time for large files.

Note

Each parallel request opens a separate WebSocket connection. Set --num-parallel-requests based on your server capacity and GPU memory. Higher values increase throughput but also increase GPU memory and CPU usage.

Combine with Voice Cloning#

Batch synthesis works with zero-shot voice cloning. Add the audio prompt flags alongside --input-file.

python3 python-clients/scripts/tts/realtime_tts_client.py \
    --server localhost:9000 \
    --language-code en-US \
    --input-file input.txt \
    --zero-shot-audio-prompt-file prompt.wav \
    --num-parallel-requests 2 \
    --output output.wav

Combine with a Custom Dictionary#

Apply custom pronunciations to all lines in the batch by passing a dictionary file.

python3 python-clients/scripts/tts/realtime_tts_client.py \
    --server localhost:9000 \
    --language-code en-US \
    --voice Magpie-Multilingual.EN-US.Aria \
    --input-file input.txt \
    --custom-dictionary custom_dict.txt \
    --output output.wav

Refer to Customizing TTS Models for dictionary format details.

Play Audio in Real Time#

Use --play-audio to play each synthesized result through the system speakers as it completes. This works with or without --output.

python3 python-clients/scripts/tts/realtime_tts_client.py \
    --server localhost:9000 \
    --language-code en-US \
    --text "Play this audio through the speakers." \
    --play-audio

Note

Real-time playback requires PyAudio. Install it with pip install pyaudio.

Reference: Realtime Client Flags#

Flag

Default

Description

--input-file

Path to a text file (plain or pipe-separated)

--num-parallel-requests

1

Number of concurrent WebSocket connections

--output / -o

Output WAV file path (indexed for multiple lines)

--play-audio

false

Play audio through system speakers

--encoding

LINEAR_PCM

Output encoding (LINEAR_PCM or OGGOPUS)

--sample-rate-hz

44100

Output audio sample rate

--debug

false

Enable debug logging