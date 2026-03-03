Evaluation Process#

This section presents the latency and throughput numbers of the Riva text-to-speech (TTS) service on different GPUs. The performance of the TTS service was measured for a different number of parallel streams. Each parallel stream performed 20 iterations over 10 input strings from the LJSpeech dataset. Each stream sends a request to the Riva server and waits for all audio chunks to have been received before sending another request. We measured the latency to the first audio chunk, the latency between successive audio chunks, and the overall throughput.

The following diagram shows how the latencies are measured.

We used the Riva TTS performance client ( riva_tts_perf_client , provided in the Riva image), to measure performance. You can find the client’s source code in the Riva C++ Clients.

The following command was used to generate the following tables:

riva_tts_perf_client \ --num_parallel_requests = <num_streams> \ --num_iterations = < 20 *num_streams> \ --online = true \ --text_file = $test_file \ --write_output_audio = false