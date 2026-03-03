Evaluation Process#

This section shows the latency and throughput numbers for streaming and offline configurations of the Riva ASR service on different GPUs.

In streaming mode, the client and the server used audio chunks of the same duration. See the Results section for the chunk size value to use.

The Riva streaming client riva_streaming_asr_client , provided in the Riva image, was used with the --simulate_realtime flag to simulate transcription from a microphone, where each stream was doing three iterations over a sample audio file ( 1272-135031-0000.wav ) from the LibriSpeech dev-clean dataset.

You can get the source code for the riva_streaming_asr_client at Riva C++ Clients.

The following command was used to measure performance:

riva_streaming_asr_client \ --chunk_duration_ms = <chunk_duration> \ --simulate_realtime = true \ --automatic_punctuation = true \ --num_parallel_requests = <num_streams> \ --word_time_offsets = false \ --print_transcripts = false \ --interim_results = false \ --num_iterations = < 3 *num_streams> \ --audio_file = 1272 -135031-0000.wav \ --output_filename = /tmp/output.json

The riva_streaming_asr_client command returns the following latency measurements:

intermediate latency : latency of responses returned with is_final == false

final latency : latency of responses returned with is_final == true

latency : the overall latency of all returned responses. This is what is tabulated in the following tables.

The following diagrams are a schematic representation of the different latencies measured by the Riva streaming ASR client.

The following command was used to measure maximum throughput in offline mode:

riva_asr_client \ --automatic_punctuation = true \ --num_parallel_requests = 32 \ --word_time_offsets = false \ --print_transcripts = false \ --num_iterations = 96 \ --audio_file = 1272 -135031-0000x5.wav \ --output_filename = /tmp/output.json