Automatic Speech Recognition (Latest)
Automatic Speech Recognition (Latest)

Performance

This section shows the latency and throughput numbers for streaming and offline configurations of the Riva ASR service on different GPUs.

In streaming mode, the client and the server used audio chunks of the same duration. See the Results section for the chunk size value to use.

The Riva streaming client riva_streaming_asr_client, provided in the Riva image, was used with the --simulate_realtime flag to simulate transcription from a microphone, where each stream was doing three iterations over a sample audio file (1272-135031-0000.wav) from the LibriSpeech dev-clean dataset. You can get the LibriSpeech datasets at https://www.openslr.org/12.

You can get the source code for the riva_streaming_asr_client at https://github.com/nvidia-riva/cpp-clients.

The following command was used to measure performance:

Copy
Copied!
            

riva_streaming_asr_client \ --chunk_duration_ms=<chunk_duration> \ --simulate_realtime=true \ --automatic_punctuation=true \ --num_parallel_requests=<num_streams> \ --word_time_offsets=false \ --print_transcripts=false \ --interim_results=false \ --num_iterations=<3*num_streams> \ --audio_file=1272-135031-0000.wav \ --output_filename=/tmp/output.json

The riva_streaming_asr_client command returns the following latency measurements:

  • intermediate latency: latency of responses returned with is_final == false

  • final latency: latency of responses returned with is_final == true

  • latency: the overall latency of all returned responses. This is what is tabulated in the following tables.

The following diagrams are a schematic representation of the different latencies measured by the Riva streaming ASR client.

riva-asr-latencies.png

The following command was used to measure maximum throughput in offline mode:

Copy
Copied!
            

riva_asr_client \ --automatic_punctuation=true \ --num_parallel_requests=32 \ --word_time_offsets=false \ --print_transcripts=false \ --num_iterations=96 \ --audio_file=1272-135031-0000x5.wav \ --output_filename=/tmp/output.json

where 1272-135031-0000x5.wav is the 1272-135031-0000.wav audio file concatenated five times. You can get the source code for the riva_asr_client at https://github.com/nvidia-riva/cpp-clients.

Latencies and throughput measurements for streaming and offline configurations are reported in the following tables. Throughput (duration of audio transcribed / computation time) is measured in RTFX.

Note

The values in the tables are average values over three trials. The values in the table are rounded to the last significant digit according to the standard deviation calculated on three trials. If a standard deviation is less than 0.001 of the average, then the corresponding value is rounded as if standard deviation equals 0.001 of the value.

For specifications of the hardware on which these measurements were collected, see the Hardware Specifications section.

Chunk size (ms): 960
Language model: n-gram
Maximum effective # of streams with n-gram language model: 490

# of streams

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

1 20 18.4 20 36 100 0.999
64 80 83 95 140 200 63.7
128 119 100 155 230 296 126.8
256 190 170 270 400 530 252
384 260 245 378 580 800 374.5
512 346 330 496 850 1200 494

Chunk size (ms): 160
Language model: n-gram
Maximum effective # of streams with n-gram language model: 104

# of streams

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

1 18 16.7 17.4 20 40 0.999
8 21 19.8 21 30 50.4 7.99
16 29 26 40 41 70 15.96
32 41 45 51 55 110 31.9
48 53 58 71 73 160 47.75
64 65.5 72 83 85 210 63.6

Language model: n-gram

# of streams

Throughput (RTFX)

1 200
32 2000

Chunk size (ms): 960
Language model: n-gram
Maximum effective # of streams with n-gram language model: 490

# of streams

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

1 30 29.1 40 50 100 0.999
64 123 130 160 240 260 63.5
128 185 165 240 360 430 126.4
256 300 266 430 630 830 249.4
384 460 445 770 1100 1560 368
512 720 650 1400 1550 2150 483

Chunk size (ms): 160
Language model: n-gram
Maximum effective # of streams with n-gram language model: 104

# of streams

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

1 24 22.7 23.7 25 50 0.999
8 32.7 31 33 51 72.7 7.98
16 44 40.8 50 63 110 15.94
32 59 60 73 75 180 31.8
48 79 90 93 100 240 47.6
64 100 109 114 160 310 63.4

Language model: n-gram

# of streams

Throughput (RTFX)

1 180
32 1330

Language model: n-gram

# of streams

Throughput (RTFX)

1 180
32 1440

Chunk size (ms): 960
Language model: n-gram
Maximum effective # of streams with n-gram language model: 490

# of streams

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

1 30 27 40 50 100 0.999
64 120 130 150 200 240 63.5
128 170 150 220 310 380 126.5
256 270 250 390 540 700 250.5

Chunk size (ms): 160
Language model: n-gram
Maximum effective # of streams with n-gram language model: 104

# of streams

Latency (ms)

Throughput (RTFX)

avg

p50

p90

p95

p99

1 25 23 29 30 50 0.999
8 31 29 35.5 46 70 7.98
16 44 40 56 60 100 15.95
32 60 62 76 80 150 31.84
48 80 86 100 112 227 47.7

GPU

NVIDIA DGX A100 40 GB
CPU
Model AMD EPYC 7742 64-Core Processor
Thread(s) per core 2
Socket(s) 2
Core(s) per socket 64
NUMA node(s) 8
Frequency boost enabled
CPU max MHz 2250
CPU min MHz 1500
RAM
Model Micron DDR4 36ASF8G72PZ-3G2B2 3200MHz
Configured Memory Speed 2933 MT/s
RAM Size 32x64GB (2048GB Total)

GPU

NVIDIA H100 80GB HBM3
CPU
Model Intel(R) Xeon(R) Platinum 8480CL
Thread(s) per core 2
Socket(s) 2
Core(s) per socket 56
NUMA node(s) 2
CPU max MHz 3800
CPU min MHz 800
RAM
Model Micron DDR5 MTC40F2046S1RC48BA1 4800MHz
Configured Memory Speed 4400 MT/s
RAM Size 32x64GB (2048GB Total)

GPU

NVIDIA L40
CPU
Model AMD EPYC 7763 64-Core Processor
Thread(s) per core 1
Socket(s) 2
Core(s) per socket 64
NUMA node(s) 8
Frequency boost enabled
CPU max MHz 3529
CPU min MHz 1500
RAM
Model Samsung DDR4 M393A4K40DB3-CWE 3200MHz
Configured Memory Speed 3200 MT/s
RAM Size 16x32GB (512GB Total)

Previous Customization
Next Support Matrix
© Copyright © 2024, NVIDIA Corporation. Last updated on Aug 6, 2024.