Performance References#

This section provides latency and throughput benchmarks for each NVIDIA Speech NIM microservice. All benchmarks use performance clients from the Riva C++ Clients repository. Each page includes:

  • Evaluation process: The client tool, dataset, and command used to generate measurements.

  • Results: Latency and throughput tables across supported GPUs (A10, A30, A100, H100, L4, L40, and others).

  • Hardware specifications: The GPU and system configurations used during benchmarking.

ASR NIM Performance

Streaming and offline latency and throughput benchmarks for ASR models across supported GPUs.

ASR NIM Performance
NMT NIM Performance

Translation latency and throughput benchmarks for NMT models across supported GPUs and language pairs.

NMT NIM Performance
TTS NIM Performance

First-chunk latency, inter-chunk latency, and throughput benchmarks for TTS models across supported GPUs.

TTS NIM Performance