Is this page helpful?

Performance References#

This section provides latency and throughput benchmarks for each NVIDIA Speech NIM microservice. All benchmarks use performance clients from the Riva C++ Clients repository. Each page includes:

Evaluation process: The client tool, dataset, and command used to generate measurements.
Results: Latency and throughput tables across supported GPUs (A10, A30, A100, H100, L4, L40, and others).
Hardware specifications: The GPU and system configurations used during benchmarking.

ASR NIM Performance

Streaming and offline latency and throughput benchmarks for ASR models across supported GPUs.

Reference

ASR NIM Performance

NMT NIM Performance

Translation latency and throughput benchmarks for NMT models across supported GPUs and language pairs.

Reference

NMT NIM Performance

TTS NIM Performance

First-chunk latency, inter-chunk latency, and throughput benchmarks for TTS models across supported GPUs.

Reference

TTS NIM Performance