Performance References#
This section provides latency and throughput benchmarks for each NVIDIA Speech NIM microservice. All benchmarks use performance clients from the Riva C++ Clients repository. Each page includes:
Evaluation process: The client tool, dataset, and command used to generate measurements.
Results: Latency and throughput tables across supported GPUs (A10, A30, A100, H100, L4, L40, and others).
Hardware specifications: The GPU and system configurations used during benchmarking.
Streaming and offline latency and throughput benchmarks for ASR models across supported GPUs.
Translation latency and throughput benchmarks for NMT models across supported GPUs and language pairs.
First-chunk latency, inter-chunk latency, and throughput benchmarks for TTS models across supported GPUs.