NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications, customized for your use case, and delivering real-time performance.

Fully Customizable

Flexibility at every step, from modifying model architectures to fine-tuning models on your data and customizing pipelines, as well as the ability to deploy on any platform.

State of the Art Models

Built on a decade of AI innovations by NVIDIA across hardware, model architectures, training techniques, inference optimizations, and deployment solutions.

Real-time Performance Optimizations

Continued optimizations across the entire stack from models to software to hardware delivered 12X the gain versus the previous generation.

Flexible and Scalable Deployments

Supports scaling to hundreds of thousands of concurrent users in the cloud, on premises, and at the edge.

Data Ownership and Privacy

Data processed on-premesis or your cloud.

NVIDIA Riva Skills 2.16.0 is a toolkit for production-grade conversational AI inference.

The Riva Speech server exposes a simple API for performing speech recognition, speech synthesis, and a variety of natural language processing inferences.


  • State-of-the-art pretrained models available from NGC

  • Fully custom trained models with NVIDIA NeMo

  • Helm-managed cloud deployment

  • Streaming and batch speech recognition

  • Streaming and batch speech synthesis

  • NLP punctuation and capitalization models