NVIDIA® Riva is an SDK for building multimodal conversational systems. Riva is used for building and deploying AI applications that fuse vision, speech, sensors, and services together to achieve conversational AI use cases that are specific to a domain of expertise. It offers a complete workflow to build, train, and deploy AI systems that can use visual cues such as gestures and gaze along with speech in context.

Riva offers pretrained speech models in NVIDIA NGC™ that can be fine-tuned with NVIDIA NeMo™ on a custom data set, accelerating the development of domain-specific models. Models can be easily exported, optimized, and deployed as a speech service on premises or in the cloud with a single command using Helm charts.

Riva's high-performance inference is powered by NVIDIA TensorRT™ optimizations and served using the NVIDIA Triton™ Inference Server, which are both part of the NVIDIA AI platform. Refer to the following Riva product documentation for more information.

These documents provide information regarding the current NVIDIA Riva 2.13.0 release.

The best way to get started with Riva is to start with the tutorials.

Ensure you meet the minimum requirements and complete setup before you run any Riva tutorial.
These tutorials walk you through the basics of Riva Speech Skills ASR services.
These tutorials walk you through the basics of Riva Speech Skills ASR services for new languages.
These tutorials walk you through how to deploy Riva.
These tutorials walk you through the basics of Riva Speech Skills TTS services.
Speech Services

You can use Riva to access highly optimized Automatic Speech Recognition (ASR) services for use cases like real-time transcription and virtual assistants, Text-To-Speech (TTS) services to generate human-like speech, and Natural Language Processing (NLP) for text and token classification functionality.

ASR takes an audio stream or audio buffer as input and returns one or more text transcripts, along with additional optional metadata. Speech recognition in Riva is a GPU-accelerated compute pipeline, with optimized performance and accuracy. Riva supports offline/batch and streaming recognition modes.
The TTS pipeline implemented for the Riva TTS service is based on a two-stage pipeline. Riva first generates a mel-spectrogram using the first model, and then generates speech using the second model. This pipeline forms a TTS system that enables you to synthesize natural sounding speech from raw transcripts without any additional information such as patterns or rhythms of speech.
Riva supports basic NLP text and token classification functionality; with classifiers that can be modified and deployed using NVIDIA NeMo.
NVIDIA Riva translation is a framework based on neural networks. Riva translation translates text between language pairs, that is, from one language to another.
Support and Services
NVIDIA Enterprise Support and Services Guide provides information for using NVIDIA Enterprise Support and services. This document is intended for NVIDIA’s potential and existing enterprise customers. This User Guide is a non-binding document and should be utilized to obtain information for NVIDIA Enterprise branded support and services.