NVIDIA Riva

NVIDIA® Riva is an SDK for building multimodal conversational systems. Riva is used for building and deploying AI applications that fuse vision, speech, sensors, and services together to achieve conversational AI use cases that are specific to a domain of expertise. It offers a complete workflow to build, train, and deploy AI systems that can use visual cues such as gestures and gaze along with speech in context.

Riva offers pretrained speech models in NVIDIA NGC™ that can be fine-tuned with NVIDIA NeMo™ on a custom data set, accelerating the development of domain-specific models. Models can be easily exported, optimized, and deployed as a speech service on premises or in the cloud with a single command using Helm charts.

Riva's high-performance inference is powered by NVIDIA TensorRT™ optimizations and served using the NVIDIA Triton™ Inference Server, which are both part of the NVIDIA AI platform. Refer to the following Riva product documentation for more information.

NVIDIA Riva Documentation

These documents provide information regarding the current NVIDIA Riva 2.13.0 release.

Browse

Tutorials

The best way to get started with Riva is to start with the tutorials.

Requirements and Setup

Ensure you meet the minimum requirements and complete setup before you run any Riva tutorial.

Browse

Automatic Speech Recognition (ASR)

These tutorials walk you through the basics of Riva Speech Skills ASR services.

Browse

ASR - New Language Adaptation

These tutorials walk you through the basics of Riva Speech Skills ASR services for new languages.

Browse

Cloud Deployment

These tutorials walk you through how to deploy Riva.

Browse

Text-To-Speech (TTS)

These tutorials walk you through the basics of Riva Speech Skills TTS services.

Browse

Speech Services

You can use Riva to access highly optimized Automatic Speech Recognition (ASR) services for use cases like real-time transcription and virtual assistants, Text-To-Speech (TTS) services to generate human-like speech, and Natural Language Processing (NLP) for text and token classification functionality.

ASR

ASR takes an audio stream or audio buffer as input and returns one or more text transcripts, along with additional optional metadata. Speech recognition in Riva is a GPU-accelerated compute pipeline, with optimized performance and accuracy. Riva supports offline/batch and streaming recognition modes.

Browse

TTS

The TTS pipeline implemented for the Riva TTS service is based on a two-stage pipeline. Riva first generates a mel-spectrogram using the first model, and then generates speech using the second model. This pipeline forms a TTS system that enables you to synthesize natural sounding speech from raw transcripts without any additional information such as patterns or rhythms of speech.

Browse

Natural Language Processing (NLP)

Riva supports basic NLP text and token classification functionality; with classifiers that can be modified and deployed using NVIDIA NeMo.

Browse

Translation

Riva Translation

NVIDIA Riva translation is a framework based on neural networks. Riva translation translates text between language pairs, that is, from one language to another.

Browse

Support and Services

Enterprise Support and Services User Guide.pdf

NVIDIA Enterprise Support and Services Guide provides information for using NVIDIA Enterprise Support and services. This document is intended for NVIDIA’s potential and existing enterprise customers. This User Guide is a non-binding document and should be utilized to obtain information for NVIDIA Enterprise branded support and services.

Browse