NVIDIA® Riva is an SDK for building multimodal conversational systems. Riva is used for building and deploying AI applications that fuse vision, speech, sensors, and services together to achieve conversational AI use cases that are specific to a domain of expertise. It offers a complete workflow to build, train, and deploy AI systems that can use visual cues such as gestures and gaze along with speech in context.
Riva offers pretrained speech models in NVIDIA NGC™ that can be fine-tuned with NVIDIA NeMo™ on a custom data set, accelerating the development of domain-specific models. Models can be easily exported, optimized, and deployed as a speech service on premises or in the cloud with a single command using Helm charts.
Riva's high-performance inference is powered by NVIDIA TensorRT™ optimizations and served using the NVIDIA Triton™ Inference Server, which are both part of the NVIDIA AI platform. Refer to the following Riva product documentation for more information.
The best way to get started with Riva is to start with the tutorials.
You can use Riva to access highly optimized Automatic Speech Recognition (ASR) services for use cases like real-time transcription and virtual assistants, Text-To-Speech (TTS) services to generate human-like speech, and Natural Language Processing (NLP) for text and token classification functionality.