NVIDIA Docs Hub NVIDIA LaunchPad Interact with Real-Time Speech AI APIs Overview

Overview

Welcome to the trial of NVIDIA Riva on NVIDIA LaunchPad!

NVIDIA Riva is a GPU-accelerated SDK (software development kit) for building Speech AI applications that are customized for your use case and deliver real-time performance. Riva is built on a decade of AI innovations by NVIDIA across hardware, model architectures, training techniques, inference optimizations, and deployment solutions.

As speech-based applications are adopted globally, solutions need to interact with humans across many languages. Speech AI apps need to understand industry specific jargon and respond naturally in real-time. Riva includes world-class automatic speech recognition (ASR) and text-to-speech (TTS) that runs in real time.

Try NVIDIA Riva Automatic Speech Recognition

In this demo, you'll see Riva speech recognition deliver highly accurate transcription in real time. You can provide an input through your microphone or upload a .wav file from your device.

The duration of each sample is limited to 30 seconds.

Language
Try saying something

Upload .wav

Try NVIDIA Riva Text-to-Speech

If you're looking to add voice to your interactive virtual assistant, modern home device, or reading assistant for the visually impaired or for people with a reading disability, try Riva's out-of-the-box English female or male voice.

Hear the natural-sounding and expressive voices created using Riva's state-of-the-art neural speech synthesis models.

Voice

0 / 400

Your use of Riva Voice Recognition and Riva Text-to-Speech is subject to our Terms of Use. Your data will be used to improve NVIDIA products and services.

In this lab, you will learn how to use NVIDIA Riva speech AI services on LaunchPad to automatically detect and transcribe speech, as well as synthesize artificial human-like voices. To that end, this lab includes Jupyter Notebooks within NVIDIA NGC Riva samples container, which will walk you through using Riva’s Python API to interact with its speech services. It also includes applications that showcase Riva’s streaming APIs.

This lab automates the setup of Riva’s speech AI server for your convenience, so you can focus on exploring the speech APIs through the Riva samples container which contains all the dependencies to query the server.

As an introductory lab, it will not focus on customizing the ASR or TTS models. We will explore more about customizing these pipelines in the advanced labs.

If you would like to learn about how to set up the default Riva’s Speech AI server, the instructions have been provided in the Setup a Speech AI Server with Speech Recognition and Text-to-speech Models section. This guide is entirely optional for the purposes of this lab.

Note

This lab should take roughly 90 minutes to complete.