NVIDIA Docs Hub NVIDIA LaunchPad Customizing Text-to-Speech With NVIDIA Riva Overview

Overview

Welcome to the trial of NVIDIA Riva on NVIDIA LaunchPad!

NVIDIA Riva is a GPU-accelerated SDK (software development kit) for building Speech AI applications that are customized for your use case and deliver real-time performance. Riva is built on a decade of AI innovations by NVIDIA across hardware, model architectures, training techniques, inference optimizations, and deployment solutions.

As speech-based applications are adopted globally, solutions need to interact with humans across many languages. Speech AI apps need to understand industry specific jargon and respond naturally in real-time. Riva includes world-class automatic speech recognition (ASR) and text-to-speech (TTS) that runs in real time.

Try NVIDIA Riva Automatic Speech Recognition

In this demo, you'll see Riva speech recognition deliver highly accurate transcription in real time. You can provide an input through your microphone or upload a .wav file from your device.

The duration of each sample is limited to 30 seconds.

Language
Try saying something

Upload .wav

Try NVIDIA Riva Text-to-Speech

If you're looking to add voice to your interactive virtual assistant, modern home device, or reading assistant for the visually impaired or for people with a reading disability, try Riva's out-of-the-box English female or male voice.

Hear the natural-sounding and expressive voices created using Riva's state-of-the-art neural speech synthesis models.

Voice

0 / 400

Your use of Riva Voice Recognition and Riva Text-to-Speech is subject to our Terms of Use. Your data will be used to improve NVIDIA products and services.

In this lab, you will learn how to customize the NVIDIA Riva Text to Speech (TTS) service on LaunchPad. To that end, this lab includes Jupyter Notebooks which will walk you through the different ways you can modify the speech synthesis pipeline, including customizing SSML tags and training the spectrogram generator & vocoder models.

Before we learn how to customize TTS, setting up the initial Riva Speech AI pipeline using pretrained models downloaded from NVIDIA NGC is part of the very first tutorial. Learning this process is essential to tailor the pipeline for your usecase.

If you would like to learn about how to set up the default Riva’s Speech AI server, the instructions have been provided in the Setup a Speech AI Server with Speech Recognition and Text-to-speech Models section. This guide is entirely optional for the purposes of this lab.

Note

This lab should take roughly 15 hours to complete.