Text-To-Speech (Latest)
Text-To-Speech (Latest)

Riva TTS NIM Overview

Riva TTS NIM APIs provide easy access to state-of-the-art text to speech (TTS) models, capable of synthesizing English speech from text with exceptional accuracy. It comprises of a non-autoregressive transformer-based spectrogram generator that predicts duration and pitch using the FastPitch model and a GAN-based vocoder HiFi-GAN model. Riva TTS NIM models are built on the NVIDIA software platform, incorporating CUDA, TensorRT, and Triton to offer out-of-the-box GPU acceleration.

Model architectures can be found from the FastPitch: Parallel Text-to-Speech with Pitch Prediction paper and HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis papers.

Enterprise-Ready Features

Riva TTS NIM comes with enterprise-ready features, such as a high-performance inference server, flexible integration, and enterprise-grade security.

  • State-of-the-art accuracy: Superior performance across diverse sources and domains.

  • Open-source and extensibility: Built on NVIDIA NeMo, allowing for seamless integration and customization.

  • Pre-trained checkpoints: Ready-to-use model for inference or fine-tuning.

  • Permissive license: Released under CC-BY-4.0 license, model checkpoints can be used in any commercial application.

Riva TTS NIM can be tried out at this link.

Previous Riva TTS NIM
Next Getting Started
© Copyright © 2024, NVIDIA Corporation. Last updated on Aug 6, 2024.