Riva TTS NIM Overview
Riva TTS NIM APIs provide easy access to state-of-the-art text to speech (TTS) models, capable of synthesizing English speech from text with exceptional accuracy. It comprises of a non-autoregressive transformer-based spectrogram generator that predicts duration and pitch using the FastPitch model and a GAN-based vocoder HiFi-GAN model. Riva TTS NIM models are built on the NVIDIA software platform, incorporating CUDA, TensorRT, and Triton to offer out-of-the-box GPU acceleration.
Model architectures can be found from the FastPitch: Parallel Text-to-Speech with Pitch Prediction paper and HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis papers.
Enterprise-Ready Features
Riva TTS NIM comes with enterprise-ready features, such as a high-performance inference server, flexible integration, and enterprise-grade security.
State-of-the-art accuracy: Superior performance across diverse sources and domains.
Open-source and extensibility: Built on NVIDIA NeMo, allowing for seamless integration and customization.
Pre-trained checkpoints: Ready-to-use model for inference or fine-tuning.
Permissive license: Released under CC-BY-4.0 license, model checkpoints can be used in any commercial application.
Riva TTS NIM can be tried out at this link.