NVIDIA Speech NIM Microservices Release Notes#
This page lists changes, fixes, and known issues for each NVIDIA Speech NIM microservices release.
All Speech NIM microservice updates are released together as a collection and follow calendar versioning YY.MM.n, where YY is the year, MM is the month, and n is the patch number within that cycle.
Release 26.02.0#
Highlights#
Consolidated the previously independent NVIDIA Riva ASR, TTS, and NMT NIMs into a single collection, NVIDIA Speech NIM Microservices, that follows unified calendar versioning (YY.MM.n).
With the consolidation of the re-branded NVIDIA Speech NIM microservices, launched the new NVIDIA Speech NIM microservices documentation. This documentation is a comprehensive guide to the NVIDIA Speech NIM microservices for ASR, TTS, and NMT.
ASR NIM#
Key Features#
Renamed and expanded Parakeet 0.6b TDT to support two model types: English-only (
type=default, parakeet-tdt-0.6b-v2) and multilingual (type=multi, parakeet-tdt-0.6b-v3) with 25 European languages. UseCONTAINER_ID=parakeet-0.6b-tdtand language codemultifor auto language detection.Added three model types to Parakeet 1.1b RNNT Multilingual: Default (auto language detection), Prompt (improved accuracy, client-specified language), and Indic (optimized for Indic languages). Expanded language support table including Bengali (bn-IN), Tamil (ta-IN).
Extended word boosting to Parakeet TDT and Parakeet RNNT models. RNNT/TDT use boost score range 0.5–2.0 (CTC uses 20–100). Added custom pronunciation using word boosting with explicit tokenization for CTC models.
Improved latency and throughput for Silero VAD and Sortformer diarizer for Parakeet 1.1b CTC and Parakeet 1.1b RNNT NIMs.
Added VAD-based end-of-utterance detection for Parakeet 1.1b RNNT NIM.
Known Issues#
The Parakeet 1.1b RNNT Multilingual model generates spaces after every character in the transcript for languages such as Japanese. To generate output without spaces, pass the
language_code=ja-JPparameter from the client.The Parakeet 1.1b RNNT Multilingual model has speaker diarization enabled for all profiles. The
mode=allprofiles up to 50 GB of GPU memory. For GPUs with lower memory, deploy only one or two modes instead of all modes.Transducer models (Parakeet RNNT, Parakeet TDT) can emit identical start/end timestamps for words when multiple tokens share the same timestamp.
TTS NIM#
Key Features and Enhancements#
Extended language support for Magpie TTS Multilingual to Hindi (hi-IN) and Japanese (ja-JP).
Added emotional voice variants (Angry, Calm, Fearful, Happy, Neutral, Sad, PleasantSurprised, Disgusted) for Magpie TTS Multilingual across supported languages.
Magpie TTS Multilingual supports DGX Spark platform (support extended from Riva TTS NIM Release 1.10.0).
Default model profile is now
batch_size=8for all hardware (removedbatch_size=1default for Blackwell).Added performance benchmarks for Magpie TTS Multilingual on B200 and DGX Spark. Updated benchmarks for A100, H100, and L40.
Known Issues#
Audio prompts for zeroshot models (Magpie TTS Zeroshot and Magpie TTS Flow) must be mono, 16-bit WAV format at 22.05 kHz or higher, with a duration of 3–10 seconds.
Support Matrix and Compatibility Updates#
The following list summarizes the updated models and their support matrices:
Updated profiles, memory requirements, and customization support for the following ASR models:
Updated language support, voice catalog with emotional variants, batch size defaults, and DGX Spark support for the following TTS models:
To find the latest support matrix for the NVIDIA Speech NIM microservices, refer to Support Matrix.
Previous Releases#
With the introduction of the NVIDIA Speech NIM microservices documentation beginning with release 26.02.0, the previous NVIDIA Riva NIM documentation has been officially deprecated. To access the deprecated documentation, refer to the following links: