NVIDIA Speech NIM Microservices Release Notes#

This page lists changes, fixes, and known issues for each NVIDIA Speech NIM microservices release.

All Speech NIM microservice updates are released together as a collection and follow calendar versioning YY.MM.n, where YY is the year, MM is the month, and n is the patch number within that cycle.


Release 26.02.0#

Highlights#

  • Consolidated the previously independent NVIDIA Riva ASR, TTS, and NMT NIMs into a single collection, NVIDIA Speech NIM Microservices, that follows unified calendar versioning (YY.MM.n).

  • With the consolidation of the re-branded NVIDIA Speech NIM microservices, launched the new NVIDIA Speech NIM microservices documentation. This documentation is a comprehensive guide to the NVIDIA Speech NIM microservices for ASR, TTS, and NMT.

ASR NIM#

Key Features#

  • Renamed and expanded Parakeet 0.6b TDT to support two model types: English-only (type=default, parakeet-tdt-0.6b-v2) and multilingual (type=multi, parakeet-tdt-0.6b-v3) with 25 European languages. Use CONTAINER_ID=parakeet-0.6b-tdt and language code multi for auto language detection.

  • Added three model types to Parakeet 1.1b RNNT Multilingual: Default (auto language detection), Prompt (improved accuracy, client-specified language), and Indic (optimized for Indic languages). Expanded language support table including Bengali (bn-IN), Tamil (ta-IN).

  • Extended word boosting to Parakeet TDT and Parakeet RNNT models. RNNT/TDT use boost score range 0.5–2.0 (CTC uses 20–100). Added custom pronunciation using word boosting with explicit tokenization for CTC models.

  • Improved latency and throughput for Silero VAD and Sortformer diarizer for Parakeet 1.1b CTC and Parakeet 1.1b RNNT NIMs.

  • Added VAD-based end-of-utterance detection for Parakeet 1.1b RNNT NIM.

Known Issues#

  • The Parakeet 1.1b RNNT Multilingual model generates spaces after every character in the transcript for languages such as Japanese. To generate output without spaces, pass the language_code=ja-JP parameter from the client.

  • The Parakeet 1.1b RNNT Multilingual model has speaker diarization enabled for all profiles. The mode=all profiles up to 50 GB of GPU memory. For GPUs with lower memory, deploy only one or two modes instead of all modes.

  • Transducer models (Parakeet RNNT, Parakeet TDT) can emit identical start/end timestamps for words when multiple tokens share the same timestamp.

TTS NIM#

Key Features and Enhancements#

  • Extended language support for Magpie TTS Multilingual to Hindi (hi-IN) and Japanese (ja-JP).

  • Added emotional voice variants (Angry, Calm, Fearful, Happy, Neutral, Sad, PleasantSurprised, Disgusted) for Magpie TTS Multilingual across supported languages.

  • Magpie TTS Multilingual supports DGX Spark platform (support extended from Riva TTS NIM Release 1.10.0).

  • Default model profile is now batch_size=8 for all hardware (removed batch_size=1 default for Blackwell).

  • Added performance benchmarks for Magpie TTS Multilingual on B200 and DGX Spark. Updated benchmarks for A100, H100, and L40.

Known Issues#

  • Audio prompts for zeroshot models (Magpie TTS Zeroshot and Magpie TTS Flow) must be mono, 16-bit WAV format at 22.05 kHz or higher, with a duration of 3–10 seconds.

Support Matrix and Compatibility Updates#

The following list summarizes the updated models and their support matrices:

To find the latest support matrix for the NVIDIA Speech NIM microservices, refer to Support Matrix.

Previous Releases#

With the introduction of the NVIDIA Speech NIM microservices documentation beginning with release 26.02.0, the previous NVIDIA Riva NIM documentation has been officially deprecated. To access the deprecated documentation, refer to the following links: