Release Notes#

Release 1.5.0#

Added the Parakeet-TDT-0.6b-v2 English ASR model for accurate offline transcription with punctuation, capitalization, and word timestamp support.
Updated the Parakeet 1.1b CTC English (en-US) NIM to support Speaker Diarization.
Added language detection support for the Whisper Large v3 and Parakeet 1.1b RNNT Multilingual models.
Added a profanity filter for the Whisper Large v3 and Parakeet 1.1b RNNT Multilingual models.
Improved throughput for the Parakeet 1.1b RNNT Multilingual model.

The Whisper Large v3 NIM does not support NVIDIA B200 GPUs.
The Whisper Large v3 NIM uses OpenAI’s Whisper model, which may result in hallucinations, inaccuracies, and incorrect inverse text normalizations in some cases.
The profanity filter may not work for non-English languages in some cases.
The Parakeet 0.6b TDT model uses chunked inference and can sometimes produce incorrect transcripts compared to the NeMo model. Refer to Model Accuracy for details.
Canary models may produce incorrect translations for certain languages such as Spanish (es-ES) and Korean (ko-KR).
ASR HTTP API (Whisper) accepts only file, model, and language parameters in the request. All other parameters are ignored.
ASR gRPC API (Whisper)
- Supports only the offline Recognize API.
- Does not support customization parameters (e.g. profanity_filter, enable_word_time_offsets, enable_automatic_punctuation, verbatim_transcripts) in the RecognitionConfig message.
SSL/TLS encryption does not support TTS APIs. MTLS encryption is not supported in this release.