Release Notes#

Release 1.6.0#

Key Features and Enhancements#

Known Issues#

  • Sortformer speaker diarization can occasionally produce incorrect speaker tags (like overlapping speech) in low latency mode.

  • Profiles with vad=silero use Silero VAD for endpointing, which can result in the following issues:

    • Reduce ASR transcription accuracy.

    • Increase latency in streaming-throughput mode

    • Lower throughput in offline mode.

  • The Parakeet CTC 0.6b Mandarin (zh-CN) model can occasionally insert double punctuation marks.

  • The Parakeet CTC 0.6b Vietnamese (vi-VN) model can occasionally generate inaccurate timestamps.

Release 1.5.0#

Key Features and Enhancements#

Known Issues#

  • The Whisper Large v3 NIM does not support NVIDIA B200 GPUs.

  • The Whisper Large v3 NIM uses OpenAI’s Whisper model, which may result in hallucinations, inaccuracies, and incorrect inverse text normalizations in some cases.

  • The profanity filter may not work for non-English languages in some cases.

  • The Parakeet 0.6b TDT model uses chunked inference and can sometimes produce incorrect transcripts compared to the NeMo model. Refer to Model Accuracy for details.

  • Canary models may produce incorrect translations for certain languages such as Spanish (es-ES) and Korean (ko-KR).

  • ASR HTTP API (Whisper) accepts only file, model, and language parameters in the request. All other parameters are ignored.

  • ASR gRPC API (Whisper)

    • Supports only the offline Recognize API.

    • Does not support customization parameters (e.g. profanity_filter, enable_word_time_offsets, enable_automatic_punctuation, verbatim_transcripts) in the RecognitionConfig message.

  • SSL/TLS encryption does not support TTS APIs. MTLS encryption is not supported in this release.