About NVIDIA ASR NIM Microservice#

The NVIDIA Automatic Speech Recognition (ASR) NIM microservice converts spoken audio into text. It packages pre-trained NeMo models with the full NVIDIA inference stack (TensorRT, Triton) into self-contained containers that handle model download, optimization, and serving.

ASR NIMs support two inference modes:

  • Streaming: Returns partial transcripts as audio arrives. Use for real-time applications such as live captioning and voice assistants.

  • Offline: Processes the full audio and returns a complete transcript. Use for batch processing of recorded files.

Available Models#

ASR NIMs ship multiple model families optimized for different use cases. Choose based on your language, latency, and capability requirements.

Model

Languages

Modes

Key Capability

Parakeet CTC

English, Vietnamese, Spanish, Mandarin, Taiwanese

Streaming + Offline

Low-latency transcription across multiple languages

Parakeet TDT v2

English

Offline

Word-level timestamps

Parakeet RNNT Multilingual

25+ languages

Streaming + Offline

Auto language detection across 25+ languages

Conformer CTC

Spanish

Streaming + Offline

Spanish transcription

Whisper Large v3

100+ languages

Offline

Transcription and translation to English

Canary 1b

26 languages

Offline

Transcription and bidirectional translation

For GPU memory requirements and all available model profiles, refer to the ASR support matrix.

Choosing a Model#

  • Single language, low latency: Parakeet CTC models provide the fastest streaming transcription for their supported languages.

  • Word-level timestamps: Parakeet TDT v2 returns start and end times for each word.

  • Many languages, one model: Parakeet RNNT Multilingual covers 25+ languages with automatic language detection in both streaming and offline modes.

  • Maximum language coverage: Whisper Large v3 supports 100+ languages (offline only).

  • Translation: Whisper translates any supported language to English. Canary 1b supports bidirectional translation across 26 languages.

Next Steps#