About NVIDIA ASR NIM Microservice#
The NVIDIA Automatic Speech Recognition (ASR) NIM microservice converts spoken audio into text. It packages pre-trained NeMo models with the full NVIDIA inference stack (TensorRT, Triton) into self-contained containers that handle model download, optimization, and serving.
ASR NIMs support two inference modes:
Streaming: Returns partial transcripts as audio arrives. Use for real-time applications such as live captioning and voice assistants.
Offline: Processes the full audio and returns a complete transcript. Use for batch processing of recorded files.
Available Models#
ASR NIMs ship multiple model families optimized for different use cases. Choose based on your language, latency, and capability requirements.
Model |
Languages |
Modes |
Key Capability |
|---|---|---|---|
English, Vietnamese, Spanish, Mandarin, Taiwanese |
Streaming + Offline |
Low-latency transcription across multiple languages |
|
English |
Offline |
Word-level timestamps |
|
Streaming + Offline |
Auto language detection across 25+ languages |
||
Spanish |
Streaming + Offline |
Spanish transcription |
|
100+ languages |
Offline |
Transcription and translation to English |
|
26 languages |
Offline |
Transcription and bidirectional translation |
For GPU memory requirements and all available model profiles, refer to the ASR support matrix.
Choosing a Model#
Single language, low latency: Parakeet CTC models provide the fastest streaming transcription for their supported languages.
Word-level timestamps: Parakeet TDT v2 returns start and end times for each word.
Many languages, one model: Parakeet RNNT Multilingual covers 25+ languages with automatic language detection in both streaming and offline modes.
Maximum language coverage: Whisper Large v3 supports 100+ languages (offline only).
Translation: Whisper translates any supported language to English. Canary 1b supports bidirectional translation across 26 languages.
Next Steps#
Deploy and Run ASR Models: Step-by-step deployment and inference commands for each model.
Customize ASR Models: Word boosting, custom vocabularies, and deploying fine-tuned NeMo checkpoints.
ASR Tutorial: Deploy your first ASR NIM from scratch.