Support Matrix#
This documentation describes the software and hardware that Riva ASR NIM supports.
Hardware#
NVIDIA Riva ASR NIM is supported on NVIDIA Volta or later GPU (Compute Capability >= 7.0). Avoid exceeding the available memory when selecting models to deploy; 16+ GB VRAM is recommended.
GPUs Supported#
GPU |
Precision |
---|---|
V100 |
FP16 |
A30, A100 |
FP16 |
H100 |
FP16 |
A2, A10, A16, A40 |
FP16 |
L4, L40, GeForce RTX 40xx |
FP16 |
GeForce RTX 50xx |
FP16 |
Software#
Linux operating systems (Ubuntu 22.04 or later recommended)
NVIDIA Driver >= 535
NVIDIA Docker >= 23.0.1
Supported Models#
Riva ASR NIM supports the following models.
NIM automatically downloads the prebuilt model if it is available on the target GPU (GPUs with Compute Capability >= 8.0) or generates an optimized model on-the-fly using RMIR model on other GPUs (Compute Capability >= 7.0).
Model |
Publisher |
WSL2 Support |
---|---|---|
NVIDIA |
✅ |
|
NVIDIA |
✅ |
|
NVIDIA |
❌ |
|
NVIDIA |
❌ |
|
NVIDIA |
❌ |
|
NVIDIA |
❌ |
|
OpenAI |
❌ |
The environment variable NIM_TAGS_SELECTOR
is used to specify the desired model and inference mode. It is specified as comma-separated key-value pairs. Some ASR models support different inference modes tuned for different use cases. Available modes include streaming low latency (str
), streaming high throughput (str-thr
), and offline (ofl
). Setting the mode to all
deploys all inference modes where applicable.
Note
All models use FP16 precision.
Parakeet 0.6b CTC English#
To use this model, set CONTAINER_ID
to parakeet-0-6b-ctc-en-us
. Choose a value for NIM_TAGS_SELECTOR
from the following table as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
3 |
5.8 |
|
streaming |
1024 |
3 |
4 |
|
streaming-throughput |
1024 |
3 |
5 |
|
all |
1024 |
5.3 |
11.5 |
|
offline |
1 |
3 |
3 |
|
streaming |
1 |
3 |
3 |
Note
Profiles with a Batch Size of 1 are optimized for the lowest memory usage and support only a single session at a time. These profiles are recommended for WSL2 deployment or scenarios with a single inference request client.
Parakeet 1.1b CTC English#
To use this model, set CONTAINER_ID
to parakeet-1-1b-ctc-en-us
. Choose a value for NIM_TAGS_SELECTOR
from the following table as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
5 |
6.7 |
|
streaming |
1024 |
5 |
5 |
|
streaming-throughput |
1024 |
5 |
5.9 |
|
all |
1024 |
7.6 |
14 |
|
offline |
1024 |
5.5 |
7.3 |
|
streaming |
1024 |
5.5 |
5.6 |
|
streaming-throughput |
1024 |
5.5 |
6.5 |
|
all |
1024 |
8.2 |
16 |
Speech Recognition with VAD based End of Utterance#
The profiles with vad=silero
use Silero VAD to detect start and end of utterance. End of utterance detection using VAD is more accurate than the Acoustic model based end of utterance detection, which is used in other profiles. This profile has better robustness to noise and generates lesser spurious transcripts compared to other profiles.
Parakeet 1.1b RNNT Multilingual#
Parakeet 1.1b RNNT Multilingual model supports streaming speech-to-text transcription in multiple languages. The model identifies the spoken language and provides the transcript corresponding to the spoken language.
List of supported languages - en-US, en-GB, es-ES, ar-AR, es-US, pt-BR, fr-FR, de-DE, it-IT, ja-JP, ko-KR, ru-RU, hi-IN, he-IL, nb-NO, nl-NL, cs-CZ, da-DK, fr-CA, pl-PL, sv-SE, th-TH, tr-TR, pt-PT, and nn-NO Recommended languages - en-US, en-GB, es-ES, ar-AR, es-US, pt-BR, fr-FR, de-DE, it-IT, ja-JP, ko-KR, ru-RU, and hi-IN
To use this model, set CONTAINER_ID
to parakeet-1-1b-rnnt-multilingual
. Choose a value for NIM_TAGS_SELECTOR
from the following table as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
5.6 |
9 |
|
streaming |
1024 |
5.3 |
7.2 |
|
streaming-throughput |
1024 |
5.5 |
7.4 |
|
all |
1024 |
9 |
21 |
Conformer CTC Spanish#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the following table as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
2 |
5.8 |
|
streaming |
1024 |
2 |
3.6 |
|
streaming-throughput |
1024 |
2 |
4.2 |
|
all |
1024 |
3.1 |
9.8 |
Canary 1b Multilingual#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the following table as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
6.5 |
13.4 |
Canary 0.6b Turbo Multilingual#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the following table as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
5.3 |
12.2 |
Whisper Large v3 Multilingual#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the following table as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
4.3 |
12.5 |