Support Matrix#
This documentation describes the software and hardware that Riva ASR NIM supports.
Hardware#
NVIDIA Riva ASR NIM is supported on NVIDIA Volta or later GPU (Compute Capability >= 7.0). Care must be taken to not exceed the memory available when selecting models to deploy. 16+ GB VRAM is recommended.
GPUs Supported#
GPU |
Precision |
---|---|
V100 |
FP16 |
A30, A100 |
FP16 |
H100 |
FP16 |
A2, A10, A16, A40 |
FP16 |
L4, L40, GeForce RTX 40xx |
FP16 |
GeForce RTX 50xx |
FP16 |
Software#
Linux operating systems (Ubuntu 22.04 or later recommended)
NVIDIA Driver >= 535
NVIDIA Docker >= 23.0.1
Supported Models#
Riva ASR NIM supports the following models.
NIM automatically downloads the prebuilt model if it is available on the target GPU (GPUs with Compute Capability >= 8.0) or generates an optimized model on-the-fly using RMIR model on other GPUs (Compute Capability >= 7.0).
Model |
Publisher |
WSL2 Support |
---|---|---|
NVIDIA |
✅ |
|
NVIDIA |
❌ |
|
NVIDIA |
❌ |
|
NVIDIA |
❌ |
|
NVIDIA |
❌ |
|
OpenAI |
❌ |
The environment variable NIM_TAGS_SELECTOR
is used to specify the desired model and inference mode. It is specified as comma-separated key-value pairs. Some ASR models support different inference modes tuned for different use cases. Available modes include streaming low latency (str
), streaming high throughput (str-thr
), and offline (ofl
). Setting the mode to all
deploys all inference modes where applicable.
Note
All models use FP16 precision.
Parakeet 0.6b CTC English#
To use this model, set CONTAINER_ID
to parakeet-0-6b-ctc-en-us
. Choose a value for NIM_TAGS_SELECTOR
from the table below as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
3 |
5.8 |
|
streaming |
1024 |
3 |
4 |
|
streaming-throughput |
1024 |
3 |
5 |
|
all |
1024 |
5.3 |
11.5 |
|
offline |
1 |
3 |
3 |
|
streaming |
1 |
3 |
3 |
Note
Profiles with a Batch Size of 1 are optimized for the lowest memory usage and support only a single session at a time. These profiles are recommended for WSL2 deployment or scenarios with a single inference request client.
Parakeet 1.1b CTC English#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the table below as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
5 |
6.7 |
|
streaming |
1024 |
5 |
5 |
|
streaming-throughput |
1024 |
5 |
5.9 |
|
all |
1024 |
7.6 |
14 |
Conformer CTC Spanish#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the table below as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
2 |
5.8 |
|
streaming |
1024 |
2 |
3.6 |
|
streaming-throughput |
1024 |
2 |
4.2 |
|
all |
1024 |
3.1 |
9.8 |
Canary 1b Multilingual#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the table below as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
6.5 |
13.4 |
Canary 0.6b Turbo Multilingual#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the table below as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
5.3 |
12.2 |
Whisper Large v3 Multilingual#
To use this model, set CONTAINER_ID
to riva-asr
. Choose a value for NIM_TAGS_SELECTOR
from the table below as needed. For further instructions, refer to Launching the NIM.
Profile |
Inference Mode |
Batch Size |
CPU Memory (GB) |
GPU Memory (GB) |
---|---|---|---|---|
|
offline |
1024 |
4.3 |
12.5 |