Support Matrix#
Models#
NeMo Retriever Text Embedding NIM supports the following models:
Model Name |
Model ID |
Max Tokens |
Publisher |
Parameters |
Embedding |
Dynamic Embeddings |
---|---|---|---|---|---|---|
Llama-3.2-NV-EmbedQA-1B-v2 |
nvidia/llama-3.2-nv-embedqa-1b-v2 |
8192 |
NVIDIA |
1236 |
2048 |
yes |
NV-EmbedQA-E5-v5 |
nvidia/nv-embedqa-e5-v5 |
512 |
NVIDIA |
335 |
1024 |
no |
NV-EmbedQA-Mistral7B-v2 |
nvidia/nv-embedqa-mistral-7b-v2 |
512 |
NVIDIA |
7110 |
4096 |
no |
Snowflake’s Arctic-embed-l |
snowflake/arctic-embed-l |
512 |
Snowflake |
335 |
1024 |
no |
Supported Hardware#
Llama-3.2-NV-EmbedQA-1B-v2#
GPU |
GPU Memory (GB) |
Precision |
---|---|---|
A100 PCIe |
40 & 80 |
FP16 |
A100 SXM4 |
40 & 80 |
FP16 |
H100 PCIe |
80 |
FP16 & FP8 |
H100 HBM3 |
80 |
FP16 & FP8 |
H100 NVL |
80 |
FP16 & FP8 |
L40s |
48 |
FP16 & FP8 |
A10G |
24 |
FP16 |
L4 |
24 |
FP16 & FP8 |
Non-optimized configuration#
The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.
GPUs |
GPU Memory |
Precision |
Disk Space |
---|---|---|---|
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory |
3.6 |
FP16 |
20.2 |
NV-EmbedQA-E5-v5#
GPU |
GPU Memory (GB) |
Precision |
---|---|---|
A100 PCIe |
40 & 80 |
FP16 |
A100 SXM4 |
40 & 80 |
FP16 |
H100 PCIe |
80 |
FP16 |
H100 HBM3 |
80 |
FP16 |
H100 NVL |
80 |
FP16 |
L40s |
48 |
FP16 |
A10G |
24 |
FP16 |
L4 |
24 |
FP16 |
Non-optimized configuration#
The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.
GPUs |
GPU Memory |
Precision |
Disk Space |
---|---|---|---|
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory |
2 |
FP16 |
17 |
NV-EmbedQA-Mistral7B-v2#
GPU |
GPU Memory (GB) |
Precision |
---|---|---|
A100 PCIe |
80 |
FP16 |
A100 SXM4 |
80 |
FP16 |
H100 HBM3 |
80 |
FP8 |
H100 HBM3 |
80 |
FP16 |
L40s |
48 |
FP8 |
L40s |
48 |
FP16 |
A10G |
24 |
FP16 |
L4 |
24 |
FP16 |
Non-optimized configuration#
The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.
GPUs |
GPU Memory |
Precision |
Disk Space |
---|---|---|---|
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory |
16 |
FP16 |
30 |
Snowflake’s Arctic-embed-l#
GPU |
GPU Memory (GB) |
Precision |
---|---|---|
A100 PCIe |
80 |
FP16 |
A100 SXM4 |
80 |
FP16 |
H100 HBM3 |
80 |
FP16 |
L40s |
48 |
FP16 |
A10G |
24 |
FP16 |
L4 |
24 |
FP16 |
Non-optimized configuration#
The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.
GPUs |
GPU Memory |
Precision |
Disk Space |
---|---|---|---|
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory |
2 |
FP16 |
17 |
Software#
NVIDIA Driver#
Release 1.2.0 uses Triton Inference Server 24.08. Please refer to the Release Notes for Triton on NVIDIA driver support.
NVIDIA Container Toolkit#
Your Docker environment must support NVIDIA GPUs. Please refer to the NVIDIA Container Toolkit for more information.