Support Matrix
NeMo Retriever Text Embedding NIM supports the following models:
Model Name |
Model ID |
Max Tokens |
Publisher |
Parameters (millions) |
Embedding Dimension |
---|---|---|---|---|---|
NV-EmbedQA-E5-v5 | nvidia/nv-embedqa-e5-v5 | 512 | NVIDIA | 335 | 1024 |
NV-EmbedQA-Mistral7B-v2 | nvidia/nv-embedqa-mistral-7b-v2 | 512 | NVIDIA | 7110 | 4096 |
Snowflake’s Arctic-embed-l | snowflake/arctic-embed-l | 512 | Snowflake | 335 | 1024 |
NV-EmbedQA-E5-v5
GPU |
GPU Memory (GB) |
Precision |
---|---|---|
A100 PCIe | 40 & 80 | FP16 |
A100 SXM4 | 40 & 80 | FP16 |
H100 PCIe | 80 | FP16 |
H100 HBM3 | 80 | FP16 |
L40s | 48 | FP16 |
A10G | 24 | FP16 |
L4 | 24 | FP16 |
Non-optimized configuration
The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.
GPUs |
GPU Memory |
Precision |
Disk Space |
---|---|---|---|
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory | 2 | FP16 | 17 |
NV-EmbedQA-Mistral7B-v2
GPU |
GPU Memory (GB) |
Precision |
---|---|---|
A100 PCIe | 80 | FP16 |
A100 SXM4 | 80 | FP16 |
H100 HBM3 | 80 | FP8 |
H100 HBM3 | 80 | FP16 |
L40s | 48 | FP8 |
L40s | 48 | FP16 |
A10G | 24 | FP16 |
L4 | 24 | FP16 |
Non-optimized configuration
The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.
GPUs |
GPU Memory |
Precision |
Disk Space |
---|---|---|---|
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory | 16 | FP16 | 30 |
Snowflake’s Arctic-embed-l
GPU |
GPU Memory (GB) |
Precision |
---|---|---|
A100 PCIe | 80 | FP16 |
A100 SXM4 | 80 | FP16 |
H100 HBM3 | 80 | FP16 |
L40s | 48 | FP16 |
A10G | 24 | FP16 |
L4 | 24 | FP16 |
Non-optimized configuration
The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.
GPUs |
GPU Memory |
Precision |
Disk Space |
---|---|---|---|
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory | 2 | FP16 | 17 |
NVIDIA Driver
Release 1.0.0 uses Triton Inference Server 24.05. Please refer to the Release Notes for Triton on NVIDIA driver support.
NVIDIA Container Toolkit
Your Docker environment must support NVIDIA GPUs. Please refer to the NVIDIA Container Toolkit for more information.