Support Matrix for NeMo Retriever Text Embedding NIM#

This documentation describes the software and hardware that NeMo Retriever Text Embedding NIM supports.

Models#

NeMo Retriever Text Embedding NIM supports the following models:

Model Name	Model ID	Max Tokens	Publisher	Parameters (millions)	Embedding Dimension	Dynamic Embeddings Supported	Model Card
Llama-3.2-NV-EmbedQA-1B-v2	nvidia/llama-3.2-nv-embedqa-1b-v2	8192	NVIDIA	1236	2048	yes	Link
NV-EmbedQA-E5-v5	nvidia/nv-embedqa-e5-v5	512	NVIDIA	335	1024	no	Link
NV-EmbedQA-Mistral7B-v2	nvidia/nv-embedqa-mistral-7b-v2	512	NVIDIA	7110	4096	no	Link
Snowflake’s Arctic-embed-l	snowflake/arctic-embed-l	512	Snowflake	335	1024	no	Link

Supported Hardware#

Llama-3.2-NV-EmbedQA-1B-v2#

GPU	GPU Memory (GB)	Precision
A100 PCIe	40 & 80	FP16
A100 SXM4	40 & 80	FP16
H100 PCIe	80	FP16 & FP8
H100 HBM3	80	FP16 & FP8
H100 NVL	80	FP16 & FP8
L40s	48	FP16 & FP8
A10G	24	FP16
L4	24	FP16 & FP8
GeForce RTX 4090 (Beta)	24	FP16
NVIDIA RTX 6000 Ada Generation (Beta)	48	FP16
GeForce RTX 5080 (Beta)	16	FP16
GeForce RTX 5090 (Beta)	32	FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs	GPU Memory	Precision	Disk Space
Any single NVIDIA GPU that has sufficient memory, or multiple homogenous NVIDIA GPUs that have sufficient memory in total.	3.6	FP16	9

NV-EmbedQA-E5-v5#

GPU	GPU Memory (GB)	Precision
A100 PCIe	40 & 80	FP16
A100 SXM4	40 & 80	FP16
H100 PCIe	80	FP16
H100 HBM3	80	FP16
H100 NVL	80	FP16
L40s	48	FP16
A10G	24	FP16
L4	24	FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs	GPU Memory	Precision	Disk Space
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory	2	FP16	17

NV-EmbedQA-Mistral7B-v2#

GPU	GPU Memory (GB)	Precision
A100 PCIe	80	FP16
A100 SXM4	80	FP16
H100 HBM3	80	FP8
H100 HBM3	80	FP16
L40s	48	FP8
L40s	48	FP16
A10G	24	FP16
L4	24	FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs	GPU Memory	Precision	Disk Space
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory	16	FP16	30

Snowflake’s Arctic-embed-l#

GPU	GPU Memory (GB)	Precision
A100 PCIe	80	FP16
A100 SXM4	80	FP16
H100 HBM3	80	FP16
L40s	48	FP16
A10G	24	FP16
L4	24	FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs	GPU Memory	Precision	Disk Space
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory	2	FP16	17

Software#

NVIDIA Driver#

Releases prior to 1.4.0-rtx use Triton Inference Server 24.08. Please refer to the Release Notes for Triton on NVIDIA driver support.

Release 1.4.0-rtx uses Triton Inference Server 25.01. Please refer to the Release Notes for Triton on NVIDIA driver support.

If issues arise when you start the NIM containers, run the following code to ensure that the latest NVIDIA drivers are installed.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
 && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
   sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
   sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

NVIDIA Container Toolkit#

Your Docker environment must support NVIDIA GPUs. Please refer to the NVIDIA Container Toolkit for more information.