Support Matrix#

Models#

NeMo Retriever Text Embedding NIM supports the following models:

Model Name	Model ID	Max Tokens	Publisher	Parameters (millions)	Embedding Dimension	Dynamic Embeddings Supported
Llama-3.2-NV-EmbedQA-1B-v2	nvidia/llama-3.2-nv-embedqa-1b-v2	8192	NVIDIA	1236	2048	yes
NV-EmbedQA-E5-v5	nvidia/nv-embedqa-e5-v5	512	NVIDIA	335	1024	no
NV-EmbedQA-Mistral7B-v2	nvidia/nv-embedqa-mistral-7b-v2	512	NVIDIA	7110	4096	no
Snowflake’s Arctic-embed-l	snowflake/arctic-embed-l	512	Snowflake	335	1024	no

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs	GPU Memory	Precision	Disk Space
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory	3.6	FP16	20.2

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs	GPU Memory	Precision	Disk Space
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory	2	FP16	17

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs	GPU Memory	Precision	Disk Space
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory	16	FP16	30

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs	GPU Memory	Precision	Disk Space
Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory	2	FP16	17

Release 1.2.0 uses Triton Inference Server 24.08. Please refer to the Release Notes for Triton on NVIDIA driver support.

Your Docker environment must support NVIDIA GPUs. Please refer to the NVIDIA Container Toolkit for more information.