Support Matrix for NeMo Retriever Text Embedding NIM#

This documentation describes the software and hardware that NeMo Retriever Text Embedding NIM supports.

Models#

NeMo Retriever Text Embedding NIM supports the following models:

Model Name

Model ID

Max Tokens

Publisher

Parameters
(millions)

Embedding
Dimension

Dynamic Embeddings
Supported

Model Card

Llama-3.2-NV-EmbedQA-1B-v2

nvidia/llama-3.2-nv-embedqa-1b-v2

8192

NVIDIA

1236

2048

yes

Link

NV-EmbedQA-E5-v5

nvidia/nv-embedqa-e5-v5

512

NVIDIA

335

1024

no

Link

NV-EmbedQA-Mistral7B-v2

nvidia/nv-embedqa-mistral-7b-v2

512

NVIDIA

7110

4096

no

Link

Snowflake’s Arctic-embed-l

snowflake/arctic-embed-l

512

Snowflake

335

1024

no

Link

Supported Hardware#

Llama-3.2-NV-EmbedQA-1B-v2#

GPU

GPU Memory (GB)

Precision

A100 PCIe

40 & 80

FP16

A100 SXM4

40 & 80

FP16

H100 PCIe

80

FP16 & FP8

H100 HBM3

80

FP16 & FP8

H100 NVL

80

FP16 & FP8

L40s

48

FP16 & FP8

A10G

24

FP16

L4

24

FP16 & FP8

GeForce RTX 4090 (Beta)

24

FP16

NVIDIA RTX 6000 Ada Generation (Beta)

48

FP16

GeForce RTX 5080 (Beta)

16

FP16

GeForce RTX 5090 (Beta)

32

FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any single NVIDIA GPU that has sufficient memory, or multiple homogenous NVIDIA GPUs that have sufficient memory in total.

3.6

FP16

9

NV-EmbedQA-E5-v5#

GPU

GPU Memory (GB)

Precision

A100 PCIe

40 & 80

FP16

A100 SXM4

40 & 80

FP16

H100 PCIe

80

FP16

H100 HBM3

80

FP16

H100 NVL

80

FP16

L40s

48

FP16

A10G

24

FP16

L4

24

FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory

2

FP16

17

NV-EmbedQA-Mistral7B-v2#

GPU

GPU Memory (GB)

Precision

A100 PCIe

80

FP16

A100 SXM4

80

FP16

H100 HBM3

80

FP8

H100 HBM3

80

FP16

L40s

48

FP8

L40s

48

FP16

A10G

24

FP16

L4

24

FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory

16

FP16

30

Snowflake’s Arctic-embed-l#

GPU

GPU Memory (GB)

Precision

A100 PCIe

80

FP16

A100 SXM4

80

FP16

H100 HBM3

80

FP16

L40s

48

FP16

A10G

24

FP16

L4

24

FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory

2

FP16

17

Software#

NVIDIA Driver#

Releases prior to 1.4.0-rtx use Triton Inference Server 24.08. Please refer to the Release Notes for Triton on NVIDIA driver support.

Release 1.4.0-rtx uses Triton Inference Server 25.01. Please refer to the Release Notes for Triton on NVIDIA driver support.

If issues arise when you start the NIM containers, run the following code to ensure that the latest NVIDIA drivers are installed.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
 && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
   sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
   sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

NVIDIA Container Toolkit#

Your Docker environment must support NVIDIA GPUs. Please refer to the NVIDIA Container Toolkit for more information.