Text Embedding (Latest)

Support Matrix

NeMo Retriever Text Embedding NIM supports the following models:

Model Name

Model ID

Max Tokens

Publisher

Parameters (millions)

Embedding Dimension

NV-EmbedQA-E5-v5 nvidia/nv-embedqa-e5-v5 512 NVIDIA 335 1024
NV-EmbedQA-Mistral7B-v2 nvidia/nv-embedqa-mistral-7b-v2 512 NVIDIA 7110 4096
Snowflake’s Arctic-embed-l snowflake/arctic-embed-l 512 Snowflake 335 1024

NV-EmbedQA-E5-v5

GPU

GPU Memory (GB)

Precision

A100 PCIe 40 & 80 FP16
A100 SXM4 40 & 80 FP16
H100 PCIe 80 FP16
H100 HBM3 80 FP16
L40s 48 FP16
A10G 24 FP16
L4 24 FP16

Non-optimized configuration

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory 2 FP16 17

NV-EmbedQA-Mistral7B-v2

GPU

GPU Memory (GB)

Precision

A100 PCIe 80 FP16
A100 SXM4 80 FP16
H100 HBM3 80 FP8
H100 HBM3 80 FP16
L40s 48 FP8
L40s 48 FP16
A10G 24 FP16
L4 24 FP16

Non-optimized configuration

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory 16 FP16 30

Snowflake’s Arctic-embed-l

GPU

GPU Memory (GB)

Precision

A100 PCIe 80 FP16
A100 SXM4 80 FP16
H100 HBM3 80 FP16
L40s 48 FP16
A10G 24 FP16
L4 24 FP16

Non-optimized configuration

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory 2 FP16 17

NVIDIA Driver

Release 1.0.0 uses Triton Inference Server 24.05. Please refer to the Release Notes for Triton on NVIDIA driver support.

NVIDIA Container Toolkit

Your Docker environment must support NVIDIA GPUs. Please refer to the NVIDIA Container Toolkit for more information.

Previous LangChain Playbook
Next Security & Authentication
© Copyright © 2024, NVIDIA Corporation. Last updated on Oct 29, 2024.