Text Embedding (Latest)
Microservices

Support Matrix

NeMo Retriever Text Embedding NIM supports the following models:

Model Name

Model ID

Max Tokens

Publisher

NV-EmbedQA-E5-v5 nvidia/nv-embedqa-e5-v5 512 NVIDIA
NV-EmbedQA-Mistral7B-v2 nvidia/nv-embedqa-mistral-7b-v2 512 NVIDIA
Snowflake’s Arctic-embed-l snowflake/arctic-embed-l 512 Snowflake

NV-EmbedQA-E5-v5

GPU

GPU Memory (GB)

Precision

A100 PCIe 80 FP16
A100 SXM4 80 FP16
H100 HBM3 80 FP16
L40s 48 FP16
A10G 24 FP16
L4 24 FP16

Non-optimized configuration

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory 2 FP16 17

NV-EmbedQA-Mistral7B-v2

GPU

GPU Memory (GB)

Precision

A100 PCIe 80 FP16
A100 SXM4 80 FP16
H100 HBM3 80 FP8
H100 HBM3 80 FP16
L40s 48 FP8
L40s 48 FP16
A10G 24 FP16
L4 24 FP16

Non-optimized configuration

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory 16 FP16 30

Snowflake’s Arctic-embed-l

GPU

GPU Memory (GB)

Precision

A100 PCIe 80 FP16
A100 SXM4 80 FP16
H100 HBM3 80 FP16
L40s 48 FP16
A10G 24 FP16
L4 24 FP16

Non-optimized configuration

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory 2 FP16 17

NVIDIA Driver

Release 1.0.0 uses Triton Inference Server 24.05. Please refer to the Release Notes for Triton on NVIDIA driver support.

NVIDIA Container Toolkit

Your Docker environment must support NVIDIA GPUs. Please refer to the NVIDIA Container Toolkit for more information.

Previous LangChain Playbook
Next Security & Authentication
© Copyright © 2024, NVIDIA Corporation. Last updated on Jul 23, 2024.