Text Reranking (Latest)
Microservices

Support Matrix

Model Name

Model ID

Max Tokens

Publisher

NV-RerankQA-Mistral4B-v3 nvidia/nv-rerankqa-mistral-4b-v3 512 NVIDIA

Note that when truncate is set to END, any Query / Passage pair that is longer than the maximum token length will be truncated from the right, starting with the passage.

NV-RerankQA-Mistral4B-v3

GPU

GPU Memory (GB)

Precision

A100 PCIe 80 FP16
A100 SXM4 80 FP16
H100 HBM3 80 FP8
H100 HBM3 80 FP16
L40s 48 FP8
L40s 48 FP16
A10G 24 FP16
L4 24 FP16

Non-optimized configuration

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory 9 FP16 23

NVIDIA Driver

Release 1.0.0 uses Triton Inference Server 24.05. Please refer to the Release Notes for Triton on NVIDIA driver support.

NVIDIA Container Toolkit

Your Docker environment must support NVIDIA GPUs. Please refer to the NVIDIA Container Toolkit for more information.

Previous Using Reranking
Next LangChain Playbook
© Copyright © 2024, NVIDIA Corporation. Last updated on Jul 23, 2024.