Support Matrix for NeMo Retriever Text Reranking NIM#

This documentation describes the software and hardware that NeMo Retriever Text Reranking NIM supports.

CPU#

Text Reranking NIM requires the following:

Models#

Model Name

Model ID

Max Tokens

Publisher

Model Card

Llama-3.2-NV-RerankQA-1B-v2

nvidia/llama-3-2-nv-rerankqa-1b-v2

8192 (optimized models)

NVIDIA

Link

NV-RerankQA-Mistral4B-v3

nvidia/nv-rerankqa-mistral-4b-v3

512

NVIDIA

Link

Note that when truncate is set to END, any Query / Passage pair that is longer than the maximum token length will be truncated from the right, starting with the passage.

Supported Hardware#

Llama-3.2-NV-RerankQA-1B-v2#

GPU

GPU Memory (GB)

Precision

A100 SXM4

40 & 80

FP16

H100 HBM3

80

FP16 & FP8

H100 NVL

80

FP16 & FP8

L40s

48

FP16 & FP8

A10G

24

FP16

L4

24

FP16 & FP8

B200

180

FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Max Tokens

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory

3.6

FP16

9.5

4096

Warning

The maximum token length of the non-optimized configuration is smaller (4096) than the other profiles (8192).

NV-RerankQA-Mistral4B-v3#

GPU

GPU Memory (GB)

Precision

A100 SXM4

80

FP16

H100 HBM3

80

FP16 & FP8

L40s

48

FP 16 & FP8

A10G

24

FP16

L4

24

FP16

Non-optimized configuration#

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

GPUs

GPU Memory

Precision

Disk Space

Any NVIDIA GPU with sufficient GPU memory or on multiple, homogenous NVIDIA GPUs with sufficient aggregate memory

9

FP16

23

Memory Footprint#

You can control the NIM’s memory footprint by controlling the maximum allowed batch size and sequence length. For more information, refer to Memory Footprint.

The following table provides the set of valid configurations and the associated approximate memory footprints for the model.

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

3.02

8

8192

7.94

16

8192

13.56

30

1024

4.48

30

2048

7.12

30

4096

11.92

30

8192

23.41

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

3.02

8

8192

7.94

16

8192

13.56

30

1024

4.48

30

2048

7.12

30

4096

11.92

30

8192

23.41

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

3.7

8

8192

8.63

16

1024

4.16

16

2048

5.56

16

4096

8.13

16

8192

14.25

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

4.52

8

8192

9.44

16

8192

15.06

30

1024

5.98

30

2048

8.62

30

4096

13.42

30

8192

24.91

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

1.84

8

8192

4.94

16

8192

8.47

30

1024

2.6

30

2048

4.02

30

4096

7.09

30

8192

14.65

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

4.52

8

8192

9.44

16

8192

15.06

30

1024

5.98

30

2048

8.62

30

4096

13.42

30

8192

24.91

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

1.84

8

8192

4.94

16

8192

8.47

30

1024

2.6

30

2048

4.02

30

4096

7.09

30

8192

14.65

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

3.2

8

8192

8.13

16

1024

3.66

16

2048

5.06

16

4096

7.63

16

8192

13.75

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

2.08

8

8192

5.94

16

1024

2.04

16

2048

2.8

16

4096

4.44

16

8192

8.47

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

3.02

8

8192

7.94

16

8192

13.56

30

1024

4.48

30

2048

7.12

30

4096

11.92

30

8192

23.41

Max Batch Size

Max Sequence Length

Approximate GPU Memory Size (GB)

1

8192

1.83

8

8192

4.94

16

8192

8.47

30

1024

2.6

30

2048

4.02

30

4096

7.09

30

8192

14.65

Software#

NVIDIA Driver#

Release 1.0.0 uses Triton Inference Server 24.05. Please refer to the Release Notes for Triton on NVIDIA driver support.

NVIDIA Container Toolkit#

Your Docker environment must support NVIDIA GPUs. For more information, refer to NVIDIA Container Toolkit.