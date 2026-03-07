Optimization for NVIDIA NeMo Retriever Reranking NIM#

Use this documentation to learn about optimization for NVIDIA NeMo Retriever Reranking NIM.

NVIDIA NeMo Retriever Reranking NIM (NeMo Retriever Reranking NIM) automatically leverages model- and hardware-specific optimizations intended to improve the performance of the models.

The NIM uses the TensorRT backend for Triton Inference Server for optimized inference of common models across a number of NVIDIA GPUs. If an optimized engine does not exist for a SKU being used, a GPU-agnostic ONNX backend (using the CUDA Execution Provider) is used instead.

The NIM includes multiple optimization profiles, catered to floating point precision types supported by each SKU.