Troubleshoot NVIDIA NeMo Retriever Reranking NIM#

Use this documentation to troubleshoot issues that arise when you use NVIDIA NeMo Retriever Reranking NIM.

list-model-profiles command is not available#

Starting in version 2.0.0, NeMo Retriever Reranking NIM automatically selects an optimized inference pipeline at startup. No user action is required to select profiles, and the list-model-profiles command is not available. For details, refer to Automatic Pipeline Selection.

To request a specific precision, use NIM_PRECISION. For details, refer to Precision Override.

NIM fails to start#

If you attempt to start a NIM, in some cases it fails to start when you run the docker run command. Some NIM models require that you accept the license terms on NGC before you can pull the container image and model assets. To resolve this issue, browse to the model page on the NGC Catalog, read and then click Accept Terms. For details, refer to Get Started.

NIM fails to start with out-of-memory error#

If you attempt to start a NIM, in some cases it fails to start with an out-of-memory error.

The runtime pre-allocates memory in accordance with the selected inference pipeline and configured input limits. Engine instances are a multiplier on VRAM requirements, and different NIMs require widely different amounts of VRAM.

To resolve this issue, use one of the following options:

  • Ensure your GPU has sufficient VRAM for the model. Refer to the support matrix for approximate memory requirements by compute capability.

  • When you run a NIM on a small VRAM card, adjust the NIM_MAX_BATCH_SIZE and NIM_MAX_SEQ_LEN environment variables.

  • On GPUs without enough VRAM for multiple engines, set NIM_ENGINE_COUNT=1.