Troubleshoot NVIDIA NeMo Retriever Embedding NIM#

Use this documentation to troubleshoot issues that arise when you use NVIDIA NeMo Retriever Embedding NIM.

list-model-profiles command fails#

Some older NIMs don’t support the list-model-profiles command, including the following:

  • nv-embedqa-mistral-7b-v2

  • arctic-embed-l

NIM fails to start#

If you attempt to start a NIM, in some cases it fails to start when you run the docker run command. Some NIM models require that you accept the license terms on NGC before you can pull the container image and model assets. To resolve this issue, browse to the model page on the NGC Catalog, read and then click Accept Terms. For details, refer to Get Started.

NIM fails to start with out-of-memory error#

If you attempt to start a NIM, in some cases it fails to start with an out-of-memory error.

TensorRT pre-allocates memory in accordance with the maximum input size based on the loaded TensorRT profiles. Model instances are a multiplier on VRAM requirements, and different NIMs require widely different amounts of VRAM.

To resolve this issue, use one of the following options:

  • When you run a TensorRT profile on a small VRAM card, adjust the NIM_TRITON_MAX_BATCH_SIZE and NIM_TRITON_MAX_SEQ_LENGTH environment variables.

  • On GPUs without enough VRAM for multiple model instances, run only a single instance of the embedder.