Troubleshoot NVIDIA NeMo Retriever Embedding NIM#
Use this documentation to troubleshoot issues that arise when you use NVIDIA NeMo Retriever Embedding NIM.
list-model-profiles command fails#
Some older NIMs don’t support the list-model-profiles command,
including the following:
nv-embedqa-mistral-7b-v2
arctic-embed-l
NIM fails to start#
If you attempt to start a NIM, in some cases it fails to start when you run the docker run command.
Some NIM models require that you accept the license terms on NGC before you can pull the container image and model assets.
To resolve this issue, browse to the model page on the NGC Catalog, read and then click Accept Terms.
For details, refer to Get Started.
Docker pull fails with error from registry: Incorrect Repository Format#
Some NeMo Retriever Embedding NIM container image tags fail to pull on Docker Engine 29.5.x when the Docker containerd image store is enabled.
The issue can occur before the NIM container starts.
The pull fails with the message error from registry: Incorrect Repository Format.
To check whether your Docker daemon is using the containerd image store, run the following command.
docker info --format '{{json .DriverStatus}}'
If the output includes io.containerd.snapshotter.v1, and docker version reports Docker Engine 29.5.x, use one of the following workarounds.
Option 1: Use Docker Engine 29.4.3#
Use an earlier Docker Engine version, such as 29.4.3. Change the Docker Engine version, restart Docker, and then retry the pull.
Option 2: Pull by exact manifest digest#
On linux/amd64 systems, you can keep Docker Engine 29.5.x and pull the current image by the exact linux/amd64 manifest digest.
The following commands use jq to extract the linux/amd64 manifest digest for nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:2.0.0.
Do not use the top-level image-index digest.
Run the following commands to pull by digest and tag the image locally.
AMD64_DIGEST=$(docker buildx imagetools inspect nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:2.0.0 --format '{{json .}}' \
| jq -r '.manifest.manifests[] | select(.platform.os == "linux" and .platform.architecture == "amd64") | .digest')
docker pull nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2@${AMD64_DIGEST}
docker tag nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2@${AMD64_DIGEST} nvcr.io/nim/nvidia/llama-nemotron-embed-vl-1b-v2:2.0.0
NIM fails to start with out-of-memory error#
If you attempt to start a NIM, in some cases it fails to start with an out-of-memory error.
NIMs pre-allocate memory in accordance with the maximum input size that may be operated on. Model instances are a multiplier on VRAM requirements, and different NIMs require widely different amounts of VRAM.
To resolve this issue, use one of the following options:
When you run a NIM on a small VRAM card, adjust the
NIM_MAX_BATCH_SIZEandNIM_MAX_SEQ_LENGTHenvironment variables.On GPUs without enough VRAM for multiple model instances, run only a single instance of the embedder.