Troubleshoot NVIDIA NIM for LLMs#

Use this documentation to troubleshoot issues that arise when you work with NVIDIA NIM for Large Language Models (LLMs).

Missing Files in LLM-NIM Deployment#

Supported model formats explains the expected folder structures for remote and local model deployment options in LLM-NIMs. This section describes common failure cases.

Note

Use list-model-profiles --model $NIM_MODEL_NAME to inspect model compatibility and debug model artifacts.

Missing Hugging Face `config.json`#

NIM loads the Hugging Face configuration file to derive metadata such as model architecture, context length, and batch size for vLLM, TensorRT-LLM, and SGLang backends. If local models or remote URIs (after download and caching) do not contain the Hugging Face config.json, NIM raises the following exception:

ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Missing HuggingFace configuration file named config.json.

Note

For unified Hugging Face checkpoints, hf_quant_config.json should be present in the folder along with config.json.

To fix this issue, download config.json from Hugging Face Hub for the deployed model and place it at the root directory. For quantized GGUF formats, this file is usually present in the corresponding full-precision repositories mentioned in model cards.

For example, QuantFactory Meta-Llama-3-8B-Instruct-GGUF repository will only have GGUF files for deployment. Tokenizer and config.json should be downloaded from the Full-precision Meta-Llama-3-8B-Instruct.

Missing TensorRT-LLM Checkpoint or Engine Configuration#

TensorRT-LLM checkpoint and engine conversion scripts write pretrained and engine configuration files with the safetensors or engine files. NIM expects these configuration files in the trtllm_ckpt and trtllm_engine subfolders as described in Supported Model Formats. If missing, NIM raises the following exception:

Looking for HuggingFace safetensors. Missing safetensors weights in /models/llama3-8b-instruct.
Looking for HuggingFace gguf. Missing gguf weights in /models/llama3-8b-instruct.
Found following files in /models/llama3-8b-instruct
 ├── config.json
 └── trtllm_ckpt
...
ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of ['hf-safetensor', 'trtllm-engine', 'trtllm-ckpt', 'gguf']. Please check NIM documentation for supported model formats and folder structures.

Ensure that the TensorRT-LLM config.json files are placed in the respective subfolders depending on the model format type.

Missing weights#

Hugging Face safetensors#

NIM requires TensorRT-LLM engine files to be present in the model directory, either locally or after downloading and caching from a remote URI. These engine files must follow the naming convention:

*.safetensors

If these files are not found, NIM will raise an error during model initialization.

Note

Looking for HuggingFace safetensors weights. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.

Hugging Face GGUF#

NIM requires TensorRT-LLM engine files to be present in the model directory, either locally or after downloading and caching from a remote URI. These engine files must follow the naming convention:

*.gguf

If these files are not found, NIM will raise an error during model initialization.

Note

Looking for HuggingFace safetensors weights. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf weights. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.

TensorRT-LLM Checkpoint#

NIM requires TRTLLM checkpoint files to be present in the model directory, either locally or after downloading and caching from a remote URI. These engine files must follow the naming convention:

trtllm_ckpt/rank.*.safetensors

If these files are not found, NIM will raise an error during model initialization.

Note

Looking for TRTLLM checkpoint. Missing TRTLLM rank.*safetensors in /models/llama3-8b-instruct. Looking for HuggingFace safetensors. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json └── trtllm_ckpt Found following files in /models/llama3-8b-instruct/trtllm_ckpt └── config.json … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.

TensorRT-LLM Engine#

NIM requires TensorRT-LLM engine files to be present in the model directory, either locally or after downloading and caching from a remote URI. These engine files must follow the naming convention:

trtllm_engine/rank.*.engine

If these files are not found, NIM will raise an error during model initialization.

Note

Looking for TRTLLM engine. Missing TRTLLM rank.*engine files in /models/llama3-8b-instruct. Looking for HuggingFace safetensors. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json └── trtllm_engine Found following files in /models/llama3-8b-instruct/trtllm_engine └── config.json … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.

Incorrect or Incomplete Files#

Hugging Face `config.json`: Missing Model Architecture#

If you are deploying a model from HuggingFace or local storage and see the following error, ensure that a valid HuggingFace config.json file with the required model architecture configuration exists:

ValueError: Model architecture is None. Please check for valid model architecture in HF configuration file or contact NIM support.

Note

NIM throws this exception because it cannot filter the supported backends if the model architecture is missing.

Unknown Weights Format#

If you see the error ValueError: Found unknown format of weights in /path/to/model. Expected one of ['hf-safetensor', 'trtllm-engine', 'trtllm-ckpt', 'gguf']., verify that the model weights are in one of the supported formats listed in the error message. For more information, refer to Model Formats.

Guided Decoding Fails to Recognize Whitespace Pattern#

Using the outlines backend for regular expressions can cause them to fail to compile. You might see a message such as "guided_whitespace_pattern": "*invalid*pattern*". In this case, use the xgrammar backend instead, which supports a wider range of regular expressions than outlines. For more information, refer to Structured Generation.

vLLM profile fails to deploy#

If the vLLM profile fails to deploy, it is likely due to insufficient GPU resources. You can try the following troubleshooting options:

Allocate more GPU resources.
Reduce the value of NIM_MAX_MODEL_LEN. Start by setting it to 70,000. If deployment continues to fail, try lowering the value further. For more information, refer to Configure Your NIM with NVIDIA NIM for LLMs.

Cannot bind on address#

By default, NIMs are launched on port 8000 on the host system; see the -p parameter in Docker Run Parameters. Don’t set environment variables to use this port for other processes (for example, set VLLM_PORT=8000) because you will receive an error of the form Can't bind on address ('0.0.0.0', 8000): address already in use.

If you must use the default port number for other processes, then change the host IP port using the -p parameter for docker run.

Trust remote code#

If you see following exception styles prompting to set --trust-remote-code, you need to set NIM_FORCE_TRUST_REMOTE_CODE to read custom models and configurations.

RuntimeError: Failed to load the model config. If the model is a custom model not yet available in the HuggingFace transformers library, consider setting `trust_remote_code=True` in LLM or using the `--trust-remote-code` flag in the CLI.

Skipping NIM_SERVED_MODEL_NAME#

In case of local model deployment in LLM-NIMs, you might see that the curl requests expect model name to be the absolute path to the local model.

export NIM_MODEL_NAME=/models/llama/custom/llama3-8b-instruct/

Example request logged in NIM after startup should have following:

curl -X 'POST' \\
  'http://0.0.0.0:8000/v1/completions' \\
  -H 'accept: application/json' \\
  -H 'Content-Type: application/json' \\
  -d '{{
    "model": "=/models/llama/custom/llama3-8b-instruct/",
    "prompt": "hello world!",
    "top_p": 1,
    "n": 1,
    "max_tokens": 15,
    "stream": true,
    "frequency_penalty": 1.0,
    "stop": ["hello"]
  }}'

It is recommended to set NIM_SERVED_MODEL_NAME to a custom model name to avoid having a long path sent in every request.

export NIM_SERVED_MODEL_NAME=llama3-8b-instruct

curl -X 'POST' \\
  'http://0.0.0.0:8000/v1/completions' \\
  -H 'accept: application/json' \\
  -H 'Content-Type: application/json' \\
  -d '{{
    "model": "lama3-8b-instruct",
    "prompt": "hello world!",
    "top_p": 1,
    "n": 1,
    "max_tokens": 15,
    "stream": true,
    "frequency_penalty": 1.0,
    "stop": ["hello"]
  }}'

Troubleshoot NVIDIA NIM for LLMs#

Missing Files in LLM-NIM Deployment#

Missing Hugging Face config.json#

Missing TensorRT-LLM Checkpoint or Engine Configuration#

Missing weights#

Hugging Face safetensors#

Hugging Face GGUF#

TensorRT-LLM Checkpoint#

TensorRT-LLM Engine#

Incorrect or Incomplete Files#

Hugging Face config.json: Missing Model Architecture#

Unknown Weights Format#

Guided Decoding Fails to Recognize Whitespace Pattern#

vLLM profile fails to deploy#

Cannot bind on address#

Trust remote code#

Skipping NIM_SERVED_MODEL_NAME#

Missing Hugging Face `config.json`#

Hugging Face `config.json`: Missing Model Architecture#