Troubleshoot NVIDIA NIM for LLMs#
Use this documentation to troubleshoot issues that arise when you work with NVIDIA NIM for Large Language Models (LLMs).
Missing files in LLM-NIM deployment#
Supported model formats explains the expected folder structures for remote and local model deployment options in LLM-NIMs. This section describes some common failure cases.
Note
list-model-profiles –model $NIM_MODEL_NAME` can be used to inspect model compatability and debug common fixes in model artifacts.
Missing HuggingFace config.json
#
HuggingFace configuration file is loaded to derive metadata such as model architecture, context length, batch size etc for vLLM, TRTLLM and SGLang backends. If local models or remote URIs (after download and caching) do not contain the HuggingFace config.json
, NIM will throw following exception:
ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Missing HuggingFace configuration file named config.json.
Note
For unified HuggingFace checkpoints, hf_quant_config.json
should be present in the folder along with config.json
file.
In order to fix this issue, download the config.json
file from HuggingFace Hub for the deployed model and place it at the root directory. In case of quantized GGUF formats, this file is usually present in the corresponding full-precision repositories mentioned in model cards.
For example, QuantFactory Meta-Llama-3-8B-Instruct-GGUF repository will only have GGUF files for deployment. Tokenizer and config.json
should be downloaded from the Full-precision Meta-Llama-3-8B-Instruct.
Missing TRTLLM checkpoint / engine configuration#
TRTLLM checkpoint/ engine conversion scripts dump a pretrained/ engine configuration files along with the safetensors/engine files. NIM expects these configuration files to be part of the trtllm_ckpt
and trtllm_engine
sub-folder as described in Supported model formats in TRTLLM checkpoint / engine sections. If missing, NIM will throw following exception:
Note
Looking for HuggingFace safetensors. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json └── trtllm_ckpt … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.
Please ensure that the TRTLLM config.json
files are placed in the respective sub-folders depending on the model format type.
Missing weights#
HuggingFace safetensors#
NIM requires TensorRT-LLM engine files to be present in the model directory, either locally or after downloading and caching from a remote URI. These engine files must follow the naming convention:
*.safetensors
If these files are not found, NIM will raise an error during model initialization.
Note
Looking for HuggingFace safetensors weights. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.
HuggingFace GGUF#
NIM requires TensorRT-LLM engine files to be present in the model directory, either locally or after downloading and caching from a remote URI. These engine files must follow the naming convention:
*.gguf
If these files are not found, NIM will raise an error during model initialization.
Note
Looking for HuggingFace safetensors weights. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf weights. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.
TRTLLM checkpoint#
NIM requires TRTLLM checkpoint files to be present in the model directory, either locally or after downloading and caching from a remote URI. These engine files must follow the naming convention:
trtllm_ckpt/rank.*.safetensors
If these files are not found, NIM will raise an error during model initialization.
Note
Looking for TRTLLM checkpoint. Missing TRTLLM rank.*safetensors in /models/llama3-8b-instruct. Looking for HuggingFace safetensors. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json └── trtllm_ckpt Found following files in /models/llama3-8b-instruct/trtllm_ckpt └── config.json … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.
TRTLLM engine#
NIM requires TensorRT-LLM engine files to be present in the model directory, either locally or after downloading and caching from a remote URI. These engine files must follow the naming convention:
trtllm_engine/rank.*.engine
If these files are not found, NIM will raise an error during model initialization.
Note
Looking for TRTLLM engine. Missing TRTLLM rank.*engine files in /models/llama3-8b-instruct. Looking for HuggingFace safetensors. Missing safetensors weights in /models/llama3-8b-instruct. Looking for HuggingFace gguf. Missing gguf weights in /models/llama3-8b-instruct. Found following files in /models/llama3-8b-instruct ├── config.json └── trtllm_engine Found following files in /models/llama3-8b-instruct/trtllm_engine └── config.json … ValueError: Invalid repository ID or local directory specified: /models/llama3-8b-instruct/. Expected model format to be one of [‘hf-safetensor’, ‘trtllm-engine’, ‘trtllm-ckpt’, ‘gguf’]. Please check NIM documentation for supported model formats and folder structures.
Incorrect / incomplete files#
HuggingFace config.json
- Lacks Model Architecture#
If you are deploying a model from HuggingFace or local storage and see the following error, ensure that a valid HuggingFace config.json
file with the required model architecture configuration exists:
ValueError: Model architecture is None. Please check for valid model architecture in HF configuration file or contact NIM support.
Note
NIM throws this exception because it can not filter the supported backends if model architecture is missing.
Unknown Weights Format#
If you see the error ValueError: Found unknown format of weights in /path/to/model. Expected one of ['hf-safetensor', 'trtllm-engine', 'trtllm-ckpt', 'gguf'].
, verify that the model weights are in one of the supported formats listed in the error message. For more information, refer to Model Format.
Guided Decoding Fails to Recognize Whitespace Pattern#
Using the outlines
backend for regular expressions can cause them to fail to compile.
You might see a message such as "guided_whitespace_pattern": "*invalid*pattern*"
.
In this case, use the xgrammar
backend instead, which supports a wider range of regular expressions than outlines
.
For more information, refer to Structured Generation.
vLLM profile fails to deploy#
If the vLLM profile fails to deploy, it is likely due to insufficient GPU resources. You can try the following troubleshooting options:
Allocate more GPU resources.
Reduce the value of
NIM_MAX_MODEL_LEN
. Start by setting it to 70,000. If deployment continues to fail, try lowering the value further. For more information, refer to Configure Your NIM with NVIDIA NIM for LLMs.
Trust remote code#
If you see following exception styles prompting to set --trust-remote-code
, you need to set NIM_FORCE_TRUST_REMOTE_CODE
to read custom models and configurations.
RuntimeError: Failed to load the model config. If the model is a custom model not yet available in the HuggingFace transformers library, consider setting `trust_remote_code=True` in LLM or using the `--trust-remote-code` flag in the CLI.
Skipping NIM_SERVED_MODEL_NAME#
In case of local model deployment in LLM-NIMs, you might see that the curl requests expect model name to be the absolute path to the local model.
export NIM_MODEL_NAME=/models/llama/custom/llama3-8b-instruct/
Example request logged in NIM after startup should have following:
curl -X 'POST' \\
'http://0.0.0.0:8000/v1/completions' \\
-H 'accept: application/json' \\
-H 'Content-Type: application/json' \\
-d '{{
"model": "=/models/llama/custom/llama3-8b-instruct/",
"prompt": "hello world!",
"top_p": 1,
"n": 1,
"max_tokens": 15,
"stream": true,
"frequency_penalty": 1.0,
"stop": ["hello"]
}}'
It is recommended to set NIM_SERVED_MODEL_NAME
to a custom model name to avoid having a long path sent in every request.
export NIM_SERVED_MODEL_NAME=llama3-8b-instruct
curl -X 'POST' \\
'http://0.0.0.0:8000/v1/completions' \\
-H 'accept: application/json' \\
-H 'Content-Type: application/json' \\
-d '{{
"model": "lama3-8b-instruct",
"prompt": "hello world!",
"top_p": 1,
"n": 1,
"max_tokens": 15,
"stream": true,
"frequency_penalty": 1.0,
"stop": ["hello"]
}}'