Troubleshoot NVIDIA NIM for LLMs#

Use this documentation to troubleshoot issues that arise when you work with NVIDIA NIM for large language models (LLMs).

Guided decoding fails to recognize whitespace pattern#

Using the outlines backend for regular expressions might cause the regular expressions to fail to compile. You might see a message such as "guided_whitespace_pattern": "*invalid*pattern*". In this case, use the xgrammar backend instead, which supports a wider range of regular expressions than outlines. For more information, refer to Structured Generation with NVIDIA NIM for LLMs.

vLLM profile fails to deploy#

If the vLLM profile fails to deploy, it is likely because of GPU resource problems. You can try the following troubleshooting options:

  • Allocate more GPU resources.

  • Reduce the value of NIM_MAX_MODEL_LEN. Start by setting it to 70,000. You can lower the value if this does not work. For more information, refer to Configure Your NIM with NVIDIA NIM for LLMs.

Cannot bind on address#

By default, NIMs are launched on port 8000 on the host system; see the -p parameter in Docker Run Parameters. Don’t set environment variables to use this port for other processes (for example, set VLLM_PORT=8000) because you will receive an error of the form Can't bind on address ('0.0.0.0', 8000): address already in use.

If you must use the default port number for other processes, then change the host IP port using the -p parameter for docker run.