Environment Variables#
This page documents all environment variables supported by NIM LLM.
Set variables using -e flags when you run the container:
docker run -d --rm --gpus all \
-p 8000:8000 \
-v /path/to/cache:/opt/nim/.cache \
-e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
-e NIM_SERVER_PORT=8000 \
-e NIM_LOG_LEVEL=INFO \
-e NGC_API_KEY \
-e HF_TOKEN \
<image>
Logging#
The following variables control log format and verbosity:
- NIM_LOG_LEVEL: str | None#
Controls the verbosity of NIM log output. Accepts standard Python logging levels:
DEBUG,INFO,WARNING,ERROR,CRITICAL.- Default:
None(uses application default)- Type:
string
- Example:
NIM_LOG_LEVEL=DEBUG
- NIM_JSONL_LOGGING: bool#
Enables structured JSON Lines (JSONL) log output.
- Default:
False- Type:
boolean
- Example:
NIM_JSONL_LOGGING=true
For usage details and examples, refer to Logging and Observability.
Model Configuration#
The following variables control model selection, model loading, and related runtime behavior:
- NIM_MODEL_PROFILE: string = None#
Selects which model profile to use. Profiles define a validated combination of model variant, precision, and parallelism settings for a given GPU configuration. Run
list-model-profilesinside the container to see available profiles and their IDs.- Default:
auto-selected based on detected GPU hardware
- Example:
NIM_MODEL_PROFILE=07cd4f2bddd7a14ca84bab0a32602889fd0ae0eb76dc2eb0fc32594d065011a4
- NIM_MODEL_PATH: str | None#
Model source URI or local filesystem path. Accepts
hf://,ngc://, andmodelscope://prefixes for remote repositories, or a local directory path. When set, a runtime manifest is generated from this URI instead of using the baked-in container manifest.- Default:
None(uses baked-in manifest andNIM_MODEL_PROFILE)- Type:
string
- Example:
NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct
- NIM_SERVED_MODEL_NAME: str | None#
Overrides the served model name returned in API responses. When set, the
/v1/modelsendpoint and response metadata use this name instead of the default model identifier.- Default:
None(uses the model’s own identifier)- Type:
string
- Example:
NIM_SERVED_MODEL_NAME=my-llama
- NIM_MAX_MODEL_LEN: int | None#
Overrides the maximum sequence length (context window) for the model. Values larger than the model’s trained maximum may cause errors.
- Default:
None(uses model’s default from config)- Type:
positive integer
- Example:
NIM_MAX_MODEL_LEN=4096
- NIM_TENSOR_PARALLEL_SIZE: int | None#
Overrides the tensor parallelism degree. Splits model layers across the specified number of GPUs for inference.
- Default:
None(auto-detected from profile)- Type:
positive integer
- Example:
NIM_TENSOR_PARALLEL_SIZE=2
- NIM_PIPELINE_PARALLEL_SIZE: int | None#
Overrides the pipeline parallelism degree. Distributes model stages across the specified number of GPUs for inference.
- Default:
None(auto-detected from profile)- Type:
positive integer
- Example:
NIM_PIPELINE_PARALLEL_SIZE=2
- NIM_NUM_COMPUTE_NODES: integer = None#
Total number of compute nodes for multi-node inference. In multi-node deployments, set this on both the leader and worker nodes to the total node count (leader + workers).
- Default:
None(single-node operation)- Example:
NIM_NUM_COMPUTE_NODES=2
- NIM_REPOSITORY_OVERRIDE: string = None#
Redirects model downloads to an external repository while preserving the NIM manifest semantics. The container still uses the baked-in manifest for profile selection, but fetches model files from the overridden source.
- Default:
None(downloads from the URI specified in the manifest)- Example:
NIM_REPOSITORY_OVERRIDE=s3://my-bucket/models
- NIM_DISABLE_MODEL_DOWNLOAD: boolean = None#
Skips model download during container startup. Useful in multi-node deployments where worker nodes use a pre-staged shared filesystem and only the leader node needs to download.
- Default:
False- Example:
NIM_DISABLE_MODEL_DOWNLOAD=true
- NIM_TRUST_CUSTOM_CODE: bool#
Allows dynamic module loading for custom model code. Required for models that ship custom tokenizer or modeling files.
- Default:
False- Type:
boolean
- Example:
NIM_TRUST_CUSTOM_CODE=true
Server#
The following variables control server and health-check ports:
- NIM_SERVER_PORT: int | None#
Port for the external-facing HTTP API server.
- Default:
None(uses container default)- Type:
integer
- Example:
NIM_SERVER_PORT=9000
- NIM_HEALTH_PORT: int | None#
Port for the proxy health endpoints (
/v1/health/liveand/v1/health/ready).- Default:
None(defaults toNIM_SERVER_PORT)- Type:
integer
- Example:
NIM_HEALTH_PORT=8001
LoRA and PEFT#
The following variables control LoRA and PEFT adapter discovery and refresh behavior:
- NIM_PEFT_SOURCE: str | None#
URI for the LoRA adapter source (local path or NGC URI).
- Default:
None(LoRA disabled)- Type:
string
- Example:
NIM_PEFT_SOURCE=/adapters
- NIM_PEFT_REFRESH_INTERVAL: int | None#
Polling interval in seconds for the dynamic LoRA watcher. When set, NIM periodically checks the PEFT source for new or removed adapters.
- Default:
None(dynamic reloading disabled)- Type:
positive integer
- Example:
NIM_PEFT_REFRESH_INTERVAL=30
- NIM_PEFT_API_TIMEOUT_SECS: float | None#
Timeout in seconds for dynamic LoRA adapter API calls.
- Default:
30.0- Type:
positive float
- Example:
NIM_PEFT_API_TIMEOUT_SECS=60
Model Cache#
The following variable controls the local model cache location:
- NIM_CACHE_PATH: str#
Directory path for NIM’s local model and artifact cache.
- Default:
/opt/nim/.cache- Type:
string
- Example:
NIM_CACHE_PATH=/mnt/models/.cache
Authentication#
The following variables provide credentials for authenticated model downloads:
- NGC_API_KEY: string = None#
API key for authenticated model downloads from NGC (NVIDIA GPU Cloud). Required when downloading models from
ngc://repositories.- Default:
None- Example:
NGC_API_KEY=nvapi-...
- NGC_CLI_API_KEY: string = None#
Backward-compatible NGC credential source. When both
NGC_CLI_API_KEYandNGC_API_KEYare set,NGC_CLI_API_KEYtakes precedence.- Default:
None- Example:
NGC_CLI_API_KEY=nvapi-...
- HF_TOKEN: string = None#
Authentication token for Hugging Face Hub. Required for downloading private or gated models from
hf://repositories.- Default:
None- Example:
HF_TOKEN=hf_...
- MODELSCOPE_API_TOKEN: string = None#
Authentication token for ModelScope. Required for authenticated downloads from
modelscope://repositories and to avoid rate limiting.- Default:
None- Example:
MODELSCOPE_API_TOKEN=...
SSL and TLS#
The following variables control TLS termination at the nginx proxy layer:
- NIM_SSL_MODE: string = None#
Controls TLS termination at the nginx proxy.
DISABLED– plain HTTP (default)TLS– server-side TLS; requiresNIM_SSL_KEY_PATHandNIM_SSL_CERTS_PATHMTLS– mutual TLS; additionally requiresNIM_SSL_CA_CERTS_PATH
- Default:
DISABLED- Example:
NIM_SSL_MODE=TLS
- NIM_SSL_KEY_PATH: string = None#
Path to the SSL private key file. Required when
NIM_SSL_MODEisTLSorMTLS.- Default:
None- Example:
NIM_SSL_KEY_PATH=/etc/ssl/private/server.key
- NIM_SSL_CERTS_PATH: string = None#
Path to the SSL certificate file. Required when
NIM_SSL_MODEisTLSorMTLS.- Default:
None- Example:
NIM_SSL_CERTS_PATH=/etc/ssl/certs/server.crt
- NIM_SSL_CA_CERTS_PATH: string = None#
Path to the CA certificate file for client verification. Required when
NIM_SSL_MODEisMTLS.- Default:
None- Example:
NIM_SSL_CA_CERTS_PATH=/etc/ssl/certs/ca.crt
CORS#
These variables configure Cross-Origin Resource Sharing (CORS) policy at the nginx proxy layer.
- NIM_CORS_ALLOW_ORIGINS: string = None#
Comma-separated list of allowed request origins, or
*for any origin.- Default:
*- Example:
NIM_CORS_ALLOW_ORIGINS=https://example.com
- NIM_CORS_ALLOW_METHODS: string = None#
Allowed HTTP methods for CORS requests.
- Default:
GET, POST, PUT, DELETE, PATCH, OPTIONS- Example:
NIM_CORS_ALLOW_METHODS=GET, POST, OPTIONS
- NIM_CORS_ALLOW_HEADERS: string = None#
Allowed request headers for CORS requests.
- Default:
Content-Type, Authorization, X-Request-Id, X-Session-Id, X-Correlation-Id- Example:
NIM_CORS_ALLOW_HEADERS=Content-Type, Authorization
- NIM_CORS_EXPOSE_HEADERS: string = None#
Response headers that are exposed to the browser in CORS responses.
- Default:
X-Request-Id- Example:
NIM_CORS_EXPOSE_HEADERS=X-Request-Id, X-Correlation-Id
- NIM_CORS_MAX_AGE: string = None#
Duration in seconds that browsers may cache CORS preflight responses.
- Default:
3600- Example:
NIM_CORS_MAX_AGE=7200
Advanced#
The following variables control advanced argument handling and runtime behavior:
- NIM_PASSTHROUGH_ARGS: str | None#
Passes additional vLLM CLI arguments as a single string. Useful in environments where direct CLI arguments are not available (e.g., container orchestrators).
- Default:
None- Type:
string
- Example:
NIM_PASSTHROUGH_ARGS="--enable-prefix-caching --max-num-seqs 128"
- NIM_STRICT_ARG_PROCESSING: bool#
Enables strict configuration processing. When true, conflicting configuration overrides (e.g., CLI overwriting an environment variable) raise errors instead of warnings.
- Default:
False- Type:
boolean
- Example:
NIM_STRICT_ARG_PROCESSING=true
- NIM_DISABLE_CUDA_GRAPH: bool#
Disables CUDA graph optimization. May reduce GPU memory usage at the cost of inference throughput.
- Default:
False- Type:
boolean
- Example:
NIM_DISABLE_CUDA_GRAPH=true