Environment Variables#
This page documents all environment variables supported by NIM LLM.
Set variables using -e flags when you run the container:
docker run -d --rm --gpus all \
-p 8000:8000 \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
-e NIM_SERVER_PORT=8000 \
-e NIM_LOG_LEVEL=INFO \
-e NGC_API_KEY \
-e HF_TOKEN \
<image>
Logging#
The following variables control log format and verbosity:
- NIM_LOG_LEVEL: str | None#
Controls the verbosity of NIM log output. Accepts standard Python logging levels:
DEBUG,INFO,WARNING,ERROR,CRITICAL.- Default:
None(uses application default)- Type:
string
- Example:
NIM_LOG_LEVEL=DEBUG
- NIM_JSONL_LOGGING: bool#
Enables structured JSON Lines (JSONL) log output.
- Default:
False- Type:
boolean
- Example:
NIM_JSONL_LOGGING=true
For usage details and examples, refer to Logging and Observability.
Model Configuration#
The following variables control model selection, model loading, and related runtime behavior:
- NIM_MODEL_PROFILE: string = None#
Selects which model profile to use. Profiles define a validated combination of model variant, precision, and parallelism settings for a given GPU configuration. Run
list-model-profilesinside the container to see available profiles and their IDs.- Default:
auto-selected based on detected GPU hardware
- Example:
NIM_MODEL_PROFILE=07cd4f2bddd7a14ca84bab0a32602889fd0ae0eb76dc2eb0fc32594d065011a4
- NIM_MODEL_PATH: str | None#
Model source URI or local filesystem path. Accepts
hf://,ngc://, andmodelscope://prefixes for remote repositories, or a local directory path. When set, a runtime manifest is generated from this URI instead of using the baked-in container manifest.- Default:
None(uses baked-in manifest andNIM_MODEL_PROFILE)- Type:
string
- Example:
NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct
- NIM_SERVED_MODEL_NAME: str | None#
Overrides the served model name returned in API responses. When set, the
/v1/modelsendpoint and response metadata use this name instead of the default model identifier.- Default:
None(uses the model’s own identifier)- Type:
string
- Example:
NIM_SERVED_MODEL_NAME=my-llama
- NIM_MAX_MODEL_LEN: int | None#
Overrides the maximum sequence length (context window) for the model. Values larger than the model’s trained maximum may cause errors.
- Default:
None(uses model’s default from config)- Type:
positive integer
- Example:
NIM_MAX_MODEL_LEN=4096
- NIM_TENSOR_PARALLEL_SIZE: int | None#
Overrides the tensor parallelism degree. Splits model layers across the specified number of GPUs for inference.
- Default:
None(auto-detected from profile)- Type:
positive integer
- Example:
NIM_TENSOR_PARALLEL_SIZE=2
- NIM_PIPELINE_PARALLEL_SIZE: int | None#
Overrides the pipeline parallelism degree. Distributes model stages across the specified number of GPUs for inference.
- Default:
None(auto-detected from profile)- Type:
positive integer
- Example:
NIM_PIPELINE_PARALLEL_SIZE=2
- NIM_NUM_COMPUTE_NODES: integer = None#
Total number of compute nodes for multi-node inference. In multi-node deployments, set this on both the leader and worker nodes to the total node count (leader + workers).
- Default:
None(single-node operation)- Example:
NIM_NUM_COMPUTE_NODES=2
- NIM_REPOSITORY_OVERRIDE: string = None#
Redirects model downloads to an external repository while preserving the NIM manifest semantics. The container still uses the baked-in manifest for profile selection, but fetches model files from the overridden source.
- Default:
None(downloads from the URI specified in the manifest)- Example:
NIM_REPOSITORY_OVERRIDE=s3://my-bucket/models
- NIM_DISABLE_MODEL_DOWNLOAD: boolean = None#
Skips model download during container startup. Useful in multi-node deployments where worker nodes use a pre-staged shared filesystem and only the leader node needs to download.
- Default:
False- Example:
NIM_DISABLE_MODEL_DOWNLOAD=true
- NIM_TRUST_CUSTOM_CODE: bool#
Allows dynamic module loading for custom model code. Required for models that ship custom tokenizer or modeling files.
- Default:
False- Type:
boolean
- Example:
NIM_TRUST_CUSTOM_CODE=true
Server#
The following variables control server and health-check ports:
- NIM_SERVER_PORT: int | None#
Port for the external-facing HTTP API server.
- Default:
None(uses container default)- Type:
integer
- Example:
NIM_SERVER_PORT=9000
- NIM_HEALTH_PORT: int | None#
Port for the proxy health endpoints (
/v1/health/liveand/v1/health/ready).- Default:
None(defaults toNIM_SERVER_PORT)- Type:
integer
- Example:
NIM_HEALTH_PORT=8001
LoRA and PEFT#
The following variables control LoRA and PEFT adapter discovery and refresh behavior:
- NIM_PEFT_SOURCE: str | None#
URI for the LoRA adapter source (local path or NGC URI).
- Default:
None(LoRA disabled)- Type:
string
- Example:
NIM_PEFT_SOURCE=/adapters
- NIM_PEFT_REFRESH_INTERVAL: int | None#
Polling interval in seconds for the dynamic LoRA watcher. When set, NIM periodically checks the PEFT source for new or removed adapters.
- Default:
None(dynamic reloading disabled)- Type:
positive integer
- Example:
NIM_PEFT_REFRESH_INTERVAL=30
- NIM_PEFT_API_TIMEOUT_SECS: float | None#
Timeout in seconds for dynamic LoRA adapter API calls.
- Default:
30.0- Type:
positive float
- Example:
NIM_PEFT_API_TIMEOUT_SECS=60
Model Cache#
The following variable controls the model cache location inside the container:
- NIM_CACHE_PATH: str#
Directory path for the model and artifact cache inside the NIM container.
- Default:
/opt/nim/.cache- Type:
string
- Example:
NIM_CACHE_PATH=/mnt/models/.cache
Authentication#
The following variables provide credentials for authenticated model downloads:
- NGC_API_KEY: string = None#
API key for authenticated model downloads from NGC (NVIDIA GPU Cloud). Required when downloading models from
ngc://repositories.- Default:
None- Example:
NGC_API_KEY=nvapi-...
- NGC_CLI_API_KEY: string = None#
Backward-compatible NGC credential source. When both
NGC_CLI_API_KEYandNGC_API_KEYare set,NGC_CLI_API_KEYtakes precedence.- Default:
None- Example:
NGC_CLI_API_KEY=nvapi-...
- HF_TOKEN: string = None#
Authentication token for Hugging Face Hub. Required for downloading private or gated models from
hf://repositories.- Default:
None- Example:
HF_TOKEN=hf_...
- MODELSCOPE_API_TOKEN: string = None#
Authentication token for ModelScope. Required for authenticated downloads from
modelscope://repositories and to avoid rate limiting.- Default:
None- Example:
MODELSCOPE_API_TOKEN=...
SSL and TLS#
NIM uses TLS in two distinct directions. Inbound TLS secures client connections to the NIM inference API (nginx layer). Outbound TLS secures connections the container makes from itself when downloading model artifacts from NGC, Hugging Face, or a corporate registry such as JFrog Artifactory.
Important
The NIM_SSL_* variables below configure inbound TLS only. They do not
affect outbound model downloads. To trust a corporate Certificate Authority (CA)
for outbound connections, see Outbound TLS (Model Downloads).
Inbound TLS (NIM API)#
The following variables control TLS termination at the nginx proxy layer:
- NIM_SSL_MODE: string = None#
Controls TLS termination at the nginx proxy.
DISABLED– plain HTTP (default)TLS– server-side TLS; requiresNIM_SSL_KEY_PATHandNIM_SSL_CERTS_PATHMTLS– mutual TLS; additionally requiresNIM_SSL_CA_CERTS_PATH
- Default:
DISABLED- Example:
NIM_SSL_MODE=TLS
- NIM_SSL_KEY_PATH: string = None#
Path to the SSL private key file. Required when
NIM_SSL_MODEisTLSorMTLS.- Default:
None- Example:
NIM_SSL_KEY_PATH=/etc/ssl/private/server.key
- NIM_SSL_CERTS_PATH: string = None#
Path to the SSL certificate file. Required when
NIM_SSL_MODEisTLSorMTLS.- Default:
None- Example:
NIM_SSL_CERTS_PATH=/etc/ssl/certs/server.crt
- NIM_SSL_CA_CERTS_PATH: string = None#
Path to the CA certificate file for client verification. Required when
NIM_SSL_MODEisMTLS.- Default:
None- Example:
NIM_SSL_CA_CERTS_PATH=/etc/ssl/certs/ca.crt
Outbound TLS (Model Downloads)#
When the NIM container downloads models from a registry that uses a certificate signed by a private or corporate CA, you must provide that CA certificate to the container. This applies to two common scenarios:
Corporate registry with private CA — for example, a JFrog Artifactory instance whose TLS certificate is signed by your organization’s internal CA (no proxy involved).
TLS-inspecting proxy — a corporate proxy that decrypts and re-encrypts HTTPS traffic using a corporate CA.
In both cases, set SSL_CERT_FILE to a CA bundle that includes the corporate
CA so that outbound TLS verification succeeds. A proxy (HTTPS_PROXY) is
not required for SSL_CERT_FILE to take effect.
- REQUESTS_CA_BUNDLE: string = None#
Same purpose as
SSL_CERT_FILEbut specific to the Pythonrequestslibrary. Some internal components (such as proxy validation in nimlib) userequests; setting this variable ensures those paths also trust the corporate CA. When in doubt, set bothSSL_CERT_FILEandREQUESTS_CA_BUNDLEto the same combined bundle.- Default:
None- Example:
REQUESTS_CA_BUNDLE=/etc/ssl/certs/custom-ca-bundle.pem
- SSL_CERT_FILE: string = None#
Path to a PEM-format CA certificate or bundle file inside the container. OpenSSL and the model download pipeline (nim_sdk, reqwest, and native-tls) use this file to verify server certificates during outbound HTTPS connections. Can be used with or without
HTTPS_PROXY.Warning
Setting
SSL_CERT_FILEreplaces the container’s default trust store. If you point it at a file containing only your corporate CA, connections to public endpoints (such asapi.ngc.nvidia.com) will fail because the public CAs are no longer trusted. If you also need to reach public endpoints, use a combined bundle that includes both the default CAs and your corporate CA.- Default:
None(uses/etc/ssl/certs/ca-certificates.crt)- Example:
SSL_CERT_FILE=/etc/ssl/certs/custom-ca-bundle.pem
Create a combined CA bundle (one-time, on the host):
To add your corporate CA without losing trust in public CAs, concatenate the container’s default bundle with your corporate CA certificate:
# Extract the default CA bundle from the container
docker run --rm --entrypoint bash \
${NIM_LLM_MODEL_SPECIFIC_IMAGE}:2.0.3 \
-c 'cat /etc/ssl/certs/ca-certificates.crt' > combined-ca-bundle.pem
# Append your corporate CA
cat /path/to/corporate-ca.pem >> combined-ca-bundle.pem
See also: Air-Gap Deployment: CA Certificate Injection.
CORS#
These variables configure Cross-Origin Resource Sharing (CORS) policy at the nginx proxy layer.
- NIM_CORS_ALLOW_ORIGINS: string = None#
Comma-separated list of allowed request origins, or
*for any origin.- Default:
*- Example:
NIM_CORS_ALLOW_ORIGINS=https://example.com
- NIM_CORS_ALLOW_METHODS: string = None#
Allowed HTTP methods for CORS requests.
- Default:
GET, POST, PUT, DELETE, PATCH, OPTIONS- Example:
NIM_CORS_ALLOW_METHODS=GET, POST, OPTIONS
- NIM_CORS_ALLOW_HEADERS: string = None#
Allowed request headers for CORS requests.
- Default:
Content-Type, Authorization, X-Request-Id, X-Session-Id, X-Correlation-Id- Example:
NIM_CORS_ALLOW_HEADERS=Content-Type, Authorization
- NIM_CORS_EXPOSE_HEADERS: string = None#
Response headers that are exposed to the browser in CORS responses.
- Default:
X-Request-Id- Example:
NIM_CORS_EXPOSE_HEADERS=X-Request-Id, X-Correlation-Id
- NIM_CORS_MAX_AGE: string = None#
Duration in seconds that browsers may cache CORS preflight responses.
- Default:
3600- Example:
NIM_CORS_MAX_AGE=7200
AWS SageMaker#
The following variable controls SageMaker BYOC (Bring Your Own Container) compatibility mode.
When active, NIM listens on port 8080 and implements the GET /ping health check and
POST /invocations inference endpoints required by SageMaker real-time inference.
- NIM_SAGEMAKER_MODE: string = None#
Controls AWS SageMaker real-time inference compatibility mode.
1— Force SageMaker mode on. NIM listens on port 8080 and exposesGET /ping(health) andPOST /invocations(inference, proxied to/v1/chat/completions).0— Suppress SageMaker mode even when SageMaker environment variables are present. Use this to run NIM on a SageMaker instance without activating the protocol adapter.(unset) — Auto-detect: SageMaker mode is enabled automatically if any of
SAGEMAKER_MULTI_MODEL,SAGEMAKER_REGION, orSAGEMAKER_BIND_TO_PORTis present in the environment. These variables are injected by the SageMaker host agent and are not present in other environments.
- Default:
(unset) — auto-detect from SageMaker environment signals
- Example:
NIM_SAGEMAKER_MODE=1
Advanced#
The following variables control advanced argument handling and runtime behavior:
- NIM_PASSTHROUGH_ARGS: str | None#
Passes additional vLLM CLI arguments as a single string. Useful in environments where direct CLI arguments are not available (e.g., container orchestrators).
- Default:
None- Type:
string
- Example:
NIM_PASSTHROUGH_ARGS="--enable-prefix-caching --max-num-seqs 128"
- NIM_STRICT_ARG_PROCESSING: bool#
Enables strict configuration processing. When true, conflicting configuration overrides (e.g., CLI overwriting an environment variable) raise errors instead of warnings.
- Default:
False- Type:
boolean
- Example:
NIM_STRICT_ARG_PROCESSING=true
- NIM_DISABLE_CUDA_GRAPH: bool#
Disables CUDA graph optimization. May reduce GPU memory usage at the cost of inference throughput.
- Default:
False- Type:
boolean
- Example:
NIM_DISABLE_CUDA_GRAPH=true