Is this page helpful?

Environment Variables#

This page documents all environment variables supported by NIM LLM. Set variables using -e flags when you run the container:

docker run -d --rm --gpus all \
  -p 8000:8000 \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e NIM_SERVER_PORT=8000 \
  -e NIM_LOG_LEVEL=INFO \
  -e NGC_API_KEY \
  -e HF_TOKEN \
  <image>

Logging#

The following variables control log format and verbosity:

NIM_LOG_LEVEL: str | None#

Controls the verbosity of NIM log output. Accepts standard Python logging levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.

Default:: None (uses application default)
Type:: string
Example:: NIM_LOG_LEVEL=DEBUG

NIM_JSONL_LOGGING: bool#

Enables structured JSON Lines (JSONL) log output.

Default:: False
Type:: boolean
Example:: NIM_JSONL_LOGGING=true

For usage details and examples, refer to Logging and Observability.

Model Configuration#

The following variables control model selection, model loading, and related runtime behavior:

NIM_MODEL_PROFILE: string = None#

Selects which model profile to use. Profiles define a validated combination of model variant, precision, and parallelism settings for a given GPU configuration. Run list-model-profiles inside the container to see available profiles and their IDs.

Default:: auto-selected based on detected GPU hardware
Example:: NIM_MODEL_PROFILE=07cd4f2bddd7a14ca84bab0a32602889fd0ae0eb76dc2eb0fc32594d065011a4

NIM_MODEL_PATH: str | None#

Model source URI or local filesystem path. Accepts hf://, ngc://, and modelscope:// prefixes for remote repositories, or a local directory path. When set, a runtime manifest is generated from this URI instead of using the baked-in container manifest.

Default:: None (uses baked-in manifest and NIM_MODEL_PROFILE)
Type:: string
Example:: NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct

NIM_SERVED_MODEL_NAME: str | None#

Overrides the served model name returned in API responses. When set, the /v1/models endpoint and response metadata use this name instead of the default model identifier.

Default:: None (uses the model’s own identifier)
Type:: string
Example:: NIM_SERVED_MODEL_NAME=my-llama

NIM_MAX_MODEL_LEN: int | None#

Overrides the maximum sequence length (context window) for the model. Values larger than the model’s trained maximum may cause errors.

Default:: None (uses model’s default from config)
Type:: positive integer
Example:: NIM_MAX_MODEL_LEN=4096

NIM_TENSOR_PARALLEL_SIZE: int | None#

Overrides the tensor parallelism degree. Splits model layers across the specified number of GPUs for inference.

Default:: None (auto-detected from profile)
Type:: positive integer
Example:: NIM_TENSOR_PARALLEL_SIZE=2

NIM_PIPELINE_PARALLEL_SIZE: int | None#

Overrides the pipeline parallelism degree. Distributes model stages across the specified number of GPUs for inference.

Default:: None (auto-detected from profile)
Type:: positive integer
Example:: NIM_PIPELINE_PARALLEL_SIZE=2

NIM_NUM_COMPUTE_NODES: integer = None#

Total number of compute nodes for multi-node inference. In multi-node deployments, set this on both the leader and worker nodes to the total node count (leader + workers).

Default:: None (single-node operation)
Example:: NIM_NUM_COMPUTE_NODES=2

NIM_REPOSITORY_OVERRIDE: string = None#

Redirects model downloads to an external repository while preserving the NIM manifest semantics. The container still uses the baked-in manifest for profile selection, but fetches model files from the overridden source.

Default:: None (downloads from the URI specified in the manifest)
Example:: NIM_REPOSITORY_OVERRIDE=s3://my-bucket/models

NIM_DISABLE_MODEL_DOWNLOAD: boolean = None#

Skips model download during container startup. Useful in multi-node deployments where worker nodes use a pre-staged shared filesystem and only the leader node needs to download.

Default:: False
Example:: NIM_DISABLE_MODEL_DOWNLOAD=true

NIM_TRUST_CUSTOM_CODE: bool#

Allows dynamic module loading for custom model code. Required for models that ship custom tokenizer or modeling files.

Default:: False
Type:: boolean
Example:: NIM_TRUST_CUSTOM_CODE=true

Server#

The following variables control server and health-check ports:

NIM_SERVER_PORT: int | None#

Port for the external-facing HTTP API server.

Default:: None (uses container default)
Type:: integer
Example:: NIM_SERVER_PORT=9000

NIM_HEALTH_PORT: int | None#

Port for the proxy health endpoints (/v1/health/live and /v1/health/ready).

Default:: None (defaults to NIM_SERVER_PORT)
Type:: integer
Example:: NIM_HEALTH_PORT=8001

LoRA and PEFT#

The following variables control LoRA and PEFT adapter discovery and refresh behavior:

NIM_PEFT_SOURCE: str | None#

URI for the LoRA adapter source (local path or NGC URI).

Default:: None (LoRA disabled)
Type:: string
Example:: NIM_PEFT_SOURCE=/adapters

NIM_PEFT_REFRESH_INTERVAL: int | None#

Polling interval in seconds for the dynamic LoRA watcher. When set, NIM periodically checks the PEFT source for new or removed adapters.

Default:: None (dynamic reloading disabled)
Type:: positive integer
Example:: NIM_PEFT_REFRESH_INTERVAL=30

NIM_PEFT_API_TIMEOUT_SECS: float | None#

Timeout in seconds for dynamic LoRA adapter API calls.

Default:: 30.0
Type:: positive float
Example:: NIM_PEFT_API_TIMEOUT_SECS=60

Model Cache#

The following variable controls the model cache location inside the container:

NIM_CACHE_PATH: str#

Directory path for the model and artifact cache inside the NIM container.

Default:: /opt/nim/.cache
Type:: string
Example:: NIM_CACHE_PATH=/mnt/models/.cache

This one controls the timeout in seconds for the model cache discovery:

NIM_CACHE_PROBE_TIMEOUT: integer = 60#

Deadline in seconds for the initial artifact-cache reachability probe at startup. If NIM_CACHE_PATH is on an unreachable NFS/CIFS/FUSE mount, the container exits within this deadline instead of hanging until the OS TCP timeout.

Default:: 60
Example:: NIM_CACHE_PROBE_TIMEOUT=120

Authentication#

The following variables provide credentials for authenticated model downloads:

NGC_API_KEY: string = None#

API key for authenticated model downloads from NGC (NVIDIA GPU Cloud). Required when downloading models from ngc:// repositories.

Default:: None
Example:: NGC_API_KEY=nvapi-...

NGC_CLI_API_KEY: string = None#

Backward-compatible NGC credential source. When both NGC_CLI_API_KEY and NGC_API_KEY are set, NGC_CLI_API_KEY takes precedence.

Default:: None
Example:: NGC_CLI_API_KEY=nvapi-...

HF_TOKEN: string = None#

Authentication token for Hugging Face Hub. Required for downloading private or gated models from hf:// repositories.

Default:: None
Example:: HF_TOKEN=hf_...

MODELSCOPE_API_TOKEN: string = None#

Authentication token for ModelScope. Required for authenticated downloads from modelscope:// repositories and to avoid rate limiting.

Default:: None
Example:: MODELSCOPE_API_TOKEN=...

SSL and TLS#

NIM uses TLS in two distinct directions. Inbound TLS secures client connections to the NIM inference API (nginx layer). Outbound TLS secures connections the container makes from itself when downloading model artifacts from NGC, Hugging Face, or a corporate registry such as JFrog Artifactory.

Important

The NIM_SSL_* variables below configure inbound TLS only. They do not affect outbound model downloads. To trust a corporate Certificate Authority (CA) for outbound connections, see Outbound TLS (Model Downloads).

Inbound TLS (NIM API)#

The following variables control TLS termination at the nginx proxy layer:

NIM_SSL_MODE: string = None#

Controls TLS termination at the nginx proxy.

DISABLED – plain HTTP (default)
TLS – server-side TLS; requires NIM_SSL_KEY_PATH and NIM_SSL_CERTS_PATH
MTLS – mutual TLS; additionally requires NIM_SSL_CA_CERTS_PATH

Default:: DISABLED
Example:: NIM_SSL_MODE=TLS

NIM_SSL_KEY_PATH: string = None#

Path to the SSL private key file. Required when NIM_SSL_MODE is TLS or MTLS.

Default:: None
Example:: NIM_SSL_KEY_PATH=/etc/ssl/private/server.key

NIM_SSL_CERTS_PATH: string = None#

Path to the SSL certificate file. Required when NIM_SSL_MODE is TLS or MTLS.

Default:: None
Example:: NIM_SSL_CERTS_PATH=/etc/ssl/certs/server.crt

NIM_SSL_CA_CERTS_PATH: string = None#

Path to the CA certificate file for client verification. Required when NIM_SSL_MODE is MTLS.

Default:: None
Example:: NIM_SSL_CA_CERTS_PATH=/etc/ssl/certs/ca.crt

Outbound TLS (Model Downloads)#

When the NIM container downloads models from a registry that uses a certificate signed by a private or corporate CA, you must provide that CA certificate to the container. This applies to two common scenarios:

Corporate registry with private CA — for example, a JFrog Artifactory instance whose TLS certificate is signed by your organization’s internal CA (no proxy involved).
TLS-inspecting proxy — a corporate proxy that decrypts and re-encrypts HTTPS traffic using a corporate CA.

In both cases, set SSL_CERT_FILE to a CA bundle that includes the corporate CA so that outbound TLS verification succeeds. A proxy (HTTPS_PROXY) is not required for SSL_CERT_FILE to take effect.

REQUESTS_CA_BUNDLE: string = None#

Same purpose as SSL_CERT_FILE but specific to the Python requests library. Some internal components (such as proxy validation in nimlib) use requests; setting this variable ensures those paths also trust the corporate CA. When in doubt, set both SSL_CERT_FILE and REQUESTS_CA_BUNDLE to the same combined bundle.

Default:: None
Example:: REQUESTS_CA_BUNDLE=/etc/ssl/certs/custom-ca-bundle.pem

SSL_CERT_FILE: string = None#

Path to a PEM-format CA certificate or bundle file inside the container. OpenSSL and the model download pipeline (nim_sdk, reqwest, and native-tls) use this file to verify server certificates during outbound HTTPS connections. Can be used with or without HTTPS_PROXY.

Warning

Setting SSL_CERT_FILE replaces the container’s default trust store. If you point it at a file containing only your corporate CA, connections to public endpoints (such as api.ngc.nvidia.com) will fail because the public CAs are no longer trusted. If you also need to reach public endpoints, use a combined bundle that includes both the default CAs and your corporate CA.

Default:: None (uses /etc/ssl/certs/ca-certificates.crt)
Example:: SSL_CERT_FILE=/etc/ssl/certs/custom-ca-bundle.pem

Create a combined CA bundle (one-time, on the host):

To add your corporate CA without losing trust in public CAs, concatenate the container’s default bundle with your corporate CA certificate:

# Extract the default CA bundle from the container
docker run --rm --entrypoint bash \
  ${NIM_LLM_MODEL_SPECIFIC_IMAGE}:2.0.7 \
  -c 'cat /etc/ssl/certs/ca-certificates.crt' > combined-ca-bundle.pem

# Append your corporate CA
cat /path/to/corporate-ca.pem >> combined-ca-bundle.pem

CORS#

These variables configure Cross-Origin Resource Sharing (CORS) policy at the nginx proxy layer.

NIM_CORS_ALLOW_ORIGINS: string = None#

Comma-separated list of allowed request origins, or * for any origin.

Default:: *
Example:: NIM_CORS_ALLOW_ORIGINS=https://example.com

NIM_CORS_ALLOW_METHODS: string = None#

Allowed HTTP methods for CORS requests.

Default:: GET, POST, PUT, DELETE, PATCH, OPTIONS
Example:: NIM_CORS_ALLOW_METHODS=GET, POST, OPTIONS

NIM_CORS_ALLOW_HEADERS: string = None#

Allowed request headers for CORS requests.

Default:: Content-Type, Authorization, X-Request-Id, X-Session-Id, X-Correlation-Id
Example:: NIM_CORS_ALLOW_HEADERS=Content-Type, Authorization

NIM_CORS_EXPOSE_HEADERS: string = None#

Response headers that are exposed to the browser in CORS responses.

Default:: X-Request-Id
Example:: NIM_CORS_EXPOSE_HEADERS=X-Request-Id, X-Correlation-Id

NIM_CORS_MAX_AGE: string = None#

Duration in seconds that browsers may cache CORS preflight responses.

Default:: 3600
Example:: NIM_CORS_MAX_AGE=7200

AWS SageMaker#

The following variable controls SageMaker BYOC (Bring Your Own Container) compatibility mode. When active, NIM listens on port 8080 and implements the GET /ping health check and POST /invocations inference endpoints required by SageMaker real-time inference.

NIM_SAGEMAKER_MODE: string = None#

Controls AWS SageMaker real-time inference compatibility mode.

1 — Force SageMaker mode on. NIM listens on port 8080 and exposes GET /ping (health) and POST /invocations (inference, proxied to /v1/chat/completions).
0 — Suppress SageMaker mode even when SageMaker environment variables are present. Use this to run NIM on a SageMaker instance without activating the protocol adapter.
(unset) — Auto-detect: SageMaker mode is enabled automatically if any of SAGEMAKER_MULTI_MODEL, SAGEMAKER_REGION, or SAGEMAKER_BIND_TO_PORT is present in the environment. These variables are injected by the SageMaker host agent and are not present in other environments.

Default:: (unset) — auto-detect from SageMaker environment signals
Example:: NIM_SAGEMAKER_MODE=1

Advanced#

The following variables control advanced argument handling and runtime behavior:

NIM_PASSTHROUGH_ARGS: str | None#

Passes additional vLLM CLI arguments as a single string. Useful in environments where direct CLI arguments are not available (e.g., container orchestrators).

Default:: None
Type:: string
Example:: NIM_PASSTHROUGH_ARGS="--enable-prefix-caching --max-num-seqs 128"

NIM_STRICT_ARG_PROCESSING: bool#

Enables strict configuration processing. When true, conflicting configuration overrides (e.g., CLI overwriting an environment variable) raise errors instead of warnings.

Default:: False
Type:: boolean
Example:: NIM_STRICT_ARG_PROCESSING=true

NIM_DISABLE_CUDA_GRAPH: bool#

Disables CUDA graph optimization. May reduce GPU memory usage at the cost of inference throughput.

Default:: False
Type:: boolean
Example:: NIM_DISABLE_CUDA_GRAPH=true