Environment Variables#

This page documents all environment variables supported by NIM LLM. Set variables using -e flags when you run the container:

docker run -d --rm --gpus all \
  -p 8000:8000 \
  -v /path/to/cache:/opt/nim/.cache \
  -e NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct \
  -e NIM_SERVER_PORT=8000 \
  -e NIM_LOG_LEVEL=INFO \
  -e NGC_API_KEY \
  -e HF_TOKEN \
  <image>

Logging#

The following variables control log format and verbosity:

NIM_LOG_LEVEL: str | None#

Controls the verbosity of NIM log output. Accepts standard Python logging levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.

Default:

None (uses application default)

Type:

string

Example:

NIM_LOG_LEVEL=DEBUG

NIM_JSONL_LOGGING: bool#

Enables structured JSON Lines (JSONL) log output.

Default:

False

Type:

boolean

Example:

NIM_JSONL_LOGGING=true

For usage details and examples, refer to Logging and Observability.

Model Configuration#

The following variables control model selection, model loading, and related runtime behavior:

NIM_MODEL_PROFILE: string = None#

Selects which model profile to use. Profiles define a validated combination of model variant, precision, and parallelism settings for a given GPU configuration. Run list-model-profiles inside the container to see available profiles and their IDs.

Default:

auto-selected based on detected GPU hardware

Example:

NIM_MODEL_PROFILE=07cd4f2bddd7a14ca84bab0a32602889fd0ae0eb76dc2eb0fc32594d065011a4

NIM_MODEL_PATH: str | None#

Model source URI or local filesystem path. Accepts hf://, ngc://, and modelscope:// prefixes for remote repositories, or a local directory path. When set, a runtime manifest is generated from this URI instead of using the baked-in container manifest.

Default:

None (uses baked-in manifest and NIM_MODEL_PROFILE)

Type:

string

Example:

NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct

NIM_SERVED_MODEL_NAME: str | None#

Overrides the served model name returned in API responses. When set, the /v1/models endpoint and response metadata use this name instead of the default model identifier.

Default:

None (uses the model’s own identifier)

Type:

string

Example:

NIM_SERVED_MODEL_NAME=my-llama

NIM_MAX_MODEL_LEN: int | None#

Overrides the maximum sequence length (context window) for the model. Values larger than the model’s trained maximum may cause errors.

Default:

None (uses model’s default from config)

Type:

positive integer

Example:

NIM_MAX_MODEL_LEN=4096

NIM_TENSOR_PARALLEL_SIZE: int | None#

Overrides the tensor parallelism degree. Splits model layers across the specified number of GPUs for inference.

Default:

None (auto-detected from profile)

Type:

positive integer

Example:

NIM_TENSOR_PARALLEL_SIZE=2

NIM_PIPELINE_PARALLEL_SIZE: int | None#

Overrides the pipeline parallelism degree. Distributes model stages across the specified number of GPUs for inference.

Default:

None (auto-detected from profile)

Type:

positive integer

Example:

NIM_PIPELINE_PARALLEL_SIZE=2

NIM_NUM_COMPUTE_NODES: integer = None#

Total number of compute nodes for multi-node inference. In multi-node deployments, set this on both the leader and worker nodes to the total node count (leader + workers).

Default:

None (single-node operation)

Example:

NIM_NUM_COMPUTE_NODES=2

NIM_REPOSITORY_OVERRIDE: string = None#

Redirects model downloads to an external repository while preserving the NIM manifest semantics. The container still uses the baked-in manifest for profile selection, but fetches model files from the overridden source.

Default:

None (downloads from the URI specified in the manifest)

Example:

NIM_REPOSITORY_OVERRIDE=s3://my-bucket/models

NIM_DISABLE_MODEL_DOWNLOAD: boolean = None#

Skips model download during container startup. Useful in multi-node deployments where worker nodes use a pre-staged shared filesystem and only the leader node needs to download.

Default:

False

Example:

NIM_DISABLE_MODEL_DOWNLOAD=true

NIM_TRUST_CUSTOM_CODE: bool#

Allows dynamic module loading for custom model code. Required for models that ship custom tokenizer or modeling files.

Default:

False

Type:

boolean

Example:

NIM_TRUST_CUSTOM_CODE=true

Server#

The following variables control server and health-check ports:

NIM_SERVER_PORT: int | None#

Port for the external-facing HTTP API server.

Default:

None (uses container default)

Type:

integer

Example:

NIM_SERVER_PORT=9000

NIM_HEALTH_PORT: int | None#

Port for the proxy health endpoints (/v1/health/live and /v1/health/ready).

Default:

None (defaults to NIM_SERVER_PORT)

Type:

integer

Example:

NIM_HEALTH_PORT=8001

LoRA and PEFT#

The following variables control LoRA and PEFT adapter discovery and refresh behavior:

NIM_PEFT_SOURCE: str | None#

URI for the LoRA adapter source (local path or NGC URI).

Default:

None (LoRA disabled)

Type:

string

Example:

NIM_PEFT_SOURCE=/adapters

NIM_PEFT_REFRESH_INTERVAL: int | None#

Polling interval in seconds for the dynamic LoRA watcher. When set, NIM periodically checks the PEFT source for new or removed adapters.

Default:

None (dynamic reloading disabled)

Type:

positive integer

Example:

NIM_PEFT_REFRESH_INTERVAL=30

NIM_PEFT_API_TIMEOUT_SECS: float | None#

Timeout in seconds for dynamic LoRA adapter API calls.

Default:

30.0

Type:

positive float

Example:

NIM_PEFT_API_TIMEOUT_SECS=60

Model Cache#

The following variable controls the local model cache location:

NIM_CACHE_PATH: str#

Directory path for NIM’s local model and artifact cache.

Default:

/opt/nim/.cache

Type:

string

Example:

NIM_CACHE_PATH=/mnt/models/.cache

Authentication#

The following variables provide credentials for authenticated model downloads:

NGC_API_KEY: string = None#

API key for authenticated model downloads from NGC (NVIDIA GPU Cloud). Required when downloading models from ngc:// repositories.

Default:

None

Example:

NGC_API_KEY=nvapi-...

NGC_CLI_API_KEY: string = None#

Backward-compatible NGC credential source. When both NGC_CLI_API_KEY and NGC_API_KEY are set, NGC_CLI_API_KEY takes precedence.

Default:

None

Example:

NGC_CLI_API_KEY=nvapi-...

HF_TOKEN: string = None#

Authentication token for Hugging Face Hub. Required for downloading private or gated models from hf:// repositories.

Default:

None

Example:

HF_TOKEN=hf_...

MODELSCOPE_API_TOKEN: string = None#

Authentication token for ModelScope. Required for authenticated downloads from modelscope:// repositories and to avoid rate limiting.

Default:

None

Example:

MODELSCOPE_API_TOKEN=...

SSL and TLS#

The following variables control TLS termination at the nginx proxy layer:

NIM_SSL_MODE: string = None#

Controls TLS termination at the nginx proxy.

  • DISABLED – plain HTTP (default)

  • TLS – server-side TLS; requires NIM_SSL_KEY_PATH and NIM_SSL_CERTS_PATH

  • MTLS – mutual TLS; additionally requires NIM_SSL_CA_CERTS_PATH

Default:

DISABLED

Example:

NIM_SSL_MODE=TLS

NIM_SSL_KEY_PATH: string = None#

Path to the SSL private key file. Required when NIM_SSL_MODE is TLS or MTLS.

Default:

None

Example:

NIM_SSL_KEY_PATH=/etc/ssl/private/server.key

NIM_SSL_CERTS_PATH: string = None#

Path to the SSL certificate file. Required when NIM_SSL_MODE is TLS or MTLS.

Default:

None

Example:

NIM_SSL_CERTS_PATH=/etc/ssl/certs/server.crt

NIM_SSL_CA_CERTS_PATH: string = None#

Path to the CA certificate file for client verification. Required when NIM_SSL_MODE is MTLS.

Default:

None

Example:

NIM_SSL_CA_CERTS_PATH=/etc/ssl/certs/ca.crt

CORS#

These variables configure Cross-Origin Resource Sharing (CORS) policy at the nginx proxy layer.

NIM_CORS_ALLOW_ORIGINS: string = None#

Comma-separated list of allowed request origins, or * for any origin.

Default:

*

Example:

NIM_CORS_ALLOW_ORIGINS=https://example.com

NIM_CORS_ALLOW_METHODS: string = None#

Allowed HTTP methods for CORS requests.

Default:

GET, POST, PUT, DELETE, PATCH, OPTIONS

Example:

NIM_CORS_ALLOW_METHODS=GET, POST, OPTIONS

NIM_CORS_ALLOW_HEADERS: string = None#

Allowed request headers for CORS requests.

Default:

Content-Type, Authorization, X-Request-Id, X-Session-Id, X-Correlation-Id

Example:

NIM_CORS_ALLOW_HEADERS=Content-Type, Authorization

NIM_CORS_EXPOSE_HEADERS: string = None#

Response headers that are exposed to the browser in CORS responses.

Default:

X-Request-Id

Example:

NIM_CORS_EXPOSE_HEADERS=X-Request-Id, X-Correlation-Id

NIM_CORS_MAX_AGE: string = None#

Duration in seconds that browsers may cache CORS preflight responses.

Default:

3600

Example:

NIM_CORS_MAX_AGE=7200

Advanced#

The following variables control advanced argument handling and runtime behavior:

NIM_PASSTHROUGH_ARGS: str | None#

Passes additional vLLM CLI arguments as a single string. Useful in environments where direct CLI arguments are not available (e.g., container orchestrators).

Default:

None

Type:

string

Example:

NIM_PASSTHROUGH_ARGS="--enable-prefix-caching --max-num-seqs 128"

NIM_STRICT_ARG_PROCESSING: bool#

Enables strict configuration processing. When true, conflicting configuration overrides (e.g., CLI overwriting an environment variable) raise errors instead of warnings.

Default:

False

Type:

boolean

Example:

NIM_STRICT_ARG_PROCESSING=true

NIM_DISABLE_CUDA_GRAPH: bool#

Disables CUDA graph optimization. May reduce GPU memory usage at the cost of inference throughput.

Default:

False

Type:

boolean

Example:

NIM_DISABLE_CUDA_GRAPH=true