Is this page helpful?

Environment Variables#

This page documents all environment variables supported by NIM VLM. Set variables using -e flags when you run the container:

docker run --gpus=all \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -p 8000:8000 \
  -e NIM_LOG_LEVEL=INFO \
  -e NGC_API_KEY \
  nvcr.io/nim/nvidia/nemotron-3-content-safety:2.0.0

Logging#

The following variables control log format and verbosity:

`NIM_LOG_LEVEL`#

str | None

Controls the verbosity of NIM log output. Accepts standard Python logging levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.

Default: None (uses application default)
Type: string
Example: NIM_LOG_LEVEL=DEBUG

`NIM_JSONL_LOGGING`#

bool

Enables structured JSON Lines (JSONL) log output.

Default: False
Type: boolean
Example: NIM_JSONL_LOGGING=true

Model Configuration#

The following variables control model selection, model loading, and related runtime behavior:

`NIM_MODEL_PROFILE`#

string = None

Selects which model profile to use. Profiles define a validated combination of model variant, precision, and parallelism settings for a given GPU configuration. Run list-model-profiles inside the container to see available profiles and their IDs.

Default: auto-selected based on detected GPU hardware
Example: NIM_MODEL_PROFILE=07cd4f2bddd7a14ca84bab0a32602889fd0ae0eb76dc2eb0fc32594d065011a4

`NIM_MODEL_PATH`#

str | None

Model source URI or local filesystem path. Accepts hf://, ngc://, and modelscope:// prefixes for remote repositories, or a local directory path. When set, a runtime manifest is generated from this URI instead of using the baked-in container manifest.

Default: None (uses baked-in manifest and NIM_MODEL_PROFILE)
Type: string
Example: NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct

`NIM_SERVED_MODEL_NAME`#

str | None

Overrides the served model name returned in API responses. When set, the /v1/models endpoint and response metadata use this name instead of the default model identifier.

Default: None (uses the model’s own identifier)
Type: string
Example: NIM_SERVED_MODEL_NAME=my-llama

`NIM_MAX_MODEL_LEN`#

int | None

Overrides the maximum sequence length (context window) for the model. Values larger than the model’s trained maximum may cause errors.

Default: None (uses model’s default from config)
Type: positive integer
Example: NIM_MAX_MODEL_LEN=4096

`NIM_TENSOR_PARALLEL_SIZE`#

int | None

Overrides the tensor parallelism degree. Splits model layers across the specified number of GPUs for inference.

Default: None (auto-detected from profile)
Type: positive integer
Example: NIM_TENSOR_PARALLEL_SIZE=2

`NIM_PIPELINE_PARALLEL_SIZE`#

int | None

Overrides the pipeline parallelism degree. Distributes model stages across the specified number of GPUs for inference.

Default: None (auto-detected from profile)
Type: positive integer
Example: NIM_PIPELINE_PARALLEL_SIZE=2

`NIM_NUM_COMPUTE_NODES`#

integer = None

Total number of compute nodes for multi-node inference. In multi-node deployments, set this on both the leader and worker nodes to the total node count (leader + workers).

Default: None (single-node operation)
Example: NIM_NUM_COMPUTE_NODES=2

`NIM_REPOSITORY_OVERRIDE`#

string = None

Redirects model downloads to an external repository while preserving the NIM manifest semantics. The container still uses the baked-in manifest for profile selection, but fetches model files from the overridden source.

Default: None (downloads from the URI specified in the manifest)
Example: NIM_REPOSITORY_OVERRIDE=s3://my-bucket/models

`NIM_DISABLE_MODEL_DOWNLOAD`#

boolean = None

Skips model download during container startup. Useful in multi-node deployments where worker nodes use a pre-staged shared filesystem and only the leader node needs to download.

Default: False
Example: NIM_DISABLE_MODEL_DOWNLOAD=true

`NIM_TRUST_CUSTOM_CODE`#

bool

Allows dynamic module loading for custom model code. Required for models that ship custom tokenizer or modeling files.

Default: False
Type: boolean
Example: NIM_TRUST_CUSTOM_CODE=true

Server#

The following variables control server and health-check ports:

`NIM_SERVER_PORT`#

int | None

Port for the external-facing HTTP API server.

Default: None (uses container default)
Type: integer
Example: NIM_SERVER_PORT=9000

`NIM_HEALTH_PORT`#

int | None

Port for the proxy health endpoints (/v1/health/live and /v1/health/ready).

Default: None (defaults to NIM_SERVER_PORT)
Type: integer
Example: NIM_HEALTH_PORT=8001

LoRA and PEFT#

The following variables control LoRA and PEFT adapter discovery and refresh behavior:

`NIM_PEFT_SOURCE`#

str | None

URI for the LoRA adapter source (local path or NGC URI).

Default: None (LoRA disabled)
Type: string
Example: NIM_PEFT_SOURCE=/adapters

`NIM_PEFT_REFRESH_INTERVAL`#

int | None

Polling interval in seconds for the dynamic LoRA watcher. When set, NIM periodically checks the PEFT source for new or removed adapters.

Default: None (dynamic reloading disabled)
Type: positive integer
Example: NIM_PEFT_REFRESH_INTERVAL=30

`NIM_PEFT_API_TIMEOUT_SECS`#

float | None

Timeout in seconds for dynamic LoRA adapter API calls.

Default: 30.0
Type: positive float
Example: NIM_PEFT_API_TIMEOUT_SECS=60

Model Cache#

The following variable controls the model cache location inside the container:

`NIM_CACHE_PATH`#

str

Directory path for the model and artifact cache inside the NIM container.

Default: /opt/nim/.cache
Type: string
Example: NIM_CACHE_PATH=/mnt/models/.cache

Authentication#

The following variables provide credentials for authenticated model downloads:

`NGC_API_KEY`#

string = None

API key for authenticated model downloads from NGC (NVIDIA GPU Cloud). Required when downloading models from ngc:// repositories.

Default: None
Example: NGC_API_KEY=nvapi-...

`NGC_CLI_API_KEY`#

string = None

Backward-compatible NGC credential source. When both NGC_CLI_API_KEY and NGC_API_KEY are set, NGC_CLI_API_KEY takes precedence.

Default: None
Example: NGC_CLI_API_KEY=nvapi-...

`HF_TOKEN`#

string = None

Authentication token for Hugging Face Hub. Required for downloading private or gated models from hf:// repositories.

Default: None
Example: HF_TOKEN=hf_...

`MODELSCOPE_API_TOKEN`#

string = None

Authentication token for ModelScope. Required for authenticated downloads from modelscope:// repositories and to avoid rate limiting.

Default: None
Example: MODELSCOPE_API_TOKEN=...

SSL and TLS#

The following variables control TLS termination at the nginx proxy layer:

`NIM_SSL_MODE`#

string = None

Controls TLS termination at the nginx proxy.

DISABLED — plain HTTP (default)
TLS — server-side TLS; requires NIM_SSL_KEY_PATH and NIM_SSL_CERTS_PATH
MTLS — mutual TLS; additionally requires NIM_SSL_CA_CERTS_PATH
Default: DISABLED
Example: NIM_SSL_MODE=TLS

`NIM_SSL_KEY_PATH`#

string = None

Path to the SSL private key file. Required when NIM_SSL_MODE is TLS or MTLS.

Default: None
Example: NIM_SSL_KEY_PATH=/etc/ssl/private/server.key

`NIM_SSL_CERTS_PATH`#

string = None

Path to the SSL certificate file. Required when NIM_SSL_MODE is TLS or MTLS.

Default: None
Example: NIM_SSL_CERTS_PATH=/etc/ssl/certs/server.crt

`NIM_SSL_CA_CERTS_PATH`#

string = None

Path to the CA certificate file for client verification. Required when NIM_SSL_MODE is MTLS.

Default: None
Example: NIM_SSL_CA_CERTS_PATH=/etc/ssl/certs/ca.crt

CORS#

These variables configure Cross-Origin Resource Sharing (CORS) policy at the nginx proxy layer.

`NIM_CORS_ALLOW_ORIGINS`#

string = None

Comma-separated list of allowed request origins, or * for any origin.

Default: *
Example: NIM_CORS_ALLOW_ORIGINS=https://example.com

`NIM_CORS_ALLOW_METHODS`#

string = None

Allowed HTTP methods for CORS requests.

Default: GET, POST, PUT, DELETE, PATCH, OPTIONS
Example: NIM_CORS_ALLOW_METHODS=GET, POST, OPTIONS

`NIM_CORS_ALLOW_HEADERS`#

string = None

Allowed request headers for CORS requests.

Default: Content-Type, Authorization, X-Request-Id, X-Session-Id, X-Correlation-Id
Example: NIM_CORS_ALLOW_HEADERS=Content-Type, Authorization

`NIM_CORS_EXPOSE_HEADERS`#

string = None

Response headers that are exposed to the browser in CORS responses.

Default: X-Request-Id
Example: NIM_CORS_EXPOSE_HEADERS=X-Request-Id, X-Correlation-Id

`NIM_CORS_MAX_AGE`#

string = None

Duration in seconds that browsers may cache CORS preflight responses.

Default: 3600
Example: NIM_CORS_MAX_AGE=7200

Advanced#

The following variables control advanced argument handling and runtime behavior:

`NIM_PASSTHROUGH_ARGS`#

str | None

Passes additional vLLM CLI arguments as a single string. Useful in environments where direct CLI arguments are not available (e.g., container orchestrators).

Default: None
Type: string
Example: NIM_PASSTHROUGH_ARGS="--enable-prefix-caching --max-num-seqs 128"

`NIM_STRICT_ARG_PROCESSING`#

bool

Enables strict configuration processing. When true, conflicting configuration overrides (e.g., CLI overwriting an environment variable) raise errors instead of warnings.

Default: False
Type: boolean
Example: NIM_STRICT_ARG_PROCESSING=true

`NIM_DISABLE_CUDA_GRAPH`#

bool

Disables CUDA graph optimization. May reduce GPU memory usage at the cost of inference throughput.

Default: False
Type: boolean
Example: NIM_DISABLE_CUDA_GRAPH=true

Environment Variables#

Logging#

NIM_LOG_LEVEL#

NIM_JSONL_LOGGING#

Model Configuration#

NIM_MODEL_PROFILE#

NIM_MODEL_PATH#

NIM_SERVED_MODEL_NAME#

NIM_MAX_MODEL_LEN#

NIM_TENSOR_PARALLEL_SIZE#

NIM_PIPELINE_PARALLEL_SIZE#

NIM_NUM_COMPUTE_NODES#

NIM_REPOSITORY_OVERRIDE#

NIM_DISABLE_MODEL_DOWNLOAD#

NIM_TRUST_CUSTOM_CODE#

Server#

NIM_SERVER_PORT#

NIM_HEALTH_PORT#

LoRA and PEFT#

NIM_PEFT_SOURCE#

NIM_PEFT_REFRESH_INTERVAL#

NIM_PEFT_API_TIMEOUT_SECS#

Model Cache#

NIM_CACHE_PATH#

Authentication#

NGC_API_KEY#

NGC_CLI_API_KEY#

HF_TOKEN#

MODELSCOPE_API_TOKEN#

SSL and TLS#

NIM_SSL_MODE#

NIM_SSL_KEY_PATH#

NIM_SSL_CERTS_PATH#

NIM_SSL_CA_CERTS_PATH#

CORS#

NIM_CORS_ALLOW_ORIGINS#

NIM_CORS_ALLOW_METHODS#

NIM_CORS_ALLOW_HEADERS#

NIM_CORS_EXPOSE_HEADERS#

NIM_CORS_MAX_AGE#

Advanced#

NIM_PASSTHROUGH_ARGS#

NIM_STRICT_ARG_PROCESSING#

NIM_DISABLE_CUDA_GRAPH#

`NIM_LOG_LEVEL`#

`NIM_JSONL_LOGGING`#

`NIM_MODEL_PROFILE`#

`NIM_MODEL_PATH`#

`NIM_SERVED_MODEL_NAME`#

`NIM_MAX_MODEL_LEN`#

`NIM_TENSOR_PARALLEL_SIZE`#

`NIM_PIPELINE_PARALLEL_SIZE`#

`NIM_NUM_COMPUTE_NODES`#

`NIM_REPOSITORY_OVERRIDE`#

`NIM_DISABLE_MODEL_DOWNLOAD`#

`NIM_TRUST_CUSTOM_CODE`#

`NIM_SERVER_PORT`#

`NIM_HEALTH_PORT`#

`NIM_PEFT_SOURCE`#

`NIM_PEFT_REFRESH_INTERVAL`#

`NIM_PEFT_API_TIMEOUT_SECS`#

`NIM_CACHE_PATH`#

`NGC_API_KEY`#

`NGC_CLI_API_KEY`#

`HF_TOKEN`#

`MODELSCOPE_API_TOKEN`#

`NIM_SSL_MODE`#

`NIM_SSL_KEY_PATH`#

`NIM_SSL_CERTS_PATH`#

`NIM_SSL_CA_CERTS_PATH`#

`NIM_CORS_ALLOW_ORIGINS`#

`NIM_CORS_ALLOW_METHODS`#

`NIM_CORS_ALLOW_HEADERS`#

`NIM_CORS_EXPOSE_HEADERS`#

`NIM_CORS_MAX_AGE`#

`NIM_PASSTHROUGH_ARGS`#

`NIM_STRICT_ARG_PROCESSING`#

`NIM_DISABLE_CUDA_GRAPH`#