Environment Variables#

This page documents all environment variables supported by NIM VLM. Set variables using -e flags when you run the container:

docker run --gpus=all \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -p 8000:8000 \
  -e NIM_LOG_LEVEL=INFO \
  -e NGC_API_KEY \
  nvcr.io/nim/nvidia/nemotron-3-content-safety:2.0.0

Logging#

The following variables control log format and verbosity:

NIM_LOG_LEVEL#

str | None

Controls the verbosity of NIM log output. Accepts standard Python logging levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.

  • Default: None (uses application default)

  • Type: string

  • Example: NIM_LOG_LEVEL=DEBUG

NIM_JSONL_LOGGING#

bool

Enables structured JSON Lines (JSONL) log output.

  • Default: False

  • Type: boolean

  • Example: NIM_JSONL_LOGGING=true

Model Configuration#

The following variables control model selection, model loading, and related runtime behavior:

NIM_MODEL_PROFILE#

string = None

Selects which model profile to use. Profiles define a validated combination of model variant, precision, and parallelism settings for a given GPU configuration. Run list-model-profiles inside the container to see available profiles and their IDs.

  • Default: auto-selected based on detected GPU hardware

  • Example: NIM_MODEL_PROFILE=07cd4f2bddd7a14ca84bab0a32602889fd0ae0eb76dc2eb0fc32594d065011a4

NIM_MODEL_PATH#

str | None

Model source URI or local filesystem path. Accepts hf://, ngc://, and modelscope:// prefixes for remote repositories, or a local directory path. When set, a runtime manifest is generated from this URI instead of using the baked-in container manifest.

  • Default: None (uses baked-in manifest and NIM_MODEL_PROFILE)

  • Type: string

  • Example: NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct

NIM_SERVED_MODEL_NAME#

str | None

Overrides the served model name returned in API responses. When set, the /v1/models endpoint and response metadata use this name instead of the default model identifier.

  • Default: None (uses the model’s own identifier)

  • Type: string

  • Example: NIM_SERVED_MODEL_NAME=my-llama

NIM_MAX_MODEL_LEN#

int | None

Overrides the maximum sequence length (context window) for the model. Values larger than the model’s trained maximum may cause errors.

  • Default: None (uses model’s default from config)

  • Type: positive integer

  • Example: NIM_MAX_MODEL_LEN=4096

NIM_TENSOR_PARALLEL_SIZE#

int | None

Overrides the tensor parallelism degree. Splits model layers across the specified number of GPUs for inference.

  • Default: None (auto-detected from profile)

  • Type: positive integer

  • Example: NIM_TENSOR_PARALLEL_SIZE=2

NIM_PIPELINE_PARALLEL_SIZE#

int | None

Overrides the pipeline parallelism degree. Distributes model stages across the specified number of GPUs for inference.

  • Default: None (auto-detected from profile)

  • Type: positive integer

  • Example: NIM_PIPELINE_PARALLEL_SIZE=2

NIM_NUM_COMPUTE_NODES#

integer = None

Total number of compute nodes for multi-node inference. In multi-node deployments, set this on both the leader and worker nodes to the total node count (leader + workers).

  • Default: None (single-node operation)

  • Example: NIM_NUM_COMPUTE_NODES=2

NIM_REPOSITORY_OVERRIDE#

string = None

Redirects model downloads to an external repository while preserving the NIM manifest semantics. The container still uses the baked-in manifest for profile selection, but fetches model files from the overridden source.

  • Default: None (downloads from the URI specified in the manifest)

  • Example: NIM_REPOSITORY_OVERRIDE=s3://my-bucket/models

NIM_DISABLE_MODEL_DOWNLOAD#

boolean = None

Skips model download during container startup. Useful in multi-node deployments where worker nodes use a pre-staged shared filesystem and only the leader node needs to download.

  • Default: False

  • Example: NIM_DISABLE_MODEL_DOWNLOAD=true

NIM_TRUST_CUSTOM_CODE#

bool

Allows dynamic module loading for custom model code. Required for models that ship custom tokenizer or modeling files.

  • Default: False

  • Type: boolean

  • Example: NIM_TRUST_CUSTOM_CODE=true

Server#

The following variables control server and health-check ports:

NIM_SERVER_PORT#

int | None

Port for the external-facing HTTP API server.

  • Default: None (uses container default)

  • Type: integer

  • Example: NIM_SERVER_PORT=9000

NIM_HEALTH_PORT#

int | None

Port for the proxy health endpoints (/v1/health/live and /v1/health/ready).

  • Default: None (defaults to NIM_SERVER_PORT)

  • Type: integer

  • Example: NIM_HEALTH_PORT=8001

LoRA and PEFT#

The following variables control LoRA and PEFT adapter discovery and refresh behavior:

NIM_PEFT_SOURCE#

str | None

URI for the LoRA adapter source (local path or NGC URI).

  • Default: None (LoRA disabled)

  • Type: string

  • Example: NIM_PEFT_SOURCE=/adapters

NIM_PEFT_REFRESH_INTERVAL#

int | None

Polling interval in seconds for the dynamic LoRA watcher. When set, NIM periodically checks the PEFT source for new or removed adapters.

  • Default: None (dynamic reloading disabled)

  • Type: positive integer

  • Example: NIM_PEFT_REFRESH_INTERVAL=30

NIM_PEFT_API_TIMEOUT_SECS#

float | None

Timeout in seconds for dynamic LoRA adapter API calls.

  • Default: 30.0

  • Type: positive float

  • Example: NIM_PEFT_API_TIMEOUT_SECS=60

Model Cache#

The following variable controls the model cache location inside the container:

NIM_CACHE_PATH#

str

Directory path for the model and artifact cache inside the NIM container.

  • Default: /opt/nim/.cache

  • Type: string

  • Example: NIM_CACHE_PATH=/mnt/models/.cache

Authentication#

The following variables provide credentials for authenticated model downloads:

NGC_API_KEY#

string = None

API key for authenticated model downloads from NGC (NVIDIA GPU Cloud). Required when downloading models from ngc:// repositories.

  • Default: None

  • Example: NGC_API_KEY=nvapi-...

NGC_CLI_API_KEY#

string = None

Backward-compatible NGC credential source. When both NGC_CLI_API_KEY and NGC_API_KEY are set, NGC_CLI_API_KEY takes precedence.

  • Default: None

  • Example: NGC_CLI_API_KEY=nvapi-...

HF_TOKEN#

string = None

Authentication token for Hugging Face Hub. Required for downloading private or gated models from hf:// repositories.

  • Default: None

  • Example: HF_TOKEN=hf_...

MODELSCOPE_API_TOKEN#

string = None

Authentication token for ModelScope. Required for authenticated downloads from modelscope:// repositories and to avoid rate limiting.

  • Default: None

  • Example: MODELSCOPE_API_TOKEN=...

SSL and TLS#

The following variables control TLS termination at the nginx proxy layer:

NIM_SSL_MODE#

string = None

Controls TLS termination at the nginx proxy.

  • DISABLED — plain HTTP (default)

  • TLS — server-side TLS; requires NIM_SSL_KEY_PATH and NIM_SSL_CERTS_PATH

  • MTLS — mutual TLS; additionally requires NIM_SSL_CA_CERTS_PATH

  • Default: DISABLED

  • Example: NIM_SSL_MODE=TLS

NIM_SSL_KEY_PATH#

string = None

Path to the SSL private key file. Required when NIM_SSL_MODE is TLS or MTLS.

  • Default: None

  • Example: NIM_SSL_KEY_PATH=/etc/ssl/private/server.key

NIM_SSL_CERTS_PATH#

string = None

Path to the SSL certificate file. Required when NIM_SSL_MODE is TLS or MTLS.

  • Default: None

  • Example: NIM_SSL_CERTS_PATH=/etc/ssl/certs/server.crt

NIM_SSL_CA_CERTS_PATH#

string = None

Path to the CA certificate file for client verification. Required when NIM_SSL_MODE is MTLS.

  • Default: None

  • Example: NIM_SSL_CA_CERTS_PATH=/etc/ssl/certs/ca.crt

CORS#

These variables configure Cross-Origin Resource Sharing (CORS) policy at the nginx proxy layer.

NIM_CORS_ALLOW_ORIGINS#

string = None

Comma-separated list of allowed request origins, or * for any origin.

  • Default: *

  • Example: NIM_CORS_ALLOW_ORIGINS=https://example.com

NIM_CORS_ALLOW_METHODS#

string = None

Allowed HTTP methods for CORS requests.

  • Default: GET, POST, PUT, DELETE, PATCH, OPTIONS

  • Example: NIM_CORS_ALLOW_METHODS=GET, POST, OPTIONS

NIM_CORS_ALLOW_HEADERS#

string = None

Allowed request headers for CORS requests.

  • Default: Content-Type, Authorization, X-Request-Id, X-Session-Id, X-Correlation-Id

  • Example: NIM_CORS_ALLOW_HEADERS=Content-Type, Authorization

NIM_CORS_EXPOSE_HEADERS#

string = None

Response headers that are exposed to the browser in CORS responses.

  • Default: X-Request-Id

  • Example: NIM_CORS_EXPOSE_HEADERS=X-Request-Id, X-Correlation-Id

NIM_CORS_MAX_AGE#

string = None

Duration in seconds that browsers may cache CORS preflight responses.

  • Default: 3600

  • Example: NIM_CORS_MAX_AGE=7200

Advanced#

The following variables control advanced argument handling and runtime behavior:

NIM_PASSTHROUGH_ARGS#

str | None

Passes additional vLLM CLI arguments as a single string. Useful in environments where direct CLI arguments are not available (e.g., container orchestrators).

  • Default: None

  • Type: string

  • Example: NIM_PASSTHROUGH_ARGS="--enable-prefix-caching --max-num-seqs 128"

NIM_STRICT_ARG_PROCESSING#

bool

Enables strict configuration processing. When true, conflicting configuration overrides (e.g., CLI overwriting an environment variable) raise errors instead of warnings.

  • Default: False

  • Type: boolean

  • Example: NIM_STRICT_ARG_PROCESSING=true

NIM_DISABLE_CUDA_GRAPH#

bool

Disables CUDA graph optimization. May reduce GPU memory usage at the cost of inference throughput.

  • Default: False

  • Type: boolean

  • Example: NIM_DISABLE_CUDA_GRAPH=true