Environment Variables#
This page documents all environment variables supported by
NIM VLM. Set variables using -e flags when you run the container:
docker run --gpus=all \
-e NGC_API_KEY=$NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-p 8000:8000 \
-e NIM_LOG_LEVEL=INFO \
-e NGC_API_KEY \
nvcr.io/nim/nvidia/nemotron-3-content-safety:2.0.0
Logging#
The following variables control log format and verbosity:
NIM_LOG_LEVEL#
str | None
Controls the verbosity of NIM log output. Accepts standard Python logging
levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.
Default:
None(uses application default)Type: string
Example:
NIM_LOG_LEVEL=DEBUG
NIM_JSONL_LOGGING#
bool
Enables structured JSON Lines (JSONL) log output.
Default:
FalseType: boolean
Example:
NIM_JSONL_LOGGING=true
Model Configuration#
The following variables control model selection, model loading, and related runtime behavior:
NIM_MODEL_PROFILE#
string = None
Selects which model profile to use. Profiles define a validated combination of
model variant, precision, and parallelism settings for a given GPU
configuration. Run list-model-profiles inside the container to see
available profiles and their IDs.
Default: auto-selected based on detected GPU hardware
Example:
NIM_MODEL_PROFILE=07cd4f2bddd7a14ca84bab0a32602889fd0ae0eb76dc2eb0fc32594d065011a4
NIM_MODEL_PATH#
str | None
Model source URI or local filesystem path. Accepts hf://, ngc://, and
modelscope:// prefixes for remote repositories, or a local directory path.
When set, a runtime manifest is generated from this URI instead of using the
baked-in container manifest.
Default:
None(uses baked-in manifest andNIM_MODEL_PROFILE)Type: string
Example:
NIM_MODEL_PATH=hf://meta-llama/Llama-3.1-8B-Instruct
NIM_SERVED_MODEL_NAME#
str | None
Overrides the served model name returned in API responses. When set, the
/v1/models endpoint and response metadata use this name instead of the
default model identifier.
Default:
None(uses the model’s own identifier)Type: string
Example:
NIM_SERVED_MODEL_NAME=my-llama
NIM_MAX_MODEL_LEN#
int | None
Overrides the maximum sequence length (context window) for the model. Values larger than the model’s trained maximum may cause errors.
Default:
None(uses model’s default from config)Type: positive integer
Example:
NIM_MAX_MODEL_LEN=4096
NIM_TENSOR_PARALLEL_SIZE#
int | None
Overrides the tensor parallelism degree. Splits model layers across the specified number of GPUs for inference.
Default:
None(auto-detected from profile)Type: positive integer
Example:
NIM_TENSOR_PARALLEL_SIZE=2
NIM_PIPELINE_PARALLEL_SIZE#
int | None
Overrides the pipeline parallelism degree. Distributes model stages across the specified number of GPUs for inference.
Default:
None(auto-detected from profile)Type: positive integer
Example:
NIM_PIPELINE_PARALLEL_SIZE=2
NIM_NUM_COMPUTE_NODES#
integer = None
Total number of compute nodes for multi-node inference. In multi-node deployments, set this on both the leader and worker nodes to the total node count (leader + workers).
Default:
None(single-node operation)Example:
NIM_NUM_COMPUTE_NODES=2
NIM_REPOSITORY_OVERRIDE#
string = None
Redirects model downloads to an external repository while preserving the NIM manifest semantics. The container still uses the baked-in manifest for profile selection, but fetches model files from the overridden source.
Default:
None(downloads from the URI specified in the manifest)Example:
NIM_REPOSITORY_OVERRIDE=s3://my-bucket/models
NIM_DISABLE_MODEL_DOWNLOAD#
boolean = None
Skips model download during container startup. Useful in multi-node deployments where worker nodes use a pre-staged shared filesystem and only the leader node needs to download.
Default:
FalseExample:
NIM_DISABLE_MODEL_DOWNLOAD=true
NIM_TRUST_CUSTOM_CODE#
bool
Allows dynamic module loading for custom model code. Required for models that ship custom tokenizer or modeling files.
Default:
FalseType: boolean
Example:
NIM_TRUST_CUSTOM_CODE=true
Server#
The following variables control server and health-check ports:
NIM_SERVER_PORT#
int | None
Port for the external-facing HTTP API server.
Default:
None(uses container default)Type: integer
Example:
NIM_SERVER_PORT=9000
NIM_HEALTH_PORT#
int | None
Port for the proxy health endpoints (/v1/health/live and
/v1/health/ready).
Default:
None(defaults toNIM_SERVER_PORT)Type: integer
Example:
NIM_HEALTH_PORT=8001
LoRA and PEFT#
The following variables control LoRA and PEFT adapter discovery and refresh behavior:
NIM_PEFT_SOURCE#
str | None
URI for the LoRA adapter source (local path or NGC URI).
Default:
None(LoRA disabled)Type: string
Example:
NIM_PEFT_SOURCE=/adapters
NIM_PEFT_REFRESH_INTERVAL#
int | None
Polling interval in seconds for the dynamic LoRA watcher. When set, NIM periodically checks the PEFT source for new or removed adapters.
Default:
None(dynamic reloading disabled)Type: positive integer
Example:
NIM_PEFT_REFRESH_INTERVAL=30
NIM_PEFT_API_TIMEOUT_SECS#
float | None
Timeout in seconds for dynamic LoRA adapter API calls.
Default:
30.0Type: positive float
Example:
NIM_PEFT_API_TIMEOUT_SECS=60
Model Cache#
The following variable controls the model cache location inside the container:
NIM_CACHE_PATH#
str
Directory path for the model and artifact cache inside the NIM container.
Default:
/opt/nim/.cacheType: string
Example:
NIM_CACHE_PATH=/mnt/models/.cache
Authentication#
The following variables provide credentials for authenticated model downloads:
NGC_API_KEY#
string = None
API key for authenticated model downloads from NGC (NVIDIA GPU Cloud). Required
when downloading models from ngc:// repositories.
Default:
NoneExample:
NGC_API_KEY=nvapi-...
NGC_CLI_API_KEY#
string = None
Backward-compatible NGC credential source. When both NGC_CLI_API_KEY and
NGC_API_KEY are set, NGC_CLI_API_KEY takes precedence.
Default:
NoneExample:
NGC_CLI_API_KEY=nvapi-...
HF_TOKEN#
string = None
Authentication token for Hugging Face Hub. Required for downloading private or
gated models from hf:// repositories.
Default:
NoneExample:
HF_TOKEN=hf_...
MODELSCOPE_API_TOKEN#
string = None
Authentication token for ModelScope. Required for authenticated downloads from
modelscope:// repositories and to avoid rate limiting.
Default:
NoneExample:
MODELSCOPE_API_TOKEN=...
SSL and TLS#
The following variables control TLS termination at the nginx proxy layer:
NIM_SSL_MODE#
string = None
Controls TLS termination at the nginx proxy.
DISABLED— plain HTTP (default)TLS— server-side TLS; requiresNIM_SSL_KEY_PATHandNIM_SSL_CERTS_PATHMTLS— mutual TLS; additionally requiresNIM_SSL_CA_CERTS_PATHDefault:
DISABLEDExample:
NIM_SSL_MODE=TLS
NIM_SSL_KEY_PATH#
string = None
Path to the SSL private key file. Required when NIM_SSL_MODE is TLS or
MTLS.
Default:
NoneExample:
NIM_SSL_KEY_PATH=/etc/ssl/private/server.key
NIM_SSL_CERTS_PATH#
string = None
Path to the SSL certificate file. Required when NIM_SSL_MODE is TLS or
MTLS.
Default:
NoneExample:
NIM_SSL_CERTS_PATH=/etc/ssl/certs/server.crt
NIM_SSL_CA_CERTS_PATH#
string = None
Path to the CA certificate file for client verification. Required when
NIM_SSL_MODE is MTLS.
Default:
NoneExample:
NIM_SSL_CA_CERTS_PATH=/etc/ssl/certs/ca.crt
CORS#
These variables configure Cross-Origin Resource Sharing (CORS) policy at the nginx proxy layer.
NIM_CORS_ALLOW_ORIGINS#
string = None
Comma-separated list of allowed request origins, or * for any origin.
Default:
*Example:
NIM_CORS_ALLOW_ORIGINS=https://example.com
NIM_CORS_ALLOW_METHODS#
string = None
Allowed HTTP methods for CORS requests.
Default:
GET, POST, PUT, DELETE, PATCH, OPTIONSExample:
NIM_CORS_ALLOW_METHODS=GET, POST, OPTIONS
NIM_CORS_ALLOW_HEADERS#
string = None
Allowed request headers for CORS requests.
Default:
Content-Type, Authorization, X-Request-Id, X-Session-Id, X-Correlation-IdExample:
NIM_CORS_ALLOW_HEADERS=Content-Type, Authorization
NIM_CORS_EXPOSE_HEADERS#
string = None
Response headers that are exposed to the browser in CORS responses.
Default:
X-Request-IdExample:
NIM_CORS_EXPOSE_HEADERS=X-Request-Id, X-Correlation-Id
NIM_CORS_MAX_AGE#
string = None
Duration in seconds that browsers may cache CORS preflight responses.
Default:
3600Example:
NIM_CORS_MAX_AGE=7200
Advanced#
The following variables control advanced argument handling and runtime behavior:
NIM_PASSTHROUGH_ARGS#
str | None
Passes additional vLLM CLI arguments as a single string. Useful in environments where direct CLI arguments are not available (e.g., container orchestrators).
Default:
NoneType: string
Example:
NIM_PASSTHROUGH_ARGS="--enable-prefix-caching --max-num-seqs 128"
NIM_STRICT_ARG_PROCESSING#
bool
Enables strict configuration processing. When true, conflicting configuration overrides (e.g., CLI overwriting an environment variable) raise errors instead of warnings.
Default:
FalseType: boolean
Example:
NIM_STRICT_ARG_PROCESSING=true
NIM_DISABLE_CUDA_GRAPH#
bool
Disables CUDA graph optimization. May reduce GPU memory usage at the cost of inference throughput.
Default:
FalseType: boolean
Example:
NIM_DISABLE_CUDA_GRAPH=true