Notes on NIM Container Variants#
Some NIMs are built with packages that vary from the standard base Docker
container. These NIMs can better access features specific to a particular model
or can run on GPUs before they are fully supported in the main source code
branch. These NIMs, also known as NIM container variants, are designated by the
-variant
suffix in their version tag name.
These NIM container variants have important, underlying differences from NIMs built with the standard base container. These differences vary according to model. This page documents these differences with respect to the features and functionality of LLM NIM container version 1.14.0. Refer to the following:
Llama-3.1-8b-Instruct-DGX-Spark#
Deployment#
Refer to the NGC catalog page for more information. You can also view the Llama-3.1-8b-Instruct-DGX-Spark deployment guide on build.nvidia.com.
Environment Variables#
Not Supported#
The following environment variables aren’t currently supported:
NIM_SCHEDULER_POLICY
NIM_TOKENIZER_MODE
: Defaults to fast modeNIM_CUSTOM_GUIDED_DECODING_BACKENDS
NIM_GUIDED_DECODING_BACKEND
NIM_KV_CACHE_HOST_MEM_FRACTION
NIM_ENABLE_KV_CACHE_HOST_OFFLOAD
NIM_ENABLE_PROMPT_LOGPROBS
NIM_MAX_CPU_LORAS
NIM_MAX_GPU_LORAS
NIM_PEFT_REFRESH_INTERVAL
NIM_PEFT_SOURCE
NIM_RELAX_MEM_CONSTRAINTS
NIM_CUSTOM_MODEL_NAME
NIM_DISABLE_OVERLAP_SCHEDULING
NIM_ENABLE_DP_ATTENTION
NIM_LOW_MEMORY_MODE
NIM_MANIFEST_ALLOW_UNSAFE
: No longer requiredNIM_NUM_KV_CACHE_SEQ_LENS
NIM_FORCE_TRUST_REMOTE_CODE
: Defaults to TrueSSL_CERT_FILE
: UseNIM_SSL_CERT_PATH
insteadNIM_FT_MODEL
NIM_DISABLE_CUDA_GRAPH
: Defaults to FalseNIM_FORCE_DETERMINISTIC
NIM_REWARD_LOGITS_RANGE
NIM_REWARD_MODEL
NIM_REWARD_MODEL_STRING
Note
Most of these variables are not used with an SGLang backend.
New Additions#
The following new environment variables are supported:
Note
Some variables may not be applicable to every model (for example, not all models support tool calling or thinking).
NIM_GPU_MEM_FRACTION
: Sets the GPU memory usage as a percentage of the maximum amount (from0.0
-1.0
). For example, this is set to 60 GB by default (NIM_GPU_MEM_FRACTION = 0.5
) for this NIM.NIM_TAGS_SELECTOR
: Filters tags in the automatic profile selector. You can use a list of key-value pairs, where the key is the profile property name and the value is the desired property value. For example, setNIM_TAGS_SELECTOR="profile=latency"
to automatically select the latency profile. Or setNIM_TAGS_SELECTOR="tp=4"
to select a throughput profile that supports 4 GPUs.REASONING_PARSER
: Set to1
to turn thinking on.TOOL_CALL_PARSER
: Set to1
to turn tool calling on.
API Compatibility#
The following API features are not supported:
logprobs
suffix
Guided decoding (including
guided_whitespace_pattern
andstructured_generation
)Echo and role configuration
Reward
Llama API
nvext
nvext
features are supported using different parameters in the top-level
payload.
Security Features#
No changes to security features. These models maintain the same security features and capabilities as standard models. No additional security limitations or modifications apply.
Usage Changes and Features#
The container docker run
command doesn’t support the -u $(id -u)
parameter.
For air gap deployment, add the following parameters to
the docker run
command:
-e NIM_DISABLE_MODEL_DOWNLOAD=1 \
-v <local-model-path>:<model-weight-path> \
-e NIM_MODEL_PATH=<model-weight-path> \
No other changes to usage and features are needed.
Qwen3 Next 80B A3B Thinking#
Environment Variables#
Not Supported#
The following environment variables aren’t currently supported:
NIM_MAX_MODEL_LEN
NIM_SCHEDULER_POLICY
NIM_TOKENIZER_MODE
: Defaults to fast modeNIM_CUSTOM_GUIDED_DECODING_BACKENDS
NIM_GUIDED_DECODING_BACKEND
NIM_KV_CACHE_HOST_MEM_FRACTION
NIM_ENABLE_KV_CACHE_HOST_OFFLOAD
NIM_ENABLE_KV_CACHE_REUSE
NIM_ENABLE_PROMPT_LOGPROBS
NIM_MAX_CPU_LORAS
NIM_MAX_GPU_LORAS
NIM_PEFT_REFRESH_INTERVAL
NIM_PEFT_SOURCE
NIM_RELAX_MEM_CONSTRAINTS
NIM_CUSTOM_MODEL_NAME
NIM_DISABLE_OVERLAP_SCHEDULING
NIM_ENABLE_DP_ATTENTION
NIM_LOW_MEMORY_MODE
NIM_MANIFEST_ALLOW_UNSAFE
: No longer requiredNIM_NUM_KV_CACHE_SEQ_LENS
NIM_FORCE_TRUST_REMOTE_CODE
: Defaults to TrueSSL_CERT_FILE
: UseNIM_SSL_CERT_PATH
insteadNIM_FT_MODEL
NIM_DISABLE_CUDA_GRAPH
: Defaults to FalseNIM_FORCE_DETERMINISTIC
NIM_REWARD_LOGITS_RANGE
NIM_REWARD_MODEL
NIM_REWARD_MODEL_STRING
Note
Most of these variables are not used with an SGLang backend.
New Additions#
The following new environment variables are supported:
Note
Some variables may not be applicable to every model (for example, not all models support tool calling or thinking).
NIM_TAGS_SELECTOR
: Filters tags in the automatic profile selector. You can use a list of key-value pairs, where the key is the profile property name and the value is the desired property value. For example, setNIM_TAGS_SELECTOR="profile=latency"
to automatically select the latency profile. Or setNIM_TAGS_SELECTOR="tp=4"
to select a throughput profile that supports 4 GPUs.DISABLE_RADIX_CACHE
: Set to1
to disable KV cache reuse.NIM_ENABLE_MTP
: Set to1
to enable the LLM to generate several tokens at once, boosting speed, efficiency, and reasoning.REASONING_PARSER
: Set to1
to turn thinking on.TOOL_CALL_PARSER
: Set to1
to turn tool calling on.NIM_CONFIG_FILE
: Specifies a configuration YAML file for advanced parameter tuning. Use this file to overwrite the default NIM configuration values. You must convert the hyphens in server argument names to underscores. For example, the following SGLang command arguments:python -m sglang.launch_server --model-path XXX --tp-size 4 \ --context-length 262144 --mem-fraction-static 0.8
are defined by the following content in the configuration YAML file:
tp_size: 4 context_length: 262144 mem_fraction_static: 0.8
Default value:
None
.
API Compatibility#
The following API features are not supported:
logprobs
suffix
Guided decoding (including
guided_whitespace_pattern
andstructured_generation
)Echo and role configuration
Reward
Llama API
nvext
nvext
features are supported using different parameters in the top-level
payload.
Security Features#
No changes to security features. These models maintain the same security features and capabilities as standard models. No additional security limitations or modifications apply.
Usage Changes and Features#
The container docker run
command doesn’t support the -u $(id -u)
parameter.
For air gap deployment, add the following parameters to
the docker run
command:
-e NIM_DISABLE_MODEL_DOWNLOAD=1 \
-v <local-model-path>:<model-weight-path> \
-e NIM_MODEL_PATH=<model-weight-path> \
No other changes to usage and features are needed.
Qwen3-32B NIM for DGX Spark#
Deployment#
Refer to the NGC catalog page for more information. You can also view the Qwen3-32B NIM for DGX Spark deployment guide on build.nvidia.com.
Environment Variables#
Not Supported#
The following environment variables aren’t currently supported:
NIM_SCHEDULER_POLICY
NIM_TOKENIZER_MODE
: Defaults to fast modeNIM_CUSTOM_GUIDED_DECODING_BACKENDS
NIM_GUIDED_DECODING_BACKEND
NIM_KV_CACHE_HOST_MEM_FRACTION
NIM_ENABLE_KV_CACHE_HOST_OFFLOAD
NIM_ENABLE_PROMPT_LOGPROBS
NIM_MAX_CPU_LORAS
NIM_MAX_GPU_LORAS
NIM_PEFT_REFRESH_INTERVAL
NIM_PEFT_SOURCE
NIM_RELAX_MEM_CONSTRAINTS
NIM_CUSTOM_MODEL_NAME
NIM_DISABLE_OVERLAP_SCHEDULING
NIM_ENABLE_DP_ATTENTION
NIM_LOW_MEMORY_MODE
NIM_MANIFEST_ALLOW_UNSAFE
: No longer requiredNIM_NUM_KV_CACHE_SEQ_LENS
NIM_FORCE_TRUST_REMOTE_CODE
: Defaults to TrueSSL_CERT_FILE
: UseNIM_SSL_CERT_PATH
insteadNIM_FT_MODEL
NIM_DISABLE_CUDA_GRAPH
: Defaults to FalseNIM_FORCE_DETERMINISTIC
NIM_REWARD_LOGITS_RANGE
NIM_REWARD_MODEL
NIM_REWARD_MODEL_STRING
Note
Most of these variables are not used with an SGLang backend.
New Additions#
The following new environment variables are supported:
Note
Some variables may not be applicable to every model (for example, not all models support tool calling or thinking).
NIM_GPU_MEM_FRACTION
: Sets the GPU memory usage as a percentage of the maximum amount (from0.0
-1.0
). For example, this is set to 108 GB by default (NIM_GPU_MEM_FRACTION = 0.9
) for this NIM.NIM_TAGS_SELECTOR
: Filters tags in the automatic profile selector. You can use a list of key-value pairs, where the key is the profile property name and the value is the desired property value. For example, setNIM_TAGS_SELECTOR="profile=latency"
to automatically select the latency profile. Or setNIM_TAGS_SELECTOR="tp=4"
to select a throughput profile that supports 4 GPUs.REASONING_PARSER
: Set to1
to turn thinking on.TOOL_CALL_PARSER
: Set to1
to turn tool calling on.
API Compatibility#
The following API features are not supported:
logprobs
suffix
Guided decoding (including
guided_whitespace_pattern
andstructured_generation
)Echo and role configuration
Reward
Llama API
nvext
nvext
features are supported using different parameters in the top-level
payload.
Security Features#
No changes to security features. These models maintain the same security features and capabilities as standard models. No additional security limitations or modifications apply.
Usage Changes and Features#
The container docker run
command doesn’t support the -u $(id -u)
parameter.
For air gap deployment, add the following parameters to
the docker run
command:
-e NIM_DISABLE_MODEL_DOWNLOAD=1 \
-v <local-model-path>:<model-weight-path> \
-e NIM_MODEL_PATH=<model-weight-path> \
No other changes to usage and features are needed.