Notes on NIM Container Variants#

Some NIMs are built with packages that vary from the standard base Docker container. These NIMs can better access features specific to a particular model or can run on GPUs before they are fully supported in the main source code branch. These NIMs, also known as NIM container variants, are designated by the -variant suffix in their version tag name.

These NIM container variants have important, underlying differences from NIMs built with the standard base container. These differences vary according to model. This page documents these differences with respect to the features and functionality of LLM NIM container version 1.15.0. Refer to the following:

DeepSeek-V3.1-Terminus#

Environment Variables#

Not Supported#

The following environment variables aren’t currently supported:

  • NIM_CUSTOM_GUIDED_DECODING_BACKENDS

  • NIM_CUSTOM_MODEL_NAME

  • NIM_DISABLE_OVERLAP_SCHEDULING

  • NIM_ENABLE_KV_CACHE_HOST_OFFLOAD

  • NIM_ENABLE_PROMPT_LOGPROBS

  • NIM_FORCE_DETERMINISTIC

  • NIM_FT_MODEL

  • NIM_GUIDED_DECODING_BACKEND: SGLang backend supports "xgrammar", "outlines", "llguidance", and "none", and does not support custom guided decoding backends. Refer to Custom Guided Decoding Backends.

  • NIM_JSONL_LOGGING

  • NIM_KV_CACHE_HOST_MEM_FRACTION

  • NIM_MANIFEST_ALLOW_UNSAFE: No longer required

  • NIM_MAX_CPU_LORAS

  • NIM_MAX_GPU_LORAS

  • NIM_NUM_KV_CACHE_SEQ_LENS

  • NIM_PEFT_REFRESH_INTERVAL

  • NIM_PEFT_SOURCE

  • NIM_PER_REQ_METRICS_ENABLE

  • NIM_REWARD_LOGITS_RANGE

  • NIM_REWARD_MODEL

  • NIM_REWARD_MODEL_STRING

  • NIM_SCHEDULER_POLICY: Supported, but SGLang has different possible values ("lpm", "random", "fcfs", "dfs-weight", "lof", and "priority") from the LLM NIM container version 1.14.0.

  • SSL_CERT_FILE: Set both NIM_SSL_CERT_PATH and SSL_CERT_FILE to the same location

Note

Most of these variables are not used with an SGLang backend.

New Additions#

The following new environment variables are supported:

Note

Some variables may not be applicable to every model (for example, not all models support tool calling or thinking).

  • TOOL_CALL_PARSER / NIM_TOOL_CALL_PARSER

  • REASONING_PARSER / NIM_REASONING_PARSER

  • NIM_CHAT_TEMPLATE

  • NIM_TAGS_SELECTOR

API Compatibility#

The following API features have differences according to the backend used:

  • Error handling: Many variables lack error handling methods, which can cause invalid cases to fail.

  • Structured output

    • vLLM uses guided_json, guided_choice and guided_regex followed by a string.

    • SGLang uses response_format(json), such as the following:

      response_format={ "type": "json_schema", "schema": json string, }
      
  • include_stop_str_in_output and continuous_usage_stats are not supported by SGLang.

  • Tool calling with streams

    • For SGLang, the second-to-last chunk contains the complete tool call content.

    • For vLLM, all chunks contain streamed tool call content.

  • top_logprobs

    • For TRT-LLM and SGLang, the content of the final chunk is empty, signaling the end, with no top_logprobs (that is, "finish_reason": "stop").

    • For vLLM, the final chunk contains content.

  • Setting a stop word

    • For vLLM and TRT-LLM, stop_reason is used.

    • For SGLang, matched_stop is used.

  • Echo configuration

    • SGLang supports boolean or int (1 or 0) input.

    • vLLM supports boolean or null input.

The following API features only have support at the function level:

  • logprobs

  • Guided decoding (including guided_whitespace_pattern and structured_generation)

  • Role configuration

The following API features are not supported:

  • Reward

  • Llama API

  • Structured output (guided_json, guided_choice and guided_regex): Use response_format instead

  • nvext

nvext features are supported using different parameters in the top-level payload.

Metrics#

The output of v1/metrics has differences according to the backend used (SGLang versus vLLM). Different naming conventions for metrics are used, for example, SGLang includes a prefix for each metric.

Additional metrics related to GPU resources have been added.

The following v1/metrics are not supported:

  • Request success rate metrics:

    • request_success_total

    • request_failure_total

    • request_finish_total

Usage Changes and Features#

The container docker run command doesn’t support the -u $(id -u) parameter.

For air gap deployment, add the following parameters to the docker run command:

-e NIM_DISABLE_MODEL_DOWNLOAD=1 -v :/opt/nim/workspace/ \
-v <local-model-path>:<model-weight-path>

No other changes to usage and features are needed.

GPT-OSS-120B#

Environment Variables#

Not Supported#

  • NIM_CUSTOM_GUIDED_DECODING_BACKENDS

  • NIM_CUSTOM_MODEL_NAME

  • NIM_DISABLE_CUDA_GRAPH : Defaults to False

  • NIM_DISABLE_OVERLAP_SCHEDULING

  • NIM_ENABLE_DP_ATTENTION

  • NIM_ENABLE_KV_CACHE_HOST_OFFLOAD

  • NIM_ENABLE_PROMPT_LOGPROBS

  • NIM_FORCE_DETERMINISTIC

  • NIM_FORCE_TRUST_REMOTE_CODE : Defaults to True

  • NIM_FT_MODEL

  • NIM_KV_CACHE_HOST_MEM_FRACTION

  • NIM_LOW_MEMORY_MODE

  • NIM_MANIFEST_ALLOW_UNSAFE : This is no longer required

  • NIM_REWARD_MODEL

  • NIM_REWARD_MODEL_STRING

  • NIM_REWARD_LOGITS_RANGE

  • NIM_SCHEDULER_POLICY

  • NIM_SERVED_MODEL_NAME : Only a single name is supported

  • NIM_TOKENIZER_MODE : Defaults to fast mode

  • SSL_CERT_FILE : Use NIM_SSL_CERT_PATH instead

Usage Changes and Features#

The container docker run command doesn’t support the -u $(id -u) parameter.

GPT-OSS-20B#

Environment Variables#

Not Supported#

  • NIM_CUSTOM_GUIDED_DECODING_BACKENDS

  • NIM_CUSTOM_MODEL_NAME

  • NIM_DISABLE_CUDA_GRAPH : Defaults to False

  • NIM_DISABLE_OVERLAP_SCHEDULING

  • NIM_ENABLE_DP_ATTENTION

  • NIM_ENABLE_KV_CACHE_HOST_OFFLOAD

  • NIM_ENABLE_PROMPT_LOGPROBS

  • NIM_FORCE_DETERMINISTIC

  • NIM_FORCE_TRUST_REMOTE_CODE : Defaults to True

  • NIM_FT_MODEL

  • NIM_KV_CACHE_HOST_MEM_FRACTION

  • NIM_LOW_MEMORY_MODE

  • NIM_MANIFEST_ALLOW_UNSAFE : This is no longer required

  • NIM_REWARD_MODEL

  • NIM_REWARD_MODEL_STRING

  • NIM_REWARD_LOGITS_RANGE

  • NIM_SCHEDULER_POLICY

  • NIM_SERVED_MODEL_NAME : Only a single name is supported

  • NIM_TOKENIZER_MODE : Defaults to fast mode

  • SSL_CERT_FILE : Use NIM_SSL_CERT_PATH instead

Usage Changes and Features#

The container docker run command doesn’t support the -u $(id -u) parameter.

Llama-3.1-8b-Instruct-DGX-Spark#

This NIM container variant was released with LLM NIM container version 1.14 and uses the 1.0.0-variant tag. For more information, refer to the 1.14 version of this page.

Nemotron 3 Nano#

Environment Variables#

Not Supported#

The following environment variables aren’t currently supported:

  • NIM_CUSTOM_GUIDED_DECODING_BACKENDS

  • NIM_CUSTOM_MODEL_NAME

  • NIM_DISABLE_CUDA_GRAPH: Defaults to False

  • NIM_DISABLE_OVERLAP_SCHEDULING

  • NIM_ENABLE_DP_ATTENTION

  • NIM_ENABLE_KV_CACHE_HOST_OFFLOAD

  • NIM_ENABLE_KV_CACHE_REUSE

  • NIM_ENABLE_PROMPT_LOGPROBS

  • NIM_FORCE_DETERMINISTIC

  • NIM_FORCE_TRUST_REMOTE_CODE: Defaults to True

  • NIM_FT_MODEL

  • NIM_JSONL_LOGGING

  • NIM_KV_CACHE_HOST_MEM_FRACTION

  • NIM_LOW_MEMORY_MODE

  • NIM_MANIFEST_ALLOW_UNSAFE: No longer required

  • NIM_MAX_CPU_LORAS

  • NIM_MAX_GPU_LORAS

  • NIM_MAX_MODEL_LEN

  • NIM_NUM_KV_CACHE_SEQ_LENS

  • NIM_PEFT_REFRESH_INTERVAL

  • NIM_PEFT_SOURCE

  • NIM_PER_REQ_METRICS_ENABLE

  • NIM_RELAX_MEM_CONSTRAINTS

  • NIM_REWARD_LOGITS_RANGE

  • NIM_REWARD_MODEL

  • NIM_REWARD_MODEL_STRING

  • NIM_SCHEDULER_POLICY

  • NIM_SERVED_MODEL_NAME: Only a single name is supported

  • NIM_TOKENIZER_MODE: Defaults to fast mode

  • NIM_USE_API_TRANSFORM_SHIM

  • SSL_CERT_FILE: Use NIM_SSL_CERT_PATH instead

Note

Most of these variables are not used with an SGLang backend.

New Additions#

The following new environment variables are supported:

Note

Some variables may not be applicable to every model (for example, not all models support tool calling or thinking).

  • NIM_TAGS_SELECTOR: Filters tags in the automatic profile selector. You can use a list of key-value pairs, where the key is the profile property name and the value is the desired property value. For example, set NIM_TAGS_SELECTOR="profile=latency" to automatically select the latency profile. Or set NIM_TAGS_SELECTOR="tp=4" to select a throughput profile that supports 4 GPUs.

  • DISABLE_RADIX_CACHE: Set to 1 to disable KV cache reuse.

  • NIM_ENABLE_MTP: Set to 1 to enable the LLM to generate several tokens at once, boosting speed, efficiency, and reasoning.

  • REASONING_PARSER: Set to 1 to turn thinking on.

  • TOOL_CALL_PARSER: Set to 1 to turn tool calling on.

  • NIM_CONFIG_FILE: Specifies a configuration YAML file for advanced parameter tuning. Use this file to overwrite the default NIM configuration values. You must convert the hyphens in server argument names to underscores. For example, the following SGLang command arguments:

    python -m sglang.launch_server --model-path XXX --tp-size 4 \
      --context-length 262144 --mem-fraction-static 0.8
    

    are defined by the following content in the configuration YAML file:

    tp_size: 4
    context_length: 262144
    mem_fraction_static: 0.8
    

    Default value: None.

API Compatibility#

The following API features are not supported:

  • logprobs

  • suffix

  • Guided decoding (including guided_whitespace_pattern and structured_generation)

  • Echo and role configuration

  • Reward

  • Llama API

  • nvext

nvext features are supported using different parameters in the top-level payload.

Security Features#

No changes to security features. These models maintain the same security features and capabilities as standard models. No additional security limitations or modifications apply.

Usage Changes and Features#

The container docker run command doesn’t support the -u $(id -u) parameter.

For air gap deployment, add the following parameters to the docker run command:

-e NIM_DISABLE_MODEL_DOWNLOAD=1 \
-v <local-model-path>:<model-weight-path> \
-e NIM_MODEL_PATH=<model-weight-path> \

No other changes to usage and features are needed.

NVIDIA-Nemotron-Nano-9B-v2-DGX-Spark#

This NIM container variant was released with LLM NIM container version 1.14 and uses the 1.0.0-variant tag. For more information, refer to the 1.14 version of this page.

Qwen3-Next-80B-A3B-Instruct#

Environment Variables#

Not Supported#

The following environment variables aren’t currently supported:

  • NIM_CUSTOM_GUIDED_DECODING_BACKENDS

  • NIM_CUSTOM_MODEL_NAME

  • NIM_DISABLE_CUDA_GRAPH: Defaults to False

  • NIM_DISABLE_OVERLAP_SCHEDULING

  • NIM_ENABLE_DP_ATTENTION

  • NIM_ENABLE_KV_CACHE_HOST_OFFLOAD

  • NIM_ENABLE_PROMPT_LOGPROBS

  • NIM_FORCE_TRUST_REMOTE_CODE: Defaults to True

  • NIM_FT_MODEL

  • NIM_JSONL_LOGGING

  • NIM_KV_CACHE_HOST_MEM_FRACTION

  • NIM_LOW_MEMORY_MODE

  • NIM_MANIFEST_ALLOW_UNSAFE: No longer required

  • NIM_MAX_CPU_LORAS

  • NIM_MAX_GPU_LORAS

  • NIM_NUM_KV_CACHE_SEQ_LENS

  • NIM_PEFT_REFRESH_INTERVAL

  • NIM_PEFT_SOURCE

  • NIM_RELAX_MEM_CONSTRAINTS

  • NIM_REPOSITORY_OVERRIDE

  • NIM_REWARD_LOGITS_RANGE

  • NIM_REWARD_MODEL

  • NIM_REWARD_MODEL_STRING

  • NIM_TOKENIZER_MODE: Defaults to fast mode

  • SSL_CERT_FILE: Set both NIM_SSL_CERT_PATH and SSL_CERT_FILE to the same location

Note

Most of these variables are not used with an SGLang backend.

API Compatibility#

The following API features have differences according to the backend used:

  • Error handling: Many variables lack error handling methods, which can cause invalid cases to fail.

  • Structured output

    • vLLM uses guided_json, guided_choice, and guided_regex followed by a string.

    • SGLang uses response_format(json), similar to the following:

      response_format={ "type": "json_schema", "schema": json string, }
      
  • include_stop_str_in_output and continuous_usage_stats are not supported by SGLang.

  • Tool calling with streams

    • For SGLang, the second-to-last chunk contains the complete tool call content.

    • For vLLM, all chunks contain streamed tool call content.

  • top_logprobs

    • For TRT-LLM and SGLang, the content of the final chunk is empty, signaling the end, with no top_logprobs (that is, "finish_reason": "stop").

    • For vLLM, the final chunk contains content.

  • Setting a stop word

    • For vLLM and TRT-LLM, stop_reason is used.

    • For SGLang, matched_stop is used.

  • Echo configuration

    • SGLang supports boolean or integer (1 or 0) input.

    • vLLM supports boolean or null input.

The following API features only have support at the function level:

  • logprobs

  • Guided decoding (including guided_whitespace_pattern and structured_generation)

  • Role configuration

The following API features are not supported:

  • Reward

  • Llama API

  • Structured output (guided_json, guided_choice and guided_regex): Use response_format instead

  • nvext

nvext features are supported using different parameters in the top-level payload.

Metrics#

The output of v1/metrics has differences according to the backend used (SGLang versus vLLM). Different naming conventions for metrics are used, for example, SGLang includes a prefix for each metric.

Additional metrics related to GPU resources have been added.

The following v1/metrics are not supported:

  • Request success rate metrics:

    • request_success_total

    • request_failure_total

    • request_finish_total

  • KV cache metrics

Usage Changes and Features#

The container docker run command doesn’t support the -u $(id -u) parameter.

For air gap deployment, add the following parameters to the docker run command:

-e NIM_DISABLE_MODEL_DOWNLOAD=1 -v :/opt/nim/workspace/ \
-v <local-model-path>:<model-weight-path>

No other changes to usage and features are needed.

Qwen3 Next 80B A3B Thinking#

This NIM container variant was released with LLM NIM container version 1.14 and uses the 1.0.0-variant tag. For more information, refer to the 1.14 version of this page.

Qwen3-32B NIM for DGX Spark#

This NIM container variant was released with LLM NIM container version 1.14 and uses the 1.0.0-variant tag. For more information, refer to the 1.14 version of this page.