Models with Functional Limitations#

The following LLM-specific NIM containers are known to have functional differences from most other models and are subject to the limitations shown on this page:

Environment Variables#

Not Supported#

The following environment variables aren’t currently supported:

  • NIM_MAX_MODEL_LEN

  • NIM_SCHEDULER_POLICY

  • NIM_TOKENIZER_MODE: Defaults to fast mode

  • NIM_CUSTOM_GUIDED_DECODING_BACKENDS

  • NIM_GUIDED_DECODING_BACKEND

  • NIM_KV_CACHE_HOST_MEM_FRACTION

  • NIM_ENABLE_KV_CACHE_HOST_OFFLOAD

  • NIM_ENABLE_KV_CACHE_REUSE

  • NIM_ENABLE_PROMPT_LOGPROBS

  • NIM_MAX_CPU_LORAS

  • NIM_MAX_GPU_LORAS

  • NIM_PEFT_REFRESH_INTERVAL

  • NIM_PEFT_SOURCE

  • NIM_RELAX_MEM_CONSTRAINTS

  • NIM_CUSTOM_MODEL_NAME

  • NIM_DISABLE_OVERLAP_SCHEDULING

  • NIM_ENABLE_DP_ATTENTION

  • NIM_LOW_MEMORY_MODE

  • NIM_MANIFEST_ALLOW_UNSAFE: No longer required

  • NIM_NUM_KV_CACHE_SEQ_LENS

  • NIM_FORCE_TRUST_REMOTE_CODE: Defaults to True

  • SSL_CERT_FILE: Use NIM_SSL_CERT_PATH instead

  • NIM_FT_MODEL

  • NIM_DISABLE_CUDA_GRAPH: Defaults to False

  • NIM_FORCE_DETERMINISTIC

  • NIM_REWARD_LOGITS_RANGE

  • NIM_REWARD_MODEL

  • NIM_REWARD_MODEL_STRING

Note

Most of these variables are not used with an SGLang backend.

New Additions#

The following environment variables are only available to the models with functional limitations listed at the start of this page.

  • NIM_TAGS_SELECTOR: Filters tags in the automatic profile selector. You can use a list of key-value pairs, where the key is the profile property name and the value is the desired property value. For example, set NIM_TAGS_SELECTOR="profile=latency" to automatically select the latency profile. Or set NIM_TAGS_SELECTOR="tp=4" to select a throughput profile that supports 4 GPUs.

  • DISABLE_RADIX_CACHE: Set to 1 to disable KV cache reuse.

  • NIM_ENABLE_MTP: Set to 1 to enable the LLM to generate several tokens at once, boosting speed, efficiency, and reasoning.

  • REASONING_PARSER: Set to 1 to turn thinking on.

  • TOOL_CALL_PARSER: Set to 1 to turn tool calling on.

  • NIM_CONFIG_FILE: Specifies a configuration YAML file for advanced parameter tuning. Use this file to overwrite the default NIM configuration values. You must convert the hyphens in server argument names to underscores. For example, the following SGLang command arguments:

    python -m sglang.launch_server --model-path XXX --tp-size 4 \
      --context-length 262144 --mem-fraction-static 0.8
    

    are defined by the following content in the configuration YAML file:

    tp_size: 4
    context_length: 262144
    mem_fraction_static: 0.8
    

    Default value: None.

API Compatibility#

The following API features are not supported:

  • logprobs

  • suffix

  • Guided decoding (including guided_whitespace_pattern and structured_generation)

  • Echo and role configuration

  • Reward

  • Llama API

  • nvext

nvext features are supported using different parameters in the top-level payload.

Security Features#

No changes to security features. These models maintain the same security features and capabilities as standard models. No additional security limitations or modifications apply.

Usage Changes and Features#

The container docker run command doesn’t support the -u $(id -u) parameter.

For air gap deployment, add the following parameters to the docker run command:

-e NIM_DISABLE_MODEL_DOWNLOAD=1 \
-v <local-model-path>:<model-weight-path> \
-e NIM_MODEL_PATH=<model-weight-path> \

No other changes to usage and features are needed. These models follow standard usage patterns and workflows. No changes to standard usage procedures are required beyond the differences specified on this page.