NIM LLM 1.x to 2.0 Migration Guide#

NIM LLM 2.0 is a new architecture built upon vLLM as the sole inference backend; thus, there are significant differences from the previous multi-backend NIM LLM 1.x architecture. In order to ease user migration, this guide highlights the key areas that require changes from the user perspective. To this end, a simple migration checklist is presented below; each item links to the relevant 2.0 documentation for full details.

Migration Checklist#

General#

Update the image tag to 2.0.x.
Review the new vLLM-based architecture.

CLI#

There are minor changes and additions to the CLI. Refer to CLI Reference for more details on the new CLI.
Notably, the nim-run CLI command is replaced with the new nim-serve command.

Configuration#

Migrate to the new configuration system. Refer to Advanced Configuration for details.
Audit environment variables against the 2.0 environment variable reference; many 1.x variables have been removed and replaced by natively-supported vLLM engine arguments.
Configure the vLLM engine by passing CLI arguments directly to nim-serve. Refer to the nim-serve section of CLI Reference for an example. Refer to the vLLM engine arguments for a full description of available configuration options.

API and Inference Requests#

Review the new API Reference and migrate clients to match the new vLLM-based request and response structure.
Notably, the nvext extension object in the request body is removed. Move sampling parameters from nvext to top-level request fields.
Refer to vLLM sampling parameters for the full list of supported sampling parameters.

Tool Calling#

Configure tool calling explicitly by passing engine arguments directly to vLLM. Refer to vLLM tool calling documentation for details.

Models and Profiles#

Replace the NIM_MODEL_NAME environment variable with NIM_MODEL_PATH to configure the model to download and serve. Refer to Model Download and Model-Free NIM for additional details.
Automatic profile selection now uses model memory estimation heuristics to choose a suitable profile. Refer to Model Profiles and Selection for details.

LoRA Support#

Convert LoRA adapters to HuggingFace PEFT format; NeMo-format adapters are no longer supported.
Refer to Fine-Tuning with LoRA for details on configuring NIM for LoRA.

Observability and Logging#

Migrate metric scraping to adhere to metrics exposed by vLLM.
Use NIM_LOG_LEVEL and NIM_JSONL_LOGGING for logging configuration.
Refer to Logging and Observability — Metrics for more details.