NIM LLM 1.x to 2.0 Migration Guide#

NIM LLM 2.0 uses a new architecture built on vLLM as the sole inference backend. As a result, it differs significantly from the previous multi-backend NIM LLM 1.x architecture. This guide highlights the main areas that require changes during migration and links to the relevant 2.0 documentation for full details.

Migration Checklist#

Use the following checklist to review the main migration tasks.

General#

Start with these high-level changes:

CLI#

Review the CLI changes that affect deployment and startup behavior:

  • There are minor changes and additions to the CLI. Refer to CLI Reference for more details on the new CLI.

  • Notably, the nim-run CLI command is replaced with the new nim-serve command.

Configuration#

Review configuration changes before you update deployment manifests or startup commands:

  • Migrate to the new configuration system. Refer to Advanced Configuration for details.

  • Audit environment variables against the 2.0 environment variable reference. Many 1.x variables have been removed and replaced by natively supported vLLM engine arguments.

  • Configure the vLLM engine by passing CLI arguments directly to nim-serve. Refer to the nim-serve section of CLI Reference for an example. Refer to the vLLM engine arguments for a full description of available configuration options.

API and Inference Requests#

Update client integrations to match the new request and response structure:

  • Review the new API Reference and migrate clients to match the new vLLM-based request and response structure.

  • Notably, the nvext extension object in the request body is removed. Move sampling parameters from nvext to top-level request fields.

  • Refer to vLLM sampling parameters for the full list of supported sampling parameters.

Tool Calling#

Review tool calling changes if your deployment uses function or tool invocation:

Models and Profiles#

Review model selection and profile behavior before you migrate existing deployments:

  • Replace the NIM_MODEL_NAME environment variable with NIM_MODEL_PATH to configure the model to download and serve. Refer to Model Download and Model-Free NIM for additional details.

  • Automatic profile selection now uses model memory estimation heuristics to choose a suitable profile. Refer to Model Profiles and Selection for details.

LoRA Support#

Review these changes if you deploy LoRA adapters:

  • Convert LoRA adapters to Hugging Face PEFT format. NeMo-format adapters are no longer supported.

  • Refer to Fine-Tuning with LoRA for details on configuring NIM for LoRA.

Observability and Logging#

Update logging and metrics collection to match the 2.0 runtime: