Is this page helpful?

NIM LLM 1.x to 2.0 Migration Guide#

NIM LLM 2.0 uses a new architecture built on vLLM as the sole inference backend. As a result, it differs significantly from the previous multi-backend NIM LLM 1.x architecture. This guide highlights the main areas that require changes during migration and links to the relevant 2.0 documentation for full details.

Migration Checklist#

Use the following checklist to review the main migration tasks.

General#

Start with these high-level changes:

Update the image tag to 2.0.7.
Review the new vLLM-based architecture.

CLI#

Review the CLI changes that affect deployment and startup behavior:

There are minor changes and additions to the CLI. Refer to CLI Reference for more details on the new CLI.
Notably, the nim-run CLI command is replaced with the new nim-serve command.

Configuration#

Review configuration changes before you update deployment manifests or startup commands:

Migrate to the new configuration system. Refer to Advanced Configuration for details.
Audit environment variables against the 2.0 environment variable reference. Many 1.x variables have been removed and replaced by natively supported vLLM engine arguments.
Configure the vLLM engine by passing CLI arguments directly to nim-serve. Refer to the nim-serve section of CLI Reference for an example. Refer to the vLLM engine arguments for a full description of available configuration options.

API and Inference Requests#

Update client integrations to match the new request and response structure:

Review the new API Reference and migrate clients to match the new vLLM-based request and response structure.
Notably, the nvext extension object in the request body is removed. Move sampling parameters from nvext to top-level request fields.
Refer to vLLM sampling parameters for the full list of supported sampling parameters.

Tool Calling#

Review tool calling changes if your deployment uses function or tool invocation:

Configure tool calling explicitly by passing engine arguments directly to vLLM. Refer to vLLM tool calling documentation for details.

Models and Profiles#

Review model selection and profile behavior before you migrate existing deployments:

Replace the NIM_MODEL_NAME environment variable with NIM_MODEL_PATH to configure the model to download and serve. Refer to Model Download and Model-Free NIM for additional details.
Automatic profile selection now uses model memory estimation heuristics to choose a suitable profile. Refer to Model Profiles and Selection for details.

LoRA Support#

Review these changes if you deploy LoRA adapters:

Convert LoRA adapters to Hugging Face PEFT format. NeMo-format adapters are no longer supported.
Refer to Fine-Tuning with LoRA for details on configuring NIM for LoRA.

Observability and Logging#

Update logging and metrics collection to match the 2.0 runtime:

Migrate metric scraping to use the metrics that vLLM exposes.
Use NIM_LOG_LEVEL and NIM_JSONL_LOGGING for logging configuration.
Refer to Logging and Observability — Metrics for more details.