NIM LLM 1.x to 2.0 Migration Guide#

NIM LLM 2.0 is a new architecture built upon vLLM as the sole inference backend; thus, there are significant differences from the previous multi-backend NIM LLM 1.x architecture. In order to ease user migration, this guide highlights the key areas that require changes from the user perspective. To this end, a simple migration checklist is presented below; each item links to the relevant 2.0 documentation for full details.


Migration Checklist#

General#

CLI#

  • There are minor changes and additions to the CLI. Refer to CLI Reference for more details on the new CLI.

  • Notably, the nim-run CLI command is replaced with the new nim-serve command.

Configuration#

  • Migrate to the new configuration system. Refer to Advanced Configuration for details.

  • Audit environment variables against the 2.0 environment variable reference; many 1.x variables have been removed and replaced by natively-supported vLLM engine arguments.

  • Configure the vLLM engine by passing CLI arguments directly to nim-serve. Refer to the nim-serve section of CLI Reference for an example. Refer to the vLLM engine arguments for a full description of available configuration options.

API and Inference Requests#

  • Review the new API Reference and migrate clients to match the new vLLM-based request and response structure.

  • Notably, the nvext extension object in the request body is removed. Move sampling parameters from nvext to top-level request fields.

  • Refer to vLLM sampling parameters for the full list of supported sampling parameters.

Tool Calling#

Models and Profiles#

  • Replace the NIM_MODEL_NAME environment variable with NIM_MODEL_PATH to configure the model to download and serve. Refer to Model Download and Model-Free NIM for additional details.

  • Automatic profile selection now uses model memory estimation heuristics to choose a suitable profile. Refer to Model Profiles and Selection for details.

LoRA Support#

  • Convert LoRA adapters to HuggingFace PEFT format; NeMo-format adapters are no longer supported.

  • Refer to Fine-Tuning with LoRA for details on configuring NIM for LoRA.

Observability and Logging#