NIM LLM 1.x to 2.0 Migration Guide#
NIM LLM 2.0 is a new architecture built upon vLLM as the sole inference backend; thus, there are significant differences from the previous multi-backend NIM LLM 1.x architecture. In order to ease user migration, this guide highlights the key areas that require changes from the user perspective. To this end, a simple migration checklist is presented below; each item links to the relevant 2.0 documentation for full details.
Migration Checklist#
General#
Update the image tag to 2.0.x.
Review the new vLLM-based architecture.
CLI#
There are minor changes and additions to the CLI. Refer to CLI Reference for more details on the new CLI.
Notably, the
nim-runCLI command is replaced with the newnim-servecommand.
Configuration#
Migrate to the new configuration system. Refer to Advanced Configuration for details.
Audit environment variables against the 2.0 environment variable reference; many 1.x variables have been removed and replaced by natively-supported vLLM engine arguments.
Configure the vLLM engine by passing CLI arguments directly to
nim-serve. Refer to thenim-servesection of CLI Reference for an example. Refer to the vLLM engine arguments for a full description of available configuration options.
API and Inference Requests#
Review the new API Reference and migrate clients to match the new vLLM-based request and response structure.
Notably, the
nvextextension object in the request body is removed. Move sampling parameters fromnvextto top-level request fields.Refer to vLLM sampling parameters for the full list of supported sampling parameters.
Tool Calling#
Configure tool calling explicitly by passing engine arguments directly to vLLM. Refer to vLLM tool calling documentation for details.
Models and Profiles#
Replace the
NIM_MODEL_NAMEenvironment variable withNIM_MODEL_PATHto configure the model to download and serve. Refer to Model Download and Model-Free NIM for additional details.Automatic profile selection now uses model memory estimation heuristics to choose a suitable profile. Refer to Model Profiles and Selection for details.
LoRA Support#
Convert LoRA adapters to HuggingFace PEFT format; NeMo-format adapters are no longer supported.
Refer to Fine-Tuning with LoRA for details on configuring NIM for LoRA.
Observability and Logging#
Migrate metric scraping to adhere to metrics exposed by vLLM.
Use
NIM_LOG_LEVELandNIM_JSONL_LOGGINGfor logging configuration.Refer to Logging and Observability — Metrics for more details.