Dynamo vLLM integrates vLLM engines into Dynamo’s distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with vLLM’s native engine arguments. Dynamo leverages vLLM’s native KV cache events, NIXL-based transfer mechanisms, and metric reporting to enable KV-aware routing and P/D disaggregation.
We recommend using uv to install:
This installs Dynamo with the compatible vLLM version.
We have public images available on NGC Catalog:
For development, use the devcontainer which has all dependencies pre-installed.
Start infrastructure services for local development:
Launch an aggregated serving deployment: