Compatibility | NVIDIA Dynamo Documentation

Compatibility by version

Dynamo v1.3.0GA release

Dynamo version

Released Jul 20, 2026 · Release notes · UCX 1.20.x

SGLang0.5.14

NIXL1.3.0

CUDA 13.0Driver 580.xx+

TensorRT-LLM1.3.0rc19

NIXL1.0.1

CUDA 13.1Driver 580.xx+

vLLM0.23.0

NIXL1.1.0

CUDA 13.0Driver 580.xx+

GPU

BlackwellHopperAda LovelaceAmpere

Ubuntu 24.04Ubuntu 22.04CentOS Stream 9 · experimental

Arch

x86_64ARM64 (Ubuntu 24.04 only)

CUDA 12 container images discontinued; EFA variants go multi-arch as -efa; GA wheels published as 1.3.0.post1 (containers stay :1.3.0); UCX 1.20.x.

Backend versions listed are the versions tested and supported for the selected release. TensorRT-LLM does not support Python 3.11.

For extended driver compatibility beyond the listed minimums, including forward compatibility and cuda-compat packages, see the CUDA Compatibility documentation.

See Release Artifacts for the full artifact inventory — container images, wheels, Helm charts, and crates — and Model Early Access Builds for per-model early access container builds.

Find a Compatible Release

Pick your backend and the CUDA generation your host driver supports to see which Dynamo releases you can run — and what to pull for the current one.

What runs where

Releases that match your backend and driver

Backend

SGLangTensorRT-LLMvLLM

CUDA driver situation

CUDA 13 driver (580.xx+)CUDA 12 driver (575.xx+)

v1.3.0CUDA 13.0driver 580.xx+Current

v1.2.1CUDA 13.0driver 580.xx+

v1.2.0CUDA 13.0driver 580.xx+

v1.1.1CUDA 13.0driver 580.xx+

v1.1.0CUDA 13.0driver 580.xx+

v1.0.2CUDA 13.0driver 580.xx+

v1.0.1CUDA 13.0driver 580.xx+

v1.0.0CUDA 13.0driver 580.xx+

v0.8.1CUDA 13.0Experimental imagedriver 580.xx+

v0.8.0CUDA 13.0Experimental imagedriver 580.xx+

v1.2.1CUDA 12.9driver 575.xx+

v1.2.0CUDA 12.9driver 575.xx+

v1.1.1CUDA 12.9driver 575.xx+

v1.1.0CUDA 12.9driver 575.xx+

v1.0.2CUDA 12.9driver 575.xx+

v1.0.1CUDA 12.9driver 575.xx+

v1.0.0CUDA 12.9driver 575.xx+

v0.9.1CUDA 12.9driver 575.xx+

v0.9.0CUDA 12.9driver 575.xx+

v0.8.1CUDA 12.9driver 575.xx+

v0.8.0CUDA 12.9driver 575.xx+

v0.7.1CUDA 12.8driver 570.xx+

v0.7.0CUDA 12.9driver 575.xx+

v1.3.0CUDA 13.1driver 580.xx+Current

v1.2.1CUDA 13.1driver 580.xx+

v1.2.0CUDA 13.1driver 580.xx+

v1.1.1CUDA 13.1driver 580.xx+

v1.1.0CUDA 13.1driver 580.xx+

v1.0.2CUDA 13.1driver 580.xx+

v1.0.1CUDA 13.1driver 580.xx+

v1.0.0CUDA 13.1driver 580.xx+

v0.9.1CUDA 13.0driver 580.xx+

v0.9.0CUDA 13.0driver 580.xx+

v0.8.1CUDA 13.0driver 580.xx+

v0.8.0CUDA 13.0driver 580.xx+

v0.7.1CUDA 13.0driver 580.xx+

v0.7.0CUDA 13.0driver 580.xx+

No release ships TensorRT-LLM for a CUDA 12 driver. TensorRT-LLM ships CUDA 13 images only — switch the driver filter.

v1.3.0CUDA 13.0driver 580.xx+Current

v1.2.1CUDA 13.0driver 580.xx+

v1.2.0CUDA 13.0driver 580.xx+

v1.1.1CUDA 13.0driver 580.xx+

v1.1.0CUDA 13.0driver 580.xx+

v1.0.2CUDA 13.0driver 580.xx+

v1.0.1CUDA 13.0driver 580.xx+

v1.0.0CUDA 13.0driver 580.xx+

v0.8.1CUDA 13.0Experimental imagedriver 580.xx+

v0.8.0CUDA 13.0Experimental imagedriver 580.xx+

v1.2.1CUDA 12.9driver 575.xx+

v1.2.0CUDA 12.9driver 575.xx+

v1.1.1CUDA 12.9driver 575.xx+

v1.1.0CUDA 12.9driver 575.xx+

v1.0.2CUDA 12.9driver 575.xx+

v1.0.1CUDA 12.9driver 575.xx+

v1.0.0CUDA 12.9driver 575.xx+

v0.9.1CUDA 12.9driver 575.xx+

v0.9.0CUDA 12.9driver 575.xx+

v0.8.1CUDA 12.9driver 575.xx+

v0.8.0CUDA 12.9driver 575.xx+

v0.7.1CUDA 12.9driver 575.xx+

v0.7.0CUDA 12.8driver 570.xx+

Driver floors from the CUDA & driver history; pull commands shown for the current release only.

Platform Notes

Dynamo ships multi-arch (x86_64 + ARM64) container images. Wheels are built in a manylinux_2_28-compatible environment and validated on CentOS Stream 9 and Ubuntu 22.04/24.04; other Linux distributions are expected to work but are not officially verified.

Cloud Service Providers

Amazon Linux 2023 (AWS) · x86_64 · Supported

AL2023 TensorRT-LLM limitation: there is a known issue with the TensorRT-LLM framework when running the AL2023 container locally with docker run --network host ... due to a bug in mpi4py. Replace the --network host flag with precise networking configuration by mapping only the necessary ports (4222 for NATS, 2379/2380 for etcd, 8000 for the frontend).

Feature Support

Feature support by backend

SupportedCaveatExperimentalNot supported

SGLang8 / 13

TRT-LLM9 / 13

vLLM12 / 13

Disaggregated Serving

KV-Aware Routing

SLA-Based Planner

KV Block Manager

Multimodal (Image)

⁴

⁵

Multimodal (Video)

—

⁶

Multimodal (Audio)

—

⁷

Request Migration

⁸

Request Cancellation

⁹

¹⁰

LoRA

—

¹¹

Tool Calling

Speculative Decoding

¹²

¹³

Dynamo Snapshot

—

Disaggregated Serving · vLLM: Prefill/decode separation with NIXL KV transfer
KV Block Manager · SGLang: Work in progress across all combinations
Multimodal (Image) · SGLang: Not compatible with KV-aware routing. Disagg patterns: EPD, E/PD, E/P/D (not traditional EP/D)
Multimodal (Image) · TRT-LLM: Image URLs + pre-computed embeddings. Disagg: EP/D + E/P/D. KV-aware routing via dedicated MM Router Worker (requires KV event publishing)
Multimodal (Image) · vLLM: With KV-aware routing, image-aware routing on documented paths
Multimodal (Video) · vLLM: Video input with frame sampling
Multimodal (Audio) · vLLM: Qwen2-Audio, experimental
Request Migration · TRT-LLM: Work in progress with multimodal
Request Cancellation · SGLang: Remote-prefill-phase cancellation not supported in disaggregated mode
Request Cancellation · TRT-LLM: Engine temporarily not notified of cancellations — resources for cancelled requests are not freed (known issue)
LoRA · vLLM: Dynamic load/unload; KV-aware routing supports adapter affinity
Speculative Decoding · SGLang: Code hooks exist; no examples or docs yet
Speculative Decoding · vLLM: Eagle3

Superscripts reference the numbered notes above; full per-backend detail follows.

Per-Backend Detail

vLLM

SGLang

TensorRT-LLM

vLLM offers the broadest feature coverage in Dynamo, with full support for disaggregated serving, KV-aware routing, KV block management, LoRA adapters, and multimodal inference including video and audio.

Source: docs/backends/vllm/README.md

Feature	Supported?	Notes
Disaggregated Serving	✓	Prefill/decode separation with NIXL KV transfer
KV-Aware Routing	✓
SLA-Based Planner	✓
KV Block Manager	✓
Multimodal	✓	Image + video; audio experimental (Qwen2-Audio). With KV-aware routing, image-aware routing on documented paths (Source)
Request Migration	✓
Request Cancellation	✓
LoRA	✓	Dynamic load/unload; KV-aware routing supports adapter affinity
Tool Calling	✓
Speculative Decoding	✓	Eagle3 (Source)

Feature Interactions

Pairwise feature-by-feature compatibility within each backend. Each cell reports whether the row feature works together with the column feature. A — marks the diagonal or a combination that does not apply; blank cells are the mirror of the populated lower triangle.

Legend: ✓ Supported · WIP Work in Progress / Experimental / Limited

vLLM Feature Interactions

Feature	Disaggregated Serving	KV-Aware Routing	SLA-Based Planner	KV Block Manager	Multimodal	Request Migration	Request Cancellation	LoRA	Tool Calling	Speculative Decoding
Disaggregated Serving	—
KV-Aware Routing	✅	—
SLA-Based Planner	✅	✅	—
KV Block Manager	✅	✅	✅	—
Multimodal	✅	✅¹	—	✅	—
Request Migration	✅	✅	✅	✅	✅	—
Request Cancellation	✅	✅	✅	✅	✅	✅	—
LoRA	✅	✅²	—	✅	—	✅	✅	—
Tool Calling	✅	✅	✅	✅	✅	✅	✅	✅	—
Speculative Decoding	✅	✅	—	✅	—	✅	✅	—	✅	—

Notes:

Multimodal + KV-Aware Routing: Image-aware KV routing is supported in the documented vLLM paths. The default Rust frontend path supports model families handled by llm-multimodal; the Python chat-processor path delegates to vLLM’s multimodal processor. (Source)

KV-Aware LoRA Routing: vLLM supports routing requests based on LoRA adapter affinity.

Audio Support: vLLM supports audio models like Qwen2-Audio (experimental). (Source)

Video Support: vLLM supports video input with frame sampling. (Source)

Speculative Decoding: Eagle3 support documented. (Source)

SGLang Feature Interactions

Feature	Disaggregated Serving	KV-Aware Routing	SLA-Based Planner	KV Block Manager	Multimodal	Request Migration	Request Cancellation	LoRA	Tool Calling	Speculative Decoding
Disaggregated Serving	—
KV-Aware Routing	✅	—
SLA-Based Planner	✅	✅	—
KV Block Manager	🚧	🚧	🚧	—
Multimodal	✅²	¹	—	🚧	—
Request Migration	✅	✅	✅	🚧	✅	—
Request Cancellation	🚧³	✅	✅	🚧	🚧	✅	—
LoRA				🚧				—
Tool Calling	✅	✅	✅	🚧	✅	✅	✅		—
Speculative Decoding	🚧	🚧	—	🚧	—	🚧	—		🚧	—

Notes:

Multimodal + KV-Aware Routing: Not supported. (Source)

Multimodal Patterns: Supports simple Aggregated EPD, E/PD, and E/P/D patterns. Traditional Disagg EP/D is not supported. (Source)

Request Cancellation: Cancellation during the remote prefill phase is not supported in disaggregated mode. (Source)

Speculative Decoding: Code hooks exist (spec_decode_stats in publisher), but no examples or documentation yet.

TensorRT-LLM Feature Interactions

Feature	Disaggregated Serving	KV-Aware Routing	SLA-Based Planner	KV Block Manager	Multimodal	Request Migration	Request Cancellation	LoRA	Tool Calling	Speculative Decoding
Disaggregated Serving	—
KV-Aware Routing	✅	—
SLA-Based Planner	✅	✅	—
KV Block Manager	✅	✅	✅	—
Multimodal	✅¹	✅²	—	✅	—
Request Migration	✅	✅	✅	✅	🚧	—
Request Cancellation	✅³	✅³	✅³	✅³	✅³	✅³	—
LoRA								—
Tool Calling	✅	✅	✅	✅	✅	✅	✅		—
Speculative Decoding	✅	✅	—	✅	—	✅	✅		✅	—

Notes:

Multimodal Disaggregation: Supports EP/D (Traditional) and E/P/D (Full Disaggregation) image flows, including image URLs and pre-computed embeddings. (Source)

Multimodal + KV-Aware Routing: The native Rust frontend routes supported models using image-aware KV overlap. TRT-LLM workers must publish KV events with block reuse enabled. (Source)

Request Cancellation: Due to known issues, the TensorRT-LLM engine is temporarily not notified of request cancellations, meaning allocated resources for cancelled requests are not freed.