For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
    • Blog
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Quick Comparison
  • 1. vLLM Backend
  • 2. SGLang Backend
  • 3. TensorRT-LLM Backend
Resources

Feature Matrix

||View as Markdown|
Edit this page
Previous

Support Matrix

Next

Release Artifacts

This document provides a comprehensive compatibility matrix for key Dynamo features across the supported backends.

Updated for Dynamo v1.0.0

Legend:

  • ✅ : Supported
  • 🚧 : Work in Progress / Experimental / Limited

Quick Comparison

FeatureSGLangTensorRT-LLMvLLMSource
Disaggregated Serving✅✅✅Design Doc
KV-Aware Routing✅✅✅Router Doc
SLA-Based Planner✅✅✅Planner Doc
KV Block Manager🚧✅✅KVBM Doc
Multimodal (Image)✅✅✅Multimodal Doc
Multimodal (Video)✅Multimodal Doc
Multimodal (Audio)🚧Multimodal Doc
Request Migration✅🚧✅Migration Doc
Request Cancellation🚧✅✅Backend READMEs
LoRA✅K8s Guide
Tool Calling✅✅✅Tool Calling Doc
Speculative Decoding🚧✅✅Backend READMEs
Dynamo Snapshot✅✅Snapshot Docs

1. vLLM Backend

vLLM offers the broadest feature coverage in Dynamo, with full support for disaggregated serving, KV-aware routing, KV block management, LoRA adapters, and multimodal inference including video and audio.

Source: docs/backends/vllm/README.md

FeatureDisaggregated ServingKV-Aware RoutingSLA-Based PlannerKV Block ManagerMultimodalRequest MigrationRequest CancellationLoRATool CallingSpeculative Decoding
Disaggregated Serving—
KV-Aware Routing✅—
SLA-Based Planner✅✅—
KV Block Manager✅✅✅—
Multimodal✅1—✅—
Request Migration✅✅✅✅✅—
Request Cancellation✅✅✅✅✅✅—
LoRA✅✅2—✅—✅✅—
Tool Calling✅✅✅✅✅✅✅✅—
Speculative Decoding✅✅—✅—✅✅—✅—

Notes:

  1. Multimodal + KV-Aware Routing: The KV router uses token-based hashing and does not yet support image/video hashes, so it falls back to random/round-robin routing. (Source)
  2. KV-Aware LoRA Routing: vLLM supports routing requests based on LoRA adapter affinity.
  3. Audio Support: vLLM supports audio models like Qwen2-Audio (experimental). (Source)
  4. Video Support: vLLM supports video input with frame sampling. (Source)
  5. Speculative Decoding: Eagle3 support documented. (Source)

2. SGLang Backend

SGLang is optimized for high-throughput serving with fast primitives, providing robust support for disaggregated serving, KV-aware routing, and request migration.

Source: docs/backends/sglang/README.md

FeatureDisaggregated ServingKV-Aware RoutingSLA-Based PlannerKV Block ManagerMultimodalRequest MigrationRequest CancellationLoRATool CallingSpeculative Decoding
Disaggregated Serving—
KV-Aware Routing✅—
SLA-Based Planner✅✅—
KV Block Manager🚧🚧🚧—
Multimodal✅21—🚧—
Request Migration✅✅✅🚧✅—
Request Cancellation🚧3✅✅🚧🚧✅—
LoRA🚧—
Tool Calling✅✅✅🚧✅✅✅—
Speculative Decoding🚧🚧—🚧—🚧—🚧—

Notes:

  1. Multimodal + KV-Aware Routing: Not supported. (Source)
  2. Multimodal Patterns: Supports E/PD and E/P/D only (requires separate vision encoder). Does not support simple Aggregated (EPD) or Traditional Disagg (EP/D). (Source)
  3. Request Cancellation: Cancellation during the remote prefill phase is not supported in disaggregated mode. (Source)
  4. Speculative Decoding: Code hooks exist (spec_decode_stats in publisher), but no examples or documentation yet.

3. TensorRT-LLM Backend

TensorRT-LLM delivers maximum inference performance and optimization, with full KVBM integration and robust disaggregated serving support.

Source: docs/backends/trtllm/README.md

FeatureDisaggregated ServingKV-Aware RoutingSLA-Based PlannerKV Block ManagerMultimodalRequest MigrationRequest CancellationLoRATool CallingSpeculative Decoding
Disaggregated Serving—
KV-Aware Routing✅—
SLA-Based Planner✅✅—
KV Block Manager✅✅✅—
Multimodal✅12—✅—
Request Migration✅✅✅✅🚧—
Request Cancellation✅3✅3✅3✅3✅3✅3—
LoRA—
Tool Calling✅✅✅✅✅✅✅—
Speculative Decoding✅✅—✅—✅✅✅—

Notes:

  1. Multimodal Disaggregation: Fully supports EP/D (Traditional) pattern. E/P/D (Full Disaggregation) is WIP and currently supports pre-computed embeddings only. (Source)
  2. Multimodal + KV-Aware Routing: Not supported. The KV router currently tracks token-based blocks only. (Source)
  3. Request Cancellation: Due to known issues, the TensorRT-LLM engine is temporarily not notified of request cancellations, meaning allocated resources for cancelled requests are not freed.