Dynamo Feature Compatibility Matrices#
This document provides a comprehensive compatibility matrix for key Dynamo features across the supported backends.
Updated for Dynamo v0.9.0
Legend:
✅ : Supported
🚧 : Work in Progress / Experimental / Limited
Quick Comparison#
Feature |
vLLM |
TensorRT-LLM |
SGLang |
Source |
|---|---|---|---|---|
Disaggregated Serving |
✅ |
✅ |
✅ |
|
KV-Aware Routing |
✅ |
✅ |
✅ |
|
SLA-Based Planner |
✅ |
✅ |
✅ |
|
KV Block Manager |
✅ |
✅ |
🚧 |
|
Multimodal (Image) |
✅ |
✅ |
✅ |
|
Multimodal (Video) |
✅ |
|||
Multimodal (Audio) |
🚧 |
|||
Request Migration |
✅ |
🚧 |
✅ |
|
Request Cancellation |
✅ |
✅ |
🚧 |
Backend READMEs |
LoRA |
✅ |
|||
Tool Calling |
✅ |
✅ |
✅ |
|
Speculative Decoding |
✅ |
✅ |
🚧 |
Backend READMEs |
1. vLLM Backend#
vLLM offers the broadest feature coverage in Dynamo, with full support for disaggregated serving, KV-aware routing, KV block management, LoRA adapters, and multimodal inference including video and audio.
Source: docs/backends/vllm/README.md
Feature |
Disaggregated Serving |
KV-Aware Routing |
SLA-Based Planner |
KV Block Manager |
Multimodal |
Request Migration |
Request Cancellation |
LoRA |
Tool Calling |
Speculative Decoding |
|---|---|---|---|---|---|---|---|---|---|---|
Disaggregated Serving |
— |
|||||||||
KV-Aware Routing |
✅ |
— |
||||||||
SLA-Based Planner |
✅ |
✅ |
— |
|||||||
KV Block Manager |
✅ |
✅ |
✅ |
— |
||||||
Multimodal |
✅ |
1 |
— |
✅ |
— |
|||||
Request Migration |
✅ |
✅ |
✅ |
✅ |
✅ |
— |
||||
Request Cancellation |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
— |
|||
LoRA |
✅ |
✅2 |
— |
✅ |
— |
✅ |
✅ |
— |
||
Tool Calling |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
— |
|
Speculative Decoding |
✅ |
✅ |
— |
✅ |
— |
✅ |
✅ |
— |
✅ |
— |
Notes:
Multimodal + KV-Aware Routing: The KV router uses token-based hashing and does not yet support image/video hashes, so it falls back to random/round-robin routing. (Source)
KV-Aware LoRA Routing: vLLM supports routing requests based on LoRA adapter affinity.
Audio Support: vLLM supports audio models like Qwen2-Audio (experimental). (Source)
Video Support: vLLM supports video input with frame sampling. (Source)
Speculative Decoding: Eagle3 support documented. (Source)
2. SGLang Backend#
SGLang is optimized for high-throughput serving with fast primitives, providing robust support for disaggregated serving, KV-aware routing, and request migration.
Source: docs/backends/sglang/README.md
Feature |
Disaggregated Serving |
KV-Aware Routing |
SLA-Based Planner |
KV Block Manager |
Multimodal |
Request Migration |
Request Cancellation |
LoRA |
Tool Calling |
Speculative Decoding |
|---|---|---|---|---|---|---|---|---|---|---|
Disaggregated Serving |
— |
|||||||||
KV-Aware Routing |
✅ |
— |
||||||||
SLA-Based Planner |
✅ |
✅ |
— |
|||||||
KV Block Manager |
🚧 |
🚧 |
🚧 |
— |
||||||
Multimodal |
✅2 |
1 |
— |
🚧 |
— |
|||||
Request Migration |
✅ |
✅ |
✅ |
🚧 |
✅ |
— |
||||
Request Cancellation |
🚧3 |
✅ |
✅ |
🚧 |
🚧 |
✅ |
— |
|||
LoRA |
🚧 |
— |
||||||||
Tool Calling |
✅ |
✅ |
✅ |
🚧 |
✅ |
✅ |
✅ |
— |
||
Speculative Decoding |
🚧 |
🚧 |
— |
🚧 |
— |
🚧 |
— |
🚧 |
— |
Notes:
Multimodal + KV-Aware Routing: Not supported. (Source)
Multimodal Patterns: Supports E/PD and E/P/D only (requires separate vision encoder). Does not support simple Aggregated (EPD) or Traditional Disagg (EP/D). (Source)
Request Cancellation: Cancellation during the remote prefill phase is not supported in disaggregated mode. (Source)
Speculative Decoding: Code hooks exist (
spec_decode_statsin publisher), but no examples or documentation yet.
3. TensorRT-LLM Backend#
TensorRT-LLM delivers maximum inference performance and optimization, with full KVBM integration and robust disaggregated serving support.
Source: docs/backends/trtllm/README.md
Feature |
Disaggregated Serving |
KV-Aware Routing |
SLA-Based Planner |
KV Block Manager |
Multimodal |
Request Migration |
Request Cancellation |
LoRA |
Tool Calling |
Speculative Decoding |
|---|---|---|---|---|---|---|---|---|---|---|
Disaggregated Serving |
— |
|||||||||
KV-Aware Routing |
✅ |
— |
||||||||
SLA-Based Planner |
✅ |
✅ |
— |
|||||||
KV Block Manager |
✅ |
✅ |
✅ |
— |
||||||
Multimodal |
✅1 |
2 |
— |
✅ |
— |
|||||
Request Migration |
🚧3 |
✅ |
✅ |
✅ |
🚧 |
— |
||||
Request Cancellation |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
— |
|||
LoRA |
— |
|||||||||
Tool Calling |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
— |
||
Speculative Decoding |
✅ |
✅ |
— |
✅ |
— |
✅ |
✅ |
✅ |
— |
Notes:
Multimodal Disaggregation: Fully supports EP/D (Traditional) pattern. E/P/D (Full Disaggregation) is WIP and currently supports pre-computed embeddings only. (Source)
Multimodal + KV-Aware Routing: Not supported. The KV router currently tracks token-based blocks only. (Source)
Request Migration: Supported on Decode/Aggregated workers only. Prefill workers do not support migration. (Source)
Speculative Decoding: Llama 4 + Eagle support documented. (Source)