Dynamo Feature Compatibility Matrices#

This document provides a comprehensive compatibility matrix for key Dynamo features across the supported backends.

Updated for Dynamo v0.9.0

Legend:

  • ✅ : Supported

  • 🚧 : Work in Progress / Experimental / Limited

Quick Comparison#

Feature

vLLM

TensorRT-LLM

SGLang

Source

Disaggregated Serving

Design Doc

KV-Aware Routing

Router Doc

SLA-Based Planner

Planner Doc

KV Block Manager

🚧

KVBM Doc

Multimodal (Image)

Multimodal Doc

Multimodal (Video)

Multimodal Doc

Multimodal (Audio)

🚧

Multimodal Doc

Request Migration

🚧

Migration Doc

Request Cancellation

🚧

Backend READMEs

LoRA

K8s Guide

Tool Calling

Tool Calling Doc

Speculative Decoding

🚧

Backend READMEs

1. vLLM Backend#

vLLM offers the broadest feature coverage in Dynamo, with full support for disaggregated serving, KV-aware routing, KV block management, LoRA adapters, and multimodal inference including video and audio.

Source: docs/backends/vllm/README.md

Feature

Disaggregated Serving

KV-Aware Routing

SLA-Based Planner

KV Block Manager

Multimodal

Request Migration

Request Cancellation

LoRA

Tool Calling

Speculative Decoding

Disaggregated Serving

KV-Aware Routing

SLA-Based Planner

KV Block Manager

Multimodal

1

Request Migration

Request Cancellation

LoRA

2

Tool Calling

Speculative Decoding

Notes:

  1. Multimodal + KV-Aware Routing: The KV router uses token-based hashing and does not yet support image/video hashes, so it falls back to random/round-robin routing. (Source)

  2. KV-Aware LoRA Routing: vLLM supports routing requests based on LoRA adapter affinity.

  3. Audio Support: vLLM supports audio models like Qwen2-Audio (experimental). (Source)

  4. Video Support: vLLM supports video input with frame sampling. (Source)

  5. Speculative Decoding: Eagle3 support documented. (Source)

2. SGLang Backend#

SGLang is optimized for high-throughput serving with fast primitives, providing robust support for disaggregated serving, KV-aware routing, and request migration.

Source: docs/backends/sglang/README.md

Feature

Disaggregated Serving

KV-Aware Routing

SLA-Based Planner

KV Block Manager

Multimodal

Request Migration

Request Cancellation

LoRA

Tool Calling

Speculative Decoding

Disaggregated Serving

KV-Aware Routing

SLA-Based Planner

KV Block Manager

🚧

🚧

🚧

Multimodal

2

1

🚧

Request Migration

🚧

Request Cancellation

🚧3

🚧

🚧

LoRA

🚧

Tool Calling

🚧

Speculative Decoding

🚧

🚧

🚧

🚧

🚧

Notes:

  1. Multimodal + KV-Aware Routing: Not supported. (Source)

  2. Multimodal Patterns: Supports E/PD and E/P/D only (requires separate vision encoder). Does not support simple Aggregated (EPD) or Traditional Disagg (EP/D). (Source)

  3. Request Cancellation: Cancellation during the remote prefill phase is not supported in disaggregated mode. (Source)

  4. Speculative Decoding: Code hooks exist (spec_decode_stats in publisher), but no examples or documentation yet.

3. TensorRT-LLM Backend#

TensorRT-LLM delivers maximum inference performance and optimization, with full KVBM integration and robust disaggregated serving support.

Source: docs/backends/trtllm/README.md

Feature

Disaggregated Serving

KV-Aware Routing

SLA-Based Planner

KV Block Manager

Multimodal

Request Migration

Request Cancellation

LoRA

Tool Calling

Speculative Decoding

Disaggregated Serving

KV-Aware Routing

SLA-Based Planner

KV Block Manager

Multimodal

1

2

Request Migration

🚧3

🚧

Request Cancellation

LoRA

Tool Calling

Speculative Decoding

Notes:

  1. Multimodal Disaggregation: Fully supports EP/D (Traditional) pattern. E/P/D (Full Disaggregation) is WIP and currently supports pre-computed embeddings only. (Source)

  2. Multimodal + KV-Aware Routing: Not supported. The KV router currently tracks token-based blocks only. (Source)

  3. Request Migration: Supported on Decode/Aggregated workers only. Prefill workers do not support migration. (Source)

  4. Speculative Decoding: Llama 4 + Eagle support documented. (Source)


Source References#