For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
      • Reference Guide
      • Chat Processor
      • Examples
      • Disaggregation
      • Diffusion
      • Observability
      • Agentic Workloads
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
    • Blog
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Use the Latest Release
  • Installation
  • Install Latest Release
  • Install for Development
  • Docker
  • Feature Support Matrix
  • Quick Start
  • Python / CLI Deployment
  • Kubernetes Deployment
  • Next Steps
Backends

SGLang

||View as Markdown|
Edit this page
Previous

Writing Python Workers in Dynamo

Next

Reference Guide

Use the Latest Release

We recommend using the latest stable release of Dynamo to avoid breaking changes.


Dynamo SGLang integrates SGLang engines into Dynamo’s distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with SGLang’s native engine arguments. It supports LLM inference, embedding models, multimodal vision models, and diffusion-based generation (LLM, image, video).

Installation

Install Latest Release

We recommend using uv to install:

$uv venv --python 3.12 --seed
$uv pip install "ai-dynamo[sglang]"

This installs Dynamo with the compatible SGLang version.

Install for Development

Development installation

Requires Rust and the CUDA toolkit (nvcc).

$# install dynamo
$uv venv --python 3.12 --seed
$uv pip install maturin nixl
$cd $DYNAMO_HOME/lib/bindings/python
$maturin develop --uv
$cd $DYNAMO_HOME
$uv pip install -e .
$# install sglang
$git clone https://github.com/sgl-project/sglang.git
$cd sglang && uv pip install -e "python"

This is the ideal way for agents to also develop. You can provide the path to both repos and the virtual environment and have it rerun these commands as it makes changes

Docker

Build and run container
$cd $DYNAMO_ROOT
$python container/render.py --framework sglang --output-short-filename
$docker build -f container/rendered.Dockerfile -t dynamo:latest-sglang .
$docker run \
> --gpus all -it --rm \
> --network host --shm-size=10G \
> --ulimit memlock=-1 --ulimit stack=67108864 \
> --ulimit nofile=65536:65536 \
> --cap-add CAP_SYS_PTRACE --ipc host \
> dynamo:latest-sglang

Feature Support Matrix

FeatureStatusNotes
Disaggregated Serving✅Prefill/decode separation with NIXL KV transfer
KV-Aware Routing✅
SLA-Based Planner✅
Multimodal Support✅Image via EPD, E/PD, E/P/D patterns
Diffusion Models✅LLM diffusion, image, and video generation
Request Cancellation✅Aggregated full; disaggregated decode-only
Graceful Shutdown✅Discovery unregister + grace period
Observability✅Metrics, tracing, and Grafana dashboards
KVBM❌Planned

Quick Start

Python / CLI Deployment

Start infrastructure services for local development:

$docker compose -f deploy/docker-compose.yml up -d

Launch an aggregated serving deployment:

$cd $DYNAMO_HOME/examples/backends/sglang
$./launch/agg.sh

Verify the deployment:

$curl localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "Qwen/Qwen3-0.6B",
> "messages": [{"role": "user", "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"}],
> "stream": true,
> "max_tokens": 30
> }'

Kubernetes Deployment

You can deploy SGLang with Dynamo on Kubernetes using a DynamoGraphDeployment. For more details, see the SGLang Kubernetes Deployment Guide.

Next Steps

  • Reference Guide: Worker types, architecture, and configuration
  • Examples: All deployment patterns with launch scripts
  • Disaggregation: P/D architecture and KV transfer details
  • Diffusion: LLM, image, and video diffusion models
  • Observability: Metrics, tracing, and Grafana dashboards
  • Deploying SGLang with Dynamo on Kubernetes: Kubernetes deployment guide