For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
      • FastVideo
      • SGLang Diffusion
      • TRT-LLM Diffusion
      • vLLM-Omni
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
    • Blog
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Overview
  • LLM Diffusion
  • Launch
  • Test
  • Image Diffusion
  • Launch
  • Test
  • Video Generation
  • Launch
  • Test
  • See Also
User GuidesDiffusion (Preview)

Diffusion

||View as Markdown|
Edit this page
Previous

FastVideo

Next

Video Diffusion Support (Experimental)

Dynamo SGLang supports three types of diffusion-based generation: LLM diffusion (text generation via iterative refinement), image diffusion (text-to-image), and video generation (text-to-video). Each uses a different worker flag and handler, but all integrate with SGLang’s DiffGenerator.

Overview

TypeWorker FlagAPI Endpoint
LLM Diffusion--dllm-algorithm <algo>/v1/chat/completions, /v1/completions
Image Diffusion--image-diffusion-worker/v1/images/generations
Video Generation--video-generation-worker/v1/videos

If you see a CuDNN version mismatch error on startup (cuDNN frontend 1.8.1 requires cuDNN lib >= 9.5.0), set SGLANG_DISABLE_CUDNN_CHECK=1 before launching. This is common when PyTorch ships a CuDNN version older than what SGLang requires for Conv3d operations.

LLM Diffusion

Diffusion Language Models generate text through iterative refinement rather than autoregressive token-by-token generation. The model starts with masked tokens and progressively replaces them with predictions, refining low-confidence tokens each step.

LLM diffusion is auto-detected: when --dllm-algorithm is set, the worker automatically uses DiffusionWorkerHandler without needing a separate flag. For more details on diffusion algorithms, see the SGLang Diffusion Language Models documentation.

Launch

$cd $DYNAMO_HOME/examples/backends/sglang
$./launch/diffusion_llada.sh

See the launch script for configuration options.

Test

$curl -X POST http://localhost:8001/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "inclusionAI/LLaDA2.0-mini-preview",
> "messages": [{"role": "user", "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"}],
> "temperature": 0.7,
> "max_tokens": 512
> }'

Image Diffusion

Image diffusion workers generate images from text prompts using SGLang’s DiffGenerator. Generated images are returned as either URLs (when using --media-output-fs-url for storage) or base64 data, in an OpenAI-compatible response format.

Launch

$cd $DYNAMO_HOME/examples/backends/sglang
$./launch/image_diffusion.sh

Supports local storage (--fs-url file:///tmp/images) and S3 (--fs-url s3://bucket). Pass --http-url to set the base URL for serving stored images. See the launch script for all configuration options.

Test

$curl http://localhost:8000/v1/images/generations \
> -H "Content-Type: application/json" \
> -d '{
> "model": "black-forest-labs/FLUX.1-dev",
> "prompt": "Explain why Roger Federer is considered one of the greatest tennis players of all time",
> "size": "1024x1024",
> "response_format": "url",
> "nvext": {
> "num_inference_steps": 15
> }
> }'

Video Generation

Video generation workers produce videos from text or image prompts using SGLang’s DiffGenerator with frame-to-video encoding. Supports text-to-video (T2V) and image-to-video (I2V) workflows.

Launch

$cd $DYNAMO_HOME/examples/backends/sglang
$./launch/text-to-video-diffusion.sh

Use --wan-size 1b (default, 1 GPU) or --wan-size 14b (2 GPUs). See the launch script for all configuration options.

Test

$curl http://localhost:8000/v1/videos \
> -H "Content-Type: application/json" \
> -d '{
> "prompt": "Roger Federer winning his 19th grand slam",
> "model": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
> "seconds": 2,
> "size": "832x480",
> "response_format": "url",
> "nvext": {
> "fps": 8,
> "num_frames": 17,
> "num_inference_steps": 50
> }
> }'

See Also

  • Examples: Launch scripts for all deployment patterns
  • Reference Guide: Worker types and argument reference
  • SGLang Diffusion LMs (upstream): SGLang diffusion documentation