For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • Feature Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Benchmarking
    • Tool Calling & Reasoning Parsing
    • Fault Tolerance
    • Observability (Local)
    • Inference Simulation
    • Agents
    • LoRA Adapters
    • Multimodal
    • Diffusion
    • Fastokens Tokenizer
  • Backends
    • SGLang
    • TensorRT-LLM
      • Reference Guide
      • Examples
      • Observability
      • Diffusion (Experimental)
      • Known Issues and Mitigations
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Requirements
  • Supported Models
  • Quick Start
  • Video Diffusion
  • Launch worker
  • API Endpoint
  • Image Diffusion
  • Launch worker
  • API Endpoint
  • Configuration Options
  • Limitations
BackendsTensorRT-LLM

Video Diffusion Support (Experimental)

||View as Markdown|
Previous

Prometheus

Next

Known Issues and Mitigations

For general TensorRT-LLM features and configuration, see the Reference Guide.


Dynamo supports video generation using diffusion models through the --modality video_diffusion flag and image generation through --modality image_diffusion flag.

Requirements

  • TensorRT-LLM with visual_gen: The visual_gen module is part of TensorRT-LLM (tensorrt_llm._torch.visual_gen). Install TensorRT-LLM following the official instructions.
  • dynamo-runtime with multimodal API: The Dynamo runtime must include ModelType.Videos or ModelType.Images support. Ensure you’re using a compatible version.
  • VIDEO diffusion: imageio with ffmpeg: Required for encoding generated frames to MP4 video. The Dynamo TRT-LLM runtime container ships an LGPL-only ffmpeg CLI built with the NVIDIA NVENC H.264 encoder (h264_nvenc) and libvpx_vp9 for WebM, and points imageio at it via IMAGEIO_FFMPEG_EXE=/usr/local/bin/ffmpeg — the GPL-encumbered ffmpeg binary normally shipped inside the imageio-ffmpeg PyPI wheel is not installed. If you’re running outside the container, install the Python wrapper without the bundled binary and point it at your own ffmpeg:
    $pip install --no-binary imageio-ffmpeg "imageio[ffmpeg]"
    $export IMAGEIO_FFMPEG_EXE=/path/to/your/ffmpeg
    MP4 output requires an NVIDIA GPU at runtime (NVENC is a hardware encoder).

Supported Models

Diffusers PipelineDescriptionExample Model
WanPipelineWan 2.1/2.2 Text-to-VideoWan-AI/Wan2.1-T2V-1.3B-Diffusers
FluxPipelineFLUX Text-to-Imageblack-forest-labs/FLUX.1-dev

The pipeline type is auto-detected from the model’s model_index.json — no --model-type flag is needed.

Quick Start

Video Diffusion

Launch worker

$python -m dynamo.trtllm \
> --modality video_diffusion \
> --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
> --media-output-fs-url file:///tmp/dynamo_media

API Endpoint

Video generation uses the /v1/videos endpoint:

$curl -X POST http://localhost:8000/v1/videos \
> -H "Content-Type: application/json" \
> -d '{
> "prompt": "A cat playing piano",
> "model": "wan_t2v",
> "seconds": 4,
> "size": "832x480",
> "nvext": {
> "fps": 24
> }
> }'

Image Diffusion

Launch worker

$python -m dynamo.trtllm \
> --modality image_diffusion \
> --model-path black-forest-labs/FLUX.1-dev \
> --media-output-fs-url file:///tmp/dynamo_media

API Endpoint

Image generation uses the /v1/images/generations endpoint:

$curl -X POST http://localhost:8000/v1/images/generations \
> -H "Content-Type: application/json" \
> -d '{
> "prompt": "A cat playing piano",
> "model": "black-forest-labs/FLUX.1-dev",
> "size": "256x256"
> }'

Configuration Options

FlagDescriptionDefault
--media-output-fs-urlFilesystem URL for storing generated mediafile:///tmp/dynamo_media
--default-heightDefault image/video height480
--default-widthDefault image/video width832
--default-num-framesDefault frame count81
--default-num-images-per-promptDefault number of images per prompt1
--enable-teacacheEnable TeaCache optimizationFalse
--disable-torch-compileDisable torch.compileFalse

Limitations

  • Diffusion is experimental and not recommended for production use
  • Only text-to-video and text-to-image is supported in this release (image-to-video planned)
  • Requires GPU with sufficient VRAM for the diffusion model