For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Agents
    • Multimodal
      • Embedding Cache
      • Encoder Disaggregation
      • Multimodal KV Routing
    • Diffusion
    • LoRA Adapters
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Key Features
  • Support Matrix
  • Security: URL Validation
  • Example Workflows
  • Backend Documentation
User Guides

Multimodal Model Serving

Deploy multimodal models with image, video, and audio support in Dynamo
||View as Markdown|
Previous

SGLang for Agentic Workloads

Next

Embedding Cache

Dynamo supports multimodal inference across multiple LLM backends, enabling models to process images, video, and audio alongside text.

Security Requirement: Multimodal processing must be explicitly enabled at startup. See the relevant backend documentation (vLLM, SGLang, TRT-LLM) for the necessary flags. This prevents unintended processing of multimodal data from untrusted sources.

Key Features

Dynamo provides support for improving latency and throughput for vision-and-language workloads through the following features, that can be used together or separately, depending on your workload characteristics:

FeatureDescription
Embedding CacheCPU-side LRU cache that skips re-encoding repeated images
Encoder DisaggregationSeparate vision encoder worker for independent scaling
Multimodal KV RoutingMM-aware KV cache routing for optimal worker selection

Support Matrix

StackImageVideoAudio
vLLM✅🧪🧪
TRT-LLM✅❌❌
SGLang✅🧪❌

Status: ✅ Supported | 🧪 Experimental | ❌ Not supported

Security: URL Validation

All multimodal loaders route remote fetches through a shared URL policy (dynamo.common.multimodal.url_validator). Only https:// and data: URLs are allowed by default, private / internal IPs are blocked, and local file access is disabled. Every HTTP redirect hop is re-validated against the policy.

Two environment variables loosen the defaults for non-public deployments:

VariableDefaultEffect
DYN_MM_ALLOW_INTERNAL0Set to 1 to allow http://, private / internal IPs, and explicit ports. Intended for on-prem or local-dev setups where media lives on an internal network.
DYN_MM_LOCAL_PATH(empty)Absolute directory prefix. When set, file:// URIs and bare paths are allowed if they resolve inside this prefix.

Never set DYN_MM_ALLOW_INTERNAL=1 on public-facing deployments. It opens SSRF paths to cloud metadata endpoints (AWS IMDS, GCE, Azure) and other internal services.

Example Workflows

Reference implementations for deploying multimodal models:

  • vLLM multimodal examples (image, video)
  • TRT-LLM multimodal examples
  • SGLang multimodal examples

Backend Documentation

Detailed deployment guides, configuration, and examples for each backend:

  • vLLM Multimodal
  • TensorRT-LLM Multimodal
  • SGLang Multimodal