Multimodal Model Serving | NVIDIA Dynamo Documentation

Dynamo supports multimodal inference across multiple LLM backends, enabling models to process images, video, and audio alongside text.

Security Requirement: Multimodal processing must be explicitly enabled at startup. See the relevant backend documentation (vLLM, SGLang, TRT-LLM) for the necessary flags. This prevents unintended processing of multimodal data from untrusted sources.

Key Features

Dynamo provides support for improving latency and throughput for vision-and-language workloads through the following features, that can be used together or separately, depending on your workload characteristics:

Feature	Description
Embedding Cache	CPU-side LRU cache that skips re-encoding repeated images
Encoder Disaggregation	Separate vision encoder worker for independent scaling
Multimodal KV Routing	MM-aware KV cache routing for optimal worker selection

Support Matrix

Stack	Image	Video	Audio
vLLM	✅	🧪	🧪
TRT-LLM	✅	❌	❌
SGLang	✅	🧪	❌

Status: ✅ Supported | 🧪 Experimental | ❌ Not supported

Security: URL Validation

All multimodal loaders route remote fetches through a shared URL policy (dynamo.common.multimodal.url_validator). Only https:// and data: URLs are allowed by default, private / internal IPs are blocked, and local file access is disabled. Every HTTP redirect hop is re-validated against the policy.

Two environment variables loosen the defaults for non-public deployments:

Variable	Default	Effect
`DYN_MM_ALLOW_INTERNAL`	`0`	Set to `1` to allow `http://` and private / internal IP targets. Intended for on-prem or local-dev setups where media lives on an internal network.
`DYN_MM_LOCAL_PATH`	(empty)	Absolute directory prefix. When set, `file://` URIs and bare paths are allowed if they resolve inside this prefix.

Never set DYN_MM_ALLOW_INTERNAL=1 on public-facing deployments. It opens SSRF paths to cloud metadata endpoints (AWS IMDS, GCE, Azure) and other internal services.

Example Workflows

Reference implementations for deploying multimodal models:

Backend Documentation

Detailed deployment guides, configuration, and examples for each backend: