Feature Guides | NVIDIA Dynamo Documentation

Use these guides after you have Dynamo running and want to improve serving behavior, operate a deployment, or adapt Dynamo to a new workload.

Recommended path

Most deployments start with the core performance loop:

Step	Guide	Use when
1	KV Cache Aware Routing	Route requests to workers that already hold useful KV cache.
2	Disaggregated Serving	Scale prefill and decode workers independently.
3	KV Cache Offloading	Extend usable cache capacity beyond GPU memory.
4	Benchmarking	Compare configurations before you move to production.

Where to go next

Goal	Start with
Make serving more resilient	Fault Tolerance
Monitor local deployments	Observability (Local)
Reproduce traffic without a full engine	Mocker Engine Simulation
Add structured model outputs	Tool Calling and Reasoning
Build agent workloads	Agents
Serve specialized workloads	LoRA Adapters, Multimodal, and Diffusion

For cluster deployments, pair these guides with the Kubernetes Deployment docs. The same features can be explored locally, then expressed through Dynamo’s Kubernetes-native CRDs and operator when you move to a shared GPU cluster.