Feature Guides

Start with Dynamo's core serving optimizations, then branch into operations and model capabilities.
View as Markdown

Use these guides after you have Dynamo running and want to improve serving behavior, operate a deployment, or adapt Dynamo to a new workload.

Most deployments start with the core performance loop:

StepGuideUse when
1KV Cache Aware RoutingRoute requests to workers that already hold useful KV cache.
2Disaggregated ServingScale prefill and decode workers independently.
3KV Cache OffloadingExtend usable cache capacity beyond GPU memory.
4BenchmarkingCompare configurations before you move to production.

Where to go next

GoalStart with
Make serving more resilientFault Tolerance
Monitor local deploymentsObservability (Local)
Reproduce traffic without a full engineMocker Engine Simulation
Add structured model outputsTool Calling and Reasoning
Build agent workloadsAgents
Serve specialized workloadsLoRA Adapters, Multimodal, and Diffusion

For cluster deployments, pair these guides with the Kubernetes Deployment docs. The same features can be explored locally, then expressed through Dynamo’s Kubernetes-native CRDs and operator when you move to a shared GPU cluster.