High-level guide to Dynamo Kubernetes deployments. Start here, then dive into specific guides.
Kubernetes Namespace: The K8s namespace where your DynamoGraphDeployment resource is created.
dynamo-system, dynamo-cloud, team-a-namespaceDynamo Namespace: The logical namespace used by Dynamo components for service discovery.
.spec.services.<ServiceName>.dynamoNamespace fieldmy-llm, production-model, dynamo-devThese are independent. A single Kubernetes namespace can host multiple Dynamo namespaces, and vice versa.
Before deploying the platform, it is recommended to run the pre-deployment checks to ensure the cluster is ready for deployment. Please refer to the pre-deployment checks for more details.
For Shared/Multi-Tenant Clusters:
If your cluster has namespace-restricted Dynamo operators, add this flag to step 3:
For more details or customization options (including multinode deployments), see Installation Guide for Dynamo Kubernetes Platform.
Each backend has deployment examples and configuration options:
For SLA-based autoscaling, see SLA Planner Quick Start Guide.
Dynamo provides two main Kubernetes Custom Resources for deploying models:
The recommended approach for generating optimal configurations. DGDR provides a high-level interface where you specify:
Dynamo automatically handles profiling and generates an optimized DGD spec in the status. Perfect for:
Note: DGDR generates a DGD spec which you can then use to deploy.
A lower-level interface that defines your complete inference pipeline:
Use this when you need fine-grained control or have already completed profiling.
Refer to the API Reference and Documentation for more details.
For detailed technical specifications of Dynamo’s Kubernetes resources:
When creating a deployment, select the architecture pattern that best fits your use case:
agg.yaml as the base configurationagg_router.yaml to enable scalable, load-balanced inferencedisagg_router.yaml for maximum throughput and modular scalabilityYou can run the Frontend on one machine (e.g., a CPU node) and workers on different machines (GPU nodes). The Frontend serves as a framework-agnostic HTTP entry point that:
/v1/chat/completions endpointExample structure:
Worker command examples per backend:
Key customization points include:
resources.limitsreplicas for number of worker instancesDYN_ROUTER_MODE=kv in Frontend envs--is-prefill-worker flag for disaggregated prefill workers