High-level guide to Dynamo Kubernetes deployments. Start here, then dive into specific guides.
Kubernetes Namespace: The K8s namespace where your DynamoGraphDeployment resource is created.
dynamo-system, team-a-namespaceDynamo Namespace: The logical namespace used by Dynamo components for service discovery.
.spec.services.<ServiceName>.dynamoNamespace fieldmy-llm, production-model, dynamo-devThese are independent. A single Kubernetes namespace can host multiple Dynamo namespaces, and vice versa.
Before you begin, ensure you have the following tools installed:
Verify your installation:
For detailed installation instructions, see the Prerequisites section in the Installation Guide.
Before deploying the platform, run the pre-deployment checks to ensure the cluster is ready:
This validates kubectl connectivity, StorageClass configuration, and GPU availability. See pre-deployment checks for more details.
v0.9.0 Helm Chart Issue: The initial v0.9.0 dynamo-platform Helm chart sets the operator image to v0.7.1 instead of v0.9.0. Use RELEASE_VERSION=0.9.0-post1 or add --set dynamo-operator.controllerManager.manager.image.tag=0.9.0 to your helm install command.
For Shared/Multi-Tenant Clusters:
If your cluster has namespace-restricted Dynamo operators, add this flag to step 2:
For more details or customization options (including multinode deployments), see Installation Guide for Dynamo Kubernetes Platform.
Each backend has deployment examples and configuration options:
Follow the Deploying Your First Model guide for a complete end-to-end
walkthrough using DynamoGraphDeploymentRequest (DGDR) — Dynamo’s recommended path that
handles profiling and configuration automatically.
The tutorial deploys Qwen/Qwen3-0.6B with vLLM and walks you through every step: creating
the DGDR, watching the profiling lifecycle, and sending your first inference request.
For SLA-based autoscaling, see SLA Planner Guide.
Dynamo provides two main Kubernetes Custom Resources for deploying models:
The recommended approach for generating optimal configurations. DGDR provides a high-level interface where you specify:
Dynamo automatically handles profiling and generates an optimized DGD spec in the status. Perfect for:
Note: DGDR generates a DGD spec which you can then use to deploy.
A lower-level interface that defines your complete inference pipeline:
Use this when you need fine-grained control or have already completed profiling.
Refer to the API Reference and Documentation for more details.
For detailed technical specifications of Dynamo’s Kubernetes resources:
When creating a deployment, select the architecture pattern that best fits your use case:
agg.yaml as the base configurationagg_router.yaml to enable scalable, load-balanced inferencedisagg_router.yaml for maximum throughput and modular scalabilityYou can run the Frontend on one machine (e.g., a CPU node) and workers on different machines (GPU nodes). The Frontend serves as a framework-agnostic HTTP entry point that:
/v1/chat/completions endpointExample structure:
Worker command examples per backend:
Key customization points include:
resources.limitsreplicas for number of worker instancesDYN_ROUTER_MODE=kv in Frontend envs--disaggregation-mode prefill flag for disaggregated prefill workers