This guide walks you through installing everything needed to deploy models with Dynamo on Kubernetes. Follow the steps in order — each builds on the previous one.
Before you begin, make sure you have:
Cloud provider GPU drivers: The GPU Operator (Step 1) installs GPU drivers for you. When creating your cluster’s GPU node pools, do not enable provider-managed GPU driver installation (e.g., skip AKS GPU driver install, don’t use GKE --accelerator gpu-driver-version=latest). If your nodes already have provider-managed drivers, see the GPU Operator step for how to handle this.
Verify your tools:
Every Dynamo deployment requires two Helm charts: the GPU Operator (Step 1) and the Dynamo Platform (Step 2). Everything else is optional. Decide what optional components you need before starting so you can install them in Step 3.
Grove + KAI Scheduler — Grove is the default multinode orchestrator. The operator returns a hard error on multinode deployments if neither Grove nor LeaderWorkerSet (LWS) is available. KAI Scheduler is optional but recommended alongside Grove for GPU-aware scheduling. See Grove for details.
Network Operator / RDMA — Without RDMA, disaggregated inference falls back to TCP automatically, but with severe performance degradation (~98s TTFT vs ~200-500ms with RDMA). Required for any production disaggregated deployment. Setup is cloud-provider-specific — see the Disaggregated Communication Guide and your cloud provider guide.
kube-prometheus-stack — Required for the Planner’s sla optimization mode (it reads live TTFT/ITL metrics from Prometheus). Also required for KEDA/HPA-based autoscaling. The Planner’s throughput mode can function without it using internal queue depth signals, but metrics-driven features will not work. See Metrics for details.
Shared storage — Prevents each pod from downloading model weights independently. Without it, large models (>70B) take hours to download per pod, and many replicas will hit HuggingFace rate limits. Not enforced by the operator — this is an operational concern. See Model Caching for the full walkthrough.
The NVIDIA GPU Operator automates deployment of all NVIDIA software components needed to provision GPUs — drivers, container toolkit, device plugin, and monitoring.
If your GPU nodes already have provider-managed drivers installed (e.g., you used GKE’s --accelerator gpu-driver-version=latest), uncomment the driver.enabled=false line above so the operator doesn’t conflict with the existing drivers.
Some cloud providers require additional GPU Operator configuration. See your provider guide for details:
LD_LIBRARY_PATH and ldconfig init requirementsVerify the GPU Operator is running:
Set your environment variables:
All helm install commands can be customized with your own values file: helm install ... -f your-values.yaml
Shared/Multi-Tenant Clusters: If a cluster-wide Dynamo operator is already running, do not install another one. Check with:
Namespace-restricted mode (namespaceRestriction.enabled=true) is deprecated and will be removed in a future release. Use the default cluster-wide mode for all new deployments.
Verify the Dynamo platform is running:
The Dynamo install command above includes commented flags for each optional component. Install the component first, then uncomment the corresponding flag before running helm install in Step 2 (or run helm upgrade --reuse-values with the flag if you’ve already installed Dynamo).
Multinode deployments require either Grove + KAI Scheduler or an alternative orchestrator setup (LeaderWorkerSet + Volcano) to enable gang scheduling for workloads that span multiple nodes. See the Multinode Deployment Guide for details on orchestrator selection and configuration.
There are two ways to enable Grove and KAI Scheduler, controlled by which flags you uncomment in the Dynamo install command:
install=true — Dynamo installs and manages Grove/KAI as bundled subcharts. Simplest path; recommended for dev/testing.enabled=true — Tells Dynamo that Grove/KAI are already installed and externally managed. Use this when you install Grove/KAI separately (e.g., to manage their lifecycle independently or share them across namespaces). Recommended for production.For the enabled=true path, install Grove and KAI Scheduler separately first. See the Grove installation guide and KAI Scheduler deployment guide for instructions.
Compatibility matrix:
If you are not using Grove for multinode, you can use LeaderWorkerSet (LWS) (>= v0.7.0) with Volcano for gang scheduling. Both must be installed before deploying multinode workloads.
See the LWS docs and Volcano docs for configuration options, and the Multinode Deployment Guide for orchestrator selection.
RDMA setup is cloud-provider-specific. See the Disaggregated Communication Guide for transport options, UCX configuration, and performance expectations, and your cloud provider guide for setup instructions:
Install Prometheus before running the Dynamo install command so you can set the endpoint in one pass:
Then uncomment the prometheusEndpoint line in the Dynamo install command. The Dynamo operator automatically creates PodMonitors for its components. See Metrics for dashboard setup and available metrics, and Logging for the Grafana Loki + Alloy logging stack.
Set up a ReadWriteMany PVC so all pods share downloaded model weights instead of each downloading independently. No Dynamo chart flags are needed — storage is configured in your deployment spec. Setup is cloud-provider-specific:
For large clusters with frequent model updates, consider ModelExpress for P2P model distribution and ModelStreamer for direct streaming from object storage. See Model Caching for the full walkthrough including the download Job, mount configuration, and ModelExpress setup.
Run the pre-deployment check script to validate your cluster is ready for deployments:
This checks kubectl connectivity, default StorageClass configuration, GPU node availability, and GPU Operator status. See Pre-Deployment Checks for details.
Your cluster is ready. Follow the Deployment Overview to choose between applying a tuned DGD recipe, creating a DGD directly, or using DGDR to generate one.
“VALIDATION ERROR: Cannot install cluster-wide Dynamo operator”
Cause: Attempting cluster-wide install on a shared cluster with existing namespace-restricted operators.
Solution: Migrate the existing namespace-restricted operators to cluster-wide mode. Namespace-restricted mode is deprecated.
CRDs already exist
Cause: Installing CRDs on a cluster where they’re already present (common on shared clusters).
Solution: CRDs are installed automatically by the Helm chart. If you encounter conflicts, check existing CRDs with kubectl get crd | grep dynamo.
Pods not starting?
Bitnami etcd “unrecognized” image?
Add to the helm install command:
Clean uninstall?
If you need to contribute to Dynamo or use the latest unreleased features from the main branch: