Installation Guide for Dynamo Kubernetes Platform#
Deploy and manage Dynamo inference graphs on Kubernetes with automated orchestration and scaling, using the Dynamo Kubernetes Platform.
Quick Start Paths#
Platform is installed using Dynamo Kubernetes Platform helm chart.
Path A: Production Install Install from published artifacts on your existing cluster → Jump to Path A
Path B: Local Development Set up Minikube first → Minikube Setup → Then follow Path A
Path C: Custom Development Build from source for customization → Jump to Path C
All helm install commands could be overridden by either setting the values.yaml file or by passing in your own values.yaml:
helm install ...
-f your-values.yaml
and/or setting values as flags to the helm install command, as follows:
helm install ...
--set "your-value=your-value"
Prerequisites#
# Required tools
kubectl version --client # v1.24+
helm version # v3.0+
docker version # Running daemon
# Set your inference runtime image
export RELEASE_VERSION=0.x.x # any version of Dynamo 0.3.2+ listed at https://github.com/ai-dynamo/dynamo/releases
export DYNAMO_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:${RELEASE_VERSION}
# Also available: sglang-runtime, tensorrtllm-runtime
Tip
No cluster? See Minikube Setup for local development.
Path A: Production Install#
Install from NGC published artifacts in 3 steps.
# 1. Set environment
export NAMESPACE=dynamo-system
export RELEASE_VERSION=0.x.x # any version of Dynamo 0.3.2+ listed at https://github.com/ai-dynamo/dynamo/releases
# 2. Install CRDs
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default
# 3. Install Platform
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} --create-namespace
Tip
For multinode deployments, you need to enable Grove and Kai Scheduler. You might chose to install them manually or through the dynamo-platform helm install command. When using the dynamo-platform helm install command, Grove and Kai Scheduler are NOT installed by default. You can enable their installation by setting the following flags in the helm install command:
--set "grove.enabled=true"
--set "kai-scheduler.enabled=true"
Tip
By default, Model Express Server is not used. If you wish to use an existing Model Express Server, you can set the modelExpressURL to the existing server’s URL in the helm install command:
--set "dynamo-operator.modelExpressURL=http://model-express-server.model-express.svc.cluster.local:8080"
Tip
By default, Dynamo Operator is installed cluster-wide and will monitor all namespaces. If you wish to restrict the operator to monitor only a specific namespace (the helm release namespace by default), you can set the namespaceRestriction.enabled to true. You can also change the restricted namespace by setting the targetNamespace property.
--set "dynamo-operator.namespaceRestriction.enabled=true"
--set "dynamo-operator.namespaceRestriction.targetNamespace=dynamo-namespace" # optional
Path C: Custom Development#
Build and deploy from source for customization.
# 1. Set environment
export NAMESPACE=dynamo-system
export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/ # or your registry
export DOCKER_USERNAME='$oauthtoken'
export DOCKER_PASSWORD=<YOUR_NGC_CLI_API_KEY>
export IMAGE_TAG=${RELEASE_VERSION}
# 2. Build operator
cd deploy/cloud/operator
# 2.1 Alternative 1 : Build and push the operator image for multiple platforms
docker buildx create --name multiplatform --driver docker-container --bootstrap
docker buildx use multiplatform
docker buildx build --platform linux/amd64,linux/arm64 -t $DOCKER_SERVER/dynamo-operator:$IMAGE_TAG --push .
# 2.2 Alternative 2 : Build and push the operator image for a single platform
docker build -t $DOCKER_SERVER/dynamo-operator:$IMAGE_TAG . && docker push $DOCKER_SERVER/dynamo-operator:$IMAGE_TAG
cd -
# 3. Create namespace and secrets to be able to pull the operator image (only needed if you pushed the operator image to a private registry)
kubectl create namespace ${NAMESPACE}
kubectl create secret docker-registry docker-imagepullsecret \
--docker-server=${DOCKER_SERVER} \
--docker-username=${DOCKER_USERNAME} \
--docker-password=${DOCKER_PASSWORD} \
--namespace=${NAMESPACE}
# 4. Install CRDs
helm upgrade --install dynamo-crds ./crds/ --namespace default
# 5. Install Platform
helm dep build ./platform/
helm install dynamo-platform ./platform/ \
--namespace ${NAMESPACE} \
--set dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/dynamo-operator \
--set dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG} \
--set dynamo-operator.imagePullSecrets[0].name=docker-imagepullsecret
Verify Installation#
# Check CRDs
kubectl get crd | grep dynamo
# Check operator and platform pods
kubectl get pods -n ${NAMESPACE}
# Expected: dynamo-operator-* and etcd-* pods Running
Next Steps#
Deploy Model/Workflow
# Example: Deploy a vLLM workflow with Qwen3-0.6B using aggregated serving kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE} # Port forward and test kubectl port-forward svc/agg-vllm-frontend 8000:8000 -n ${NAMESPACE} curl http://localhost:8000/v1/models
Explore Backend Guides
Optional:
SLA Planner Deployment Guide (for advanced SLA-aware scheduling and autoscaling)
Troubleshooting#
Pods not starting?
kubectl describe pod <pod-name> -n ${NAMESPACE}
kubectl logs <pod-name> -n ${NAMESPACE}
HuggingFace model access?
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
Bitnami etcd “unrecognized” image?
ERROR: Original containers have been substituted for unrecognized ones. Deploying this chart with non-standard containers is likely to cause degraded security and performance, broken chart features, and missing environment variables.
This error that you might encounter during helm install is due to bitnami changing their docker repository to a secure one.
just add the following to the helm install command:
--set "etcd.image.repository=bitnamilegacy/etcd" --set "etcd.global.security.allowInsecureImages=true"
Clean uninstall?
./uninstall.sh # Removes all CRDs and platform