Azure Spot VMs offer significant cost savings for GPU workloads but can be evicted by Azure at any time. This guide covers the configuration required to schedule Dynamo on Spot VM node pools.
When a node pool uses Spot VMs, AKS automatically applies the following taint to all nodes in that pool:
This prevents standard workloads from landing on Spot nodes by default. Any pod that should run on a Spot node must explicitly tolerate this taint.
Add the following toleration to any workload that should run on Spot nodes:
The Dynamo platform Helm chart includes a pre-built values file for Spot VM deployments — examples/deployments/AKS/values-aks-spot.yaml — which adds the required toleration to all Dynamo components:
Install Dynamo with the Spot values file:
To upgrade an existing installation:
Add a Spot GPU node pool to an existing AKS cluster:
--spot-max-price -1 means pay up to the on-demand price (recommended). --eviction-policy Delete removes evicted nodes from the pool; use Deallocate if you want to preserve node state across evictions.