For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes
    • DynoSim: Simulating the Pareto Frontier
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
        • AKS Setup
        • RDMA / InfiniBand
        • AKS Storage
        • Azure Lustre CSI Driver
        • Spot VMs
  • Feature Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Benchmarking
    • Tool Calling & Reasoning Parsing
    • Fault Tolerance
    • Observability (Local)
    • Inference Simulation
    • Agents
    • LoRA Adapters
    • Multimodal
    • Diffusion
    • Fastokens Tokenizer
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • How AKS Taints Spot Nodes
  • Required Toleration
  • Deploying Dynamo on Spot Nodes
  • Creating a Spot GPU Node Pool
  • See Also
Kubernetes DeploymentCloud Provider GuidesAzure

AKS Spot VMs

||View as Markdown|
Previous

Azure Lustre CSI Driver for AKS

Next

Google Kubernetes Engine (GKE)

Azure Spot VMs offer significant cost savings for GPU workloads but can be evicted by Azure at any time. This guide covers the configuration required to schedule Dynamo on Spot VM node pools.

How AKS Taints Spot Nodes

When a node pool uses Spot VMs, AKS automatically applies the following taint to all nodes in that pool:

1kubernetes.azure.com/scalesetpriority=spot:NoSchedule

This prevents standard workloads from landing on Spot nodes by default. Any pod that should run on a Spot node must explicitly tolerate this taint.

Required Toleration

Add the following toleration to any workload that should run on Spot nodes:

1tolerations:
2 - key: kubernetes.azure.com/scalesetpriority
3 operator: Equal
4 value: spot
5 effect: NoSchedule

Deploying Dynamo on Spot Nodes

The Dynamo platform Helm chart includes a pre-built values file for Spot VM deployments — examples/deployments/AKS/values-aks-spot.yaml — which adds the required toleration to all Dynamo components:

  • Dynamo operator controller manager
  • Webhook CA inject and cert generation jobs
  • etcd
  • NATS
  • MPI SSH key generation job
  • Other core Dynamo platform pods

Install Dynamo with the Spot values file:

$helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
> --namespace dynamo-system \
> --create-namespace \
> -f ./values-aks-spot.yaml

To upgrade an existing installation:

$helm upgrade dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
> --namespace dynamo-system \
> -f ./values-aks-spot.yaml

Creating a Spot GPU Node Pool

Add a Spot GPU node pool to an existing AKS cluster:

$az aks nodepool add \
> --resource-group <RESOURCE_GROUP> \
> --cluster-name <CLUSTER_NAME> \
> --name spotgpunp \
> --node-count 2 \
> --node-vm-size Standard_NC24ads_A100_v4 \
> --priority Spot \
> --eviction-policy Delete \
> --spot-max-price -1 \
> --skip-gpu-driver-install

--spot-max-price -1 means pay up to the on-demand price (recommended). --eviction-policy Delete removes evicted nodes from the pool; use Deallocate if you want to preserve node state across evictions.

See Also

  • Azure Spot VMs overview
  • Use Spot VMs in AKS