AKS GPU Setup
Kubernetes Version Requirement
AICR requires Kubernetes 1.34 or later on AKS. This is driven by DRA (Dynamic Resource Allocation), which is included in every AICR recipe.
The core DRA APIs (resource.k8s.io) graduated to GA (stable v1) in
Kubernetes 1.34. No AKS-specific feature flag is needed — DRA is enabled out of
the box once you’re on 1.34+.
You can verify DRA is available after the upgrade:
Expected output includes deviceclasses, resourceclaims, resourceclaimtemplates,
and resourceslices.
Note: Kubernetes version skipping is not allowed. If your cluster is on 1.32, you must upgrade to 1.33 first, then to 1.34.
Dynamic Resource Allocation (DRA)
All AICR recipes include the nvidia-dra-driver-gpu component, which advertises
GPUs via the Kubernetes DRA API instead of the legacy device plugin. DRA provides
structured GPU device advertisement, claim-based allocation, and integration with
gang scheduling.
Feature Gate Details
On AKS 1.34, DRA is GA. You do not need to pass any custom API server flags or register an AKS preview feature.
CLI Override
You can control DRA settings when bundling:
Device Plugin vs DRA
Both device-plugin and DRA are enabled by default, but only one should be used per node. Using both concurrently causes GPU over-admission — both systems advertise all GPUs independently, so the scheduler may admit more GPU pods than physical GPUs available.
For DRA-only (recommended):
For device-plugin-only (legacy):
GPU Driver Setup
AKS GPU nodepools install NVIDIA drivers by default. This conflicts with the GPU Operator, which also installs drivers by default. Use one of the approaches below to avoid the conflict.
Recommended: Let GPU Operator Manage the Driver
Create nodepools with --gpu-driver none so AKS skips its driver installation
and the GPU Operator handles it:
No changes to AICR recipes are needed — this is the default configuration.
Alternative: Use the AKS-Managed Driver
If you prefer the AKS-managed driver (e.g., for driver version pinning by AKS), disable the GPU Operator driver:
Or add to your values override file: