Azure Lustre CSI Driver for AKS
Azure Lustre CSI Driver for AKS
This guide covers installing and configuring the Azure Lustre CSI driver on an AKS cluster so that Dynamo workloads can use Azure Managed Lustre (AMLFS) filesystems for high-performance model storage.
Prerequisites
AKS cluster requirements
- Kubernetes 1.21 or later
- Node pools must use the Ubuntu OS SKU — Windows and Azure Linux (CBL Mariner) nodes are not supported
- AKS is the only supported Kubernetes distribution (self-managed clusters are not supported)
Tools
- Azure CLI (
az) kubectl
Network connectivity
AKS and your AMLFS filesystem must have network reachability. Two supported topologies:
- VNet peering: Deploy AKS in its own VNet and peer it with the AMLFS VNet. The AKS infrastructure VNet lives in the auto-created resource group
MC_<aks-rg>_<aks-name>_<region>. - Shared VNet: Use AKS’s “Bring your own VNet” feature and deploy AKS in a dedicated subnet inside the AMLFS VNet. Do not use the same subnet as AMLFS.
Step 1: Connect to your AKS cluster
Step 2: Install the CSI driver
There is no Helm chart. Install via the provided shell script:
The script deploys the CSI controller (2-replica Deployment) and node plugin (DaemonSet) into kube-system, and waits for them to become ready.
Verify the installation:
Step 3: Configure storage
There are two provisioning modes depending on whether your AMLFS filesystem already exists.
Option A: Static provisioning (existing AMLFS filesystem)
Use this when you want to bring your own Azure Managed Lustre filesystem. If you don’t have one yet, create it first, then configure the CSI driver to use it.
Create an Azure Managed Lustre filesystem
1. Register the resource provider (first time only):
2. Validate your subnet before creating the filesystem:
The subnet must be dedicated to AMLFS (do not share with AKS nodes or other resources) and sized to hold the filesystem. Check requirements first:
3. Create a dedicated subnet for AMLFS:
AMLFS requires its own subnet — it cannot share the subnet used by AKS nodes. Create a new subnet in the AKS VNet (or in a peered VNet):
If vnet is non-null, your cluster uses Azure CNI with a custom VNet — use that VNet name and resource group below.
If vnet is null, AKS manages its own VNet in the node resource group. Find it:
List existing subnets to find a free CIDR range:
Pick a non-overlapping CIDR within the VNet’s address space. The filesystemSubnetSize value from get-subnets-size is the number of IPs required. Azure also reserves 5 IPs per subnet, so add those when sizing the prefix (e.g., filesystemSubnetSize: 8 → 13 IPs needed → use /28 for 16 addresses or more).
Then create the dedicated AMLFS subnet:
Use the full subnet resource ID in the next step:
/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<VNET_RESOURCE_GROUP>/providers/Microsoft.Network/virtualNetworks/<AKS_VNET_NAME>/subnets/amlfs-subnet
4. Create the filesystem:
This takes 10–20 minutes. Use --no-wait to return immediately and poll with az amlfs show.
Available SKUs:
5. Get the MGS IP address:
Use the mgsAddress value in the StorageClass below. Alternatively, find it in the Azure portal under your filesystem’s Client connection pane.
StorageClass:
PersistentVolumeClaim:
Option B: Dynamic provisioning (auto-create AMLFS filesystem)
Requires driver v0.3.0 or later. The driver creates an AMLFS cluster automatically when the PVC is created — this takes 10+ minutes.
Additional IAM permissions required on the kubelet managed identity (grant before creating the PVC):
Alternatively assign the broader roles: Reader at subscription scope, Contributor at the target resource group, and Network Contributor at the VNet scope.
Available SKUs:
StorageClass:
PersistentVolumeClaim:
Troubleshooting
Pod stuck in ContainerCreating
PVC stuck in Pending (dynamic provisioning)
Node cannot mount — verify Ubuntu OS SKU: