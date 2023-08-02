Connect to AKS Cluster

Copy Copied! az account set --subscription <subscription id> az aks get-credentials --resource-group <resource group name> --name <aks cluster name>





Configuring Kubernetes Pods to Access GPU Resources

Copy Copied! helm repo add nvidia https://helm.ngc.nvidia.com/nvidia helm repo update helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator

Once the daemon set is running on the GPU-powered worker nodes, use the following command to verify that each node has allocatable GPUs.

Copy Copied! kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

You should see all the nodes in your cluster with the number of gpu’s for that instance

AKS, NFS and DSVM need to be one the same vnet. You can create a vnet and use it at creation time. For me, I create ASK and dig out the vnet and use it at NFS, DSVM creation. To find the vnet of AKS is not straight forward. I would explain it in this section.

First, open portal.azure.com and go to Virtual Machine Scale Set page.





Find the Virtual Machine Scale Set page of your AKS from Resource group Looking for MC_<resource group of your AKS>_<AKS name>_location and click it.

you can see the vnet name and you can config firewall by click network tab

Install NGINX Ingress Controller

Carry out the following commands on your local machine

Copy Copied! helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update helm install ingress-nginx ingress-nginx/ingress-nginx





NFS Server

We need NFS server, IT can be SaaS like Azure NFS Server or PaaS like installing NFS on top of a VM. You just need to do one of the foolowing two. No need to do both.

Azure NFS Server

Create a storage account using azure portal (make sure same resource group, same loctation) Add virtual network of AKS just create to storage account Create a file share and configure, Please don’t create private endpoint, just Configure service endpoint. Add virtual network of AKS just create to fileshare

More detals at following link https://learn.microsoft.com/en-us/azure/storage/files/storage-files-quick-create-use-linux

VM-Based NFS Server

You can setup a VM-based NFS Server if you do not setup Azure NFS. Details described at Bare-Metal Setup and AWS EKS Setup.

Storage Provisioner

Below is an example with local NFS (requires a local NFS server). One must replace the NFS server IP and exported path.

Copy Copied! helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \ --set nfs.server=<storage account name>.file.core.windows.net:/<storage account name>/<file share name> \ --set nfs.path=/mnt/nfs_share





Image Pull Secret for nvcr.io

In this example, one must set his ngc-api-key, ngc-email and deployment namespace.

Copy Copied! kubectl create secret docker-registry 'imagepullsecret' --docker-server='nvcr.io' --docker-username='$oauthtoken' --docker-password='ngc-api-key' --docker-email='ngc-email' --namespace='default'

Where: