Deploy Kubernetes#

Kubernetes Node Setup#

This section outlines the configuration steps required on the BCM head node to provision the 3 Kubernetes control nodes

First, we will create the software image and define the category for the kubernetes nodes and assign the disklayout.

Add necessary kernel modules for bonding and mellanox drivers.

cmsh
[bcm10-headnode1]% softwareimage
[bcm10-headnode1->softwareimage]% clone default-image k8s-control-plane-image
[bcm10-headnode1->softwareimage*[k8s-control-plane-image*]]% commit
[bcm10-headnode1->softwareimage[k8s-control-plane-image]]%kernelmodules
[bcm10-headnode1->softwareimage[k8s-control-plane-image]->kernelmodules]% add mlx5_core
[bcm10-headnode1->softwareimage[k8s-control-plane-image]->kernelmodules]% add bonding
[bcm10-headnode1->softwareimage*[k8s-control-plane-image*]->kernelmodules*[mlx5_core*]]%commit
[bcm10-headnode1->softwareimage[k8s-control-plane-image]]% category
[bcm10-headnode1->category]% clone default k8s-control-plane
[bcm10-headnode1->category*[k8s-control-plane*]]% set softwareimage k8s-control-plane-image
[bcm10-headnode1->category*[k8s-control-plane*]]% set disksetup /cm/local/apps/cmd/etc/htdocs/disk-setup/x86_64-slave-one-big-partition-ext4.xml
[bcm10-headnode1->category*[k8s-control-plane*]]% commit

Create the first kubernetes node - knode-01- by cloning the default node01 and move it to the k8s-control-plane category.

cmsh
[bcm10-headnode1]% device
[bcm10-headnode1->device]% clone node01 knode-01
[bcm10-headnode1->device*[knode-01*]]% set category k8s-control-plane
[bcm10-headnode1->device*[knode-01*]]% commit

Add and configure the IPMI (BMC) and managementnet (internalnet) bond interfaces.

Note : The name of the interfaces will change depending on the hardware vendor of the node appliance.

#cmsh
[bcm10-headnode1]% device
[bcm10-headnode1->device]% use knode-01
[bcm10-headnode1->device[knode-01]]% interfaces
[bcm10-headnode1->device[knode-01]->interfaces]% add bmc ipmi0 10.160.6.4 ipminet
[bcm10-headnode1->device*[knode-01*]->interfaces*[ipmi0*]]% commit
[bcm10-headnode1->device[knode-01]->interfaces]% add physica ens2f1np1; add physical ens1f1np1
[bcm10-headnode1->device*[knode-01*]->interfaces*[ens1f1np1*]]% commit
[bcm10-headnode1->device[knode-01]->interfaces]% add bond bond0 10.184.94.4 managementnet
[bcm10-headnode1->device*[knode-01*]->interfaces*[bond0*]]% append interfaces ens2f1np1 ens1f1np1
[bcm10-headnode1->device*[knode-01*]->interfaces*[bond0*]]% remove bootif
[bcm10-headnode1->device*[knode-01*]->interfaces*[bond0*]]% ..
[bcm10-headnode1->device*[knode-01*]->interfaces*]% ..
[bcm10-headnode1->device*[knode-01*]]% set provisioninginterface bond0
[bcm10-headnode1->device*[knode-01*]]% commit

Clone knode-01 to create the two additional knodes.

[bcm10-headnode1]% device
[bcm10-headnode1->device]% foreach --clone knode-01 -n knode-02..knode-03 --next-ip ()
[bcm10-headnode1->device*]% commit

Set the MAC addresses for each of the knodes so that they can PXE boot from BCM. Refer to the site survey for details.

cmsh
[bcm10-headnode1]% device
[bcm10-headnode1->device]% use knode-01
[bcm10-headnode1->device[knode-01]]% interfaces
[bcm10-headnode1->device[knode-01]->interfaces]% set ens2f1np1 mac 00:CC:EE:FF:77:88
[bcm10-headnode1->device*[knode-01*]->interfaces*]% set ens1f1np1 mac 00:CC:EE:FF:77:99
[bcm10-headnode1->device*[knode-01*]->interfaces*]% exit
[bcm10-headnode1->device*[knode-01*]]% set mac 00:CC:EE:FF:77:88
[bcm10-headnode1->device*[knode-01*]]% commit

Repeat these steps for all 3 knodes, to get their interface/MAC mapping in BCM for PXE boot/provisioning.

Power on and BCM will provision the kubernetes nodes with PXE boot.

cmsh

[bcm10-headnode1]% device
[bcm10-headnode1->device]% power on -c k8s-control-plane

Ensure all the nodes are up

[bcm10-headnode1->device]% list -c k8s-control-plane
Type Hostname (key) MAC Category IP Network Status
-------------------------------------------------------
PhysicalNode Knode1 9E:1D:D5:41:E4:96 k8s-control-plane 10.184.94.4 managementnet [ UP ]
PhysicalNode Knode2 52:FD:DB:48:5C:4D k8s-control-plane 10.184.94.5 managementnet [ UP ]
PhysicalNode Knode3 C2:6F:59:F9:75:6C k8s-control-plane 10.184.94.6 managementnet [ UP ]

Kubernetes Deployment#

This section addresses configuration steps to be performed on the BCM head node.

In the root shell, run the Kubernetes setup script.

cm-kubernetes-setup

Select Deploy to continue

image

Select the newest version certified for NVIDIA AI Enterprise (i.e. version with *).

image

(Optional) Enter a private registry server here if required.

image

Keep the default settings for the cluster.

image

Select yes to allow the cluster to be used from the headnode.

image

Select managementnet (internalnet) for kubernetes networking.

image

Select the 3 knodes we just provisioned for the control plane.

image

Select the dgx-h100 category for Kubernetes workers.

image

Skip selecting individual worker nodes.

image

Select the 3 kubernetes nodes to be etcd nodes.

image

Click “OK” to the symlink dialogue

_images/k8s-image26.png

Accept the default values for the main Kubernetes components.

image

Select the Calico CNI plugin.

image

Select no for installing the Kyverno Policy Engine.

image

(Optional) If an NVAIE license has been provided, select yes and enter the details on the following page. Otherwise, select no.

image

Select the following operators to install:

  • NVIDIA GPU Operator

  • cm-jupyter-kernel-operator

  • cm-kubernetes-mpi-operator

  • Network Operator

  • Kubeflow Training Operator

  • Prometheus Adapter

  • Prometheus Operator Stack

Optional - MetalLB if you are planning to expose services directly from the DGX nodes, like NIM/inference workloads

_images/k8s-image29.png

Select the latest version certified for NVIDIA AI Enterprise.

image

Select the latest version of network operator certified for NVIDIA AI Enterprise.

image

Leave the custom YAML file blank for the GPU Operator.

image

Enable CDI (container device interface) and NFD (node feature discovery)

image

Leave the Network Operator custom YAML configuration file name as blank.

image

Select the following configuration options for the Network Operator:

  • NFD - Node Feature discovery

  • SRIOV - SRIOV Network Operator

  • CR - Deploy NIC Cluster Policy CR

  • IPoIB - Enable IP over Infiniband CNI

All secondaryNetwork Components - To deploy RDMA interface to the containers.

_images/k8s-image2.png

If MetalLB is enabled, define the MetalLB IP pool. Allocate a free IP range for MetalLB from the internal network range or as appropriate. Refer to the MetalLB configuration guide for further details.

_images/k8s-image31.png

Deploy all the addons.

_images/k8s-image4.png

Select Yes to the Ingress default port

_images/k8s-image12.png

Leave the ports as default.

image

Choose yes to install the Permission Manager.

image

Select both enabled and default for the local storage path.

image

Leave the storage path as default.

image

Choose to save config and deploy.

image

Keep the default file path for the config file and continue.

Note: This file contains the configuration for the kubernetes cluster.

image

BCM will deploy kubernetes to the nodes, once the Kubernetes setup has completed, verify that all the nodes are online.

root@bcm10-headnode1:~# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
dgx-01 Ready worker 14d v1.30.9 10.184.94.11 <none> Ubuntu 22.04.4 LTS 5.15.0-1063-nvidia containerd://1.7.21
dgx-02 Ready worker 14d v1.30.9 10.184.94.12 <none> Ubuntu 22.04.4 LTS 5.15.0-1063-nvidia containerd://1.7.21
dgx-03 Ready worker 14d v1.30.9 10.184.94.13 <none> Ubuntu 22.04.4 LTS 5.15.0-1063-nvidia containerd://1.7.21
dgx-04 Ready worker 14d v1.30.9 10.184.94.14 <none> Ubuntu 22.04.4 LTS 5.15.0-1063-nvidia containerd://1.7.21
knode1 Ready control-plane,master 14d v1.30.9 10.184.94.4 <none> Ubuntu 22.04.4 LTS 5.15.0-113-generic containerd://1.7.21
knode2 Ready control-plane,master 14d v1.30.9 10.184.94.5 <none> Ubuntu 22.04.4 LTS 5.15.0-113-generic containerd://1.7.21
knode3 Ready control-plane,master 14d v1.30.9 10.184.94.6 <none> Ubuntu 22.04.4 LTS 5.15.0-113-generic containerd://1.7.21

Validate the Kubernetes cluster by checking that the pods are in the “Running” state and ensuring that both the GPU operator and network operator pods are active.

root@bcm10-headnode1:~$ kubectl get pods -A \| grep "network\|gpu"
gpu-operator gpu-feature-discovery-7l4bz 1/1 Running 0 13h
gpu-operator gpu-feature-discovery-bpzzq 1/1 Running 0 13h
– Output removed for brevity –
network-operator cni-plugins-ds-gx97k 1/1 Running 0 13h
network-operator cni-plugins-ds-hw6sr 1/1 Running 0 13h
network-operator kube-multus-ds-kkbwz 1/1 Running 0 13h

Add a kubernetes user#

Add a new user named ‘k8suser’ in BCM

root@bcm10-headnode1:~# cmsh
[bcm10-headnode1]% user;list
Name (key) ID (key) Primary group Secondary groups

---------------------------------------------------

cmsupport 1000      cmsupport
[bcm10-headnode1]% user
[bcm10-headnode1->user]% add k8suser
[bcm10-headnode1->user*[k8suser*]]% set password bcm123
[bcm10-headnode1->user*[k8suser*]]% commit

Add a new user to Kubernetes:

cm-kubernetes-setup --add-user k8suser

Switch to the new user:

su – k8suser