Kubernetes Updates#

Prerequisites#

Kubernetes upgrade requires “N-2 step upgrades” meaning the upgrade process may require multiple control plane and worker node upgrades. For example, an upgrade to 1.31 from 1.28 would be 1.28 > 1.29 > 1.30 > 1.31.

Upgrade control plane nodes first followed by worker nodes. Worker node upgrades can take advantage of N-2 API compatibility by performing the worker node upgrade once for every 2 control plane upgrades.

Kubernetes operators can be done at any time if they are compatible with Kubernetes version. Service interruption is expected during the updates. Run:ai updates can occur at any time if the Kubernetes version prerequisites are met.

  1. Validate the status of the Kubernetes cluster.

    Note

    This example demonstrates a single lab etcd master node. The expected result is all master nodes are in sync and there are no warnings in the output.

    # Load Kubernetes module which allows etcdctl commands to run
    module load kubernetes
    
    # Validate control plane nodes are online
    kubectl get node
    
    # Check the databases are in-sync and there are no warnings
    etcdctl --write-out=table endpoint status
    etcdctl --write-out=table endpoint health
    
    _images/78547395fadf82c0690191e7a41e87e72dd1d722.png
  2. Obtain the latest etcd management scripts (detailed steps found on BCM KB) and take a backup of the control plane leader.

    # BCM 10+
    # Obtain the scripts
    wget -O /cm/local/apps/cmd/scripts/cm-etcd-manage https://support2.brightcomputing.com/etcd/cm-etcd-manage
    wget -O /cm/local/apps/cmd/scripts/cm-kubeadm-manage
    
    # Change the scripts to be executable
    chmod +x /cm/local/apps/cmd/scripts/cm-etcd-manage
    chmod +x /cm/local/apps/cmd/scripts/cm-kubeadm-manage
    
    # Create alias for the scripts
    alias cm-etcd-manage='/cm/local/apps/cmd/scripts/cm-etcd-manage'
    alias cm-kubeadm-manage='/cm/local/apps/cmd/scripts/cm-kubeadm-manage'
    
    # Back up the control plane leader
    cm-etcd-manage --kube-cluster default backup /cm/shared/etcd-backups
    
    _images/214f1d97f76fb01978c615de7b2a2a5143829dcf.png
  3. Validate the current version of Kubernetes.

    kubectl version
    
    _images/c85cfcc59eb59e379186bf660ec97db629298960.png

Control Plane Node Updates#

  1. Validate which current repo will be used.

    _images/6ac6b1b977569f639f91ddc3deb2032d15696966.png
    # From BCM headnode
    cat /cm/images/<k8s-image>/etc/apt/sources.list.d/kubernetes.list
    
  2. Modify the Kubernetes repo to the updated version and validate it was correctly changed. In this case, the version is being migrated from 1.28 to 1.29.

    # From BCM headnode
    sed -i 's/1\.28/1.29/g' /cm/images/<k8s-image>/etc/apt/sources.list.d/kubernetes.list
    
    _images/52f4399b84e9d9505d3e2ed732f33e7cd9d18363.png
  3. Update the control plane nodes image to the Kubernetes version specified in the previous step.

    cm-chroot-sw-img /cm/images/<k8s-image>/
    
    apt update; apt-cache madison kubeadm
    
    # Update for a fresh cache to see the versions
    # madison option lists versions and origin repos
    # The expected results is to have the new version displayed
    
    # By default BCM puts Kubernetes packages on hold to prevent unattended upgrades
    
    apt-mark unhold kubeadm kubelet kubectl
    
    apt update && apt install -y kubeadm kubelet kubectl
    
    # Kubernetes
    apt-mark hold kubeadm kubelet kubectl
    
    exit
    
  4. Update the first control plane node.

    export host=knode001
    cmsh -c "device use $host; imageupdate -w --wait"
    
    ### Verify that Kubeadm has been updated to the desired version
    ssh $host kubeadm version
    
    ### Review the suggested upgrade plan
    ssh $host kubeadm upgrade plan
    
    ### Perform the actual upgrade
    ### This command should only be run once on the first to-be update control plane node
    ### Do not run this command more than once.
    ssh $host kubeadm upgrade apply <VERSION>
    
    ### reload kubelet and restart pods
    kubectl drain $host --ignore-daemonsets
    ssh $host sudo systemctl daemon-reload
    ssh $host sudo systemctl restart kubelet
    kubectl uncordon $host
    
    ### Verify that the control plane node has been updated
    kubectl get node
    
    ### Verify that there are no failing pods
    kubectl get po -A | grep -Ev 'Runn|Comp'
    
  5. Update the subsequent control plane nodes.

    export host=knode002
    cmsh -c "device use $host; imageupdate -w --wait"
    
    ### Verify that Kubeadm has been updated to the desired version
    ssh $host kubeadm version
    
    ### Perform the actual upgrade
    ssh $host kubeadm upgrade apply <VERSION>
    
    ### reload kubelet and restart pods
    kubectl drain $host --ignore-daemonsets
    ssh $host sudo systemctl daemon-reload
    ssh $host sudo systemctl restart kubelet
    kubectl uncordon $host
    
    ### Verify that the control plane node has been updated
    kubectl get node
    
    ### Verify that there are no failing pods
    kubectl get po -A | grep -Ev 'Runn|Comp'
    
  6. At this point Kubernetes will be updated to the latest version which was specified in step 2. Repeat the entire process for subsequent upgrades. Keep in mind, the worker nodes need to be within two releases (N-2), so it may be necessary to update the worker nodes prior to any additional updates.

Worker Node Updates#

Similar as the workflow for secondary control nodes outlined in the previous step.

  1. Update the software image(s) used by the Kubernetes workers to the new version of Kubernetes.

    # From BCM headnode
    sed -i 's/1\.28/1.29/g' /cm/images/<worker-image>/etc/apt/sources.list.d/kubernetes.list
    cm-chroot-sw-img /cm/images/<worker-image>/
    apt update; apt-cache madison kubeadm
    
    # update for a fresh cache to see the versions
    # madison option lists versions and origin repos
    # The expected results is to have the new version displayed
    
    # By default BCM puts Kubernetes packages on hold to prevent unattended upgrades
    
    apt-mark unhold kubeadm kubelet kubectl
    
    apt update && apt install -y kubeadm kubelet kubectl
    
    # Kubernetes
    apt-mark hold kubeadm kubelet kubectl
    
    exit
    
  2. Update the worker nodes in batches of up to 16 to maintain the Pod Distribution Budget requirements.

    export batch1='dgx-[01-15]'
    cmsh -c "device imageupdate -n $batch1 -w --wait"
    
    ### Verify that Kubeadm has been updated to the desired version
    pdsh -w $batch1 kubeadm version | dshbak -c
    
    ### Perform the actual upgrade
    pdsh -w $batch1 kubeadm upgrade node
    
    ### reload kubelet and restart pods
    ### Drain each node
    for host in dgx{01..15}; do kubectl drain $host --ignore-daemonsets; done
    
    ### Restart Kubelet in parallel
    pdsh -w $batch1 systemctl daemon-reload
    pdsh -w $batch1 systemctl restart kubelet
    
    ### Uncordon each drained node
    for host in dgx{01..15}; do kubectl uncordon $host; done
    
    ### Repeat the above steps for the remaining DGX nodes
    
  3. Repeat the previous steps until all the worker nodes are updated as well as any other worker image.

Operator Updates#

Kubernetes operators are deployed via BCM with Helm Charts. NVIDIA recommends administrators upgrade Kubernetes operators after all the control plane and worker nodes have been upgraded to the desired version of Kubernetes. Operators are applied system wide by deleting existing pods and rebuilding them at the new version, outages are expected during this time.

Prerequisites#

  1. Validate the current versions of the operators installed on the cluster and update the charts.

    helm list -A
    helm repo update
    
    _images/157c10da8f1cc7afb7f3bab755f8e60e1a43324d.png
  2. Obtain the customized values inputted of all the helm charts.

    mkdir values
    cd values
    helm list -A --output json | jq -r '.[] | "\(.namespace)\(.name)\(.chart)"' | while read namespace name chart; do   version="${chart##*-}";   filename="${name}_${version}.yaml";   helm get values -n "$namespace" "$name" > "$filename"; done
    

GPU Operator#

Follow the latest information found in the respective GPU Operator version.

helm upgrade gpu-operator nvidia/gpu-operator -n gpu-operator --disable-openapi-validation --version=<NEW_VERSION> -f /values/gpu-operator_<OLD_VERSION>.yaml

Network Operator#

Follow the latest information found in the respective Network Operator version.

# Validate the NicClusterPolicy status returns ready
kubectl get -n network-operator nicclusterpolicies.mellanox.com nic-cluster-policy -o yaml

# Deactivate the SR-IOV pods
kubectl -n network-operator scale deployment network-operator-sriov-network-operator --replicas=0

# Capture the existing SR-IOV node policies
kubectl get sriovnetworknodepolicies.sriovnetwork.openshift.io -n network-operator -o yaml > sriovnetworknodepolicies.yaml

# Delete the SR-IOV node policies
kubectl -n network-operator delete sriovnetworknodepolicies.sriovnetwork.openshift.io --all

# Download Helm chart
helm fetch https://helm.ngc.nvidia.com/nvidia/charts/network-operator-24.7.0.tgz

# Extract the chart
ls network-operator-*.tgz | xargs -n 1 tar xf

# Deploy the CRDs
kubectl apply -f network-operator/crds -f network-operator/charts/sriov-network-operator/crds

# Update to the desired version
helm upgrade -n network-operator network-operator nvidia/network-operator --version=<VERSION> -f /values/network-operator_<OLD_VERSION>.yaml --force

# Add back the node policies
kube apply -f sriovnetworknodepolicies.yaml

Metallb#

Follow the latest information found in the respective Metallb version.

helm upgrade metallb metallb/metallb -n metallb-system --version=<NEW-VERSION> -f /values/metallb_<OLD_VERSION>.yaml

Prometheus Stack#

Follow the latest information found in the respective Prometheus version. The instructions require the user to update from each version following their guidance. Most steps require the user to update the CRDs to a supported version prior to updating the helm charts. A sample update is outlined below.

# Update CRDs
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagerconfigs.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_probes.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_prometheusagents.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_scrapeconfigs.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/<VERSION>/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml

# Update helm
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack -n prometheus --version=<NEW-VERSION>  -f kube-prometheus-stack_<OLD-VERSION>.yaml

Kube State Metrics#

Follow the latest information found in the respective Kube State Metrics version. Unless the system was installed at a version older than v2, it is safe to upgrade using helm directly.

helm upgrade kube-state-metrics prometheus-community/kube-state-metrics -n kube-system --version=<VERSION> -f kube-state-metrics_<OLD-VERSION>.yaml

Kubeflow MPI#

Follow the latest information found in the respective Kubeflow version.

# Uninstall the v1.5.0 version of the training operator:
kubectl delete -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.4.0/deploy/v2beta1/mpi-operator.yaml

kubectl delete -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.5.0"

# Install the v1.7.0 version of the training operator:
kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.7.0"

# Delete the CRD (kubectl delete crd)
kubectl delete crd

# Apply the MPI Operator v2-beta1 manifest
kubectl create -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.6.0/deploy/v2beta1/mpi-operator.yaml

kubectl apply -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.6.0/deploy/v2beta1/mpi-operator.yaml

Calico#

The Calico system was deployed with BCM and therefore would need to be changed from the BCM menu.

  1. Verify the current version of Calico server on the BCM headnode.

    calicoctl version
    
    #Sample Output
    Client Version:    v3.26.3
    Git commit:        bdb7878af
    Cluster Version:   v3.29.2
    Cluster Type:      typha,kdd,k8s,bgp,kubeadm
    
  2. Take a backup of the current Calico configuration.

    cmsh
    
    [demeter-headnode-01]% kubernetes
    [demeter-headnode-01->kubernetes[default]]% appgroups
    [demeter-headnode-01->kubernetes[default]->appgroups]% use system
    [demeter-headnode-01->kubernetes[default]->appgroups[system]]% applications
    [demeter-headnode-01->kubernetes[default]->appgroups[system]->applications]% use calico
    [demeter-headnode-01->kubernetes[default]->appgroups[system]->applications[calico]]% get config > /root/calico_<OLD-VERSION>.yaml
    
  3. Modify the configuration file to the desired version of Calico by replacing the versions in the file.

    # Enter file edit mode.
    # VI commands can be used to replace the current version with the desired version
    [demeter-headnode-01->kubernetes[default]->appgroups[system]->applications[calico]]% set config
    
    # Replace the current version with the desired version.
    # In the example, v3.29.2 will be replaced with v3.29.4 throughout the entire file.
    :%s/v3.29.2/v3.29.4/g
    # Write to file and exit
    :wq
    
    # Commit the changes
    [demeter-headnode-01->kubernetes*[default*]->appgroups*[system*]->applications[calico]]% commit
    
    _images/1ca8d14c6413180148072b9e3837e6ae1d267025.png _images/8a8eba43092174d4114e72cbe42c01ad5089dfc7.png
  4. Validate the pods were redeployed and validate the server version is at the expected release.

    kubectl get pods -n kube-system | grep calico
    
    # Validate calico server version
    calicoctl version
    

Run:ai Updates#

Validate the suggested update path with the Run:ai SA. Below are the general steps to update a Run:ai workload managed system which can be found on the Run:ai website. Before starting the upgrade process, validate the GUI is accessible with an administrator account.

  1. Obtain the customized values inputted of all the helm charts.

mkdir values
cd values
helm list -A --output json | jq -r '.[] | "\(.namespace)\(.name)\(.chart)"' | while read namespace name chart; do   version="${chart##*-}";   filename="${name}_${version}.yaml";   helm get values -n "$namespace" "$name" > "$filename"; done
  1. Update the Run:ai control plane.

helm repo update

helm upgrade runai-backend -n runai-backend runai-backend/control-plane --version "<NEW-VERSION>" -f values/runai_control_plane_<OLD-VERSION>.yaml --reset-then-reuse-values

# Validate all the pods in the runai-backend namespace are up and running
kubectl get pods -n runai-backend
  1. Update the Run:ai cluster.

    1. Log into the Web Gui as an administrator.

    1. Select Cluster on the left hand side, and select the existing cluster.

      _images/runai3b1.png _images/runai3b2.png
    1. On the new screen, select the Run:ai version which matches the previously deployed control plane version and “Cluster Location” to “same as control plane” and then CONTINUE.

      _images/runai3c.png
    2. The new screen will have the commands which are necessary to update the cluster from the BCM headnode.

      _images/runai3d.png
      helm repo add runai https://runai.jfrog.io/artifactory/api/helm/op-charts-prod
      helm repo update
      helm upgrade -i runai-cluster runai/runai-cluster -n runai \
      --set controlPlane.url=Automatic FQDN \
      --set controlPlane.clientSecret=EZQUsaZ6L9Oyq0l6v3tW8QnYMX6LAkef \
      --set cluster.uid=438122b4-a392-41aa-b190-970ae72ab252 \
      --set cluster.url=Automatic FQDN --version=2.19.x --create-namespace