Install Kubernetes

Introduction

Kubernetes is an open-source platform for automating deployment, scaling and managing containerized applications. Kubernetes includes support for GPUs and enhancements to Kubernetes so users can easily configure and use GPU resources for accelerating AI and HPC workloads.

There are many ways to install upstream Kubernetes with NVIDIA supported components, such as drivers, plugins and runtime. This document describes a few methods for getting started with Kubernetes. Click on the links below to map out the options that you would like to follow:

  1. Option 1: Using DeepOps

  2. Option 2: Using Kubeadm to install Kubernetes

    • Option 2-a: Use the NVIDIA GPU Operator to automate/manage the deployment of the NVIDIA software components

    • Option 2-b: Set up the NVIDIA software components as pre-requisites before running applications

Option 1: Installing Kubernetes Using DeepOps

Use DeepOps to automate deployment, especially for a cluster of many worker nodes. DeepOps is a modular collection of ansible scripts which automate the deployment of Kubernetes, Slurm, or a hybrid combination of the two across your nodes. It also installs the necessary GPU drivers, NVIDIA Container Toolkit for Docker (nvidia-docker2), and various other dependencies for GPU-accelerated work. Encapsulating best practices for NVIDIA GPUs, it can be customized or run as individual components, as needed.

Use the following procedure to install Kubernetes using DeepOps:

  1. Pick a provisioning node to deploy from. This is where the DeepOps Ansible scripts run from and is often a development laptop that has a connection to the target cluster. On this provisioning node, clone the DeepOps repository with the following command:

    $ git clone https://github.com/NVIDIA/deepops.git
    
  2. Optionally, check out a recent release tag with the following command:

    $ cd deepops \
       && git checkout tags/20.10
    

    If you do not explicitly use a release tag, then the latest development code is used, and not an official release.

  3. Follow the instructions in the DeepOps Kubernetes Deployment Guide to install Kubernetes.

Option 2: Installing Kubernetes Using Kubeadm

Note

The method described in this section is an alternative to using DeepOps. If you have deployed using DeepOps, then skip this section.

For a less scripted approach, especially for smaller clusters or where there is a desire to learn the components that make up a Kubernetes cluster, use Kubeadm.

A Kubernetes cluster is composed of master nodes and worker nodes. The master nodes run the control plane components of Kubernetes which allows your cluster to function properly. These components include the API Server (front-end to the kubectl CLI), etcd (stores the cluster state) and others.

Use CPU-only (GPU-free) master nodes, which run the control plane components: Scheduler, API-server, and Controller Manager. Control plane components can have some impact on your CPU intensive tasks and conversely, CPU or HDD/SSD intensive components can have an impact on your control plane components.

With kubeadm, this document will walk through the steps for installing a single node Kubernetes cluster (where we untaint the control plane so it can run GPU pods), but the cluster can be scaled easily with additional nodes.

Step 0: Before You Begin

Before proceeding to install the components, check that all Kubernetes prerequisites have been satisfied. These prerequisites include:

  • Check network adapters and required ports

  • Disable swap on the nodes so that kubelet can work correctly

  • Install a supported container runtime such as Docker, containerd or CRI-O

Depending on your Linux distribution, refer to the steps below:

Ubuntu LTS

This section provides steps for setting up K8s on Ubuntu 18.04 and 20.04 LTS distributions.

Step 1: Install a Container Engine

NVIDIA supports running GPU containers with Docker and other CRI compliant runtimes such as containerd or CRI-O.

Follow the steps in this guide to install Docker.

First, install some pre-requisites for containerd:

$ sudo apt-get update \
   && sudo apt-get install -y apt-transport-https \
      ca-certificates curl software-properties-common

The overlay and br_netfilter modules are required to be loaded:

$ cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
$ sudo modprobe overlay \
   && sudo modprobe br_netfilter

Setup the required sysctl parameters and make them persistent:

$ cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
$ sudo sysctl --system

Now proceed to setup the Docker repository:

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

Install containerd:

$ sudo apt-get update \
   && sudo apt-get install -y containerd.io

Create a default config.toml:

$ sudo mkdir -p /etc/containerd \
   && sudo containerd config default | sudo tee /etc/containerd/config.toml

Configure containerd to use the systemd cgroup driver with runc by editing the configuration file and adding this line:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
   SystemdCgroup = true

Now restart the daemon:

$ sudo systemctl restart containerd

Step 2: Install Kubernetes Components

First, install some dependencies:

$ sudo apt-get update \
   && sudo apt-get install -y apt-transport-https curl

Add the package repository keys:

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

And the repository:

$ cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF

Update the package listing and install kubelet:

$ sudo apt-get update \
   && sudo apt-get install -y -q kubelet kubectl kubeadm

Note

If you’re using containerd as the CRI runtime, then follow these steps:

  1. Configure the cgroup driver for kubelet:

    $ sudo mkdir -p  /etc/systemd/system/kubelet.service.d/
    
    $ sudo cat << EOF | sudo tee  /etc/systemd/system/kubelet.service.d/0-containerd.conf
    [Service]
    Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock --cgroup-driver='systemd'"
    EOF
    
  2. Restart kubelet:

    $ sudo systemctl daemon-reload \
       && sudo systemctl restart kubelet
    

And init using kubeadm:

$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16

Finish the configuration setup with Kubeadm:

$ mkdir -p $HOME/.kube \
   && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config \
   && sudo chown $(id -u):$(id -g) $HOME/.kube/config

Step 3: Configure Networking

Now, setup networking with Calico:

$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

Untaint the control plane, so it can be used to schedule GPU pods in our simplistic single-node cluster:

$ kubectl taint nodes --all node-role.kubernetes.io/master-

Your cluster should now be ready to schedule containerized applications.

CentOS

Follow the steps in this section for setting up K8s on CentOS 7/8.

Note

If you’re using CentOS 7/8 on a cloud IaaS platform such as EC2, then you may need to do some additional setup as listed here:

  1. Choose an official CentOS image for your EC2 region: https://wiki.centos.org/Cloud/AWS

  2. Install some of the prerequisites:

    $ sudo dnf install -y tar bzip2 make automake gcc gcc-c++ \
       pciutils elfutils-libelf-devel libglvnd-devel \
       iptables firewalld bind-utils \
       vim wget
    
    $ sudo yum install -y tar bzip2 make automake gcc gcc-c++ \
       pciutils elfutils-libelf-devel libglvnd-devel \
       iptables firewalld bind-utils \
       vim wget
    
  3. Update the running kernel to ensure you’re running the latest updates

    $ sudo dnf update -y
    
    $ sudo yum update -y
    
  4. Reboot your VM

    $ sudo reboot
    

Step 0: Configuring the System

Disable Nouveau

For a successful install of the NVIDIA driver, the Nouveau drivers must first be disabled.

Determine if the nouveau driver is loaded:

$ lsmod | grep -i nouveau

Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:

blacklist nouveau
options nouveau modeset=0

Regenerate the kernel initramfs:

$ sudo dracut --force

Reboot the system before proceeding with the next step.

For the remaining part of this section, we will follow the general steps for using kubeadm. Also, for convenience, let’s enter into an interactive sudo session since most of the remaining commands require root privileges:

$ sudo -i
Disable SELinux
$ setenforce 0 \
   && sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
Bridged traffic and iptables

As mentioned in the kubedadm documentation, ensure that the br_netfilter module is loaded:

$ modprobe br_netfilter

Ensure net.bridge.bridge-nf-call-iptables is configured correctly:

$ cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

and restart the sysctl config:

$ sysctl --system
Firewall and required ports

The network plugin requires certain ports to be open on the control plane and worker nodes. See this table for more information on the purpose of these port numbers.

Ensure that firewalld is running:

$ systemctl status firewalld

and if required, start firewalld:

$ systemctl start firewalld \
   && systemctl enable firewalld

Now open the ports:

$ firewall-cmd --permanent --add-port=6443/tcp \
   && firewall-cmd --permanent --add-port=2379-2380/tcp \
   && firewall-cmd --permanent --add-port=10250/tcp \
   && firewall-cmd --permanent --add-port=10251/tcp \
   && firewall-cmd --permanent --add-port=10252/tcp \
   && firewall-cmd --permanent --add-port=10255/tcp

Its also required to add the docker0 interface to the public zone and allow for docker0 ingress and egress:

$ nmcli connection modify docker0 connection.zone public \
   && firewall-cmd --zone=public --add-masquerade --permanent \
   && firewall-cmd --zone=public --add-port=443/tcp
$ firewall-cmd --zone=public --add-masquerade --permanent \
   && firewall-cmd --zone=public --add-port=443/tcp

Reload the firewalld configuration and dockerd for the settings to take effect:

$ firewall-cmd --reload \
   && systemctl restart docker

Optionally, before we install the Kubernetes control plane, test your container networking using a simple ping command:

$ docker run busybox ping google.com
Disable swap

For performance, disable swap on your system:

$ swapoff -a

Step 1: Install Docker

Follow the steps in this guide to install Docker on CentOS 7/8.

Step 2: Install Kubernetes Components

Add the network repository listing to the package manager configuration:

$ cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

Install the components:

$ dnf install -y kubelet kubectl kubeadm
$ yum install -y kubelet kubectl kubeadm

Ensure that kubelet is started across system reboots:

$ systemctl enable kubelet \
   && systemctl start kubelet

Now use kubeadm to initialize the control plane:

$ kubeadm init --pod-network-cidr=192.168.0.0/16

At this point, feel free to exit from the interactive sudo session that we started with.

Configure Directories

To start using the cluster, run the following as a regular user:

$ mkdir -p $HOME/.kube \
   && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config \
   && sudo chown $(id -u):$(id -g) $HOME/.kube/config

If you’re using a simplistic cluster (or just testing), you can untaint the control plane node so that it can also run containers:

$ kubectl taint nodes --all node-role.kubernetes.io/master-

At this point, your cluster would look like below:

$ kubectl get pods -A
NAMESPACE     NAME                                                    READY   STATUS    RESTARTS   AGE
kube-system   coredns-f9fd979d6-46hmf                                 0/1     Pending   0          23s
kube-system   coredns-f9fd979d6-v7v4d                                 0/1     Pending   0          23s
kube-system   etcd-ip-172-31-54-109.ec2.internal                      0/1     Running   0          38s
kube-system   kube-apiserver-ip-172-31-54-109.ec2.internal            1/1     Running   0          38s
kube-system   kube-controller-manager-ip-172-31-54-109.ec2.internal   0/1     Running   0          37s
kube-system   kube-proxy-xd5zg                                        1/1     Running   0          23s
kube-system   kube-scheduler-ip-172-31-54-109.ec2.internal            0/1     Running   0          37s

Step 3: Configure Networking

For the purposes of this document, we will use Calico as a network plugin to configure networking in our Kubernetes cluster. Due to an issue with Calico and iptables on CentOS, let’s modify the configuration before deploying the plugin.

Download the calico configuration:

$ curl -fOSsL https://docs.projectcalico.org/manifests/calico.yaml

And add the following configuration options to the environment section:

- name: FELIX_IPTABLESBACKEND
  value: "NFT"

Save the modified file and then deploy the plugin:

$ kubectl apply -f ./calico.yaml

After a few minutes, you can see that the networking has been configured:

NAMESPACE     NAME                                                    READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-5c6f6b67db-wmts9                1/1     Running   0          99s
kube-system   calico-node-fktnf                                       1/1     Running   0          100s
kube-system   coredns-f9fd979d6-46hmf                                 1/1     Running   0          3m22s
kube-system   coredns-f9fd979d6-v7v4d                                 1/1     Running   0          3m22s
kube-system   etcd-ip-172-31-54-109.ec2.internal                      1/1     Running   0          3m37s
kube-system   kube-apiserver-ip-172-31-54-109.ec2.internal            1/1     Running   0          3m37s
kube-system   kube-controller-manager-ip-172-31-54-109.ec2.internal   1/1     Running   0          3m36s
kube-system   kube-proxy-xd5zg                                        1/1     Running   0          3m22s
kube-system   kube-scheduler-ip-172-31-54-109.ec2.internal            1/1     Running   0          3m36s

To verify that networking has been setup successfully, let’s use the multitool container:

$ kubectl run multitool --image=praqma/network-multitool --restart Never

and then run a simple ping command to ensure that the DNS servers can be detected correctly:

$ kubectl exec multitool -- bash -c 'ping google.com'
PING google.com (172.217.9.206) 56(84) bytes of data.
64 bytes from iad30s14-in-f14.1e100.net (172.217.9.206): icmp_seq=1 ttl=53 time=0.569 ms
64 bytes from iad30s14-in-f14.1e100.net (172.217.9.206): icmp_seq=2 ttl=53 time=0.548 ms

Step 4: Setup NVIDIA Software

At this point in our journey, you should have a working Kubernetes control plane and worker nodes attached to your cluster. We can proceed to configure the NVIDIA software on the worker nodes. As described at the beginning of the document, there are two options:

NVIDIA GPU Operator

Use the NVIDIA GPU Operator to automatically setup and manage the NVIDIA software components on the worker nodes.

This is the preferred way as it provides a 1-click install experience.

Install NVIDIA Dependencies

The GPU worker nodes in the Kubernetes cluster need to be enabled with the following components:

  1. NVIDIA drivers

  2. NVIDIA Container Toolkit

  3. NVIDIA Kubernetes Device Plugin (and optionally GPU Feature Discovery plugin)

  4. (Optional) DCGM-Exporter to gather GPU telemetry and integrate into a monitoring stack such as Prometheus

Let’s walk through these steps.

Install NVIDIA Drivers

This section provides a summary of the steps for installing the driver using the apt package manager on Ubuntu LTS.

Note

For complete instructions on setting up NVIDIA drivers, visit the quickstart guide at https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html. The guide covers a number of pre-installation requirements and steps on supported Linux distributions for a successful install of the driver.

Install the kernel headers and development packages for the currently running kernel:

$ sudo apt-get install linux-headers-$(uname -r)

Setup the CUDA network repository and ensure packages on the CUDA network repository have priority over the Canonical repository:

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g') \
   && wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin \
   && sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600

Install the CUDA repository GPG key:

$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub \
   && echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list

Update the apt repository cache and install the driver using the cuda-drivers or cuda-drivers-<branch-number> meta-package. Use the --no-install-recommends option for a lean driver install without any dependencies on X packages. This is particularly useful for headless installations on cloud instances:

$ sudo apt-get update \
   && sudo apt-get -y install cuda-drivers
Install NVIDIA Container Toolkit (nvidia-docker2)

First, setup the stable repository for the NVIDIA runtime and the GPG key:

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Depending on the container engine, you would need to use different packages.

Install the nvidia-docker2 package (and its dependencies) after updating the package listing:

$ sudo apt-get update \
   && sudo apt-get install -y nvidia-docker2

Since Kubernetes does not support the --gpus option with Docker yet, the nvidia runtime should be setup as the default container runtime for Docker on the GPU node. This can be done by adding the default-runtime line into the Docker daemon config file, which is usually located on the system at /etc/docker/daemon.json:

{
   "default-runtime": "nvidia",
   "runtimes": {
      "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
      }
   }
}

Restart the Docker daemon to complete the installation after setting the default runtime:

$ sudo systemctl restart docker

At this point, a working setup can be tested by running a base CUDA container:

$ sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

You should observe an output as shown below:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Install the nvidia-container-runtime package (and its dependencies) after updating the package listing:

$ sudo apt-get update \
   && sudo apt-get install -y nvidia-container-runtime

Next, containerd’s configuration file (config.toml) needs to be updated to set the default runtime to nvidia. The new configuration changes are shown in the patch below:

--- config.toml 2020-12-17 19:13:03.242630735 +0000
+++ /etc/containerd/config.toml 2020-12-17 19:27:02.019027793 +0000
@@ -70,7 +70,7 @@
   ignore_image_defined_volumes = false
   [plugins."io.containerd.grpc.v1.cri".containerd]
      snapshotter = "overlayfs"
-      default_runtime_name = "runc"
+      default_runtime_name = "nvidia"
      no_pivot = false
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
@@ -94,6 +94,15 @@
         privileged_without_host_devices = false
         base_runtime_spec = ""
         [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+            SystemdCgroup = true
+       [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
+          privileged_without_host_devices = false
+          runtime_engine = ""
+          runtime_root = ""
+          runtime_type = "io.containerd.runc.v1"
+          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
+            BinaryName = "/usr/bin/nvidia-container-runtime"
+            SystemdCgroup = true
   [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"

Finally, restart containerd:

$ sudo systemctl restart containerd
Install NVIDIA Device Plugin

To use GPUs in Kubernetes, the NVIDIA Device Plugin is required. The NVIDIA Device Plugin is a daemonset that automatically enumerates the number of GPUs on each node of the cluster and allows pods to be run on GPUs.

The preferred method to deploy the device plugin is as a daemonset using helm. First, install Helm:

$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
   && chmod 700 get_helm.sh \
   && ./get_helm.sh

Add the nvidia-device-plugin helm repository:

$ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin \
   && helm repo update

Deploy the device plugin:

$ helm install --generate-name nvdp/nvidia-device-plugin

For more user configurable options while deploying the daemonset, refer to the documentation

At this point, all the pods should be deployed:

$ kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS      RESTARTS   AGE
kube-system   calico-kube-controllers-5fbfc9dfb6-2ttkk   1/1     Running     3          9d
kube-system   calico-node-5vfcb                          1/1     Running     3          9d
kube-system   coredns-66bff467f8-jzblc                   1/1     Running     4          9d
kube-system   coredns-66bff467f8-l85sz                   1/1     Running     3          9d
kube-system   etcd-ip-172-31-81-185                      1/1     Running     4          9d
kube-system   kube-apiserver-ip-172-31-81-185            1/1     Running     3          9d
kube-system   kube-controller-manager-ip-172-31-81-185   1/1     Running     3          9d
kube-system   kube-proxy-86vlr                           1/1     Running     3          9d
kube-system   kube-scheduler-ip-172-31-81-185            1/1     Running     4          9d
kube-system   nvidia-device-plugin-1595448322-42vgf      1/1     Running     2          9d

To test whether CUDA jobs can be deployed, run a sample CUDA vectorAdd application:

The pod spec is shown for reference below, which requests 1 GPU:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-operator-test
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vector-add
    image: "nvidia/samples:vectoradd-cuda10.2"
    resources:
      limits:
         nvidia.com/gpu: 1

Save this podspec as gpu-pod.yaml. Now, deploy the application:

$ kubectl apply -f gpu-pod.yaml

Check the logs to ensure the app completed successfully:

$ kubectl get pods gpu-operator-test
NAME                READY   STATUS      RESTARTS   AGE
gpu-operator-test   0/1     Completed   0          9d

And check the logs of the gpu-operator-test pod:

$ kubectl logs gpu-operator-test
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
GPU Telemetry

Refer to the DCGM-Exporter documentation to get started with integrating GPU metrics into a Prometheus monitoring system.