Install Kubernetes

Download release tarball Release=1.4.6 release-os-arch.tar.gz the cri-containerd-cni includes the systemd service file, shims, crictl tools etc. compared to the containerd tarball

  1. curl -fOsSL https://github.com/containerd/containerd/releases/download/v1.4.3/cri-containerd-cni-1.4.3-linux-amd64.tar.gz

Install containerd #. sudo tar –no-overwrite-dir -C / -xzf cri-containerd-${VERSION}.linux-amd64.tar.gz #. sudo systemctl start containerd

Disable swap sudo swapoff -a Let iptables see bridged traffic

sudo modprobe br_netfilter cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF sudo sysctl –system

Install pre-requisities for containerd

Install containerd Configure systemd Manually configure cgroup driver for kubelet https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd

Install kublet, kubectl and kubeadm Create Systemd Drop-In for Containerd sudo vim /etc/systemd/system/kubelet.service.d/0-containerd.conf [Service] Environment=”KUBELET_EXTRA_ARGS=–container-runtime=remote –runtime-request-timeout=15m –container-runtime-endpoint=unix:///run/containerd/containerd.sock –cgroup-driver=’systemd’” systemctl daemon-reload systemctl restart kubelet

install nvidia-container-runtime

Introduction

Kubernetes is an open-source platform for automating deployment, scaling and managing containerized applications. Kubernetes includes support for GPUs and enhancements to Kubernetes so users can easily configure and use GPU resources for accelerating workloads such as deep learning. This document describes two methods for installing upstream Kubernetes with NVIDIA supported components, such as drivers, plugins and runtime - a method using DeepOps and a method using Kubeadm.

To set up orchestration and scheduling in your cluster, it is highly recommended that you use DeepOps. DeepOps is a modular collection of ansible scripts which automate the deployment of Kubernetes, Slurm, or a hybrid combination of the two across your nodes. It also installs the necessary GPU drivers, NVIDIA Container Toolkit for Docker (nvidia-docker2), and various other dependencies for GPU-accelerated work. Encapsulating best practices for NVIDIA GPUs, it can be customized or run as individual components, as needed.

With kubeadm, this document will walk through the steps for installing a single node Kubernetes cluster (where we untaint the control plane so it can run GPU pods), but the cluster can be scaled easily with additional nodes.

Installing Kubernetes Using DeepOps

Use DeepOps to automate deployment, especially for a cluster of many worker nodes. Use the following procedure to install Kubernetes using DeepOps:

  1. Pick a provisioning node to deploy from. This is where the DeepOps Ansible scripts run from and is often a development laptop that has a connection to the target cluster. On this provisioning node, clone the DeepOps repository with the following command:

    $ git clone https://github.com/NVIDIA/deepops.git
    
  2. Optionally, check out a recent release tag with the following command:

    $ cd deepops \
       && git checkout tags/20.10
    

    If you do not explicitly use a release tag, then the latest development code is used, and not an official release.

  3. Follow the instructions in the DeepOps Kubernetes Deployment Guide to install Kubernetes.

Installing Kubernetes Using Kubeadm

Note

The method described in this section is an alternative to using DeepOps. If you have deployed using DeepOps, then skip this section.

For a less scripted approach, especially for smaller clusters or where there is a desire to learn the components that make up a Kubernetes cluster, use Kubeadm.

A Kubernetes cluster is composed of master nodes and worker nodes. The master nodes run the control plane components of Kubernetes which allows your cluster to function properly. These components include the API Server (front-end to the kubectl CLI), etcd (stores the cluster state) and others.

Use CPU-only (GPU-free) master nodes, which run the control plane components: Scheduler, API-server, and Controller Manager. Control plane components can have some impact on your CPU intensive tasks and conversely, CPU or HDD/SSD intensive components can have an impact on your control plane components.

Before You Begin

Before proceeding to install the components, check that all Kubernetes prerequisites have been satisfied. These prerequisites include:

  • Check network adapters and required ports

  • Disable swap on the nodes so that kubelet can work correctly

  • Install a supported container runtime such as Docker, containerd or CRI-O

Depending on your Linux distribution, refer to the steps below:

Ubuntu LTS

This section provides steps for setting up K8s on Ubuntu 18.04 and 20.04 LTS distributions.

Install Docker

Follow the steps in this guide to install Docker on Ubuntu.

Install Kubernetes components

First, install some dependencies:

$ sudo apt-get update \
   && sudo apt-get install -y apt-transport-https curl

Add the package repository keys:

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

And the repository:

$ cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF

Update the package listing and install the required packages, and init using kubeadm:

$ sudo apt-get update \
   && sudo apt-get install -y -q kubelet kubectl kubeadm \
   && sudo kubeadm init --pod-network-cidr=192.168.0.0/16

Finish the configuration setup with Kubeadm:

$ mkdir -p $HOME/.kube \
   && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config \
   && sudo chown $(id -u):$(id -g) $HOME/.kube/config

Configure networking

Now, setup networking with Calico:

$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

Untaint the control plane, so it can be used to schedule GPU pods in our simplistic single-node cluster:

$ kubectl taint nodes --all node-role.kubernetes.io/master-

Your cluster should now be ready to schedule containerized applications.

CentOS

Follow the steps in this section for setting up K8s on CentOS 7/8.

Note

If you’re using CentOS 7/8 on a cloud IaaS platform such as EC2, then you may need to do some additional setup as listed here:

  1. Choose an official CentOS image for your EC2 region: https://wiki.centos.org/Cloud/AWS

  2. Install some of the prerequisites:

    On CentOS 8:

    $ sudo dnf install -y tar bzip2 make automake gcc gcc-c++ \
       pciutils elfutils-libelf-devel libglvnd-devel \
       iptables firewalld bind-utils \
       vim wget
    

    On CentOS 7:

    $ sudo yum install -y tar bzip2 make automake gcc gcc-c++ \
       pciutils elfutils-libelf-devel libglvnd-devel \
       iptables firewalld bind-utils \
       vim wget
    
  3. Update the running kernel to ensure you’re running the latest updates

    On CentOS 8:

    $ sudo dnf update -y
    

    On CentOS 7:

    $ sudo yum update -y
    
  4. Reboot your VM

    $ sudo reboot
    

Disable Nouveau

For a successful install of the NVIDIA driver, the Nouveau drivers must first be disabled.

Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:

blacklist nouveau
options nouveau modeset=0

Regenerate the kernel initramfs:

$ sudo dracut --force

Reboot the system before proceeding with the rest of this guide.

Install Docker

Follow the steps in this guide to install Docker on CentOS 7/8.

Configuring the system

For the remaining part of this section, we will follow the general steps for using kubeadm. Also, for convenience, let’s enter into an interactive sudo session since most of the remaining commands require root privileges:

$ sudo -i
Disabling SELinux
$ setenforce 0 \
   && sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
Bridged traffic and iptables

As mentioned in the kubedadm documentation, ensure that the br_netfilter module is loaded:

$ modprobe br_netfilter

Ensure net.bridge.bridge-nf-call-iptables is configured correctly:

$ cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

and restart the sysctl config:

$ sysctl --system
Firewall and required ports

The network plugin requires certain ports to be open on the control plane and worker nodes. See this table for more information on the purpose of these port numbers.

Ensure that firewalld is running:

$ systemctl status firewalld

and if required, start firewalld:

$ systemctl --now enable firewalld

Now open the ports:

$ firewall-cmd --permanent --add-port=6443/tcp \
   && firewall-cmd --permanent --add-port=2379-2380/tcp \
   && firewall-cmd --permanent --add-port=10250/tcp \
   && firewall-cmd --permanent --add-port=10251/tcp \
   && firewall-cmd --permanent --add-port=10252/tcp \
   && firewall-cmd --permanent --add-port=10255/tcp

Its also required to add the docker0 interface to the public zone and allow for docker0 ingress and egress:

On CentOS 8:

$ nmcli connection modify docker0 connection.zone public \
   && firewall-cmd --zone=public --add-masquerade --permanent \
   && firewall-cmd --zone=public --add-port=443/tcp

On CentOS 7:

$ firewall-cmd --zone=public --add-masquerade --permanent \
   && firewall-cmd --zone=public --add-port=443/tcp

Reload the firewalld configuration and dockerd for the settings to take effect:

$ firewall-cmd --reload \
   && systemctl restart docker

Optionally, before we install the Kubernetes control plane, test your container networking using a simple ping command:

$ docker run busybox ping google.com
Disable swap

For performance, disable swap on your system:

$ swapoff -a

Install Kubernetes components

Add the network repository listing to the package manager configuration:

$ cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

Install the components:

On CentOS 8:

$ dnf install -y kubelet kubectl kubeadm

On CentOS 7:

$ yum install -y kubelet kubectl kubeadm

Ensure that kubelet is started across system reboots:

$ systemctl --now enable kubelet

Now use kubeadm to initialize the control plane:

$ kubeadm init --pod-network-cidr=192.168.0.0/16

At this point, feel free to exit from the interactive sudo session that we started with.

Configure directories

To start using the cluster, run the following as a regular user:

$ mkdir -p $HOME/.kube \
   && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config \
   && sudo chown $(id -u):$(id -g) $HOME/.kube/config

If you’re using a simplistic cluster (or just testing), you can untaint the control plane node so that it can also run containers:

$ kubectl taint nodes --all node-role.kubernetes.io/master-

At this point, your cluster would look like below:

$ kubectl get pods -A
NAMESPACE     NAME                                                    READY   STATUS    RESTARTS   AGE
kube-system   coredns-f9fd979d6-46hmf                                 0/1     Pending   0          23s
kube-system   coredns-f9fd979d6-v7v4d                                 0/1     Pending   0          23s
kube-system   etcd-ip-172-31-54-109.ec2.internal                      0/1     Running   0          38s
kube-system   kube-apiserver-ip-172-31-54-109.ec2.internal            1/1     Running   0          38s
kube-system   kube-controller-manager-ip-172-31-54-109.ec2.internal   0/1     Running   0          37s
kube-system   kube-proxy-xd5zg                                        1/1     Running   0          23s
kube-system   kube-scheduler-ip-172-31-54-109.ec2.internal            0/1     Running   0          37s
Configure networking

For the purposes of this document, we will use Calico as a network plugin to configure networking in our Kubernetes cluster. Due to an issue with Calico and iptables on CentOS, let’s modify the configuration before deploying the plugin.

Download the calico configuration:

$ curl -fOSsL https://docs.projectcalico.org/manifests/calico.yaml

And add the following configuration options to the environment section:

- name: FELIX_IPTABLESBACKEND
  value: "NFT"

Save the modified file and then deploy the plugin:

$ kubectl apply -f ./calico.yaml

After a few minutes, you can see that the networking has been configured:

NAMESPACE     NAME                                                    READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-5c6f6b67db-wmts9                1/1     Running   0          99s
kube-system   calico-node-fktnf                                       1/1     Running   0          100s
kube-system   coredns-f9fd979d6-46hmf                                 1/1     Running   0          3m22s
kube-system   coredns-f9fd979d6-v7v4d                                 1/1     Running   0          3m22s
kube-system   etcd-ip-172-31-54-109.ec2.internal                      1/1     Running   0          3m37s
kube-system   kube-apiserver-ip-172-31-54-109.ec2.internal            1/1     Running   0          3m37s
kube-system   kube-controller-manager-ip-172-31-54-109.ec2.internal   1/1     Running   0          3m36s
kube-system   kube-proxy-xd5zg                                        1/1     Running   0          3m22s
kube-system   kube-scheduler-ip-172-31-54-109.ec2.internal            1/1     Running   0          3m36s

To verify that networking has been setup successfully, let’s use the multitool container:

$ kubectl run multitool --image=praqma/network-multitool --restart Never

and then run a simple ping command to ensure that the DNS servers can be detected correctly:

$ kubectl exec multitool -- bash -c 'ping google.com'
PING google.com (172.217.9.206) 56(84) bytes of data.
64 bytes from iad30s14-in-f14.1e100.net (172.217.9.206): icmp_seq=1 ttl=53 time=0.569 ms
64 bytes from iad30s14-in-f14.1e100.net (172.217.9.206): icmp_seq=2 ttl=53 time=0.548 ms