1. Introduction

Kubernetes is an open-source platform for automating deployment, scaling and managing containerized applications. Kubernetes on NVIDIA GPUs includes support for GPUs and enhancements to Kubernetes so users can easily configure and use GPU resources for accelerating workloads such as deep learning. This document serves as a step-by-step guide to installing Kubernetes and using it with NVIDIA GPUs.

2. Supported Platforms

This release of Kubernetes is supported on the following platforms. Note that there are certain prerequisites that must be satisfied before proceeding to install Kubernetes. These are detailed in the “Before you begin” section.

On-Premises

Cloud

  • NVIDIA GPU Cloud virtual machine images available on Amazon EC2 and Google Cloud Platform.

3. Before You Begin: DGX Systems

Installation of Kubernetes on DGX systems has certain requirements that must be met. This section provides information on those prerequisites.

3.1. Hardware and Software Requirements

Installation of Kubernetes on DGX requires the following hardware and software:

3.2. Upgrading to the NVIDIA Container Runtime for Docker

Each DGX worker node requires the NVIDIA Container Runtime for Docker (nvidia-docker2) in order to leverage the NVIDIA Device Plugin feature. In order to ensure a smooth upgrade to nvidia-docker2, follow the steps detailed in the Upgrading to the NVIDIA Container Runtime for Docker Guide.

3.3. DGX Station

On DGX Station (and other Ubuntu 16.04 desktop systems), there is a known issue with Kubernetes 1.9.7 and Ubuntu 16.04 Desktop where the kube-dns service will not run. In order to work around this issue, take the following actions, depending on the DNS resolver service you are using. In most cases for Ubuntu 16.04 desktop systems, NetworkManager is the DNS resolver service and the procedure in For NetworkManager applies.

3.3.1. For NetworkManager

  1. Find the active interface.
    $ route | grep '^default' | grep -o '[^ ]*$'

    (Alternately, use ifconfig.)

  2. Determine the nameservers. For interface, use the active interface listed in the output of the previous command.
    $ nmcli device show interface | grep IP4.DNS

    For example:

    $ nmcli device show enp2s0f0 | grep IP4.DNS
    IP4.DNS[1]:                             192.0.2.0
    IP4.DNS[2]:                             192.0.2.1
    IP4.DNS[3]:                             192.0.2.2
    IP4.DNS[4]:                             192.0.2.3
    IP4.DNS[5]:                             192.0.2.4
  3. Copy /etc/resolv.conf to /etc/customResolv.conf and modify /etc/customResolv.conf to only include the nameservers listed in the output of the previous command.
  4. Add the following line to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf.
    Environment=”KUBELET_RESOLVER_ARGS=--resolv-conf=/etc/customResolv.conf”
  5. Start kubelet.
    $ sudo systemctl start kubelet

3.3.2. For systemd-resolved

  1. Add the following line to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf.
    Environment=”KUBELET_RESOLVER_ARGS=--resolv-conf=/run/systemd/resolve/resolv.conf”
  2. Start kubelet.
    $ sudo systemctl start kubelet

4. Installing Kubernetes

Kubernetes can be deployed through different mechanisms. NVIDIA recommends using kubeadm to deploy Kubernetes.

4.1. Master Nodes

The master nodes run the control plane components of Kubernetes. These include the API Server (front-end to the kubectl CLI), etcd (stores the cluster state) and other components. Master nodes need to be setup with the following three components, of which only the kubelet has been customized with changes from NVIDIA:
  • Kubelet
  • Kubeadm
  • Kubectl
We recommend that your master nodes not be equipped with GPUs and to only run the master components, such as the following:
  • Scheduler
  • API-server
  • Controller Manager

4.1.1. Installing and Running Kubernetes

Before proceeding to install the components, check that all the Kubernetes prerequisites have been satisfied. This includes the following
  • Check network adapters and required ports
  • Disable swap for kubelet to work correctly
  • Install dependencies such as the Docker container runtime. To install Docker on Ubuntu, follow the official instructions provided by Docker.
Note: If you are setting up a single node GPU cluster for development purposes or you want to run jobs on the master nodes as well, then you must install the NVIDIA driver and the NVIDIA Container Runtime for Docker. See the section on worker node prerequisites for more information.

Install the required components on your master node with the following procedures.

4.1.1.1. 4.2.2.1. Setting Up the Repository

  1. Add the official GPG keys.
    $ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
    $ curl -s -L https://nvidia.github.io/kubernetes/gpgkey | sudo apt-key add -
    $ curl -s -L https://nvidia.github.io/kubernetes/ubuntu16.04/nvidia-kubernetes.list |\
               sudo tee /etc/apt/sources.list.d/nvidia-kubernetes.list
  2. Update the package index.
    $ sudo apt update

Initializing the Master

  1. Install the specific versions of the components provided by NVIDIA.
    $ sudo apt install -y kubectl=1.9.7+nvidia kubelet=1.9.7+nvidia kubeadm=1.9.7+nvidia
  2. Start kubelet.
    $ sudo systemctl start kubelet
  3. You can check the status of kubelet by using the following command. Note that initialization of the kubelet will fail due to a missing CA certificate. The certificate is generated as part of initializing the cluster using kubeadm in the next step.
    $ sudo systemctl status kubelet
  4. Start the cluster (you may choose to save the token and the hash of the CA certificate as part of of kubeadm init as you will need these later to join worker nodes to the cluster).
    $ sudo kubeadm init --ignore-preflight-errors=all --config /etc/kubeadm/config.yml

4.1.2. Setting Up Your Cluster

  1. Setup your user account to administrate the cluster.
    $ mkdir -p $HOME/.kube
    $ sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
    $ sudo chown $(id -u):$(id -g) $HOME/.kube/config

Kubernetes clusters need a pod network addon installed. Flannel is recommended for multiple reasons:

  • Recommended by Kubernetes
  • Used in production clusters
  • Integrates well with the CRI-O runtime

For more information and other networking options refer to: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network.

  1. Run the following command to deploy Flannel.
    $ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
  2. Check that all the control plane components are running on the master node.
    $ kubectl get pods --all-namespaces
    NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE
    kube-system   etcd-ip-172-31-88-152                      1/1       Running   0          1m
    kube-system   kube-apiserver-ip-172-31-88-152            1/1       Running   4          1m
    kube-system   kube-controller-manager-ip-172-31-88-152   1/1       Running   1          1m
    kube-system   kube-dns-d4b8fb74c-dg8zf                   3/3       Running   0          1m
    kube-system   kube-flannel-ds-62hzb                      1/1       Running   0          1m
    kube-system   kube-proxy-96xln                           1/1       Running   0          1m
    kube-system   kube-scheduler-ip-172-31-88-152            1/1       Running   0          1m

4.1.3. Single Node Cluster

  1. This step is optional. If you are setting up a single node cluster for development purposes or you want to run jobs on the master nodes as well, you need to set Kubernetes to allow this:
    $ kubectl taint nodes --all node-role.kubernetes.io/master-
  2. After untainting the master node, the device plugin appears on your master node.
    $ kubectl get pods --all-namespaces
    NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE
    kube-system   etcd-ip-172-31-88-152                      1/1       Running   0          1m
    kube-system   kube-apiserver-ip-172-31-88-152            1/1       Running   4          1m
    kube-system   kube-controller-manager-ip-172-31-88-152   1/1       Running   1          1m
    kube-system   kube-dns-d4b8fb74c-dg8zf                   3/3       Running   0          1m
    kube-system   kube-flannel-ds-62hzb                      1/1       Running   0          1m
    kube-system   kube-proxy-96xln                           1/1       Running   0          1m
    kube-system   kube-scheduler-ip-172-31-88-152            1/1       Running   0          1m
    kube-system   nvidia-device-plugin-daemonset-hqhjr       1/1       Running   0          1m
    Note: The Kubernetes master node is now functional.

4.2. Worker Nodes

Prerequisites

  • The worker nodes must be provisioned with the NVIDIA driver. The recommended way is to use your package manager and install the cuda-drivers package (or equivalent). When no packages are available, you should use an official "runfile" that can be downloaded from NVIDIA driver downloads site.
  • Ensure that a supported version of Docker is installed before proceeding to install the NVIDIA Container Runtime for Docker (via nvidia-docker2.)

DGX and NGC Images

On DGX systems installed with nvidia-docker version 1.0.1, NVIDIA provides an option to upgrade the existing system environment to NVIDIA Container Runtime. Follow these instructions to upgrade your environment. Skip the following section and proceed to installing Kubernetes on the worker nodes.

If you are using the NGC images on AWS or GCP, then you may skip the following section and proceed to installing Kubernetes on the worker nodes.

4.2.1. Install NVIDIA Container Runtime for Docker 2.0

NVIDIA Container Runtime for Docker (nvidia-docker2) is the supported way to run GPU containers. It is more stable than nvidia-docker 1.0 and required for use of the Device Plugin feature in Kubernetes.

4.2.1.1. Uninstalling Old Versions

  1. List the volumes.
    $ docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
  2. Remove nvidia-docker 1.0.
    $ sudo apt-get purge -y nvidia-docker

4.2.1.2. Setting Up the Repository

  1. Add the official GPG keys.
    $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey |sudo apt-key add -
    $ curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
  2. Update the package index.
    $ sudo apt update

4.2.1.3. Install NVIDIA Container Runtime

  1. Install the nvidia-docker2 package.
    $ sudo apt-get install -y nvidia-docker2
  2. Restart the Docker daemon.
    $ sudo pkill -SIGHUP dockerd
  3. Check that nvidia-docker2 is properly installed.
    $ sudo docker info | grep nvidia
    Runtimes: runc nvidia
  4. Check that the new runtime is functional.
    $ docker run -it --rm --runtime nvidia nvidia/cuda nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.125                Driver Version: 384.125                   |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
    | N/A   34C    P0    20W / 300W |     10MiB / 16152MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    

4.2.2. Install and Run Kubernetes

Complete the following procedures on the GPU node to install the Kubernetes components needed for a non-master:
Note: The NVIDIA Device Plugin feature is enabled by default in the kubelet service file.

First, follow the steps in Setting Up the Repository, and then continue to the steps in Initializing the Worker.

Setting Up the Repository

  1. Add the official GPG keys.
    $ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
    $ curl -s -L https://nvidia.github.io/kubernetes/gpgkey | sudo apt-key add -
    $ curl -s -L https://nvidia.github.io/kubernetes/ubuntu16.04/nvidia-kubernetes.list |\
               sudo tee /etc/apt/sources.list.d/nvidia-kubernetes.list
  2. Update the package index.
    $ sudo apt update

4.2.2.2. Initializing the Worker

  1. Install the specific versions of the components provided by NVIDIA.
    $ sudo apt install -y kubectl=1.9.7+nvidia kubelet=1.9.7+nvidia kubeadm=1.9.7+nvidia
  2. Start Kubelet.
    $ sudo systemctl start kubelet
  3. Before starting your cluster, retrieve the token and CA certificate hash you recorded from when kubeadm init was run on the master node. Alternatively, to retrieve the token, use the following command.
    $ sudo kubeadm token create --print-join-command
  4. Join the worker node to the cluster with a command similar to the following. (The command below is an example that will not work for your installation. To convert to a real command, replace the relevant parameters with the information from step 3.)
    $ sudo kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256::<hash> --ignore-preflight-errors=all 

4.2.3. Check Your Cluster State

  1. Run the following commands on the Master Node and make sure your GPU worker nodes appear and their state transitions to Healthy. It may take a few minutes for the status to change.
    $ kubectl get all --all-namespaces
  2. List the worker nodes in your cluster to ensure that your worker nodes are visible.
    $ kubectl describe nodes

4.3. Run GPU Tasks

Run a GPU Task

Ensure that GPU support has been properly set up by running the provided simple CUDA container. (A GPU with at least 8GB is required.) There are also other examples available in the examples directory.
Start the CUDA sample workload.
$ kubectl create -f /etc/kubeadm/examples/pod.yml
nvidia.com/gpu:  8
nvidia.com/gpu:  8

Enable GPUs

Make sure the GPU support has been properly set up by running a simple CUDA container. We provided one in the artifacts you downloaded (you’ll need to have a GPU with at least 8GB). There are also other examples available in the examples directory.
  1. Start the CUDA sample workload.
    $ kubectl create -f /etc/kubeadm/examples/pod.yml
  2. When the pod is running, you can execute the nvidia-smi command inside the container.
    $ kubectl exec -it gpu-pod nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.125                Driver Version: 384.125                   |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
    | N/A   34C    P0    20W / 300W |     10MiB / 16152MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+

5. Using the New Features

Kubernetes on NVIDIA GPUs include the following features that are not yet available in the upstream release of Kubernetes:

  • GPU attributes exposed in a node
  • Scheduling improvements
  • GPU monitoring
  • Support for the CRI-O runtime preview feature

5.1. Exposing GPU Attributes In a Node

Nodes now expose the attributes of your GPUs. This can be inspected by querying the Kubernetes API at the node endpoint. The GPUs attributes currently advertised are:

  • GPU memory
  • GPU ECC
  • GPU compute capabilities
Inspect GPU attributes in a node with the following command:
$ kubectl proxy --port=8000 &
$ curl -s http://localhost:8000/api/v1/nodes | grep -B 7 -A 3 gpu-memory

5.2. Scheduling Improvements

Pods can now specify device selectors based on the attributes that are advertised on the node. These can be specified at the container level. For example:
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: cuda-container
      image: nvidia/cuda:9.0-base
      command: ["sleep"]
      args: ["100000"]
      extendedResourceRequests: ["nvidia-gpu"]
  extendedResources:
    - name: "nvidia-gpu"
      resources:
        limits:
          nvidia.com/gpu: 1
      affinity:
        required:
          - key: "nvidia.com/gpu-memory"
            operator: "Gt"
            values: ["8000"] # change value to appropriate mem for GPU
  1. Create the pod and check its status.
    $ kubectl create -f /etc/kubeadm/examples/pod.yml
  2. List the pods running in the cluster.
    $ kubectl get pods
  3. Run the nvidia-smi command inside the container.
    $ kubectl exec -it gpu-pod nvidia-smi

5.3. Monitoring GPUs in Kubernetes

To monitor the health and get metrics from GPUs in Kubernetes, an integrated monitoring stack is provided. The stack uses the NVIDIA Datacenter GPU Manager (DCGM), Prometheus (using Prometheus Operator), and Grafana for visualizing the various metrics.

Setting Up Monitoring

To set up monitoring, follow these steps.

  1. Label GPU nodes.
    $ kubectl label nodes <gpu-node-name> hardware-type=NVIDIAGPU
  2. Ensure that the label has been applied.
    $ kubectl get nodes --show-labels

Initialize Prometheus Operator

The Prometheus Operator provides definitions and deployment of Prometheus instances. More information about the Prometheus Operator is available on the project site.

  1. Install the Python virtual environment.
    $ sudo apt install -y python3 python3-pip unzip
    $ sudo -H pip3 install virtualenv
  2. Download kube-prometheus v0.18.1 and extract the archive.
    $ wget https://github.com/coreos/prometheus-operator/archive/v0.18.1.zip
    $ unzip -q v0.18.1.zip
  3. Setup DCGM and the configuration dashboard.
    $ cd prometheus-operator-*/contrib/kube-prometheus
    $ cp /etc/kubeadm/dcgm/node-exporter-daemonset.yaml manifests/node-exporter/node-exporter-daemonset.yaml
    $ cp /etc/kubeadm/dcgm/nodes.dashboard.py assets/grafana/nodes.dashboard.py
  4. Deploy the custom kube-prometheus.
    $ sudo ./hack/scripts/generate-manifests.sh
    $ sudo ./hack/cluster-monitoring/deploy
  5. Check the status of the pods. Note that it may take a few minutes for the components to initialize and start running.
    $ kubectl get pods -n monitoring
    Output is similar to the following:
    NAME                               READY        STATUS    RESTARTS      AGE
    alertmanager-main-0                	2/2   	Running   0      	1h
    alertmanager-main-1                	2/2   	Running   0      	1h
    alertmanager-main-2                	2/2   	Running   0      	1h
    grafana-6f6d695747-72k6x           	1/1   	Running   0      	1h
    kube-state-metrics-5d76466899-9cnv5	4/4   	Running   0      	1h
    node-exporter-hxms7                	3/3   	Running   0      	1h
    prometheus-k8s-0                   	2/2   	Running   0      	1h
    prometheus-k8s-1                   	2/2   	Running   0      	1h
    prometheus-operator-85c46d75d7-qx8wg      1/1   	Running   0     	 1h
  6. Ensure that there are no errors in the logs.
    $ NODE_EXPORTER_POD=`kubectl get pods -n monitoring | grep node-exporter | tr -s ' ' | cut -f 1 -d ' '`
    $ kubectl logs -n monitoring $GRAFANA_POD
    $ GRAFANA_POD=`kubectl get pods -n monitoring | grep grafana | tr -s ' ' | cut -f 1 -d ' '`
    $ kubectl logs -n monitoring $NODE_EXPORTER_POD -c nvidia-dcgm-exporter 
    $ kubectl logs -n monitoring $GRAFANA_POD
  7. Forward the port for Prometheus, so that the Prometheus dashboard can be viewed in a browser.
    $ kubectl -n monitoring port-forward prometheus-k8s-0 9090 &
  8. Forward the port for Grafana.
    $ kubectl -n monitoring port-forward $(kubectl get pods -n monitoring -lapp=grafana -ojsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}') 3000 &
  9. Open a browser window and type http://localhost:3000 to view the Nodes Dashboard in Grafana. To view the GPU metrics in Grafana, run the NBody CUDA sample provided in the examples directory.
    $ kubectl create -f /etc/kubeadm/examples/nbody.yml

5.4. CRI-O Runtime Preview Feature Support

CRI-O is a lightweight container runtime for Kubernetes as an alternative to Docker. It is an implementation of the Kubernetes CRI (Container Runtime Interface) to enable using OCI (Open Container Initiative) compatible runtimes. For more information on CRI-O, visit the CRI-O website.

Install CRI-O

  1. Add the repository.
    $ sudo add-apt-repository -y ppa:alexlarsson/flatpak
  2. Update the apt package index.
    $ sudo apt-get update
  3. Install the dependencies.
    $ sudo apt-get install -y btrfs-tools git libassuan-dev libdevmapper-dev \
         libglib2.0-dev libc6-dev libgpgme11-dev libgpg-error-dev libseccomp-dev \
         libselinux1-dev pkg-config go-md2man runc libostree-dev libapparmor-dev
  4. Install and setup Golang and setup your PATH environment variables for the current user and the root user.
    $ wget https://dl.google.com/go/go1.9.3.linux-amd64.tar.gz
    $ sudo tar -C /usr/local -xzf go1.9.3.linux-amd64.tar.gz
    
    $ export GOPATH=~/go
    $ export PATH=/usr/local/go/bin:/usr/bin:$GOPATH/bin:$PATH
    
    $ echo 'GOPATH=~/go' >> ~/.bashrc
    $ echo 'PATH=/usr/local/go/bin:/usr/bin:$GOPATH/bin:$PATH' >> ~/.bashrc
    
    $ echo "GOPATH=$HOME/go" | sudo tee --append /root/.bashrc
    $ echo 'PATH=/usr/local/go/bin:/usr/bin:$GOPATH/bin:$PATH' | sudo tee --append /root/.bashrc
  5. Install the CRI-O tools.
    $ git clone https://github.com/projectatomic/skopeo $GOPATH/src/github.com/projectatomic/skopeo
    $ cd $GOPATH/src/github.com/projectatomic/skopeo && make binary-local
  6. Install and setup CRI-O.
    $ mkdir -p $GOPATH/src/github.com/kubernetes-incubator && cd $_
    $ git clone https://github.com/kubernetes-incubator/cri-tools && cd cri-tools
    $ git checkout v1.0.0-alpha.0 && make
    $ sudo cp ~/go/bin/crictl /usr/bin/crictl
    
    $ git clone https://github.com/kubernetes-incubator/cri-o && cd cri-o
    $ git checkout release-1.9
    $ make install.tools && make && sudo make install
  7. Set up runc container runtime.
    $ wget https://github.com/opencontainers/runc/releases/download/v1.0.0-rc4/runc.amd64
    $ chmod +x runc.amd64
    $ sudo mv runc.amd64 /usr/bin/runc
  8. Set up the different policies and hooks.
    $ sudo mkdir -p /etc/containers /etc/crio /etc/cni/net.d/
    $ sudo mkdir -p /usr/share/containers/oci/hooks.d /etc/containers/oci/hooks.d/
    
    $ sudo cp seccomp.json /etc/crio/seccomp.json
    $ sudo cp test/policy.json /etc/containers/policy.json
    $ sudo cp contrib/cni/* /etc/cni/net.d/
    $ sudo cp /etc/kubeadm/crio/nvidia.json /etc/containers/oci/hooks.d/
    $ sudo cp /etc/kubeadm/crio/nvidia-crio-hook.sh /usr/bin/nvidia-crio-hook.sh

Run the CRI-O Service

  1. Create a systemd service for CRI-O.
    $ sudo sh -c 'echo "[Unit]
    Description=OCI-based implementation of Kubernetes Container Runtime Interface
    Documentation=https://github.com/kubernetes-incubator/cri-o
    
    [Service]
    ExecStart=/usr/local/bin/crio --log-level debug
    Restart=on-failure
    RestartSec=5
    
    [Install]
    WantedBy=multi-user.target" > /etc/systemd/system/crio.service'
  2. Start the systemd service.
    $ sudo systemctl enable crio
    $ sudo systemctl start crio
  3. Insure that the daemon is running.
    $ sudo crictl --runtime-endpoint /var/run/crio/crio.sock info
    Output is similar to the following:
    {
      "status": {
        "conditions": [
          {
            "type": "RuntimeReady",
            "status": true,
            "reason": "",
            "message": ""
          },
          {
            "type": "NetworkReady",
            "status": true,
            "reason": "",
            "message": ""
          }
        ]
      }
    }

Configure the Kubelet to Use CRI-O

  1. Replace the Kubelet systemd Unit file at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf The new version should have the following contents:
    [Service]
    Wants=docker.socket crio.service
    
    Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
    Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
    Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
    Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
    Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
    Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
    Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
    
    Environment="KUBELET_CRIO_ARGS=--container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-request-timeout=10m"
    
    ExecStart=
    ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_EXTRA_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS $KUBELET_CRIO_ARGS
  2. Reload and restart Kubelet.
    $ sudo systemctl daemon-reload
    $ sudo systemctl restart kubelet
  3. Join your cluster with the Kubeadm join command.
    $ sudo kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256::<hash> --skip-preflight-checks
  4. Alternatively, you can also initialize the cluster using Kubeadm.
    $ sudo kubeadm init --ignore-preflight-errors=all --config /etc/kubeadm/config.yml
    $ mkdir -p $HOME/.kube
    $ sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
    $ sudo chown -R $USER:$USER $HOME/.kube/
    
    $ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
  5. If using a single node cluster, you may need to untaint the master.
    $ kubectl taint nodes --all node-role.kubernetes.io/master-
  6. Check if all the control plane components are running on the master.
    $ kubectl get pods --all-namespaces

Enable GPUs

Make sure the GPU support has been properly set up by running a simple CUDA container. We provided one in the artifacts you downloaded (you’ll need to have a GPU with at least 8GB). There are also other examples available in the examples directory.
  1. Start the CUDA sample workload.
    $ kubectl create -f /etc/kubeadm/examples/pod.yml
  2. When the pod is running, you can execute the nvidia-smi command inside the container.
    $ kubectl exec -it gpu-pod nvidia-smi
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.125                Driver Version: 384.125                   |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
    | N/A   34C    P0    20W / 300W |     10MiB / 16152MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+

Troubleshooting

6.1. Package Installation

This topic describes common package installation issues and workarounds for those issues.

apt-update Warnings or Package Installation Errors

Check if the official GPG keys and repositories have been added correctly.

To Verify if Kubernetes (kubelet, kubectl, and kubeadm) is Installed Properly

Verify correct installation with the following commands.

$ dpkg -l ‘*kube*’| grep +nvidia
$ ls /etc/kubeadm

6.2. Cluster Initialization

This topic describes common cluster initialization issues and workarounds for those issues.

If the initialization of the cluster (kubeadm init) fails

Check status and logs with the following commands:

$ sudo systemctl status kubelet 
$ sudo journalctl -xeu kubelet
$ kubectl cluster-info dump

Check for commonly encountered network failures during the initialization process:

  • Check if port is already busy with netstat -lptn.
  • Check for time out with ufw status (firewall)

Restart Kubeadm with the following commands (It might be necessary to start kubelet after restarting Kubeadm.)

$ sudo systemctl start kubelet
$ sudo kubeadm init --ignore-preflight-errors=all --config /etc/kubeadm/config.yml

Remove the user account to administer the cluster with the following command:

$ rmdir $HOME/.kube

Set up the account again as described in section 4.1.2 of this document.

Verify User Account Permissions for Cluster Administration ($HOME/.kube)

Verify permissions of the user account with the following command:
$ sudo chown -R $(id -u):$(id -g) $HOME/.kube

Determine If the Pod Status is Other Than "Running"

Determine pod status with the following commands:

$ kubectl describe pod POD_NAME
$ kubectl get events --all-namespaces
$ kubectl logs -n NAMESPACE POD_NAME -c CONTAINER

Validation Errors

When running a GPU pod, you may encounter the following error:

error: error validating "/etc/kubeadm/pod.yml": error validating data:
ValidationError(Pod.spec.extendedResources[0]): unknown field "resources" in
 io.k8s.api.core.v1.PodExtendedResource

Ensure that you have installed the Kubernetes components (kubectl, kubelet and kubeadm) from NVIDIA and not upstream Kubernetes.

6.3. Monitoring Issues

This section describes common monitoring issues and workarounds.

Common Grafana Errors

Check if the Grafana pod is deployed and running:
 $ kubectl get pods -n monitoring | grep grafana
Check for any errors when generating manifests:
$ sudo ./hack/scripts/generate-manifests
The command should return 0. If not, the python virtualenv installation fails. fail $ echo $?

Port Forwarding Errors

Check if port 3000 is already busy with the netstat -lptn command.

Kill the port-forwarding process with the following commands:
$ jobs | grep grafana 
$ kill %JOB_ID

Destroying Pods in the Monitoring Namespace

Destroy pods in the monitoring namespace with the following commands:
$ sudo ./hack/cluster-monitoring/teardown
$ kubectl delete namespace monitoring

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA and the NVIDIA logo are trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.