OpenShift on NVIDIA GPU Accelerated Clusters

This document serves as a guide to installing Red Hat OpenShift 4.1 and using it with NVIDIA GPUs.

1. Introduction

Kubernetes is an open-source platform for automating the deployment, scaling and managing of containerized applications.

Red Hat OpenShift is a security-centric and enterprise-grade hardened Kubernetes platform for deploying and managing Kubernetes clusters at scale, developed and supported by Red Hat.

Kubernetes via Red Hat OpenShift 4.1 includes enhancements to Kubernetes so users can easily configure and use GPU resources for accelerating workloads such as deep learning.

2. Requirements

  • Red Hat Subscription

    Installing and running OpenShift requires a Red Hat account and additional subscriptions.

  • Internet Access

    To perform subscription management, including legally entitling your software from Red Hat, your systems must have direct internet access to install the cluster.

  • Hardware

    System Type

    Description

    Minimum Number of Systems

    Recommended Specs

    Deployment system

    System for executing the deployment.

    BYO

    Any Mac OS or Linux system with 300MB of disk space

    Load balancer

    The load balancer services allow external access to the OpenShift Container Platform cluster and distributes the work across various nodes of the cluster.

    BYO or 1

    Xeon Gold 5118 (12 cores, 16.5MB cache, 2.3Ghz/3.2Ghz) / 32GB RAM (distributed) / Single 40/50GbE NIC / 800GB SAS SSD

    Bootstrap

    Because each machine in the cluster requires information about the cluster when it is provisioned, the OpenShift Container Platform uses a temporary bootstrap machine during initial configuration to provide the required information to the permanent control plane deployed on the Masters. After the cluster machines initialize, the bootstrap machine is destroyed and can be reallocated.

    1 (Temporary)

     

    Master (Control plane)

    The master nodes run services that are required to control the Kubernetes cluster. In OpenShift Container Platform, the master machines are the control plane.

    3

    Dual Xeon Gold 5118 (12 cores, 16.5MB cache, 2.3Ghz/3.2Ghz) / 32GB RAM (distributed) / Single 40/50GbE NIC / 800GB SAS SSD

    Worker (Compute)

    The worker nodes are where the actual workloads requested by Kubernetes users run and are managed. The worker nodes advertise their capacity and the scheduler, which is part of the master services, determines on which nodes to start containers and Pods.

    2+

    OEM NGC-Ready T4 or V100 Systems with dual 50 GbE NICs, DGX-1 or DGX-2

  • Network Connectivity

    All systems require access to DHCP and DNS servers.

  • HTTP server

    An HTTP server that is accessible from the deployment system is necessary for deploying the systems.

  • PXE or iPXE server

    A PXE or iPXE server is needed for deploying the systems. Note, Red Hat also provides instructions for deployment without PXE but this method requires additional steps to set up the systems.

  • RHCOS

    The compressed metal BIOS, UEFI, kernel, and initramfs files from the Red Hat portal are needed for deploying on the systems. These should be located on your HTTP server.

  • (Optional) NFS Storage

    For production clusters, access to NFS is necessary for setting up a persistent volume (PV). 100GiB minimum.

3. Installation

The following image illustrates the process of creating the compute machines.

The following is a high-level overview of the installation process.
  1. Configure networking (DHCP and system to system communication).
  2. Provision two layer-4 load balancers and DNS.
  3. Download the installer to the deployment system.
  4. Create the installation configuration file.
  5. Generate Ignition files for the systems (bootstrap, masters and workers) and upload to HTTP server.
    Note: The bootstrap ignition file is only good for 24 hrs due to the certificates within.
  6. Configure systems and images for PXE boot.
  7. Create the bootstrap, master and compute systems.
  8. Create the cluster by starting the installation process cluster.
  9. Remove the bootstrap machine from the load balancer.
  10. Log into the cluster, export Kubernetes credentials for safe keeping, approve pending CSRs (certificate signing requests) if necessary and confirm all operators come online.
  11. Configure image registry storage.
    • For non-production servers, an empty directory can be used.
    • For production servers, a persistent volume (PV) is necessary (100GiB minimum, NFS).
  12. The installation is complete when all components report as available and the Kubernetes API is communicating with PODS.

The detailed installation instructions are maintained by Red Hat on their documentation website or refer to the NVIDIA blog about setting up OpenShift on VMs for more information.

4. GPU support

Note: ***The functionality described in this section is considered pre-release and the manual instructions below will be replaced with operators in the coming months. ***
The Node Feature Discovery Operator (NFD) detects hardware features and configurations in the OpenShift cluster, such as CPU type and extensions or, in our case, NVIDIA GPUs.
$ git clone h​ttps://github.com/openshift/cluster-nfd-operator
$ make -C cluster-nfd-operator deploy
After the installation completes, the NVIDIA GPU will show up in the feature list for the worker nodes as vendor ID 0x10de as described below.
oc describe node worker-0|grep 10de
feature.node.kubernetes.io/pci-10de.present=true
The Special Resource Operator (SRO) activates when a component is detected and installs the correct drivers and other software components. For NVIDIA GPUs it manages the installation process of all required NVIDIA drivers and software components.
$ git clone https://github.com/zvonkok/special-resource-operator
$ make -C special-resource-operator deploy
To validate the installation, use the following nvidia-smi.yaml file to define a Kubernetes Pod that allocates a single GPU and runs the nvidia-smi command.
apiVersion: v1
kind: Pod
metadata:
 name: nvidia-smi
spec:
 containers:
 - image: nvidia/cuda
   name: nvidia-smi
   command: [ nvidia-smi ]
   resources:
    limits:
      nvidia.com/gpu: 1
    requests:
      nvidia.com/gpu: 1

Use oc create -f nvidia-smi.yaml to create and run the pod and monitor the progress of the pod creation using oc describe pod nvidia-smi.

When completed, the output of the nvidia-smi command can be viewed with oc logs nvidia-smi:
+---------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1              |
|----------------------------+----------------------+----------------------+
| GPU Name Persistence-M     | Bus-Id Disp.A        | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage         | GPU-Util Compute M.  |
|============================+======================+======================|
| 0 Tesla V100-SXM2... On    | 00000000:86:00.0 Off | 0                    |
| N/A 36C P0 41W / 300W      | 0MiB / 16130MiB      | 1% Default           |
+----------------------------+----------------------+----------------------+
+--------------------------------------------------------------------------+
| Processes: GPU Memory                                                    |
| GPU PID Type Process name Usage                                          |
|==========================================================================|
| No running processes found                                               |
+--------------------------------------------------------------------------+

The pod can be deleted with oc delete pod nvidia-smi.