What is Kubernetes?
Kubernetes is an open source container orchestration platform that makes the job of a DevOps engineer easier. Applications can be deployed on Kubernetes as logical units which are easy to manage, upgrade and deploy with zero downtime (rolling upgrades) and high availability using replication. NVIDIA AI Enterprise applications are available as containers and can be deployed in a cloud native way on Kubernetes. For example deploying Triton Inference Server on Kubernetes offers these same benefits to AI in the Enterprise. In order to easily manage GPU resources in the cluster, the NVIDIA GPU operator is leveraged.
Setting up HPC clusters is often a hard task. AI and HPC applications also have complex software dependencies. Enterprises usually dedicate infrastructure specifically for running these HPC applications. Kubernetes is increasingly being adopted in datacenters for general purpose enterprise applications. Also, it is an effective general purpose system for container orchestration. Containers make it easy to manage complex software dependencies. So, it is easier to manage and set up AI and HPC software dependencies by containerzing them. Finally, If we are able to manage and orchestrate these HPC applications on the same platform as the rest of our applications it can lead to flexibility and optimal use of resources.
Kubernetes operators act as Software Reliability Engineers for Applications. For example, installing a MySQL operator makes it easy to set up a containerized MySQL database server on Kubernetes. Likewise, the MPI Operator makes it easy to run allreduce-style distributed training on Kubernetes. Similarly, the GPU Operator allows administrators of Kubernetes clusters to manage GPU nodes just like CPU nodes in the cluster. The AI practitioner doesn’t concern themselves with installation of the GPU operator and it is done by the DevOps Admin who is maintaining the cluster. So, for this lab the GPU operator has been automatically installed on your cluster. Together, adding both the GPU and MPI operator will enable your Kubernetes cluster to run HPC style workloads on the GPU.
Organizations are made up of teams that have varying hardware and software requirements depending on the applications that they work on. For example, a vision engineering team might be deploying a new object detection model that requires a certain hardware configuration (10 cores of CPU, 64 GB of memory and 10GB of GPU memory). The application also does not want to share the GPU memory with other applications on the clusters but wants a slice of the GPU with guaranteed performance. Whereas, the QA team has similar requirements but might want a seperate Kubernetes cluster for testing. NVIDIA AI Enterprise stack on VMware Tanzu solves this problem by creating GPU accelerated Kubernetes clusters on demand on Virtual Machines.
Each Virtual machine Kubernetes node is created using a hardware template called a VM class. The VM class is basically a t-shirt size for the hardware allocation to the VM (number of cores, main memory, GPU memory etc.) The Virtual Machines created this way get a completely isolated slice of the entire GPU called a vGPU. Within this lab, you will use a Time sliced vGPU where the memory is statically partitioned among the VMs and the GPU cores are time sliced. Tanzu creates Kubernetes clusters on these Virtual Machines which leads to an optimal use of GPU resources and true OS level isolation.