NVIDIA AI Enterprise offers the flexibility run AI workloads within VMs but if you want to embrace containers, upstream Kubernetes is also offered. VMware Tanzu support is upcoming and will be available soon.

By leveraging Kubernetes, the IT admin can automate deployments, scale and manage containerized AI applications and frameworks.

What is Kubernetes?

Kubernetes is an open-source container orchestration platform that makes the job of a DevOps engineer easier. Applications can be deployed on Kubernetes as logical units which are easy to manage, upgrade and deploy with zero downtime (rolling upgrades) and high availability using replication. Deploying Triton Inference Server on Kubernetes offers these same benefits to AI in the Enterprise. To easily manage GPU resources in the Kubernetes cluster, the NVIDIA GPU operator is leveraged.

What is Helm?

Helm is an application package manager running on top of Kubernetes. Helm is very similar to what Debian/RPM is for Linux, or what JAR/WAR is for Java-based applications. Helm charts help you define, install, and upgrade even the most complex Kubernetes applications.

What is the NVIDIA GPU Operator?

The GPU Operator allows administrators of Kubernetes clusters to manage GPU nodes just like CPU nodes in the cluster. Instead of providing a special OS image for GPU nodes, administrators can deploy a standard OS image for both CPU and GPU nodes and then rely on the GPU Operator to provide the required software components for GPUs. The components include the NVIDIA drivers, Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM based monitoring, etc.


What is the NVIDIA Network Operator?

The NVIDIA Network Operator leverages Kubernetes custom resources and the Operator framework to configure fast networking, RDMA, and GPUDirect. The Network Operator’s goal is to install the host networking components required to enable RDMA and GPUDirect in a Kubernetes cluster. It does so by configuring a high-speed data path for IO intensive workloads on a secondary network in each cluster node.