About the NVIDIA GPU Operator

_images/nvidia-gpu-operator-image.jpg

Kubernetes provides access to special hardware resources such as NVIDIA GPUs, NICs, Infiniband adapters and other devices through the device plugin framework. However, configuring and managing nodes with these hardware resources requires configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors. The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node labelling using GFD, DCGM based monitoring and others.


Documentation

Browse through the following documents for getting started, platform support and release notes.

Getting Started

The Installing the NVIDIA GPU Operator guide includes information on installing the GPU Operator in a Kubernetes cluster.

Release Notes

Refer to Release Notes for information about releases.

Platform Support

The Platform Support describes the supported platform configurations.

Pod Security Context of the Operator and Operands

Several of the NVIDIA GPU Operator operands, such as the driver containers and container toolkit, require the following elevated privileges:

  • privileged: true

  • hostPID: true

  • hostIPC: true

The elevated privileges are required for the following reasons:

  • Access to the host file system and hardware devices, such as NVIDIA GPUs.

  • Restart system services such as containerd.

  • Permit users to list all GPU clients using the nvidia-smi utility.

Only the Kubernetes cluster administrator needs to access or manage the Operator namespace. As a best practice, establish proper security policies and prevent any other users from accessing the Operator namespace.

Licenses and Contributing

The NVIDIA GPU Operator sourcecode is licensed under Apache 2.0 and contributions are accepted with a DCO. See the contributing document for more information on how to contribute and the release artifacts.

The NVIDIA GPU Operator includes components governed by the following NVIDIA End User License Agreements. By installing and using the GPU Operator, you accept the terms and conditions of these licenses.

Since the underlying images may include components licensed under open-source licenses such as GPL, the sources for these components are archived on the CUDA opensource index.

Below table outlines the license for the components.

Artifact Type

Artifact Licenses

Source Code License

NVIDIA GPU Operator

Helm Chart

Apache 2.0

Apache 2.0

NVIDIA GPU Operator

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA GPU Feature Discovery

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA GPU Driver

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE and NVIDIA GPU Driver

Apache 2.0

NVIDIA Container Toolkit

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA Kubernetes Device Plugin

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA MIG Manager for Kubernetes

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

Validator for NVIDIA GPU Operator

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA DCGM

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

NVIDIA Data Center GPU Manager License

NVIDIA DCGM Exporter

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA Driver Manager for Kubernetes

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA KubeVirt GPU Device Plugin

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

BSD 3-Clause “New” or “Revised” License

NVIDIA vGPU Device Manager

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA FS

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE and NVIDIA GPU Driver

GPL v2

NVIDIA Confidential Computing Manager for Kubernetes

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0

NVIDIA Kata Manager for Kubernetes

Image

NVIDIA DEEP LEARNING CONTAINER LICENSE

Apache 2.0