Platform Support
NVIDIA GPU Operator Versioning
To understand the NVIDIA GPU Operator life cycle policy, it is important to know how the NVIDIA GPU Operator is versioned.
As of September 2022, the NVIDIA GPU Operator is versioned following the calendar schema. NVIDIA GPU Operator v22.9.0 will be the first release following calendar versioning, and NVIDIA GPU Operator 1.11 is therefore the last release following the old versioning schema.
Now, let’s have a look at how to interpret a NVIDIA GPU Operator release that follows calendar versioning. In this example, we will use v22.9.0 as the example.
The first two segments in the version are in the format of YY.MM which represent the major version and also when the NVIDIA GPU Operator was initially released. In this example, the NVIDIA GPU Operator was released in September 2022. Zero padding is omitted for month to be still compatible with semantic versioning.
The third segment as in ‘.0’ represents a dot release. Dot releases typically include fixes for bugs or CVEs but could also include minor features like support for a new NVIDIA GPU driver.
NVIDIA GPU Operator Life Cycle
The NVIDIA GPU Operator life cycle policy provides a predictable support policy and timeline of when new NVIDIA GPU Operator versions are released.
Starting with the NVIDIA GPU Operator v23.3.0, a new major GPU Operator version is released every three months. Therefore, the next major release of the NVIDIA GPU Operator is scheduled for June 2023 and will be named v23.6.0.
Every major release of the NVIDIA GPU Operator, starting with v23.3.0, is maintained for six months. Bug fixes and CVEs are released throughout the six months while minor feature updates are only released within the first three months.
This life cycle allows NVIDIA GPU Operator users to use a given NVIDIA GPU Operator version for up to six months. It also provides users a three month period where they can plan the transition to the next major NVIDIA GPU Operator version.
The product life cycle and versioning are subject to change in the future.
Note
Upgrades are only supported within a major release or to the next major release.
GPU Operator Version |
Status |
Details |
---|---|---|
23.9.x |
Generally Available |
Enters maintenance when v23.12.0 is released. |
23.6.x |
Maintenance |
Enters EOL when v23.12.0 is released. |
23.3.x and lower, 1.11.x and lower |
EOL |
GPU Operator Component Matrix
The following table shows the operands and default operand versions that correspond to a GPU Operator version.
When post-release testing confirms support for newer versions of operands, these updates are identified as recommended updates to a GPU Operator version. Refer to Upgrading the NVIDIA GPU Operator for more information.
Component |
Version |
---|---|
NVIDIA GPU Operator |
v23.9.1 |
NVIDIA GPU Driver |
|
NVIDIA Driver Manager for K8s |
|
NVIDIA Container Toolkit |
|
NVIDIA Kubernetes Device Plugin |
|
DCGM Exporter |
|
Node Feature Discovery |
v0.14.2 |
NVIDIA GPU Feature Discovery
for Kubernetes
|
|
NVIDIA MIG Manager for Kubernetes |
|
DCGM |
|
Validator for NVIDIA GPU Operator |
v23.9.1 |
NVIDIA KubeVirt GPU Device Plugin |
|
NVIDIA vGPU Device Manager |
v0.2.4 |
NVIDIA GDS Driver 1 |
|
NVIDIA Kata Manager for Kubernetes |
v0.1.2 |
NVIDIA Confidential Computing
Manager for Kubernetes
|
v0.1.1 |
1 This release of the GDS driver requires that you use the NVIDIA Open GPU Kernel module driver for the GPUs. Refer to GPUDirect RDMA and GPUDirect Storage for more information.
Note
Driver version could be different with NVIDIA vGPU, as it depends on the driver version downloaded from the NVIDIA vGPU Software Portal.
The GPU Operator is supported on all active NVIDIA datacenter production drivers. Refer to Supported Drivers and CUDA Toolkit Versions for more information.
Supported NVIDIA Data Center GPUs and Systems
The following NVIDIA data center GPUs are supported on x86 based platforms:
Product |
Architecture |
---|---|
NVIDIA GH200 1 |
NVIDIA Grace Hopper |
1
NVIDIA GH200 systems require the NVIDIA Open GPU Kernel module driver.
You can install the open kernel modules by specifying the driver.useOpenKernelModules=true
argument to the helm
command.
Refer to Chart Customization Options for more information.
Product |
Architecture |
---|---|
NVIDIA H800 |
NVIDIA Hopper |
NVIDIA DGX H100 |
NVIDIA Hopper and NVSwitch |
NVIDIA HGX H100 |
NVIDIA Hopper and NVSwitch |
NVIDIA H100,
NVIDIA H100 NVL
|
NVIDIA Hopper |
NVIDIA L40,
NVIDIA L40S
|
NVIDIA Ada |
NVIDIA L4 |
NVIDIA Ada |
NVIDIA DGX A100 |
A100 and NVSwitch |
NVIDIA HGX A100 |
A100 and NVSwitch |
NVIDIA A800 |
NVIDIA Ampere |
NVIDIA A100 |
NVIDIA Ampere |
NVIDIA A100X |
NVIDIA Ampere |
NVIDIA A40 |
NVIDIA Ampere |
NVIDIA A30 |
NVIDIA Ampere |
NVIDIA A30X |
NVIDIA Ampere |
NVIDIA A16 |
NVIDIA Ampere |
NVIDIA A10 |
NVIDIA Ampere |
NVIDIA A2 |
NVIDIA Ampere |
Note
Hopper (H100) GPU is only supported on x86 servers.
The GPU Operator supports DGX A100 with DGX OS 5.1+ and Red Hat OpenShift using Red Hat Core OS. For installation instructions, see Pre-Installed NVIDIA GPU Drivers and NVIDIA Container Toolkit for DGX OS 5.1+ and Introduction for Red Hat OpenShift.
Product |
Architecture |
---|---|
NVIDIA T4 |
Turing |
NVIDIA V100 |
Volta |
NVIDIA P100 |
Pascal |
NVIDIA P40 |
Pascal |
NVIDIA P4 |
Pascal |
Product |
Architecture |
---|---|
NVIDIA RTX A6000 |
NVIDIA Ampere /Ada |
NVIDIA RTX A5000 |
NVIDIA Ampere |
NVIDIA RTX A4500 |
NVIDIA Ampere |
NVIDIA RTX A4000 |
NVIDIA Ampere |
NVIDIA RTX A8000 |
Turing |
NVIDIA RTX A6000 |
Turing |
NVIDIA RTX A5000 |
Turing |
NVIDIA RTX A4000 |
Turing |
NVIDIA T1000 |
Turing |
NVIDIA T600 |
Turing |
NVIDIA T400 |
Turing |
Supported ARM Based Platforms
The following NVIDIA data center GPUs are supported:
Product |
Architecture |
---|---|
NVIDIA A100X |
Ampere |
NVIDIA A30X |
Ampere |
AWS EC2 G5g instances |
Turing |
In addition to the products specified in the preceding table, any ARM based system that meets the following requirements is supported:
NVIDIA GPUs connected to the PCI bus.
A supported operating system such as Ubuntu or Red Hat Enterprise Linux.
Note
The GPU Operator only supports platforms using discrete GPUs. NVIDIA Jetson, or other embedded products with integrated GPUs, are not supported.
The R520 Data Center Driver is not supported for ARM.
Supported Deployment Options, Hypervisors, and NVIDIA vGPU Based Products
The GPU Operator has been validated in the following scenarios:
Deployment Options |
---|
Bare Metal |
Virtual machines with GPU Passthrough |
Virtual machines with NVIDIA vGPU based products |
Hypervisors (On-premises)
Hypervisors |
---|
VMware vSphere 7 and 8 |
Red Hat Enterprise Linux KVM |
Red Hat Virtualization (RHV) |
NVIDIA vGPU based products
NVIDIA vGPU based products |
---|
NVIDIA vGPU (NVIDIA AI Enterprise) |
NVIDIA vCompute Server |
NVIDIA RTX Virtual Workstation |
Note
GPU Operator is supported with NVIDIA vGPU 12.0+.
Supported Operating Systems and Kubernetes Platforms
The GPU Operator has been validated in the following scenarios:
Note
The Kubernetes community supports only the last three minor releases as of v1.17. Older releases may be supported through enterprise distributions of Kubernetes such as Red Hat OpenShift.
Operating
System
|
Kubernetes |
Red Hat
OpenShift
|
VMWare vSphere
with Tanzu
|
Rancher Kubernetes
Engine 2
|
HPE Ezmeral
Runtime
Enterprise
|
Canonical
MicroK8s
|
---|---|---|---|---|---|---|
Ubuntu 20.04 LTS 1 |
1.22—1.28 |
7.0 U3c, 8.0 U2 |
1.22—1.28 |
|||
Ubuntu 22.04 LTS 1 |
1.22—1.28 |
8.0 U2 |
1.26 |
|||
Red Hat Core OS |
4.9—4.14
|
|||||
Red Hat
Enterprise
Linux 8.4,
8.6—8.9
|
1.22—1.28 |
1.22—1.28 |
||||
Red Hat
Enterprise
Linux 8.4, 8.5
|
5.5 |
1
For Ubuntu 22.04 LTS, kernel version 5.15 is an LTS ESM kernel.
For Ubuntu 20.04 LTS, kernel versions 5.4 and 5.15 are LTS ESM kernels.
The GPU Driver containers support these Linux kernels.
Refer to the Kernel release schedule on Canonical’s
Ubuntu kernel lifecycle and enablement stack page for more information.
NVIDIA recommends disabling automatic updates for the Linux kernel that are performed
by the unattended-upgrades
package to prevent an upgrade to an unsupported kernel version.
Note
Red Hat OpenShift Container Platform is supported on the AWS (G4, G5, P3, P4, P5), Azure (NC-T4-v3, NC-v3, ND-A100-v4), and GCP (T4, V100, A100) based instances.
Operating
System
|
Amazon EKS
Kubernetes
|
Google GKE
Kubernetes
|
Microsoft Azure
Kubernetes Service
|
---|---|---|---|
Ubuntu 20.04 LTS |
1.25, 1.26 |
1.24, 1.25 |
1.25 |
Ubuntu 22.04 LTS |
1.25, 1.26 |
1.24, 1.25 |
1.25 |
Operating
System
|
Kubernetes |
Red Hat
OpenShift
|
VMWare vSphere
with Tanzu
|
Rancher Kubernetes
Engine 2
|
---|---|---|---|---|
Ubuntu 20.04 LTS |
1.22–1.28 |
7.0 U3c, 8.0 U2 |
1.22, 1.23,
1.24, 1.25
|
|
Ubuntu 22.04 LTS |
1.22–1.28 |
8.0 U2 |
||
Red Hat Core OS |
4.9—4.14 |
|||
Red Hat
Enterprise
Linux 8.4,
8.6—8.9
|
1.22—1.28 |
1.22—1.28 |
Supported Container Runtimes
The GPU Operator has been validated in the following scenarios:
Operating System |
Containerd 1.4 - 1.7 |
CRI-O |
---|---|---|
Ubuntu 20.04 LTS |
Yes |
Yes |
Ubuntu 22.04 LTS |
Yes |
Yes |
CentOS 7 |
Yes |
No |
Red Hat Core OS (RHCOS) |
No |
Yes |
Red Hat Enterprise Linux 8 |
Yes |
Yes |
Note
The GPU Operator has been validated with version 2 of the containerd config file.
Support for KubeVirt and OpenShift Virtualization
Red Hat OpenShift Virtualization is based on KubeVirt.
Operating System |
Kubernetes |
KubeVirt |
OpenShift Virtualization |
||
---|---|---|---|---|---|
GPU
Passthrough
|
vGPU |
GPU
Passthrough
|
vGPU |
||
Ubuntu 20.04 LTS |
1.22—1.28 |
0.36+ |
0.59.1+ |
||
Ubuntu 22.04 LTS |
1.22—1.28 |
0.36+ |
0.59.1+ |
||
Red Hat Core OS |
4.11—4.14 |
4.13, 4.14 |
You can run GPU passthrough and NVIDIA vGPU in the same cluster as long as you use a software version that meets both requirements.
NVIDIA vGPU is incompatible with KubeVirt v0.58.0, v0.58.1, and v0.59.0, as well
as OpenShift Virtualization 4.12.0—4.12.2.
Starting with KubeVirt v0.58.2 and v0.59.1, and OpenShift Virtualization 4.12.3 and 4.13,
you must set the DisableMDEVConfiguration
feature gate.
Refer to GPU Operator with KubeVirt or NVIDIA GPU Operator with OpenShift Virtualization.
Support for GPUDirect RDMA
Supported operating systems and NVIDIA GPU Drivers with GPUDirect RDMA.
Ubuntu 20.04 and 22.04 LTS with Network Operator 23.10.0
Red Hat OpenShift 4.9 and higher with Network Operator 23.10.0
For information about configuring GPUDirect RDMA, refer to GPUDirect RDMA and GPUDirect Storage.
Support for GPUDirect Storage
Supported operating systems and NVIDIA GPU Drivers with GPUDirect Storage.
Ubuntu 20.04 and 22.04 LTS with Network Operator 23.10.0
Red Hat OpenShift Container Platform 4.11 and higher
Note
Version v2.17.5 and higher of the NVIDIA GPUDirect Storage kernel driver, nvidia-fs
,
requires the NVIDIA Open GPU Kernel module driver.
You can install the open kernel modules by specifying the driver.useOpenKernelModules=true
argument to the helm
command.
Refer to Chart Customization Options for more information.
Not supported with secure boot. Supported storage types are local NVMe and remote NFS.
Additional Supported Container Management Tools
Helm v3
Red Hat Operator Lifecycle Manager (OLM)