Appendix
Install GPU Operator in Proxy Environments
Introduction
This page describes how to successfully deploy the GPU Operator in clusters behind a HTTP Proxy. By default, the GPU Operator requires internet access for the following reasons:
Container images need to be pulled during GPU Operator installation.
The
driver
container needs to download several OS packages prior to driver installation.
To address these requirements, all Kubernetes nodes as well as the driver
container need proper configuration
in order to direct traffic through the proxy.
This document demonstrates how to configure the GPU Operator so that the driver
container can successfully
download packages behind a HTTP proxy. Since configuring Kubernetes/container runtime components to use
a proxy is not specific to the GPU Operator, we do not include those instructions here.
The instructions for Openshift are different, so skip the section titled HTTP Proxy Configuration for Openshift if you are not running Openshift.
Prerequisites
Kubernetes cluster is configured with HTTP proxy settings (container runtime should be enabled with HTTP proxy)
HTTP Proxy Configuration for Openshift
For Openshift, it is recommended to use the cluster-wide Proxy object to provide proxy information for the cluster.
Please follow the procedure described in Configuring the cluster-wide proxy
from Red Hat Openshift public documentation. The GPU Operator will automatically inject proxy related ENV into the driver
container
based on information present in the cluster-wide Proxy object.
Note
GPU Operator v1.8.0 does not work well on RedHat OpenShift when a cluster-wide Proxy object is configured and causes constant restarts of
driver
container. This will be fixed in an upcoming patch release v1.8.2.
HTTP Proxy Configuration
First, get the values.yaml
file used for GPU Operator configuration:
$ curl -sO https://raw.githubusercontent.com/NVIDIA/gpu-operator/v1.7.0/deployments/gpu-operator/values.yaml
Note
Replace v1.7.0
in the above command with the version you want to use.
Specify driver.env
in values.yaml
with appropriate HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables
(in both uppercase and lowercase).
driver:
env:
- name: HTTPS_PROXY
value: http://<example.proxy.com:port>
- name: HTTP_PROXY
value: http://<example.proxy.com:port>
- name: NO_PROXY
value: <example.com>
- name: https_proxy
value: http://<example.proxy.com:port>
- name: http_proxy
value: http://<example.proxy.com:port>
- name: no_proxy
value: <example.com>
Note
Proxy related ENV are automatically injected by GPU Operator into the
driver
container to indicate proxy information used when downloading necessary packages.If HTTPS Proxy server is setup then change the values of HTTPS_PROXY and https_proxy to use
https
instead.
Deploy GPU Operator
Download and deploy GPU Operator Helm Chart with the updated values.yaml
.
Fetch the chart from NGC repository. v1.10.0
is used as an example in the command below:
$ helm fetch https://helm.ngc.nvidia.com/nvidia/charts/gpu-operator-v1.10.0.tgz
Install the GPU Operator with updated values.yaml
:
$ helm install --wait gpu-operator \
-n gpu-operator --create-namespace \
gpu-operator-v1.10.0.tgz \
-f values.yaml
Check the status of the pods to ensure all the containers are running:
$ kubectl get pods -n gpu-operator
Install GPU Operator in Air-gapped Environments
Introduction
This page describes how to successfully deploy the GPU Operator in clusters with restricted internet access. By default, The GPU Operator requires internet access for the following reasons:
Container images need to be pulled during GPU Operator installation.
The
driver
container needs to download several OS packages prior to driver installation.
To address these requirements, it may be necessary to create a local image registry and/or a local package repository so that the necessary images and packages are available for your cluster. In subsequent sections, we detail how to configure the GPU Operator to use local image registries and local package repositories. If your cluster is behind a proxy, also follow the steps from Install GPU Operator in Proxy Environments.
Different steps are required for different environments with varying levels of internet connectivity. The supported use cases/environments are listed in the below table:
Network Flow |
|||
---|---|---|---|
Use Case |
Pulling Images |
Pulling Packages |
|
1 |
HTTP Proxy with full Internet access |
K8s node –> HTTP Proxy –> Internet Image Registry |
Driver container –> HTTP Proxy –> Internet Package Repository |
2 |
HTTP Proxy with limited Internet access |
K8s node –> HTTP Proxy –> Internet Image Registry |
Driver container –> HTTP Proxy –> Local Package Repository |
3a |
Full Air-Gapped (w/ HTTP Proxy) |
K8s node –> Local Image Registry |
Driver container –> HTTP Proxy –> Local Package Repository |
3b |
Full Air-Gapped (w/o HTTP Proxy) |
K8s node –> Local Image Registry |
Driver container–> Local Package Repository |
Note
For Red Hat Openshift deployments in air-gapped environments (use cases 2, 3a and 3b), see the documentation here.
Note
Ensure that Kubernetes nodes can successfully reach the local DNS server(s). Public name resolution for image registry and package repositories are mandatory for use cases 1 and 2.
Before proceeding to the next sections, get the values.yaml
file used for GPU Operator configuration.
$ curl -sO https://raw.githubusercontent.com/NVIDIA/gpu-operator/v1.7.0/deployments/gpu-operator/values.yaml
Note
Replace v1.7.0
in the above command with the version you want to use.
Local Image Registry
Without internet access, the GPU Operator requires all images to be hosted in a local image registry that is accessible
to all nodes in the cluster. To allow the GPU Operator to work with a local registry, users can specify local
repository, image, tag along with pull secrets in values.yaml
.
Pulling and pushing container images to local registry
To pull the correct images from the NVIDIA registry, you can leverage the fields repository
, image
and version
specified in the file values.yaml
.
The general syntax for the container image is <repository>/<image>:<version>
.
If the version is not specified, you can retrieve the information from the NVIDIA NGC catalog (https://ngc.nvidia.com/catalog) by checking the available tags for an image.
An example is shown below with the gpu-operator container image:
operator:
repository: nvcr.io/nvidia
image: gpu-operator
version: "v1.9.0"
For instance, to pull the gpu-operator image version v1.9.0, use the following instruction:
$ docker pull nvcr.io/nvidia/gpu-operator:v1.9.0
There is one caveat with regards to the driver image. The version field must be appended by the OS name running on the worker node.
driver:
repository: nvcr.io/nvidia
image: driver
version: "470.82.01"
To pull the driver image for Ubuntu 20.04:
$ docker pull nvcr.io/nvidia/driver:470.82.01-ubuntu20.04
To pull the driver image for CentOS 8:
$ docker pull nvcr.io/nvidia/driver:470.82.01-centos8
To push the images to the local registry, simply tag the pulled images by prefixing the image with the image registry information.
Using the above examples, this will result in:
$ docker tag nvcr.io/nvidia/gpu-operator:v1.9.0 <local-registry>/<local-path>/gpu-operator:v1.9.0
$ docker tag nvcr.io/nvidia/driver:470.82.01-ubuntu20.04 <local-registry>/<local-path>/driver:470.82.01-ubuntu20.04
Finally, push the images to the local registry:
$ docker push <local-registry>/<local-path>/gpu-operator:v1.9.0
$ docker push <local-registry>/<local-path>/driver:470.82.01-ubuntu20.04
Update values.yaml
with local registry information in the repository field.
Note
replace <repo.example.com:port> below with your local image registry url and port
Sample of values.yaml
for GPU Operator v1.9.0:
operator:
repository: <repo.example.com:port>
image: gpu-operator
version: 1.9.0
imagePullSecrets: []
initContainer:
image: cuda
repository: <repo.example.com:port>
version: 11.4.2-base-ubi8
validator:
image: gpu-operator-validator
repository: <repo.example.com:port>
version: 1.9.0
imagePullSecrets: []
driver:
repository: <repo.example.com:port>
image: driver
version: "470.82.01"
imagePullSecrets: []
manager:
image: k8s-driver-manager
repository: <repo.example.com:port>
version: v0.2.0
toolkit:
repository: <repo.example.com:port>
image: container-toolkit
version: 1.7.2-ubuntu18.04
imagePullSecrets: []
devicePlugin:
repository: <repo.example.com:port>
image: k8s-device-plugin
version: v0.10.0-ubi8
imagePullSecrets: []
dcgmExporter:
repository: <repo.example.com:port>
image: dcgm-exporter
version: 2.3.1-2.6.0-ubuntu20.04
imagePullSecrets: []
gfd:
repository: <repo.example.com:port>
image: gpu-feature-discovery
version: v0.4.1
imagePullSecrets: []
nodeStatusExporter:
enabled: false
repository: <repo.example.com:port>
image: gpu-operator-validator
version: "1.9.0"
migManager:
enabled: true
repository: <repo.example.com:port>
image: k8s-mig-manager
version: v0.2.0-ubuntu20.04
Local Package Repository
The driver
container deployed as part of the GPU operator requires certain packages to be available as part of the
driver installation. In restricted internet access or air-gapped installations, users are required to create a
local mirror repository for their OS distribution and make the following packages available:
Note
KERNEL_VERSION is the underlying running kernel version on the GPU node GCC_VERSION is the gcc version matching the one used for building underlying kernel
ubuntu:
linux-headers-${KERNEL_VERSION}
linux-image-${KERNEL_VERSION}
linux-modules-${KERNEL_VERSION}
centos:
elfutils-libelf.x86_64
elfutils-libelf-devel.x86_64
kernel-headers-${KERNEL_VERSION}
kernel-devel-${KERNEL_VERSION}
kernel-core-${KERNEL_VERSION}
gcc-${GCC_VERSION}
rhel/rhcos:
kernel-headers-${KERNEL_VERSION}
kernel-devel-${KERNEL_VERSION}
kernel-core-${KERNEL_VERSION}
gcc-${GCC_VERSION}
For example, for Ubuntu these packages can be found at archive.ubuntu.com
so this would be the mirror that
needs to be replicated locally for your cluster. Using apt-mirror
, these packages will be automatically mirrored
to your local package repository server.
For CentOS, reposync
can be used to create the local mirror.
Once all above required packages are mirrored to the local repository, repo lists need to be created following
distribution specific documentation. A ConfigMap
containing the repo list file needs to be created in
the namespace where the GPU Operator gets deployed.
An example of repo list is shown below for Ubuntu 20.04 (access to local package repository via HTTP):
custom-repo.list
:
deb [arch=amd64] http://<local pkg repository>/ubuntu/mirror/archive.ubuntu.com/ubuntu focal main universe
deb [arch=amd64] http://<local pkg repository>/ubuntu/mirror/archive.ubuntu.com/ubuntu focal-updates main universe
deb [arch=amd64] http://<local pkg repository>/ubuntu/mirror/archive.ubuntu.com/ubuntu focal-security main universe
An example of repo list is shown below for CentOS 8 (access to local package repository via HTTP):
custom-repo.repo
:
[baseos]
name=CentOS Linux $releasever - BaseOS
baseurl=http://<local pkg repository>/repos/centos/$releasever/$basearch/os/baseos/
gpgcheck=0
enabled=1
[appstream]
name=CentOS Linux $releasever - AppStream
baseurl=http://<local pkg repository>/repos/centos/$releasever/$basearch/os/appstream/
gpgcheck=0
enabled=1
[extras]
name=CentOS Linux $releasever - Extras
baseurl=http://<local pkg repository>/repos/centos/$releasever/$basearch/os/extras/
gpgcheck=0
enabled=1
Create the ConfigMap
:
$ kubectl create configmap repo-config -n gpu-operator --from-file=<path-to-repo-list-file>
Once the ConfigMap is created using the above command, update values.yaml
with this information, to let the GPU Operator mount the repo configuration
within the driver
container to pull required packages. Based on the OS distribution the GPU Operator will automatically mount this ConfigMap into the appropriate directory.
driver:
repoConfig:
configMapName: repo-config
If self-signed certificates are used for an HTTPS based internal repository then a ConfigMap needs to be created for those certs and provide that during the GPU Operator install. Based on the OS distribution the GPU Operator will automatically mount this ConfigMap into the appropriate directory.
$ kubectl create configmap cert-config -n gpu-operator --from-file=<path-to-pem-file1> --from-file=<path-to-pem-file2>
driver:
certConfig:
name: cert-config
Deploy GPU Operator
Download and deploy GPU Operator Helm Chart with the updated values.yaml
.
Fetch the chart from NGC repository. v1.9.0
is used in the command below:
$ helm fetch https://helm.ngc.nvidia.com/nvidia/charts/gpu-operator-v1.9.0.tgz
Install the GPU Operator with updated values.yaml
:
$ helm install --wait gpu-operator \
-n gpu-operator --create-namespace \
gpu-operator-v1.9.0.tgz \
-f values.yaml
Check the status of the pods to ensure all the containers are running:
$ kubectl get pods -n gpu-operator
Considerations when Installing with Outdated Kernels in Cluster
The driver
container deployed as part of the GPU Operator requires certain packages to be available as part of the driver installation.
On GPU nodes where the running kernel is not the latest, the driver
container may fail to find the right version of these packages
(e.g. kernel-headers, kernel-devel) that correspond to the running kernel version. In the driver
container logs, you will most likely
see the following error message: Could not resolve Linux kernel version
.
In general, upgrading your system to the latest kernel should fix this issue. But if this is not an option, the following is a workaround to successfully deploy the GPU operator when GPU nodes in your cluster may not be running the latest kernel.
Add Archived Package Repositories
The workaround is to find the package archive containing packages for your outdated kernel and to add this repository to the package
manager running inside the driver
container. To achieve this, we can simply mount a repository list file into the driver
container using a ConfigMap
.
The ConfigMap
containing the repository list file needs to be created in the gpu-operator
namespace.
Let us demonstrate this workaround via an example. The system used in this example is running CentOS 7 with an outdated kernel:
$ uname -r
3.10.0-1062.12.1.el7.x86_64
The official archive for older CentOS packages is https://vault.centos.org/. Typically, most archived CentOS repositories
are found in /etc/yum.repos.d/CentOS-Vault.repo
but they are disabled by default. If the appropriate archive repository
was enabled, then the driver
container would resolve the kernel version and be able to install the correct versions
of the prerequisite packages.
We can simply drop in a replacement of /etc/yum.repos.d/CentOS-Vault.repo
to ensure the appropriate CentOS archive is enabled.
For the kernel running in this example, the CentOS-7.7.1908
archive contains the kernel-headers version we are looking for.
Here is our example drop-in replacement file:
[C7.7.1908-base]
name=CentOS-7.7.1908 - Base
baseurl=http://vault.centos.org/7.7.1908/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
[C7.7.1908-updates]
name=CentOS-7.7.1908 - Updates
baseurl=http://vault.centos.org/7.7.1908/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
Once the repo list file is created, we can create a ConfigMap
for it:
$ kubectl create configmap repo-config -n gpu-operator --from-file=<path-to-repo-list-file>
Once the ConfigMap
is created using the above command, update values.yaml
with this information, to let the GPU Operator mount the repo configuration
within the driver
container to pull required packages.
For Ubuntu:
driver:
repoConfig:
configMapName: repo-config
destinationDir: /etc/apt/sources.list.d
For RHEL/Centos/RHCOS:
driver:
repoConfig:
configMapName: repo-config
destinationDir: /etc/yum.repos.d
Deploy GPU Operator with updated values.yaml
:
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
-f values.yaml
Check the status of the pods to ensure all the containers are running:
$ kubectl get pods -n gpu-operator
Customizing NVIDIA GPU Driver Parameters during Installation
The NVIDIA Driver kernel modules accept a number of parameters which can be used to customize the behavior of the driver.
Most of the parameters are documented in the NVIDIA Driver README.
By default, the GPU Operator loads the kernel modules with default values.
Starting with v1.10, the GPU Operator provides the ability to pass custom parameters to the kernel modules that get loaded as part of the
NVIDIA Driver installation (e.g. nvidia
, nvidia-modeset
, nvidia-uvm
, and nvidia-peermem
).
To pass custom parameters, execute the following steps.
Create a configuration file named <module>.conf
, where <module>
is the name of the kernel module the parameters are for.
The file should contain parameters as key-value pairs – one parameter per line.
In the below example, we are passing one parameter to the nvidia
module, which is disabling the use of
GSP firmware.
$ cat nvidia.conf
NVreg_EnableGpuFirmware=0
Create a ConfigMap
for the configuration file.
If multiple modules are being configured, pass multiple files when creating the ConfigMap
.
$ kubectl create configmap kernel-module-params -n gpu-operator --from-file=nvidia.conf=./nvidia.conf
Install the GPU Operator and set driver.kernelModuleConfig.name
to the name of the ConfigMap
containing the kernel module parameters.
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.kernelModuleConfig.name="kernel-module-params"
Installing Precompiled and Canonical Signed Drivers on Ubuntu20.04
GPU Operator supports deploying NVIDIA precompiled and signed drivers from Canonical on Ubuntu20.04. This is required
when nodes are enabled with Secure Boot. In order to use these, GPU Operator needs to be installed with options --set driver.version=<DRIVER_BRANCH>-signed
.
$ helm install --wait gpu-operator \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.version=<DRIVER_BRANCH>-signed
supported DRIVER_BRANCH value currently are 470
and 510
which will install latest drivers available on that branch for current running
kernel version.
Following are the packages used in this case by the driver container.
linux-objects-nvidia-${DRIVER_BRANCH}-server-${KERNEL_VERSION} - Linux kernel nvidia modules.
linux-signatures-nvidia-${KERNEL_VERSION} - Linux kernel signatures for nvidia modules.
linux-modules-nvidia-${DRIVER_BRANCH}-server-${KERNEL_VERSION} - Meta package for nvidia driver modules, signatures and kernel interfaces.
nvidia-utils-${DRIVER_BRANCH}-server - NVIDIA driver support binaries.
nvidia-compute-utils-${DRIVER_BRANCH}-server - NVIDIA compute utilities (includes nvidia-persistenced).
Note
Before upgrading kernel on the worker nodes please ensure that above packages are available for that kernel version, else upgrade will cause driver installation failures.
Running KubeVirt VMs with the GPU Operator
Introduction
Note
This feature is introduced as a technical preview with the GPU Operator 1.11.0 release. This is not ready for production use. Please submit feedback and bug reports here. We encourage contributions in our Gitlab repository.
KubeVirt is a virtual machine management add-on to Kubernetes that allows you to run and manage VMs in a Kubernetes cluster. It eliminates the need to manage separate clusters for VM and container workloads, as both can now coexist in a single Kubernetes cluster.
Up until this point, the GPU Operator only provisioned worker nodes for running GPU-accelerated containers. Now, the GPU Operator can also be used to provision worker nodes for running GPU-accelerated VMs.
The prerequisites needed for running containers and VMs with GPU(s) differs, with the primary difference being the drivers required. For example, the datacenter driver is needed for containers, the vfio-pci driver is needed for GPU passthrough, and the NVIDIA vGPU Manager is needed for creating vGPU devices.
The GPU Operator can now be configured to deploy different software components on worker nodes depending on what GPU workload is configured to run on those nodes. Consider the following example.
Node A receives the following software components:
NVIDIA Datacenter Driver
- to install the driverNVIDIA Container Toolkit
- to ensure containers can properly access GPUsNVIDIA Kubernetes Device Plugin
- to discover and advertise GPU resources to kubeletNVIDIA DCGM and DCGM Exporter
- to monitor the GPU(s)
Node B receives the following software components:
VFIO Manager
- to load vfio-pci and bind it to all GPUs on the nodeSandbox Device Plugin
- to discover and advertise the passthrough GPUs to kubelet
Node C receives the following software components:
NVIDIA vGPU Manager
- to install the driverNVIDIA vGPU Device Manager
- to create vGPU devices on the nodeSandbox Device Plugin
- to discover and advertise the vGPU devices to kubelet
Limitations
This feature is a Technical Preview and is not ready for production use.
Trying out this feature requires a fresh install of GPU Operator 1.11 with necessary fields set in ClusterPolicy as detailed in this document. The instructions in this document are not valid if upgrading from 1.10 to 1.11.
Enabling / disabling this feature post-install is not supported.
MIG-backed vGPUs are not supported.
A GPU worker node can run GPU workloads of a particular type - containers, VMs with GPU Passthrough, and VMs with vGPU - but not a combination of any of them.
Install the GPU Operator
To enable this functionality, install the GPU Operator and set the following parameter in ClusterPolicy
: sandboxWorkloads.enabled=true
.
Note
The term sandboxing
refers to running software in a separate isolated environment, typically for added security (i.e. a virtual machine). We use the term sandbox workloads
to signify workloads that run in a virtual machine, irrespective of the virtualization technology used.
Partition Cluster based the GPU Workload
When sandbox workloads are enabled (sandboxWorkloads.enabled=true
), a worker node can run GPU workloads of a particular type – containers, VMs with GPU passthrough, or VMs with vGPU – but not a combination of any of them. As illustrated in the Introduction, the GPU Operator will deploy a specific set of operands on a worker node depending on the workload type configured. For example, a node which is configured to run containers will receive the NVIDIA Datacenter Driver
, while a node which is configured to run VMs with vGPU will receive the NVIDIA vGPU Manager
.
To set the GPU workload configuration for a worker node, apply the node label nvidia.com/gpu.workload.config=<config>
, where the valid config values are container
, vm-passthrough
, and vm-vgpu
.
If the node label nvidia.com/gpu.workload.config
does not exist on the node, the GPU Operator will assume the default GPU workload configuration, container
. To override the default GPU workload configuration, set the following value in ClusterPolicy
during install: sandboxWorkloads.defaultWorkload=<config>
.
Consider the following example:
GPU Operator is installed with the following options: sandboxWorkloads.enabled=true sandboxWorkloads.defaultWorkload=container
nvidia.com/gpu.workload.config
.nvidia.com/gpu.workload.config=vm-passthrough
.nvidia.com/gpu.workload.config=vm-vgpu
.Deployment Scenarios
Running VMs with GPU Passthrough
This section runs through the deployment scenario of running VMs with GPU Passthrough. We will first deploy the GPU Operator, such that our worker node will be provisioned for GPU Passthrough, then we will deploy a KubeVirt VM which requests a GPU.
By default, to provision GPU Passthrough, the GPU Operator will deploy the following components:
VFIO Manager
- to loadvfio-pci
and bind it to all GPUs on the nodeSandbox Device Plugin
- to discover and advertise the passthrough GPUs to kubeletSandbox Validator
- to validate the other operands
Install the GPU Operator
Follow the below steps.
Label the worker node explicitly for GPU passthrough workloads:
$ kubectl label node <node-name> --overwrite nvidia.com/gpu.workload.config=vm-passthrough
Install the GPU Operator with sandbox workloads enabled:
$ helm install gpu-operator nvidia/gpu-operator -n gpu-operator \
–set sandboxWorkloads.enabled=true
The following operands get deployed. Ensure all pods are in a running state and all validations succeed with the sandbox-validator
component:
$ kubectl get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
...
nvidia-sandbox-device-plugin-daemonset-4mxsc 1/1 Running 0 40s
nvidia-sandbox-validator-vxj7t 1/1 Running 0 40s
nvidia-vfio-manager-thfwf 1/1 Running 0 78s
The vfio-manager pod will bind all GPUs on the node to the vfio-pci driver:
$ lspci --nnk -d 10de:
3b:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:2236] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:1482]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
86:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:2236] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:1482]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
The sandbox-device-plugin will discover and advertise these resources to kubelet. In this example, we have two A10 GPUs:
$ kubectl describe node <node-name>
...
Capacity:
...
nvidia.com/GA102GL_A10: 2
...
Allocatable:
...
nvidia.com/GA102GL_A10: 2
...
Note
The resource name is currently constructed by joining the device and device_name columns from the PCI IDs database. For example, the entry for A10 in the database reads 2236 GA102GL [A10]
, which results in a resource name nvidia.com/GA102GL_A10
.
Update the KubeVirt CR
Next, we will update the KubeVirt Custom Resource, as documented in the KubeVirt user guide, so that the passthrough GPUs are permitted and can be requested by a KubeVirt VM. Note, replace the values for pciVendorSelector
and resourceName
to correspond to your GPU model. We set externalResourceProvider=true
to indicate that this resource is being provided by an external device plugin, in this case the sandbox-device-plugin
which is deployed by the Operator.
Note
To find the device ID for a particular GPU, search by device name in the PCI IDs database.
$ kubectl edit kubevirt -n kubevirt
...
spec:
configuration:
developerConfiguration:
featureGates:
- GPU
permittedHostDevices:
pciHostDevices:
- externalResourceProvider: true
pciVendorSelector: 10DE:2236
resourceName: nvidia.com/GA102GL_A10
...
Create a VM
We are now ready to create a VM. Let’s create a sample VM using a simple VMI spec which requests a nvidia.com/GA102GL_A10 resource:
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstance
metadata:
labels:
special: vmi-gpu
name: vmi-gpu
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
gpus:
- deviceName: nvidia.com/GA102GL_A10
name: gpu1
rng: {}
machine:
type: ""
resources:
requests:
memory: 1024M
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: docker.io/kubevirt/fedora-cloud-container-disk-demo:devel
name: containerdisk
- cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
name: cloudinitdisk
$ kubectl apply -f vmi-gpu.yaml
virtualmachineinstance.kubevirt.io/vmi-gpu created
$ kubectl get vmis
NAME AGE PHASE IP NODENAME READY
vmi-gpu 13s Running 192.168.47.210 cnt-server-2 True
Let’s console into the VM and verify we have a GPU. Refer here for installing virtctl.
$ ./virtctl console vmi-gpu
Successfully connected to vmi-gpu console. The escape sequence is ^]
vmi-gpu login: fedora
Password:
[fedora@vmi-gpu ~]$ sudo yum install -y -q pciutils
[fedora@vmi-gpu ~]$ lspci -nnk -d 10de:
06:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:2236] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:1482]
Running VMs with vGPU
This section runs through the deployment scenario of running VMs with vGPU. We will first deploy the GPU Operator, such that our worker node will be provisioned for vGPU, then we will deploy a KubeVirt VM which requests a vGPU.
By default, to provision vGPU, the GPU Operator will deploy the following components:
NVIDIA vGPU Manager
- installs vGPU Manager on the nodeNVIDIA vGPU Device Manager
- creates vGPU devices on the node after vGPU Manager is installedSandbox Device Plugin
- to discover and advertise the vGPU devices to kubeletSandbox Validator
- to validate the other operands
Build the vGPU Manager Image
Building the vGPU Manager container and pushing it to a private registry is a prerequisite. To fulfill this prerequisite, follow the below steps.
Download the vGPU Software from the NVIDIA Licensing Portal.
Login to the NVIDIA Licensing Portal and navigate to the Software Downloads section.
The NVIDIA vGPU Software is located in the Software Downloads section of the NVIDIA Licensing Portal.
The vGPU Software bundle is packaged as a zip file. Download and unzip the bundle to obtain the NVIDIA vGPU Manager for Linux (
NVIDIA-Linux-x86_64-<version>-vgpu-kvm.run
file)
Next, clone the driver container repository and build the driver image with the following steps.
Open a terminal and clone the driver container image repository.
$ git clone https://gitlab.com/nvidia/container-images/driver
$ cd driver
Change to the vgpu-manager directory for your OS. We use Ubuntu 20.04 as an example.
$ cd vgpu-manager/ubuntu20.04
Note
For RedHat OpenShift, run cd vgpu-manager/rhel
to use the rhel
folder instead.
Copy the NVIDIA vGPU Manager from your extracted zip file
$ cp <local-driver-download-directory>/*-vgpu-kvm.run ./
PRIVATE_REGISTRY
- name of private registry used to store driver imageVERSION
- NVIDIA vGPU Manager version downloaded from NVIDIA Software PortalOS_TAG
- this must match the Guest OS version. In the below example ubuntu20.04
is used. For RedHat OpenShift this should be set to rhcos4.x
where x is the supported minor OCP version.$ export PRIVATE_REGISTRY=my/private/registry VERSION=510.73.06 OS_TAG=ubuntu20.04
Build the NVIDIA vGPU Manager image.
$ docker build \
–build-arg DRIVER_VERSION=${VERSION} \
-t ${PRIVATE_REGISTRY}/vgpu-manager:${VERSION}-${OS_TAG} .
Push NVIDIA vGPU Manager image to your private registry.
$ docker push ${PRIVATE_REGISTRY}/vgpu-manager:${VERSION}-${OS_TAG}
Install the GPU Operator
Follow the below steps.
Label the worker node explicitly for vGPU workloads:
$ kubectl label node <node-name> --overwrite nvidia.com/gpu.workload.config=vm-vgpu
Create a configuration file named config.yaml
for the vGPU Device Manager. This file contains a list of vGPU device configurations. Each named configuration contains a list of desired vGPU types. The vGPU Device Manager reads the configuration file and applies a specific named configuration when creating vGPU devices on the node. Download the comprehensive example file as a starting point, and modify as needed:
$ wget -O config.yaml https://raw.githubusercontent.com/NVIDIA/vgpu-device-manager/main/examples/config-example.yaml
Optionally, label the worker node explicitly with a vGPU devices config. More information on vGPU devices config is detailed in this section below.
$ kubectl label node <node-name> --overwrite nvidia.com/vgpu.config=<config-name>
Create a namespace for GPU Operator:
$ kubectl create namespace gpu-operator
Create a ConfigMap for the vGPU devices config:
$ kubectl create cm vgpu-devices-config -n gpu-operator –from-file=config.yaml
Install the GPU Operator with sandbox workloads enabled and specify the vGPU Manager image built previously:
$ helm install gpu-operator nvidia/gpu-operator -n gpu-operator \
–set sandboxWorkloads.enabled=true \
–set vgpuManager.repository=<path to private repository>
–set vgpuManager.image=vgpu-manager
–set vgpuManager.version=<driver version>
The following operands get deployed. Ensure all pods are in a running state and all validations succeed with the sandbox-validator
component.
$ kubectl get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
...
nvidia-sandbox-device-plugin-daemonset-kkdt9 1/1 Running 0 9s
nvidia-sandbox-validator-jcpgw 1/1 Running 0 9s
nvidia-vgpu-device-manager-8mgg8 1/1 Running 0 89s
nvidia-vgpu-manager-daemonset-fpplc 1/1 Running 0 2m41s
This worker node has two A10 GPUs. Assuming the node has not been labeled explicitly with nvidia.com/vgpu.config=<config-name>
, the default
configuration will be used. And since the default
configuration in the vgpu-devices-config only lists the A10-24C vGPU type for the A10 GPU, the vgpu-device-manager should only create vGPU devices on this type.
A10-24C is the largest vGPU type supported on the A10 GPU, and only one vGPU device can be created per physical GPU. We should see two vGPU devices created:
$ ls -l /sys/bus/mdev/devices
total 0
lrwxrwxrwx 1 root root 0 Jun 7 00:18 9adc60ea-98a7-41b6-b17b-9b3e0d210c7a -> ../../../devices/pci0000:85/0000:85:02.0/0000:86:00.4/9adc60ea-98a7-41b6-b17b-9b3e0d210c7a
lrwxrwxrwx 1 root root 0 Jun 7 00:18 f9033b86-ccee-454b-8b20-dd7912d95bfd -> ../../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.4/f9033b86-ccee-454b-8b20-dd7912d95bfd
The sandbox-device-plugin will discover and advertise these resources to kubelet. In this example, we have two A10 GPUs and therefore two A10-24C vGPU devices.
$ kubectl describe node
...
Capacity:
...
nvidia.com/NVIDIA_A10-24C: 2
...
Allocatable:
...
nvidia.com/NVIDIA_A10-24C: 2
...
Update the KubeVirt CR
Next, we will update the KubeVirt Custom Resource, as documented in the KubeVirt user guide, so that these vGPU devices are permitted and can be requested by a KubeVirt VM. Note, replace the values for mdevNameSelector
and resourceName
to correspond to your vGPU type. We set externalResourceProvider=true
to indicate that this resource is being provided by an external device plugin, in this case the sandbox-device-plugin which is deployed by the Operator.
$ kubectl edit kubevirt -n kubevirt
...
spec:
certificateRotateStrategy: {}
configuration:
developerConfiguration:
featureGates:
- GPU
permittedHostDevices:
mediatedDevices:
- externalResourceProvider: true
mdevNameSelector: NVIDIA A10-24C
resourceName: nvidia.com/NVIDIA_A10-24C
...
We are now ready to create a VM. Let’s create a sample VM using a simple VMI spec which requests a nvidia.com/NVIDIA_A10-24C
resource:
$ cat vmi-vgpu.yaml
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstance
metadata:
labels:
special: vmi-vgpu
name: vmi-vgpu
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
gpus:
- deviceName: nvidia.com/NVIDIA_A10-24C
name: vgpu1
rng: {}
machine:
type: ""
resources:
requests:
memory: 1024M
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: docker.io/kubevirt/fedora-cloud-container-disk-demo:devel
name: containerdisk
- cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
name: cloudinitdisk
$ kubectl apply -f vmi-vgpu.yaml
virtualmachineinstance.kubevirt.io/vmi-vgpu created
$ kubectl get vmis
NAME AGE PHASE IP NODENAME READY
vmi-vgpu 10s Running 192.168.47.205 cnt-server-2 True
Let’s console into the VM and verify we have a GPU. Refer here for installing virtctl.
$ ./virtctl console vmi-vgpu
Successfully connected to vmi-vgpu console. The escape sequence is ^]
vmi-vgpu login: fedora
Password:
[fedora@vmi-vgpu ~]$ sudo yum install -y -q pciutils
[fedora@vmi-vgpu ~]$ lspci -nnk -d 10de:
06:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:2236] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:14d4]
Apply a New vGPU Device Configuration
We can apply a specific vGPU device configuration on a per-node basis by setting the nvidia.com/vgpu.config
node label. It is recommended to set this node label prior to installing the GPU Operator if you do not want the default configuration applied.
Switching vGPU device configuration assumes that no VMs with vGPU are currently running on the node. Any existing VMs will have to be shutdown/migrated first.
To apply a new configuration after GPU Operator install, simply update the node label:
$ kubectl label node <node-name> --overwrite nvidia.com/vgpu.config=A10-4C
After the vGPU Device Manager finishes applying the new configuration, all pods should return to the Running state.
$ kubectl get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
...
nvidia-sandbox-device-plugin-daemonset-brtb6 1/1 Running 0 10s
nvidia-sandbox-validator-ljnwg 1/1 Running 0 10s
nvidia-vgpu-device-manager-8mgg8 1/1 Running 0 30m
nvidia-vgpu-manager-daemonset-fpplc 1/1 Running 0 31m
We now see 12 vGPU devices on the node, as 6 A10-4C devices can be created per A10 GPU.
$ ls -ltr /sys/bus/mdev/devices
total 0
lrwxrwxrwx 1 root root 0 Jun 7 00:47 87401d9a-545b-4506-b1be-d4d30f6f4a4b -> ../../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.5/87401d9a-545b-4506-b1be-d4d30f6f4a4b
lrwxrwxrwx 1 root root 0 Jun 7 00:47 78597b11-282f-496c-a4d0-19220310039c -> ../../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.4/78597b11-282f-496c-a4d0-19220310039c
lrwxrwxrwx 1 root root 0 Jun 7 00:47 0d093db4-2c57-40ce-a1f0-ef4d410c6db8 -> ../../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.6/0d093db4-2c57-40ce-a1f0-ef4d410c6db8
lrwxrwxrwx 1 root root 0 Jun 7 00:47 f830dbb1-0eb5-4294-af32-c68028e2ae35 -> ../../../devices/pci0000:3a/0000:3a:00.0/0000:3b:00.7/f830dbb1-0eb5-4294-af32-c68028e2ae35
lrwxrwxrwx 1 root root 0 Jun 7 00:47 a5a11713-e683-4372-bebf-82219c58ce24 -> ../../../devices/pci0000:3a/0000:3a:00.0/0000:3b:01.1/a5a11713-e683-4372-bebf-82219c58ce24
lrwxrwxrwx 1 root root 0 Jun 7 00:47 1a48c902-07f1-4a19-b3b0-b89ce35ad025 -> ../../../devices/pci0000:3a/0000:3a:00.0/0000:3b:01.0/1a48c902-07f1-4a19-b3b0-b89ce35ad025
lrwxrwxrwx 1 root root 0 Jun 7 00:47 b8de2bbe-a41a-440e-9276-f7b56dc35138 -> ../../../devices/pci0000:85/0000:85:02.0/0000:86:01.1/b8de2bbe-a41a-440e-9276-f7b56dc35138
lrwxrwxrwx 1 root root 0 Jun 7 00:47 afd7a4bb-d638-4489-bb41-6e03fc5c75b5 -> ../../../devices/pci0000:85/0000:85:02.0/0000:86:01.0/afd7a4bb-d638-4489-bb41-6e03fc5c75b5
lrwxrwxrwx 1 root root 0 Jun 7 00:47 98175f96-707b-4167-ada5-869110ead3ab -> ../../../devices/pci0000:85/0000:85:02.0/0000:86:00.5/98175f96-707b-4167-ada5-869110ead3ab
lrwxrwxrwx 1 root root 0 Jun 7 00:47 6e93ea61-9068-4096-b20c-ea30a72c1238 -> ../../../devices/pci0000:85/0000:85:02.0/0000:86:00.7/6e93ea61-9068-4096-b20c-ea30a72c1238
lrwxrwxrwx 1 root root 0 Jun 7 00:47 537ce645-32cc-46d0-b7f0-f90ead840957 -> ../../../devices/pci0000:85/0000:85:02.0/0000:86:00.6/537ce645-32cc-46d0-b7f0-f90ead840957
lrwxrwxrwx 1 root root 0 Jun 7 00:47 4eb167bc-0e15-43f3-a218-d74cc9d162ff -> ../../../devices/pci0000:85/0000:85:02.0/0000:86:00.4/4eb167bc-0e15-43f3-a218-d74cc9d162ff
Check the new vGPU resources are advertised to kubelet:
$ kubectl describe node
...
Capacity:
...
nvidia.com/NVIDIA_A10-4C: 12
...
Allocatable:
...
nvidia.com/NVIDIA_A10-4C: 12
...
Following previous instructions, we can now create a VM with an A10-4C vGPU attached.