NVIDIA vGPU

This document provides an overview of the workflow to getting started with using the GPU Operator with NVIDIA vGPU.

Note

NVIDIA vGPU is only supported with the NVIDIA Virtual (vGPU) Software License Server (vCS). The NVIDIA License System is only supported with NVIDIA AI Enterprise.

High Level Workflow

The following section outlines the high level workflow to use the GPU Operator with NVIDIA vGPUs.

  1. Download the vGPU Software and latest NVIDIA vGPU Driver Catalog file.

  2. Clone driver container source repository for building private driver image.

  3. Build the driver container image.

  4. Push the driver container image to your private repository.

  5. Create a ConfigMap in gpu-operator-resources namespace with vGPU license configuration file.

  6. Create an ImagePullSecret in gpu-operator-resources namespace for your private repository.

  7. Install the GPU Operator.

Detailed Workflow

Download the vGPU Software and latest NVIDIA vGPU driver catalog file from the NVIDIA Licensing Portal.

  1. Login to the NVIDIA Licensing Portal and navigate to the “Software Downloads” section.

  2. The NVIDIA vGPU Software is located in the Software Downloads section of the NVIDIA Licensing Portal.

  3. The NVIDIA vGPU catalog driver file is located in the “Additional Software” section.

  4. The vGPU Software bundle is packaged as a zip file. Download and unzip the bundle to obtain the NVIDIA vGPU Linux guest driver (NVIDIA-Linux-x86_64-<version>-grid.run file)

Clone the driver container repository and build driver image

  • Open a terminal and clone the driver container image repository

$ git clone https://gitlab.com/nvidia/container-images/driver
$ cd driver
  • Change to the OS directory under the driver directory.

$ cd ubuntu20.04

Note

For RedHat OpenShift, run cd rhel to use rhel folder instead.

  • Copy the NVIDIA vGPU guest driver from your extracted zip file and the NVIDIA vGPU driver catalog file

$ cp <local-driver-download-directory>/*-grid.run drivers
$ cp vgpuDriverCatalog.yaml drivers
  • Build the driver container image

Set the private registry name using below command on the terminal

$ export PRIVATE_REGISTRY=<private registry name>

Set the OS_TAG. The OS_TAG has to match the Guest OS version. Please refer to OS Support for the list of supported OS distributions. In the below example ubuntu20.04 is used, for RedHat OpenShift this should be rhcos4.x where x is the supported minor OCP version.

$ export OS_TAG=ubuntu20.04

Set the driver container image version to a user defined version number. For example, 1.0.0:

$ export VERSION=1.0.0

Note

VERSION can be any user defined value. Please note this value to use during operator installation command

Replace the VGPU_DRIVER_VERSION below with the appropriate Linux guest vGPU driver version downloaded from the NVIDIA software portal. In this example, the 460.32.03 driver has been downloaded. Note that the -grid suffix needs to be added to the environment variable as shown:

$ export VGPU_DRIVER_VERSION=460.32.03-grid

Note

GPU Operator automatically selects the compatible guest driver version from the drivers bundled with the driver image. If version check is disabled with --build-arg DISABLE_VGPU_VERSION_CHECK=true when building driver image, then VGPU_DRIVER_VERSION value is used as default.

Build the driver container image

$ sudo docker build \
  --build-arg DRIVER_TYPE=vgpu \
  --build-arg DRIVER_VERSION=$VGPU_DRIVER_VERSION \
  -t ${PRIVATE_REGISTRY}/driver:${VERSION}-${OS_TAG} .
  • Push the driver container image to your private repository

$ sudo docker login ${PRIVATE_REGISTRY} --username=<username> {enter password on prompt}
$ sudo docker push ${PRIVATE_REGISTRY}/driver:${VERSION}-${OS_TAG}
  • Install the GPU Operator.

Create a NVIDIA vGPU license file named gridd.conf with the below content.

# Description: Set License Server Address
# Data type: string
# Format:  "<address>"
ServerAddress=<license server address>

Input the license server address of the License Server

Note

Optionally add a backup/secondary license server address if one is configured. BackupServerAddress=<backup license server address>

Create a ConfigMap licensing-config using gridd.conf file created above

$ kubectl  create namespace gpu-operator-resources
$ kubectl create configmap licensing-config \
  -n gpu-operator-resources --from-file=gridd.conf

Creating an image pull secrets

$ export REGISTRY_SECRET_NAME=registry-secret
$ kubectl create secret docker-registry ${REGISTRY_SECRET_NAME} \
  --docker-server=${PRIVATE_REGISTRY} --docker-username=<username> \
  --docker-password=<password> \
  --docker-email=<email-id> -n gpu-operator-resources

Note

Please note the secret name REGISTRY_SECRET_NAME for using during operator installation command.

  • Install GPU Operator via the Helm chart

Please refer to Install NVIDIA GPU Operator section for GPU operator installation command and options for vGPU.