Precompiled Driver Containers#

About Precompiled Driver Containers#

Containers with precompiled drivers do not require internet access to download Linux kernel header files, GCC compiler tooling, or operating system packages.

Using precompiled drivers also avoids the burst of compute demand that is required to compile the kernel drivers with the conventional driver containers.

These two benefits are valuable to most sites, but are especially beneficial to sites with restricted internet access or sites with resource-constrained hardware.

Limitations and Restrictions#

Support for deploying the driver containers with precompiled drivers is limited to hosts with the x86_64 architecture and operating system versions listed in the Supported Precompiled Drivers table.

For information about using precompiled drivers with OpenShift Container Platform, refer to Precompiled Drivers for the NVIDIA GPU Operator for RHCOS.
NVIDIA supports precompiled driver containers for the most recently released long-term servicing branch (LTSB) driver branch.
NVIDIA builds images for the aws, azure, generic, nvidia, and oracle kernel variants. If your hosts run a different kernel variant, you can build a precompiled driver image and use your own container registry.
Precompiled driver containers do not support NVIDIA vGPU or GPUDirect Storage (GDS).

Determining if a Precompiled Driver Container is Available#

The precompiled driver containers are named according to the following pattern:

<driver-branch>-<linux-kernel-version>-<os-tag>

For example, 525-5.15.0-69-generic-ubuntu22.04.

Use one of the following ways to check if a driver container is available for your Linux kernel and driver branch:

Use a web browser to access the NVIDIA GPU Driver page of the NVIDIA GPU Cloud registry at https://catalog.ngc.nvidia.com/orgs/nvidia/containers/driver/tags. Use the search field to filter the tags by your operating system version.

Use the NGC CLI tool to list the tags for the driver container:

$ ngc registry image info nvidia/driver

Example Output

Image Repository Information
  Name: driver
  Display Name: NVIDIA GPU Driver
  Short Description: Provision NVIDIA GPU Driver as a Container.
  Built By: NVIDIA
  Publisher: NVIDIA
  Multinode Support: False
  Multi-Arch Support: True
  Logo: https://assets.nvidiagrid.net/ngc/logos/Infrastructure.png
  Labels: Multi-Arch, NVIDIA AI Enterprise Supported, Infrastructure Software, Kubernetes Infrastructure
  Public: Yes
  Last Updated: Apr 20, 2023
  Latest Image Size: 688.87 MB
  Latest Tag: 525-5.15.0-69-generic-ubuntu22.04
  Tags:
      525-5.15.0-69-generic-ubuntu22.04
      525-5.15.0-70-generic-ubuntu22.04
      ...

Enabling Precompiled Driver Container Support During Installation#

Refer to the common instructions for installing the Operator with Helm at Installing the NVIDIA GPU Operator. Specify the --set driver.usePrecompiled=true and --set driver.version=<driver-branch> arguments like the following example command:

$ helm install --wait gpu-operator \
     -n gpu-operator --create-namespace \
     nvidia/gpu-operator \
     --version=v25.3.4 \
     --set driver.usePrecompiled=true \
     --set driver.version="<driver-branch>"

Specify a value like 525 for <driver-branch>. Refer to Common Chart Customization Options for information about other installation options.

Enabling Support After Installation#

Perform the following steps to enable support for precompiled driver containers:

Enable support by modifying the cluster policy:

$ kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \
   -p='[
     {"op":"replace", "path":"/spec/driver/usePrecompiled", "value":true},
     {"op":"replace", "path":"/spec/driver/version", "value":"<driver-branch>"}
   ]'

Specify a value like 525 for <driver-branch>.

Example Output

clusterpolicy.nvidia.com/cluster-policy patched

Optional: Confirm that the driver daemon set pods terminate:

$ kubectl get pods -n gpu-operator

Example Output

NAME                                                              READY   STATUS        RESTARTS   AGE
pod/gpu-feature-discovery-pzzr8                                   2/2     Running       0          19m
pod/gpu-operator-859cb64846-57hfn                                 1/1     Running       0          47m
pod/gpu-operator-node-feature-discovery-master-6d6649d597-7l8bj   1/1     Running       0          10d
pod/gpu-operator-node-feature-discovery-worker-v86vj              1/1     Running       0          10d
pod/nvidia-container-toolkit-daemonset-6ltbv                      1/1     Running       0          19m
pod/nvidia-cuda-validator-62w6r                                   0/1     Completed     0          17m
pod/nvidia-dcgm-exporter-fh5wz                                    1/1     Running       0          19m
pod/nvidia-device-plugin-daemonset-rwslh                          2/2     Running       0          19m
pod/nvidia-device-plugin-validator-gq4ww                          0/1     Completed     0          17m
pod/nvidia-driver-daemonset-xqrxk                                 1/1     Terminating   0          20m
pod/nvidia-operator-validator-78mzv                               1/1     Running       0          19m

Confirm that the driver container pods are running:

$ kubectl get pods -l app=nvidia-driver-daemonset -n gpu-operator

Example Output

NAME                                                          READY   STATUS    RESTARTS   AGE
nvidia-driver-daemonset-5.15.0-69-generic-ubuntu22.04-thbts   1/1     Running   0          44s

Ensure that the pod names include a Linux kernel semantic version number like 5.15.0-69-generic.

Disabling Support for Precompiled Driver Containers#

Perform the following steps to disable support for precompiled driver containers:

Disable support by modifying the cluster policy:

$ kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \
    -p='[
      {"op": "replace", "path": "/spec/driver/usePrecompiled", "value":false},
      {"op": "replace", "path": "/spec/driver/version", "value":"550.90.07"},
    ]'

Example Output

clusterpolicy.nvidia.com/cluster-policy patched

Confirm that the conventional driver container pods are running:

$ kubectl get pods -l app=nvidia-driver-daemonset -n gpu-operator

Example Output

NAME                            READY   STATUS    RESTARTS   AGE
nvidia-driver-daemonset-qwprp   1/1     Running   0          10m

Ensure that the pod names do not include a Linux kernel semantic version number.

Building a Custom Driver Container Image#

If a precompiled driver container for your Linux kernel variant is not available, you can perform the following steps to build and run a container image.

Note

NVIDIA provides limited support for custom driver container images.

Prerequisites

You have access to a private container registry, such as NVIDIA NGC Private Registry, and can push container images to the registry.
Your build machine has access to the internet to download operating system packages.
You know a CUDA version, such as 12.1.0, that you want to use. The CUDA version only specifies which base image is used to build the driver container. The version does not have any correlation to the version of CUDA that is associated with or supported by the resulting driver container.

One way to find a supported CUDA version for your operating system is to access the NVIDIA GPU Cloud registry at https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags and view the tags. Use the search field to filter the tags, such as base-ubuntu22.04. The filtered results show the CUDA versions, such as 12.1.0, 12.0.1, 12.0.0, and so on.
You know the GPU driver branch, such as 525, that you want to use.

Procedure

Clone the driver container repository and change directory into the repository:

$ git clone https://github.com/NVIDIA/gpu-driver-container.git

$ cd gpu-driver-container

Change directory to the operating system name and version under the driver directory:
```
$ cd ubuntu22.04/precompiled
```
Set environment variables for building the driver container image.
- Specify your private registry URL:
```
$ export PRIVATE_REGISTRY=<private-registry-url>
```
- Specify the KERNEL_VERSION environment variable that matches your kernel variant, such as 5.15.0-1033-aws:
```
$ export KERNEL_VERSION=5.15.0-1033-aws
```
- Specify the version of the CUDA base image to use when building the driver container:
```
$ export CUDA_VERSION=12.1.0
```
- Specify the driver branch, such as 525:
```
$ export DRIVER_BRANCH=525
```
- Specify the OS_TAG environment variable to identify the guest operating system name and version:
```
$ export OS_TAG=ubuntu22.04
```
  The value must match the guest operating system version.

Build the driver container image:

$ sudo docker build \
    --build-arg KERNEL_VERSION=$KERNEL_VERSION \
    --build-arg CUDA_VERSION=$CUDA_VERSION \
    --build-arg DRIVER_BRANCH=$DRIVER_BRANCH \
    -t ${PRIVATE_REGISTRY}/driver:${DRIVER_BRANCH}-${KERNEL_VERSION}-${OS_TAG} .

Push the driver container image to your private registry.
- Log in to your private registry:
```
$ sudo docker login ${PRIVATE_REGISTRY} --username=<username>
```
  Enter your password when prompted.
- Push the driver container image to your private registry:
```
$ sudo docker push ${PRIVATE_REGISTRY}/driver:${DRIVER_BRANCH}-${KERNEL_VERSION}-${OS_TAG}
```

Next Steps

To use the custom driver container image, follow the steps for enabling support during or after installation.

If you have not already installed the GPU Operator, in addition to the --set driver.usePrecompiled=true and --set driver.version=${DRIVER_BRANCH} arguments for Helm, also specify the --set driver.repository="$PRIVATE_REGISTRY" argument.

If the container registry is not public, you need to create an image pull secret in the GPU Operator namespace and specify it in the --set driver.imagePullSecrets=<pull-secret> argument.

If you already installed the GPU Operator, specify the private registry for the driver in the cluster policy:
```
$ kubectl patch clusterpolicies.nvidia.com/cluster-policy --type='json' \
    -p='[{"op": "replace", "path": "/spec/driver/repository", "value":"$PRIVATE_REGISTRY"}]'
```