Install GPU Operator in Proxy Environments

Introduction

This page describes how to successfully deploy the GPU Operator in clusters behind an HTTP proxy. By default, the GPU Operator requires internet access for the following reasons:

Container images need to be pulled during GPU Operator installation.

The driver container needs to download several OS packages prior to driver installation.

Tip

Using Precompiled Driver Containers removes the need for the driver containers to download operating system packages.

To address these requirements, all Kubernetes nodes as well as the driver container need proper configuration in order to direct traffic through the proxy.

This document demonstrates how to configure the GPU Operator so that the driver container can successfully download packages behind a HTTP proxy. Since configuring Kubernetes/container runtime components to use a proxy is not specific to the GPU Operator, we do not include those instructions here.

The instructions for Openshift are different, so skip the section titled HTTP Proxy Configuration for Openshift if you are not running Openshift.

Prerequisites

Kubernetes cluster is configured with HTTP proxy settings (container runtime should be enabled with HTTP proxy)

HTTP Proxy Configuration for Openshift

For Openshift, it is recommended to use the cluster-wide Proxy object to provide proxy information for the cluster. Please follow the procedure described in Configuring the cluster-wide proxy from Red Hat Openshift public documentation. The GPU Operator will automatically inject proxy related ENV into the driver container based on information present in the cluster-wide Proxy object.

Note

GPU Operator v1.8.0 does not work well on RedHat OpenShift when a cluster-wide Proxy object is configured and causes constant restarts of driver container. This will be fixed in an upcoming patch release v1.8.2.

HTTP Proxy Configuration

First, get the values.yaml file used for GPU Operator configuration:

$ curl -sO https://raw.githubusercontent.com/NVIDIA/gpu-operator/v1.7.0/deployments/gpu-operator/values.yaml

Note

Replace v1.7.0 in the above command with the version you want to use.

Specify driver.env in values.yaml with appropriate HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables (in both uppercase and lowercase).

driver:
   env:
   - name: HTTPS_PROXY
     value: http://<example.proxy.com:port>
   - name: HTTP_PROXY
     value: http://<example.proxy.com:port>
   - name: NO_PROXY
     value: <example.com>
   - name: https_proxy
     value: http://<example.proxy.com:port>
   - name: http_proxy
     value: http://<example.proxy.com:port>
   - name: no_proxy
     value: <example.com>

Note

Proxy related ENV are automatically injected by GPU Operator into the driver container to indicate proxy information used when downloading necessary packages.
If HTTPS Proxy server is setup then change the values of HTTPS_PROXY and https_proxy to use https instead.

Deploy GPU Operator

Download and deploy GPU Operator Helm Chart with the updated values.yaml.

Fetch the chart from NGC repository. v1.10.0 is used as an example in the command below:

$ helm fetch https://helm.ngc.nvidia.com/nvidia/charts/gpu-operator-v1.10.0.tgz

Install the GPU Operator with updated values.yaml:

$ helm install --wait gpu-operator \
     -n gpu-operator --create-namespace \
     gpu-operator-v1.10.0.tgz \
     -f values.yaml

Check the status of the pods to ensure all the containers are running:

$ kubectl get pods -n gpu-operator