Install GPU Operator in Proxy Environments
Introduction
This page describes how to successfully deploy the GPU Operator in clusters behind a HTTP Proxy. By default, the GPU Operator requires internet access for the following reasons:
Container images need to be pulled during GPU Operator installation.
The
drivercontainer needs to download several OS packages prior to driver installation.
To address these requirements, all Kubernetes nodes as well as the driver container need proper configuration
in order to direct traffic through the proxy.
This document demonstrates how to configure the GPU Operator so that the driver container can successfully
download packages behind a HTTP proxy. Since configuring Kubernetes/container runtime components to use
a proxy is not specific to the GPU Operator, we do not include those instructions here.
The instructions for Openshift are different, so skip the section titled HTTP Proxy Configuration for Openshift if you are not running Openshift.
Prerequisites
- Kubernetes cluster is configured with HTTP proxy settings (container runtime should be enabled with HTTP proxy) 
HTTP Proxy Configuration for Openshift
For Openshift, it is recommended to use the cluster-wide Proxy object to provide proxy information for the cluster.
Please follow the procedure described in Configuring the cluster-wide proxy
from Red Hat Openshift public documentation. The GPU Operator will automatically inject proxy related ENV into the driver container
based on information present in the cluster-wide Proxy object.
Note
- GPU Operator v1.8.0 does not work well on RedHat OpenShift when a cluster-wide Proxy object is configured and causes constant restarts of - drivercontainer. This will be fixed in an upcoming patch release v1.8.2.
HTTP Proxy Configuration
First, get the values.yaml file used for GPU Operator configuration:
$ curl -sO https://raw.githubusercontent.com/NVIDIA/gpu-operator/v1.7.0/deployments/gpu-operator/values.yaml
Note
Replace v1.7.0 in the above command with the version you want to use.
Specify driver.env in values.yaml with appropriate HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables
(in both uppercase and lowercase).
driver:
   env:
   - name: HTTPS_PROXY
     value: http://<example.proxy.com:port>
   - name: HTTP_PROXY
     value: http://<example.proxy.com:port>
   - name: NO_PROXY
     value: <example.com>
   - name: https_proxy
     value: http://<example.proxy.com:port>
   - name: http_proxy
     value: http://<example.proxy.com:port>
   - name: no_proxy
     value: <example.com>
Note
- Proxy related ENV are automatically injected by GPU Operator into the - drivercontainer to indicate proxy information used when downloading necessary packages.
- If HTTPS Proxy server is setup then change the values of HTTPS_PROXY and https_proxy to use - httpsinstead.
Deploy GPU Operator
Download and deploy GPU Operator Helm Chart with the updated values.yaml.
Fetch the chart from NGC repository. v1.10.0 is used as an example in the command below:
$ helm fetch https://helm.ngc.nvidia.com/nvidia/charts/gpu-operator-v1.10.0.tgz
Install the GPU Operator with updated values.yaml:
$ helm install --wait gpu-operator \
     -n gpu-operator --create-namespace \
     gpu-operator-v1.10.0.tgz \
     -f values.yaml
Check the status of the pods to ensure all the containers are running:
$ kubectl get pods -n gpu-operator