Install GPU Operator in Proxy Environments
This page describes how to successfully deploy the GPU Operator in clusters behind a HTTP Proxy. By default, the GPU Operator requires internet access for the following reasons:
Container images need to be pulled during GPU Operator installation.
drivercontainer needs to download several OS packages prior to driver installation.
To address these requirements, all Kubernetes nodes as well as the
driver container need proper configuration
in order to direct traffic through the proxy.
This document demonstrates how to configure the GPU Operator so that the
driver container can successfully
download packages behind a HTTP proxy. Since configuring Kubernetes/container runtime components to use
a proxy is not specific to the GPU Operator, we do not include those instructions here.
The instructions for Openshift are different, so skip the section titled HTTP Proxy Configuration for Openshift if you are not running Openshift.
Kubernetes cluster is configured with HTTP proxy settings (container runtime should be enabled with HTTP proxy)
HTTP Proxy Configuration for Openshift
For Openshift, it is recommended to use the cluster-wide Proxy object to provide proxy information for the cluster.
Please follow the procedure described in Configuring the cluster-wide proxy
from Red Hat Openshift public documentation. The GPU Operator will automatically inject proxy related ENV into the
based on information present in the cluster-wide Proxy object.
GPU Operator v1.8.0 does not work well on RedHat OpenShift when a cluster-wide Proxy object is configured and causes constant restarts of
drivercontainer. This will be fixed in an upcoming patch release v1.8.2.
HTTP Proxy Configuration
First, get the up-to-date
values.yaml file used for GPU Operator configuration:
$ curl -sO https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/deployments/gpu-operator/values.yaml
The above command retrieves the latest
values.yaml. If you want to use a specific GPU Operator version, use the following
v1.7.0 with the appropriate version:
values.yaml with appropriate HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables
(in both uppercase and lowercase).
driver: env: - name: HTTPS_PROXY value: http://<example.proxy.com:port> - name: HTTP_PROXY value: http://<example.proxy.com:port> - name: NO_PROXY value: <example.com> - name: https_proxy value: http://<example.proxy.com:port> - name: http_proxy value: http://<example.proxy.com:port> - name: no_proxy value: <example.com>
Proxy related ENV are automatically injected by GPU Operator into the
drivercontainer to indicate proxy information used when downloading necessary packages.
If HTTPS Proxy server is setup then change the values of HTTPS_PROXY and https_proxy to use
Deploy GPU Operator
Download and deploy GPU Operator Helm Chart with the updated
Fetch latest version of the chart from NGC repository.
v1.8.1 is used in the command below:
$ helm fetch https://helm.ngc.nvidia.com/nvidia/charts/gpu-operator-v1.8.1.tgz
Install the GPU Operator with updated
$ helm install --wait gpu-operator \ gpu-operator-v1.8.1.tgz \ -f values.yaml
Check the status of the pods to ensure all the containers are running:
$ kubectl get pods -n gpu-operator-resources