Install GPU Operator in Proxy Environments
Introduction
This page describes how to successfully deploy the GPU Operator in clusters behind an HTTP proxy. By default, the GPU Operator requires internet access for the following reasons:
Container images need to be pulled during GPU Operator installation.
The
driver
container needs to download several OS packages prior to driver installation.Tip
Using Precompiled Driver Containers removes the need for the
driver
containers to download operating system packages.
To address these requirements, all Kubernetes nodes as well as the driver
container need proper configuration
in order to direct traffic through the proxy.
This document demonstrates how to configure the GPU Operator so that the driver
container can successfully
download packages behind a HTTP proxy. Since configuring Kubernetes/container runtime components to use
a proxy is not specific to the GPU Operator, we do not include those instructions here.
The instructions for Openshift are different, so skip the section titled HTTP Proxy Configuration for Openshift if you are not running Openshift.
Prerequisites
Kubernetes cluster is configured with HTTP proxy settings (container runtime should be enabled with HTTP proxy)
HTTP Proxy Configuration for Openshift
For Openshift, it is recommended to use the cluster-wide Proxy object to provide proxy information for the cluster.
Please follow the procedure described in Configuring the cluster-wide proxy
from Red Hat Openshift public documentation. The GPU Operator will automatically inject proxy related ENV into the driver
container
based on information present in the cluster-wide Proxy object.
Note
GPU Operator v1.8.0 does not work well on RedHat OpenShift when a cluster-wide Proxy object is configured and causes constant restarts of
driver
container. This will be fixed in an upcoming patch release v1.8.2.
HTTP Proxy Configuration
First, get the values.yaml
file used for GPU Operator configuration:
$ curl -sO https://raw.githubusercontent.com/NVIDIA/gpu-operator/v1.7.0/deployments/gpu-operator/values.yaml
Note
Replace v1.7.0
in the above command with the version you want to use.
Specify driver.env
in values.yaml
with appropriate HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables
(in both uppercase and lowercase).
driver:
env:
- name: HTTPS_PROXY
value: http://<example.proxy.com:port>
- name: HTTP_PROXY
value: http://<example.proxy.com:port>
- name: NO_PROXY
value: <example.com>
- name: https_proxy
value: http://<example.proxy.com:port>
- name: http_proxy
value: http://<example.proxy.com:port>
- name: no_proxy
value: <example.com>
Note
Proxy related ENV are automatically injected by GPU Operator into the
driver
container to indicate proxy information used when downloading necessary packages.If HTTPS Proxy server is setup then change the values of HTTPS_PROXY and https_proxy to use
https
instead.
Deploy GPU Operator
Download and deploy GPU Operator Helm Chart with the updated values.yaml
.
Fetch the chart from NGC repository. v1.10.0
is used as an example in the command below:
$ helm fetch https://helm.ngc.nvidia.com/nvidia/charts/gpu-operator-v1.10.0.tgz
Install the GPU Operator with updated values.yaml
:
$ helm install --wait gpu-operator \
-n gpu-operator --create-namespace \
gpu-operator-v1.10.0.tgz \
-f values.yaml
Check the status of the pods to ensure all the containers are running:
$ kubectl get pods -n gpu-operator