NVIDIA GPUs with Google Anthos
Changelog
- 3/22/2020 (author: PR):
Fixed URLs
- 11/30/2020 (author: PR/DF):
Added information on Anthos on bare metal
- 11/25/2020 (author: PR):
Migrated docs to new format
- 8/14/2020 (author: PR):
Initial Version
Introduction
Google Cloud’s Anthos is a modern application management platform that lets users build, deploy, and manage applications anywhere in a secure, consistent manner. The platform provides a consistent development and operations experience across deployments while reducing operational overhead and improving developer productivity. Anthos runs in hybrid and multi-cloud environments that spans Google Cloud, on-premise, and is generally available on Amazon Web Services (AWS). Support for Anthos on Microsoft Azure is in preview. For more information on Anthos, see the product overview.
Systems with NVIDIA GPUs can be deployed in various configurations for use with Google Cloud’s Anthos. The purpose of this document is to provide users with steps on getting started with using NVIDIA GPUs with Anthos in these various configurations.
Deployment Configurations
Anthos can be deployed in different configurations. Depending on your deployment, choose one of the sections below to get started with NVIDIA GPUs in Google Cloud’s Anthos:
Supported Platforms
GPUs
The following GPUs are supported:
NVIDIA A100, T4 and V100
DGX Systems
The following NVIDIA DGX systems are supported:
NVIDIA DGX A100
NVIDIA DGX-2 and DGX-1 (Volta)
Linux Distributions
The following Linux distributions are supported:
Ubuntu 18.04.z, 20.04.z LTS
For more information on the Anthos Ready platforms, visit this page.
Getting Support
For support issues related to using GPUs with Anthos, please open a ticket on the NVIDIA GPU Operator GitHub project. Your feedback is appreciated.
DGX customers can visit the NVIDIA DGX Systems Support Portal.
Anthos Clusters on Bare Metal with NVIDIA DGX Systems and GPU-Accelerated Servers
Anthos on bare metal with DGX A100 or NVIDIA GPU-accelerated servers systems enables a consistent development and operational experience across deployments, while reducing expensive overhead and improving developer productivity. Refer to the Anthos documentation for more information on Anthos cluster environments.
Installation Flow
The basic steps described in this document follows this workflow:
Configure nodes
Ensure each node (including the control plane) meets the pre-requisites, including time synchronization, correct versions of Docker and other conditions.
Configure networking (Optional)
Ensure network connectivity between control plane and nodes - ideally the VIPs, control plane and the nodes in the cluster are in the same network subnet.
Configure an admin workstation and set up Anthos to create the cluster
Set up the cluster using Anthos on bare-metal
Setup NVIDIA software on GPU nodes
Set up the NVIDIA software components on the GPU nodes to ensure that your cluster can run CUDA applications.
At the end of the installation flow, you should have a user cluster with GPU-enabled nodes that you can use to deploy applications.
Configure Nodes
These steps are required on each node in the cluster (including the control plane).
Time Synchronization
Ensure
apparmor
is stopped:$ apt-get install -y apparmor-utils policycoreutils
$ systemctl --now enable apparmor \ && systemctl stop apparmor
Synchronize the time on each node:
Check the current time
$ timedatectl
Local time: Fri 2020-11-20 10:38:06 PST Universal time: Fri 2020-11-20 18:38:06 UTC RTC time: Fri 2020-11-20 18:38:08 Time zone: US/Pacific (PST, -0800) System clock synchronized: no NTP service: active RTC in local TZ: no
Configure the NTP server in
/etc/systemd/timesyncd.conf
:NTP=time.google.com
Adjust the system clock:
$ timedatectl set-local-rtc 0 --adjust-system-clock
Restart the service
$ systemctl restart systemd-timesyncd.service
Verify the synchronization with the time server
$ timedatectl
Local time: Fri 2020-11-20 11:03:22 PST Universal time: Fri 2020-11-20 19:03:22 UTC RTC time: Fri 2020-11-20 19:03:22 Time zone: US/Pacific (PST, -0800) System clock synchronized: yes NTP service: active RTC in local TZ: no
Test Network Connectivity
Ensure you can
nslookup
on hostname$ systemctl restart systemd-resolved \ && ping us.archive.ubuntu.com
ping: us.archive.ubuntu.com: Temporary failure in name resolution
Check the nameserver in
resolve.conf
$ cat <<EOF > /etc/resolv.conf nameserver 8.8.8.8 EOF
And re-test
ping
$ ping us.archive.ubuntu.com PING us.archive.ubuntu.com (91.189.91.38) 56(84) bytes of data. 64 bytes from banjo.canonical.com (91.189.91.38): icmp_seq=1 ttl=49 time=73.4 ms 64 bytes from banjo.canonical.com (91.189.91.38): icmp_seq=2 ttl=49 time=73.3 ms 64 bytes from banjo.canonical.com (91.189.91.38): icmp_seq=3 ttl=49 time=73.4 ms
Install Docker
Follow these steps to install Docker. On DGX systems, Docker may already be installed using the docker-ce
package.
In this case, use docker.io
as the base installation package for Docker to ensure a successful cluster setup with
Anthos.
Stop services using docker:
$ systemctl stop kubelet \ && systemctl stop docker \ && systemctl stop containerd \ && systemctl stop containerd.io
Purge the existing packages of Docker and
nvidia-docker2
if any:$ systemctl stop run-docker-netns-default.mount \ && systemctl stop docker.haproxy
$ dpkg -r nv-docker-options \ && dpkg --purge nv-docker-options \ && dpkg -r nvidia-docker2 \ && dpkg --purge nvidia-docker2 \ && dpkg -r docker-ce \ && dpkg --purge docker-ce \ && dpkg -r docker-ce-cli \ && dpkg -r containerd \ && dpkg --purge containerd \ && dpkg -r containerd.io \ && dpkg --purge
Re-install Docker
$ apt-get update \ && apt-get install -y apt-transport-https \ ca-certificates \ curl \ software-properties-common \ inetutils-traceroute \ conntrack
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable"
$ apt-get update \ && apt-get install -y docker.io
$ systemctl --now enable docker
Install nvidia-docker on GPU Nodes
Note
This step should be performed on the GPU nodes only
For DGX systems, re-install nvidia-docker2
from the DGX repositories:
$ apt-get install -y nvidia-docker2
Since Kubernetes does not support the --gpus
option with Docker yet, the nvidia
runtime should
be setup as the default container runtime for Docker on the GPU node. This can be done by adding the
default-runtime
line into the Docker daemon config file, which is usually located on the system
at /etc/docker/daemon.json
:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Restart the Docker daemon to complete the installation after setting the default runtime:
$ sudo systemctl restart docker
For non-DGX systems, refer to the NVIDIA Container Toolkit installation guide
to setup nvidia-docker2
.
Configure Networking (Optional)
Note
The following steps are provided as a reference for configuring the network so that the control plane and the nodes are on the same subnet by using tunnels and DNAT. If the nodes in your cluster are on the same subnet, then you may skip this step.
In the example below:
The control plane is at
10.117.29.41
The GPU node or admin workstation is at
10.110.20.149
The control plane VIP is
10.0.0.8
If the machines are on a different subnet than each other or the control plane VIP then tunnel routes can be used to establish connectivity.
There are two scenarios to consider:
If the machines are on the same subnet, but the VIP is on a different subnet, then add the correct IP route (using
ip route add 10.0.0.8 via <contro-plane-ip>
from the GPU node or admin-workstationIf the machines and VIP are on different subnets, then a tunnel is also needed to enable the above route command to succeed where
<control-plane-ip>
is the control plane tunnel192.168.210.1
.
Control Plane
Setup tunneling:
$ ip tunnel add tun0 mode ipip local 10.117.29.41 remote 10.110.20.149
$ ip addr add 192.168.200.1/24 dev tun0
$ ip link set tun0 up
Update DNAT to support the control plane VIP over the tunnel:
$ iptables -t nat -I PREROUTING -p udp -d 192.168.210.1 --dport 6081 -j DNAT --to-destination 10.117.29.41
GPU Node or Admin Workstation
Establish connectivity with the control plane:
$ ip tunnel add tun1 mode ipip local 10.110.20.149 remote 10.117.29.41
$ ip addr add 192.168.210.2/24 dev tun1
$ ip link set tun1 up
$ ip route add 10.0.0.8/32 via 192.168.210.1
Setup DNAT:
$ iptables -t nat -I OUTPUT -p udp -d 10.117.29.41 --dport 6081 -j DNAT --to-destination 192.168.210.1
Configure Admin Workstation
Configure the admin workstation prior to setting up the cluster.
Download the Google Cloud SDK:
$ wget https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-314.0.0-linux-x86_64.tar.gz \
&& tar -xf google-cloud-sdk-314.0.0-linux-x86_64.tar.gz
$ google-cloud-sdk/install.sh
Install the Anthos authentication components:
$ gcloud components install anthos-auth
See the Anthos installtion overview for detailed instructions for installing Anthos in an on-premise environment and setup your cluster.
Setup NVIDIA Software on GPU Nodes
Once the Anthos cluster has been set up, you can proceed to deploy the NVIDIA software components on the GPU nodes.
NVIDIA Drivers
Note
DGX systems include the NVIDIA drivers. This step can be skipped.
For complete instructions on setting up NVIDIA drivers, visit the quickstart guide at https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html. The guide covers a number of pre-installation requirements and steps on supported Linux distributions for a successful install of the driver.
NVIDIA Device Plugin
To use GPUs in Kubernetes, the NVIDIA Device Plugin is required. The NVIDIA Device Plugin is a daemonset that automatically enumerates the number of GPUs on each node of the cluster and allows pods to be run on GPUs.
The preferred method to deploy the device plugin is as a daemonset using helm
.
Add the nvidia-device-plugin
helm
repository:
$ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin \
&& helm repo update
Deploy the device plugin:
$ helm install --generate-name nvdp/nvidia-device-plugin
For more user configurable options while deploying the daemonset, refer to the device plugin README
Node Feature Discovery
For detecting the hardware configuration and system configuration, we will deploy the Node Feature Discovery add-on:
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/node-feature-discovery/v0.6.0/nfd-master.yaml.template
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/node-feature-discovery/v0.6.0/nfd-worker-daemonset.yaml.template
See the NFD documentation for more information on NFD.
Anthos Clusters with VMware and NVIDIA GPU Accelerated Servers
Anthos running on-premise has requirements for which vSphere versions are supported along with network and storage requirements. Please see the Anthos version compatibility matrix for more information: https://cloud.google.com/anthos/gke/docs/on-prem/versioning-and-upgrades#version_compatibility_matrix.
This guide assumes that the user already has an installed Anthos on-premise cluster in a vSphere environment. Please see https://cloud.google.com/anthos/gke/docs/on-prem/how-to/install-overview-basic for detailed instructions for installing Anthos in an on-premise environment.
Kubernetes provides access to special hardware resources such as NVIDIA GPUs, NICs, Infiniband adapters and other devices through the device plugin framework. However, configuring and managing nodes with these hardware resources requires configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors. The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPUs.
In the VMware vSphere configuration, Anthos uses the NVIDIA GPU Operator to configure GPU nodes in the Kubernetes cluster so that the nodes can be used to schedule CUDA applications. The GPU Operator itself is deployed using Helm. The rest of this section provides users with steps on getting started.
Configuring PCIe Passthrough
For the GPU to be accessible to the VM, first you must enable PCI Passthrough on the ESXi host. This can be done from the vSphere client. This will require a reboot of the ESXi host to complete the process and therefore the host should be put into maintenance mode and any VMs running on the ESXi host evacuated to another. If you only have a single ESXi host, then the VMs will need to be restarted after the reboot.
From the vSphere client, select an ESXi host from the Inventory of VMware vSphere Client. In the Configure tab, click Hardware > PCI Devices. This will show you the passthrough-enabled devices (you will most likely find none at this time).

Click CONFIGURE PASSTHROUGH to launch the Edit PCI Device Availability window. Look for the GPU device and select the checkbox next to it (the GPU device will be recognizable as having NVIDIA Corporation in the Vendor Name view). Select the GPU devices (you may have more than one) and click OK.

At this point, the GPU(s) will appear as Available (pending). You will need to select Reboot This Host and complete the reboot before proceeding to the next step.

It is a VMware best practice to reboot an ESXi host only when it is in maintenance mode and after all the VMs have been migrated to other hosts. If you have only 1 ESXi host, then you can reboot without migrating the VMs, though shutting them down gracefully first is always a good idea.

Once the server has rebooted. Make sure to remove maintenance mode (if it was used) or restart the VMs that needed to be stopped (when only a single ESXi host is used).
Adding GPUs to a Node
Creating a Node Pool for the GPU Node
Note
This is an optional step.
Node Pools are a good way to specify pools of Kubernetes worker nodes which may have different or unique attributes. In this case, we have the opportunity to create a node pool which contains workers that manually have a GPU assigned to it. See managing node pools in the Google GKE documentation for more information regarding node pools with Anthos on-premise.
First, edit your user cluster config.yaml file on the admin workstation and add an additional node pool:
- name: user-cluster1-gpu
cpus: 4
memoryMB: 8192
replicas: 1
labels:
hardware: gpu
After adding the node pool to your configuration, use the gkectl
update command push the change:
$ gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
--config [USER_CLUSTER_KUBECONFIG]
Reading config with version "v1"
Update summary for cluster user-cluster1-bundledlb:
Node pool(s) to be created: [user-cluster1-gpu]
Do you want to continue? [Y/n]: Y
Updating cluster "user-cluster1-bundledlb"...
Creating node MachineDeployment(s) in user cluster... DONE
Done updating the user cluster
Add GPUs to Nodes in vSphere
Select an existing user-cluster node to add a GPU to (if you created a node pool with the previous step then you would choose a node from that pool). Make sure that this VM is on the host with the GPU (if you have vMotion enabled this could be as simple as right clicking on the VM and selecting Migrate).
To configure a PCI device on a virtual machine, from the Inventory in vSphere Client, right-click the virtual machine and select Power->Power Off.

After the VM is powered off, right-click the virtual machine and click Edit Settings.

Within the Edit Settings window, click ADD NEW DEVICE.

Choose PCI Device from the dropdown.

You may need to select the GPU or if it’s the only device available it may be automatically selected for you. If you don’t see the GPU, it’s possible your VM is not currently on the ESXi host with the passthrough device configured.

Expand the Memory section and make sure to select the option for Reserve all Guest Memory (All locked).

Click OK.
Before the VM can be started, the VM/Host Rule for VM anti-affinity must be deleted.
(Note that this step may not be necessary if your cluster’s config.yaml
contains antiAffinityGroups.enabled: False
).
From the vSphere Inventory list, click on the cluster then the Configure tab and then
under Configuration select VM/Host Rules. Select the rule containing your node and delete it.

Now you can power on the VM, right click on the VM and select Power>Power On.

If vSphere presents you with Power On Recommendations then select OK.

The following steps should be performed from your Admin Workstation or other Linux system which has the ability to use kubectl
to work with the cluster.
Install NVIDIA GPU Operator
Install Helm
The preferred method to deploy the GPU Operator is using helm
.
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
&& chmod 700 get_helm.sh \
&& ./get_helm.sh
Now, add the NVIDIA Helm repository:
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
&& helm repo update
Install the GPU Operator
The GPU Operator Helm chart offers a number of customizable options that can be configured depending on your environment.

Chart Customization Options
The following options are available when using the Helm chart. These options can be used with --set
when installing via Helm.
Parameter |
Description |
Default |
---|---|---|
|
When set to Pods can specify |
|
|
When set to |
|
|
Map of custom annotations to add to all GPU Operator managed pods. |
|
|
Map of custom labels to add to all GPU Operator managed pods. |
|
|
By default, the Operator deploys NVIDIA drivers as a container on the system.
Set this value to |
|
|
The images are downloaded from NGC. Specify another image repository when using custom driver images. |
|
|
Controls whether the driver daemonset should build and load the |
|
|
Indicate if MOFED is directly pre-installed on the host. This is used to build and load |
|
|
By default, the driver container has an initial delay of |
|
|
When set to |
|
|
Version of the NVIDIA datacenter driver supported by the Operator. If you set |
Depends on the version of the Operator. See the Component Matrix for more information on supported drivers. |
|
Controls the strategy to be used with MIG on supported NVIDIA GPUs. Options
are either |
|
|
The MIG manager watches for changes to the MIG geometry and applies reconfiguration as needed. By default, the MIG manager only runs on nodes with GPUs that support MIG (for e.g. A100). |
|
|
Deploys Node Feature Discovery plugin as a daemonset.
Set this variable to |
|
|
DEPRECATED as of v1.9 |
|
|
The GPU operator deploys |
|
|
By default, the Operator deploys the NVIDIA Container Toolkit ( |
|
|
DEPRECATED as of v1.9 |
|
|
Map of custom labels that will be added to all GPU Operator managed pods. |
|
|
The GPU operator deploys |
|
|
By default, the Operator deploys the NVIDIA Container Toolkit ( |
|
Namespace
Prior to GPU Operator v1.9, the operator was installed in the default
namespace while all operands were
installed in the gpu-operator-resources
namespace.
Starting with GPU Operator v1.9, both the operator and operands get installed in the same namespace.
The namespace is configurable and is determined during installation. For example, to install the GPU Operator
in the gpu-operator
namespace:
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
If a namespace is not specified during installation, all GPU Operator components will be installed in the
default
namespace.
Operands
By default, the GPU Operator operands are deployed on all GPU worker nodes in the cluster.
GPU worker nodes are identified by the presence of the label feature.node.kubernetes.io/pci-10de.present=true
,
where 0x10de
is the PCI vendor ID assigned to NVIDIA.
To disable operands from getting deployed on a GPU worker node, label the node with nvidia.com/gpu.deploy.operands=false
.
$ kubectl label nodes $NODE nvidia.com/gpu.deploy.operands=false
Common Deployment Scenarios
In this section, we present some common deployment recipes when using the Helm chart to install the GPU Operator.
Bare-metal/Passthrough with default configurations on Ubuntu
In this scenario, the default configuration options are used:
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
Note
For installing on Secure Boot systems or using Precompiled modules refer to Precompiled Driver Containers.
Bare-metal/Passthrough with default configurations on Red Hat Enterprise Linux
In this scenario, the default configuration options are used:
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
Note
When using RHEL8 with Kubernetes, SELinux has to be enabled (either in permissive or enforcing mode) for use with the GPU Operator. Additionally, network restricted environments are not supported.
Bare-metal/Passthrough with default configurations on CentOS
In this scenario, the CentOS toolkit image is used:
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set toolkit.version=1.7.1-centos7
Note
For CentOS 8 systems, use toolkit.version=1.7.1-centos8.
Replace 1.7.1 toolkit version used here with the latest one available here.
NVIDIA vGPU
Note
The GPU Operator with NVIDIA vGPUs requires additional steps to build a private driver image prior to install. Refer to the document NVIDIA vGPU for detailed instructions on the workflow and required values of the variables used in this command.
The command below will install the GPU Operator with its default configuration for vGPU:
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.repository=$PRIVATE_REGISTRY \
--set driver.version=$VERSION \
--set driver.imagePullSecrets={$REGISTRY_SECRET_NAME} \
--set driver.licensingConfig.configMapName=licensing-config
NVIDIA AI Enterprise
Refer to GPU Operator with NVIDIA AI Enterprise.
Bare-metal/Passthrough with pre-installed NVIDIA drivers
In this example, the user has already pre-installed NVIDIA drivers as part of the system image:
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false
Bare-metal/Passthrough with pre-installed drivers and NVIDIA Container Toolkit
In this example, the user has already pre-installed the NVIDIA drivers and NVIDIA Container Toolkit (nvidia-docker2
)
as part of the system image.
Note
These steps should be followed when using the GPU Operator v1.9+ on DGX A100 systems with DGX OS 5.1+.
Before installing the operator, ensure that the following configurations are modified depending on the container runtime configured in your cluster.
Docker:
Update the Docker configuration to add
nvidia
as the default runtime. Thenvidia
runtime should be setup as the default container runtime for Docker on GPU nodes. This can be done by adding thedefault-runtime
line into the Docker daemon config file, which is usually located on the system at/etc/docker/daemon.json
:{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }Restart the Docker daemon to complete the installation after setting the default runtime:
$ sudo systemctl restart docker
Containerd:
Update
containerd
to usenvidia
as the default runtime and addnvidia
runtime configuration. This can be done by adding below config to/etc/containerd/config.toml
and restartingcontainerd
service.version = 2 [plugins] [plugins."io.containerd.grpc.v1.cri"] [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "nvidia" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] BinaryName = "/usr/bin/nvidia-container-runtime"Restart the Containerd daemon to complete the installation after setting the default runtime:
$ sudo systemctl restart containerd
Install the GPU operator with the following options:
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false \
--set toolkit.enabled=false
Bare-metal/Passthrough with pre-installed NVIDIA Container Toolkit (but no drivers)
In this example, the user has already pre-installed the NVIDIA Container Toolkit (nvidia-docker2
) as part of the system image.
Before installing the operator, ensure that the following configurations are modified depending on the container runtime configured in your cluster.
Docker:
Update the Docker configuration to add
nvidia
as the default runtime. Thenvidia
runtime should be setup as the default container runtime for Docker on GPU nodes. This can be done by adding thedefault-runtime
line into the Docker daemon config file, which is usually located on the system at/etc/docker/daemon.json
:{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }Restart the Docker daemon to complete the installation after setting the default runtime:
$ sudo systemctl restart docker
Containerd:
Update
containerd
to usenvidia
as the default runtime and addnvidia
runtime configuration. This can be done by adding below config to/etc/containerd/config.toml
and restartingcontainerd
service.version = 2 [plugins] [plugins."io.containerd.grpc.v1.cri"] [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "nvidia" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] privileged_without_host_devices = false runtime_engine = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] BinaryName = "/usr/bin/nvidia-container-runtime"Restart the Containerd daemon to complete the installation after setting the default runtime:
$ sudo systemctl restart containerd
Configure toolkit to use the root
directory of the driver installation as /run/nvidia/driver
, which is the path mounted by driver container.
$ sudo sed -i 's/^#root/root/' /etc/nvidia-container-runtime/config.toml
Once these steps are complete, now install the GPU operator with the following options (which will provision a driver):
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set toolkit.enabled=false
Custom driver image (based off a specific driver version)
If you want to use custom driver container images (for e.g. using 465.27), then you would need to build a new driver container image. Follow these steps:
Rebuild the driver container by specifying the
$DRIVER_VERSION
argument when building the Docker image. For reference, the driver container Dockerfiles are available on the Git repo hereBuild the container using the appropriate Dockerfile. For example:
$ docker build --pull -t \ --build-arg DRIVER_VERSION=455.28 \ nvidia/driver:455.28-ubuntu20.04 \ --file Dockerfile .
Ensure that the driver container is tagged as shown in the example by using the
driver:<version>-<os>
schema.Specify the new driver image and repository by overriding the defaults in the Helm install command. For example:
$ helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator \ --set driver.repository=docker.io/nvidia \ --set driver.version="465.27"
Note that these instructions are provided for reference and evaluation purposes. Not using the standard releases of the GPU Operator from NVIDIA would mean limited support for such custom configurations.
Custom configuration for runtime containerd
When containerd is the container runtime used, the following configuration options are used with the container-toolkit deployed with GPU Operator:
toolkit:
env:
- name: CONTAINERD_CONFIG
value: /etc/containerd/config.toml
- name: CONTAINERD_SOCKET
value: /run/containerd/containerd.sock
- name: CONTAINERD_RUNTIME_CLASS
value: nvidia
- name: CONTAINERD_SET_AS_DEFAULT
value: true
These options are defined as follows:
- CONTAINERD_CONFIGThe path on the host to the
containerd
configyou would like to have updated with support for the
nvidia-container-runtime
. By default this will point to/etc/containerd/config.toml
(the default location forcontainerd
). It should be customized if yourcontainerd
installation is not in the default location.
- CONTAINERD_SOCKETThe path on the host to the socket file used to
communicate with
containerd
. The operator will use this to send aSIGHUP
signal to thecontainerd
daemon to reload its config. By default this will point to/run/containerd/containerd.sock
(the default location forcontainerd
). It should be customized if yourcontainerd
installation is not in the default location.
- CONTAINERD_RUNTIME_CLASSThe name of the
Runtime Class you would like to associate with the
nvidia-container-runtime
. Pods launched with aruntimeClassName
equal to CONTAINERD_RUNTIME_CLASS will always run with thenvidia-container-runtime
. The default CONTAINERD_RUNTIME_CLASS isnvidia
.
- CONTAINERD_SET_AS_DEFAULTA flag indicating whether you want to set
nvidia-container-runtime
as the default runtime used to launch all containers. When set to false, only containers in pods with aruntimeClassName
equal to CONTAINERD_RUNTIME_CLASS will be run with thenvidia-container-runtime
. The default value istrue
.
For Rancher Kubernetes Engine 2 (RKE2), set the following in the ClusterPolicy.
toolkit:
env:
- name: CONTAINERD_CONFIG
value: /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
- name: CONTAINERD_SOCKET
value: /run/k3s/containerd/containerd.sock
- name: CONTAINERD_RUNTIME_CLASS
value: nvidia
- name: CONTAINERD_SET_AS_DEFAULT
value: "true"
These options can be passed to GPU Operator during install time as below.
helm install -n gpu-operator --create-namespace \
nvidia/gpu-operator $HELM_OPTIONS \
--set toolkit.env[0].name=CONTAINERD_CONFIG \
--set toolkit.env[0].value=/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl \
--set toolkit.env[1].name=CONTAINERD_SOCKET \
--set toolkit.env[1].value=/run/k3s/containerd/containerd.sock \
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
--set toolkit.env[2].value=nvidia \
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
--set-string toolkit.env[3].value=true
Proxy Environments
Refer to the section Install GPU Operator in Proxy Environments for more information on how to install the Operator on clusters behind a HTTP proxy.
Air-gapped Environments
Refer to the section Install NVIDIA GPU Operator in Air-Gapped Environments for more information on how to install the Operator in air-gapped environments.
Multi-Instance GPU (MIG)
Refer to the document GPU Operator with MIG for more information on how use the Operator with Multi-Instance GPU (MIG) on NVIDIA Ampere products. For guidance on configuring MIG support for the NVIDIA GPU Operator in an OpenShift Container Platform cluster, see the user guide.
KubeVirt / OpenShift Virtualization
Refer to the document GPU Operator with KubeVirt for more information on how to use the GPU Operator to provision GPU nodes for running KubeVirt virtual machines with access to GPU. For guidance on using the GPU Operator with OpenShift Virtualization, refer to the document NVIDIA GPU Operator with OpenShift Virtualization.
Outdated Kernels
Refer to the section Considerations when Installing with Outdated Kernels in Cluster for more information on how to install the Operator successfully when nodes in the cluster are not running the latest kernel
Verify GPU Operator Install
Once the Helm chart is installed, check the status of the pods to ensure all the containers are running and the validation is complete:
$ kubectl get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-crrsq 1/1 Running 0 60s
gpu-operator-7fb75556c7-x8spj 1/1 Running 0 5m13s
gpu-operator-node-feature-discovery-master-58d884d5cc-w7q7b 1/1 Running 0 5m13s
gpu-operator-node-feature-discovery-worker-6rht2 1/1 Running 0 5m13s
gpu-operator-node-feature-discovery-worker-9r8js 1/1 Running 0 5m13s
nvidia-container-toolkit-daemonset-lhgqf 1/1 Running 0 4m53s
nvidia-cuda-validator-rhvbb 0/1 Completed 0 54s
nvidia-dcgm-5jqzg 1/1 Running 0 60s
nvidia-dcgm-exporter-h964h 1/1 Running 0 60s
nvidia-device-plugin-daemonset-d9ntc 1/1 Running 0 60s
nvidia-device-plugin-validator-cm2fd 0/1 Completed 0 48s
nvidia-driver-daemonset-5xj6g 1/1 Running 0 4m53s
nvidia-mig-manager-89z9b 1/1 Running 0 4m53s
nvidia-operator-validator-bwx99 1/1 Running 0 58s
We can now proceed to running some sample GPU workloads to verify that the Operator (and its components) are working correctly.
Running GPU Applications
Jupyter Notebooks
This section of the guide walks through how to run a sample Jupyter notebook on the Kubernetes cluster.
Create a yaml file for the pod and service for the notebook:
$ LOADBALANCERIP=<ip address to be used to expose the service>
$ cat << EOF | kubectl create -f - apiVersion: v1 kind: Service metadata: name: tf-notebook labels: app: tf-notebook spec: type: LoadBalancer loadBalancerIP: $LOADBALANCERIP ports: - port: 80 name: http targetPort: 8888 nodePort: 30001 selector: app: tf-notebook --- apiVersion: v1 kind: Pod metadata: name: tf-notebook labels: app: tf-notebook spec: securityContext: fsGroup: 0 containers: - name: tf-notebook image: tensorflow/tensorflow:latest-gpu-jupyter resources: limits: nvidia.com/gpu: 1 ports: - containerPort: 8888 name: notebook EOF
View the logs of the tf-notebook pod to obtain the token:
$ kubectl logs tf-notebook
[I 19:07:43.061 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret [I 19:07:43.423 NotebookApp] Serving notebooks from local directory: /tf [I 19:07:43.423 NotebookApp] The Jupyter Notebook is running at: [I 19:07:43.423 NotebookApp] http://tf-notebook:8888/?token=fc5d8b9d6f29d5ddad62e8c731f83fc8e90a2d817588d772 [I 19:07:43.423 NotebookApp] or http://127.0.0.1:8888/?token=fc5d8b9d6f29d5ddad62e8c731f83fc8e90a2d817588d772 [I 19:07:43.423 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 19:07:43.429 NotebookApp] To access the notebook, open this file in a browser: file:///root/.local/share/jupyter/runtime/nbserver-1-open.html Or copy and paste one of these URLs: http://tf-notebook:8888/?token=fc5d8b9d6f29d5ddad62e8c731f83fc8e90a2d817588d772 or http://127.0.0.1:8888/?token=fc5d8b9d6f29d5ddad62e8c731f83fc8e90a2d817588d772 [I 19:08:24.180 NotebookApp] 302 GET / (172.16.20.30) 0.61ms [I 19:08:24.182 NotebookApp] 302 GET /tree? (172.16.20.30) 0.57ms
From a web browser, navigate to
http://<LOADBALANCERIP>
and enter the token where prompted to login: Depending on your environment you may not have web browser access to the exposed service. You may be able to use SSH Port Forwarding/Tunneling to achieve this.Once logged in, navigate click on the tenserflow-tutorials folder and then on the first file, classification.ipynb:
This will launch a new tab with the Notebook loaded. You can now run through the Notebook by clicking on the Run button. The notebook will step through each section and execute the code as you go. Continue pressing Run until you reach the end of the notebook and observe the execution of the classification program.
Once the notebook is complete you can check the logs of the
tf-notebook
pod to confirm it was using the GPU:=========snip=============== [I 19:17:58.116 NotebookApp] Saving file at /tensorflow-tutorials/classification.ipynb 2020-05-21 19:21:01.422482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-05-21 19:21:01.436767: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.437469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:13:00.0 name: Tesla P4 computeCapability: 6.1 coreClock: 1.1135GHz coreCount: 20 deviceMemorySize: 7.43GiB deviceMemoryBandwidth: 178.99GiB/s 2020-05-21 19:21:01.438477: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-05-21 19:21:01.462370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-05-21 19:21:01.475269: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2020-05-21 19:21:01.478104: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2020-05-21 19:21:01.501057: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2020-05-21 19:21:01.503901: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2020-05-21 19:21:01.544763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-05-21 19:21:01.545022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.545746: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.546356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2020-05-21 19:21:01.546705: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-05-21 19:21:01.558283: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2194840000 Hz 2020-05-21 19:21:01.558919: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f6f2c000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-05-21 19:21:01.558982: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-05-21 19:21:01.645786: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.646387: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x53ab350 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-05-21 19:21:01.646430: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P4, Compute Capability 6.1 2020-05-21 19:21:01.647005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.647444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:13:00.0 name: Tesla P4 computeCapability: 6.1 coreClock: 1.1135GHz coreCount: 20 deviceMemorySize: 7.43GiB deviceMemoryBandwidth: 178.99GiB/s 2020-05-21 19:21:01.647523: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-05-21 19:21:01.647570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-05-21 19:21:01.647611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2020-05-21 19:21:01.647647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2020-05-21 19:21:01.647683: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2020-05-21 19:21:01.647722: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2020-05-21 19:21:01.647758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-05-21 19:21:01.647847: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.648311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.648720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0 2020-05-21 19:21:01.649158: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2020-05-21 19:21:01.650302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-21 19:21:01.650362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2020-05-21 19:21:01.650392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2020-05-21 19:21:01.650860: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.651341: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-05-21 19:21:01.651773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7048 MB memory) -> physical GPU (device: 0, name: Tesla P4, pci bus id: 0000:13:00.0, compute capability: 6.1) 2020-05-21 19:21:03.601093: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 [I 19:21:58.132 NotebookApp] Saving file at /tensorflow-tutorials/classification.ipynb
Uninstall and Cleanup
You can remove the tf-notebook
and service with the following commands:
$ kubectl delete pod tf-notebook
$ kubectl delete svc tf-notebook
You can remove the GPU operator with the command:
$ helm uninstall $(helm list | grep gpu-operator | awk '{print $1}')
release "gpu-operator-1590086955" uninstalled
You can now stop the VM, remove the PCI device, remove the memory reservation, and restart the VM.
You do not need to remove the PCI passthrough device from the host.
Known Issues
This section outlines some known issues with using Google Cloud’s Anthos with NVIDIA GPUs.
Attaching a GPU to a Anthos on-prem worker node requires manually editing the VM from vSphere. These changes will not survive an Anthos on-prem upgrade process. When the node with the GPU is deleted as part of the update process, the new VM replacing it will not have the GPU added. The GPU must be added back to a new VM manually again. While the NVIDIA GPU seems to be able to handle that event gracefully, the workload backed by the GPU may need to be initiated again manually.
Attaching a VM to the GPU means that the VM can no longer be migrated to another ESXi host. The VM will essentially be pinned to the ESXi host which hosts the GPU. vMotion and VMware HA features cannot be used.
VMs that use a PCI Passthrough device require that their full memory allocation be locked. This will cause a Virtual machine memory usage alarm on the VM which can safely be ignored.