Title: NVIDIA GPU Operator with Google GKE#

URL Source: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html

Published Time: Wed, 10 Sep 2025 21:07:20 GMT

Markdown Content:
About Using the Operator with Google GKE[#](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html.md#about-using-the-operator-with-google-gke "Permalink to this headline")
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

There are two ways to use NVIDIA GPU Operator with Google Kubernetes Engine (GKE). You can use Google driver installer to install and manage NVIDIA GPU Driver on the nodes or you can use the Operator and driver manager to manage the driver and other NVIDIA software components.

The choice depends on the operating system and whether you prefer to have the Operator manage all the software components.

|  | Supported OS | Summary |
| --- | --- | --- |
| Google Driver Installer | * Container-Optimized OS * Ubuntu with containerd | The Google driver installer manages the NVIDIA GPU Driver. NVIDIA GPU Operator manages other software components. |
| NVIDIA Driver Manager | * Ubuntu with containerd | NVIDIA GPU Operator manages the lifecycle and upgrades of the driver and other NVIDIA software. |

The preceding information relates to using GKE Standard node pools. For Autopilot Pods, using the GPU Operator is not supported, and you can refer to [Deploy GPU workloads in Autopilot](https://cloud.google.com/kubernetes-engine/docs/how-to/autopilot-gpus.md).

Prerequisites[#](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html.md#prerequisites "Permalink to this headline")
------------------------------------------------------------------------------------------------------------------------------------------------

*   You installed and initialized the Google Cloud CLI. Refer to [gcloud CLI overview](https://cloud.google.com/sdk/gcloud.md) in the Google Cloud documentation.

*   You have a Google Cloud project to use for your GKE cluster. Refer to [Creating and managing projects](https://cloud.google.com/resource-manager/docs/creating-managing-projects.md) in the Google Cloud documentation.

*   You have the project ID for your Google Cloud project. Refer to [Identifying projects](https://cloud.google.com/resource-manager/docs/creating-managing-projects.md#identifying_projects) in the Google Cloud documentation.

*   You know the machine type for the node pool and that the machine type is supported in your region and zone. Refer to [GPU platforms](https://cloud.google.com/compute/docs/gpus.md) in the Google Cloud documentation.

Using the Google Driver Installer[#](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html.md#using-the-google-driver-installer "Permalink to this headline")
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Perform the following steps to create a GKE cluster with the `gcloud` CLI and use Google driver installer to manage the GPU driver. You can create a node pool that uses a Container-Optimized OS node image or a Ubuntu node image.

1.   Create the node pool. Refer to [Running GPUs in GKE Standard clusters](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus.md#create) in the GKE documentation.

When you create the node pool, specify the following additional `gcloud` command-line options to disable GKE features that are not supported with the Operator:

    *   `--node-labels="gke-no-default-nvidia-gpu-device-plugin=true"`

The node label disables the GKE GPU device plugin daemon set on GPU nodes.

    *   `--accelerator type=...,gpu-driver-version=disabled`

This argument disables automatically installing the GPU driver on GPU nodes.

2.   Get the authentication credentials for the cluster:

$ gcloud container clusters get-credentials demo-cluster --location us-west1 
3.   Optional: Verify that you can connect to the cluster:

$ kubectl get nodes -o wide 
4.   Create the namespace for the NVIDIA GPU Operator:

$ kubectl create ns gpu-operator 
5.   Create a file, such as `gpu-operator-quota.yaml`, with contents like the following example:

apiVersion: v1
kind: ResourceQuota
metadata:
 name: gpu-operator-quota
spec:
 hard:
 pods: 100
 scopeSelector:
 matchExpressions:
 - operator: In
 scopeName: PriorityClass
 values:
 - system-node-critical
 - system-cluster-critical 
6.   Apply the resource quota:

$ kubectl apply -n gpu-operator -f gpu-operator-quota.yaml 
7.   Optional: View the resource quota:

$ kubectl get -n gpu-operator resourcequota 
_Example Output_

NAME AGE REQUEST
gpu-operator-quota 38s pods: 0/100 
8.   Install the Google driver installer daemon set.

For Container-Optimized OS:

$ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml 
For Ubuntu, the manifest to apply depends on GPU model and node version. Refer to the **Ubuntu** tab at [Manually install NVIDIA GPU drivers](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus.md#installing_drivers) in the GKE documentation.

9.   Install the Operator using Helm:

$ helm install --wait --generate-name \
    -n gpu-operator \
    nvidia/gpu-operator \
    --version=v25.3.3 \
    --set hostPaths.driverInstallDir=/home/kubernetes/bin/nvidia \
    --set toolkit.installDir=/home/kubernetes/bin/nvidia \
    --set cdi.enabled=true \
    --set cdi.default=true \
    --set driver.enabled=false 
Set the NVIDIA Container Toolkit and driver installation path to `/home/kubernetes/bin/nvidia`. On GKE node images, this directory is writable and is a stateful location for storing the NVIDIA runtime binaries.

To configure MIG with NVIDIA MIG Manager, specify the following additional Helm command arguments:

--set migManager.env[0].name=WITH_REBOOT \
--set-string migManager.env[0].value=true 

Using NVIDIA Driver Manager[#](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html.md#using-nvidia-driver-manager "Permalink to this headline")
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Perform the following steps to create a GKE cluster with the `gcloud` CLI and use the Operator and NVIDIA Driver Manager to manage the GPU driver. The steps create the cluster with a node pool that uses a Ubuntu and containerd node image.

1.   Create the cluster by running a command that is similar to the following example:

$ gcloud beta container clusters create demo-cluster \
    --project <project-id> \
    --location us-west1 \
    --release-channel "regular" \
    --machine-type "n1-standard-4" \
    --accelerator "type=nvidia-tesla-t4,count=1" \
    --image-type "UBUNTU_CONTAINERD" \
    --node-labels="gke-no-default-nvidia-gpu-device-plugin=true" \
    --disk-type "pd-standard" \
    --disk-size "1000" \
    --no-enable-intra-node-visibility \
    --metadata disable-legacy-endpoints=true \
    --max-pods-per-node "110" \
    --num-nodes "1" \
    --logging=SYSTEM,WORKLOAD \
    --monitoring=SYSTEM \
    --enable-ip-alias \
    --default-max-pods-per-node "110" \
    --no-enable-master-authorized-networks \
    --tags=nvidia-ingress-all 
Creating the cluster requires several minutes.

2.   Get the authentication credentials for the cluster:

$ USE_GKE_GCLOUD_AUTH_PLUGIN=True \
    gcloud container clusters get-credentials demo-cluster --zone us-west1 
3.   Optional: Verify that you can connect to the cluster:

$ kubectl get nodes -o wide 
4.   Create the namespace for the NVIDIA GPU Operator:

$ kubectl create ns gpu-operator 
5.   Create a file, such as `gpu-operator-quota.yaml`, with contents like the following example:

apiVersion: v1
kind: ResourceQuota
metadata:
 name: gpu-operator-quota
spec:
 hard:
 pods: 100
 scopeSelector:
 matchExpressions:
 - operator: In
 scopeName: PriorityClass
 values:
 - system-node-critical
 - system-cluster-critical 
6.   Apply the resource quota:

$ kubectl apply -n gpu-operator -f gpu-operator-quota.yaml 
7.   Optional: View the resource quota:

$ kubectl get -n gpu-operator resourcequota 
_Example Output_

NAME AGE REQUEST
gke-resource-quotas 6m56s count/ingresses.extensions: 0/100, count/ingresses.networking.k8s.io: 0/100, count/jobs.batch: 0/5k, pods: 2/1500, services: 1/500
gpu-operator-quota 38s pods: 0/100 
8.   Install the Operator. Refer to [install the NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html.md#install-gpu-operator).

Links/Buttons:
- [#](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html.md#related-information)
- [Deploy GPU workloads in Autopilot](https://cloud.google.com/kubernetes-engine/docs/how-to/autopilot-gpus.md)
- [gcloud CLI overview](https://cloud.google.com/sdk/gcloud.md)
- [Creating and managing projects](https://cloud.google.com/resource-manager/docs/creating-managing-projects.md)
- [Identifying projects](https://cloud.google.com/resource-manager/docs/creating-managing-projects.md#identifying_projects)
- [GPU platforms](https://cloud.google.com/compute/docs/gpus.md)
- [Running GPUs in GKE Standard clusters](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus.md#create)
- [Manually install NVIDIA GPU drivers](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus.md#installing_drivers)
- [install the NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html.md#install-gpu-operator)
- [Add and manage node pools](https://cloud.google.com/kubernetes-engine/docs/how-to/node-pools.md)
