Create the Cluster Policy Instance

NVIDIA AI Enterprise 2.0 or later

Next, we will create the cluster policy, which is responsible for maintaining policy resources to create pods in a cluster.

  1. In the OpenShift Container Platform web console, from the side menu, select Operators > Installed Operators, and click NVIDIA GPU Operator.

  2. Select the ClusterPolicy tab, then click Create ClusterPolicy.

    Note

    The platform assigns the default name gpu-cluster-policy.


  3. Expand the drop down for Driver config and then Licensing Config. In the text box labeled Config Map Name, enter the name of the licensing config map that was previously created (eg: licensing-config). Check the NLS Enabled checkbox. Refer to the screenshot below for parameter examples and modify values accordingly.

    Important

    This was previously created in Step 2 in the Create CLS License Config Map.

    openshift-cluster1.png


  4. Scroll down to specify repository path, image``name and NVIDIA vGPU driver ``version bundled under Driver section. Refer the screenshot below for parameter examples and modify values accordingly.

    openshift-cluster4.png


  5. Expand the Advanced configuration menu and specify the imagePullSecret. (eg: gpu-operator-secret)

    Important

    This was previously created in Step 3 Create CLS License Config Map.

    openshift-cluster2.png


  6. Click Create.

The GPU Operator will proceed to install all the required components to set up the NVIDIA GPUs in the OpenShift cluster.

The status of the newly deployed ClusterPolicy gpu-cluster-policy for the NVIDIA GPU Operator changes to State:ready when the installation succeeds.

openshift-cluster3.png

To verify the ClusterPolicy installation from the CLI use:

Copy
Copied!
            

$ oc get nodes -o=custom-columns='Node:metadata.name,GPUs:status.capacity.nvidia\.com/gpu'


This lists each node and the number of GPUs it has available to Kubernetes.

Eaxmple output:

Copy
Copied!
            

$ oc get nodes -o=custom-columns='Node:metadata.name,GPUs:status.capacity.nvidia\.com/gpu' Node GPUs nvaie-ocp-7rfr8-master-0 <none> nvaie-ocp-7rfr8-master-1 <none> nvaie-ocp-7rfr8-master-2 <none> nvaie-ocp-7rfr8-worker-7x5km 1 nvaie-ocp-7rfr8-worker-9jgmk <none> nvaie-ocp-7rfr8-worker-jntsp 1


Previous Install the NVIDIA GPU Operator
Next Deploying NVIDIA AI Enterprise Containers
© Copyright 2024, NVIDIA. Last updated on Apr 2, 2024.