Enable the GPU Operator Dashboard

Prerequisites

  • Install Helm

  • OpenShift Container Platform 4.10+

Follow this guidance to provide GPU usage information in the cluster utilization screen in the OpenShift Container Platform web console.

Enable the NVIDIA GPU Operator usage information

  1. Add the helm repo:

    $ helm repo add rh-ecosystem-edge https://rh-ecosystem-edge.github.io/console-plugin-nvidia-gpu
    
  2. Update the repo:

    $ helm repo update
    
  3. Install the helm chart in the default NVIDIA GPU Operator namespace:

    $ helm install -n nvidia-gpu-operator console-plugin-nvidia-gpu rh-ecosystem-edge/console-plugin-nvidia-gpu
    
    NAME: console-plugin-nvidia-gpu
    LAST DEPLOYED: Thu Apr 14 09:35:36 2022
    NAMESPACE: nvidia-gpu-operator
    STATUS: deployed
    REVISION: 1
    NOTES:
    View the Console Plugin NVIDIA GPU deployed resources by running the following command:
    
    $ kubectl -n nvidia-gpu-operator get all -l app.kubernetes.io/name=console-plugin-nvidia-gpu
    
    Enable the plugin by running the following command:
    
    $ kubectl patch consoles.operator.openshift.io cluster --patch '[{"op": "add", "path": "/spec/plugins/-", "value": "console-plugin-nvidia-gpu" }]' --type=json
    
  4. View the deployed resources:

    $ oc -n nvidia-gpu-operator get all -l app.kubernetes.io/name=console-plugin-nvidia-gpu
    
  5. Verify the plugins field is specified:

    $ oc get consoles.operator.openshift.io cluster --output=jsonpath="{.spec.plugins}"
    
    1. If it is not specified, then run the following to enable the plugin:

      $ oc patch consoles.operator.openshift.io cluster --patch '{ "spec": { "plugins": ["console-plugin-nvidia-gpu"] } }' --type=merge
      
    2. If it is specified, then run the following to enable the plugin:

      $ oc patch consoles.operator.openshift.io cluster --patch '[{"op": "add", "path": "/spec/plugins/-", "value": "console-plugin-nvidia-gpu" }]' --type=json
      
  6. In the OpenShift Container Platform web console from the side menu, navigate to Home > Overview.

    The Cluster utilization window now displays the GPU related graphs.

    ../../../../_images/gpu_overview_dashboard2.png

The NVIDIA GPU Operator dashboards

The following table provides a brief description of the displayed dashboards.

Dashboard

Description

GPU

Number of available GPUs.

GPU Power Usage

Power usage in watts for each GPU.

GPU Encoder/Decoder

Percentage of GPU workload dedicated to video encoding and decoding.