Enable the GPU Operator Dashboard
Prerequisites
Install Helm
OpenShift Container Platform 4.10+
Follow this guidance to provide GPU usage information in the cluster utilization screen in the OpenShift Container Platform web console.
Enable the NVIDIA GPU Operator usage information
Add the
helmrepo:$ helm repo add rh-ecosystem-edge https://rh-ecosystem-edge.github.io/console-plugin-nvidia-gpuUpdate the repo:
$ helm repo updateInstall the
helmchart in the default NVIDIA GPU Operator namespace:$ helm install -n nvidia-gpu-operator console-plugin-nvidia-gpu rh-ecosystem-edge/console-plugin-nvidia-gpuNAME: console-plugin-nvidia-gpu LAST DEPLOYED: Thu Apr 14 09:35:36 2022 NAMESPACE: nvidia-gpu-operator STATUS: deployed REVISION: 1 NOTES: View the Console Plugin NVIDIA GPU deployed resources by running the following command: $ kubectl -n nvidia-gpu-operator get all -l app.kubernetes.io/name=console-plugin-nvidia-gpu Enable the plugin by running the following command: $ kubectl patch consoles.operator.openshift.io cluster --patch '[{"op": "add", "path": "/spec/plugins/-", "value": "console-plugin-nvidia-gpu" }]' --type=json
View the deployed resources:
$ oc -n nvidia-gpu-operator get all -l app.kubernetes.io/name=console-plugin-nvidia-gpu
Verify the plugins field is specified:
$ oc get consoles.operator.openshift.io cluster --output=jsonpath="{.spec.plugins}"
If it is not specified, then run the following to enable the plugin:
$ oc patch consoles.operator.openshift.io cluster --patch '{ "spec": { "plugins": ["console-plugin-nvidia-gpu"] } }' --type=merge
If it is specified, then run the following to enable the plugin:
$ oc patch consoles.operator.openshift.io cluster --patch '[{"op": "add", "path": "/spec/plugins/-", "value": "console-plugin-nvidia-gpu" }]' --type=json
In the OpenShift Container Platform web console from the side menu, navigate to Home > Overview.
The
Cluster utilizationwindow now displays the GPU related graphs.
The NVIDIA GPU Operator dashboards
The following table provides a brief description of the displayed dashboards.
Dashboard |
Description |
|---|---|
GPU |
Number of available GPUs. |
GPU Power Usage |
Power usage in watts for each GPU. |
GPU Encoder/Decoder |
Percentage of GPU workload dedicated to video encoding and decoding. |