Idle NVIDIA A100, NVIDIA A40, and NVIDIA A10 GPUs show 100% GPU utilization
Description
The nvidia-smi command shows 100% GPU utilization for NVIDIA A100, NVIDIA A40, and NVIDIA A10 GPUs even if no vGPUs have been configured or no VMs are running. On Linux with KVM hypervisors, GPU is affected by this issue only if the sriov-manage script has not been run to enable the virtual function for the GPU in the sysfs file system.
[root@host ~]# nvidia-smi
Fri Jun 13 11:45:28 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.05 Driver Version: 580.65.05 CUDA Version: 13.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-PCIE-40GB On | 00000000:5E:00.0 Off | 0 |
| N/A 50C P0 97W / 250W | 0MiB / 40537MiB | 100% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Workaround
On Linux with KVM hypervisors, run the sriov-manage script to enable the virtual function for the GPU in the sysfs file system as explained in Virtual GPU Software User Guide.
On VMware vSphere, boot any VMs that are configured with a vGPU that resides on the GPU.
After this workaround has been completed, the nvidia-smi command shows 0% GPU utilization for affected GPUs when they are idle.
root@host ~]# nvidia-smi
Fri Jun 13 11:47:38 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.05 Driver Version: 580.65.05 CUDA Version: 13.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-PCIE-40GB On | 00000000:5E:00.0 Off | 0 |
| N/A 50C P0 97W / 250W | 0MiB / 40537MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Status
Open
Ref. #
200605527