Known Issues

See the sections for specific versions to see which issues are open in those versions.

DGX-1, DGX-2: GPU MIG Partitions do not return output fields

Issue

When enabling MIG and creating a MIG partition for the GPU, there is no output returned for non-device specific fields: dcgmi dmon -e 1,2,3,4,5

Explanation

This issue affects EL 8 with:

  • Driver Version: 470.141.03

  • CUDA Version: 11.4.152

  • DCGM: 2.4.5

DGX-1, DGX-2: Log displays CEC error

Issue

DGX A100/A800 Firmware Update Container log may show error messages such as "Unable to send RAW command (channel=0x0 netfn=0x3c lun=0x0 cmd=0xf rsp=0xd3): Destination unavailable"

This error will be displayed when running supported commands and may be safely ignored.

DGX-1: NVSM show controllers SerialNumber shows “NOT_SET”

Issue

After rebooting, nvsm show controllers may display a blank serial number.

Explanation

This issue is specific to the DGX-1 platform with the MegaRAID controller and can be remedied by restarting the nvsm service after 30 minutes. To restart the service, run systemctl restart nvsm