Resolved Issues#

The following issues that were previously identified as known issues have been resolved.

Issues with ConnectX-7 Network (Cluster) Card Firmware#

Issue#

If the NVIDIA® ConnectX®-7 Network (Cluster) Card firmware version 28.39.3560 is currently installed on your DGX H100/H200 system, you might encounter the following issues:

  • After a long runtime on a DGX H100/H200 system, one or more GPUs might fall off the bus, and the nvidia-smi command fails to run. After a power cycle, the system will recover, and all GPUs will be operational. The system will continue to run again without any issues for a long time.

  • After a reboot or power cycle, one or more OSFP ports on the DGX system might remain in the Down state.

Resolution#

To prevent these issues, NVIDIA recommends updating the firmware of the following ConnectX-7 network cards to version 28.42.1000:

NVIDIA ConnectX-7 Card

Version for the 24.09.1 Release

Recommended Version

Network (cluster) card

28.39.3560

28.42.1000

Network (storage) card

28.39.3560

28.42.1000

For more information, refer to DGX H100/H200 - Update for ConnectX-7 Networking Cards Available.

Platform DGX H200 Not Supported#

Issue#

On DGX H200 systems with nvfwupd version 2.0.1 installed, the following error message might appear when you update the firmware using the nvfwupd command.

Platform dgxh200 not supported.

Explanation#

Starting with nvfwupd version 2.0.1, the server type is required to update the firmware on new DGX platforms. An enhanced solution to automatically detect the server type for DGX platforms will be available in a future release.

Status#

Resolved in nvfwupd version 2.0.4.

The ipmitool dcmi power reading Command Returns 0 Power Reading Value#

Issue#

When you use the ipmitool dcmi power reading command to report the power consumption data, the command reports 0 Watts for the power reading value as shown in the following example:

$ sudo ipmitool -I lanplus -H IPaddress -U user -P password dcmi power reading
Instantaneous power reading:                             0 Watts
Minimum during sampling period:                          0 Watts
Maximum during sampling period:                       7852 Watts
Average power reading over sample period:             1885 Watts
IPMI timestamp:                             Jan 12 09:20:45 2024
Sampling period:                                00000005 Seconds
Power reading state is:                                activated

Status#

Resolved in version 24.09.1.

GPUs Show Exclamation Mark in BMC Web Interface#

Issue#

When you view the GPUs from the BMC web interface, the GPUs are shown with an exclamation mark (excl-mark).

Explanation#

The icon is a false positive. You can view the results of the nvsm show health command to confirm that the GPU status is healthy.

Status#

Resolved in version 1.1.3.

BMC LDAP Fields Do Not Support Space or Slash Characters#

Issue#

The BMC LDAP settings do not support the space or slash characters as part of the bind DN or search base. The following DN results in a failure:

DC=Echo Studios,DC=com

Status#

Resolved in version 24.09.1.

NVMe Information Not Visible in BCM Web Interface#

Issue#

In some cases, the NVMe information is not visible in the BMC web interface.

Status#

Resolved in version 24.09.1.