Resolved Issues#
The following issues that were previously identified as known issues have been resolved.
Issues with ConnectX-7 Network (Cluster) Card Firmware#
Issue#
If the NVIDIA® ConnectX®-7 Network (Cluster) Card firmware version 28.39.3560 is currently installed on your DGX H100/H200 system, you might encounter the following issues:
After a long runtime on a DGX H100/H200 system, one or more GPUs might fall off the bus, and the
nvidia-smi
command fails to run. After a power cycle, the system will recover, and all GPUs will be operational. The system will continue to run again without any issues for a long time.After a reboot or power cycle, one or more OSFP ports on the DGX system might remain in the
Down
state.
Resolution#
To prevent these issues, NVIDIA recommends updating the firmware of the following ConnectX-7 network cards to version 28.42.1000:
NVIDIA ConnectX-7 Card |
Version for the 24.09.1 Release |
Recommended Version |
---|---|---|
Network (cluster) card |
28.39.3560 |
28.42.1000 |
Network (storage) card |
28.39.3560 |
28.42.1000 |
For more information, refer to DGX H100/H200 - Update for ConnectX-7 Networking Cards Available.
Platform DGX H200 Not Supported#
Issue#
On DGX H200 systems with nvfwupd
version 2.0.1 installed, the following error
message might appear when you update the firmware using the nvfwupd
command.
Platform dgxh200 not supported.
Explanation#
Starting with nvfwupd
version 2.0.1, the server type is required to update the firmware
on new DGX platforms. An enhanced solution to automatically detect the server type for DGX platforms
will be available in a future release.
Status#
Resolved in nvfwupd
version 2.0.4.
The ipmitool dcmi power reading Command Returns 0 Power Reading Value#
Issue#
When you use the ipmitool dcmi power reading
command to report the power consumption data,
the command reports 0 Watts for the power reading value as shown in the following example:
$ sudo ipmitool -I lanplus -H IPaddress -U user -P password dcmi power reading
Instantaneous power reading: 0 Watts
Minimum during sampling period: 0 Watts
Maximum during sampling period: 7852 Watts
Average power reading over sample period: 1885 Watts
IPMI timestamp: Jan 12 09:20:45 2024
Sampling period: 00000005 Seconds
Power reading state is: activated
Status#
Resolved in version 24.09.1.
GPUs Show Exclamation Mark in BMC Web Interface#
Issue#
When you view the GPUs from the BMC web interface, the GPUs are shown
with an exclamation mark ().
Explanation#
The icon is a false positive.
You can view the results of the nvsm show health
command to confirm that the GPU status is healthy.
Status#
Resolved in version 1.1.3.
BMC LDAP Fields Do Not Support Space or Slash Characters#
Issue#
The BMC LDAP settings do not support the space or slash characters as part of the bind DN or search base. The following DN results in a failure:
DC=Echo Studios,DC=com
Status#
Resolved in version 24.09.1.
NVMe Information Not Visible in BCM Web Interface#
Issue#
In some cases, the NVMe information is not visible in the BMC web interface.
Status#
Resolved in version 24.09.1.