Version 1.1.3

Highlights

  • Added support

    • Support for Gen5 NVME drives.

    • U.2 drive temperature sensor fix.

    • Updated power supply firmware.

    • Included the latest GPU tray firmware.

    • Included the latest network (cluster and storage) card firmware.

    • Added support for securing KCS.

  • The nvfwupd command is updated with the following enhancements:

    • Support for abbreviated firmware update package names.

    • Enhanced the show_update_progress output to provide a full status report for Redfish.

    • Support for custom log file path.

    • The command exits with an error code 1 for any update failure or tool failure.

BMC Fixes

  • Fixed where SEL logs might fill up for NVMe drives.

  • Fixed low occurrence where HMC might not be visible in the BMC after BMC reboot.

  • Added ability to control IPMI visibility for Host (Allow All, Limited Command, Hide).

  • Higher resolution for CPU and GPU energy telemetry via Redfish.

  • Improved reliability of Redfish inventory.

  • Improved overall stability of telemetry collection and handling invalid/missing values.

  • General improvements to WebUI.

Firmware Package Details

This firmware release supports the following hardware:

  • NVIDIA DGX H100

This firmware release supports the following operating systems:

  • NVIDIA DGX OS 6.1, 6.0.11, and higher

  • NVIDIA DGX Software for EL9.2, 23.12 and 23.08

  • NVIDIA DGX Software for EL8 23.08

Refer to the NVIDIA Base OS documentation for more information about the operating systems.

You can download firmware packages from the NVIDIA Enterprise Support Portal at https://enterprise-support.nvidia.com/s/.

Download two firmware package files:

Components

Sample File Name

Combined Archive

DGXH100_1.1.3.tar

The combined archive includes the firmware for the system components, firmware for the GPU tray, and the nvfwupd executable.

Motherboard Tray

nvfw_DGXH100_231206.1.0.fwpkg

GPU Tray

nvfw_HGX_DGXH100_231101.1.0.fwpkg

If you are updating from 1.1.1, the total update time is approximately

  • 88 minutes for the CPU tray using sequential updates.

  • 33 minutes for the CPU tray using parallel updates.

  • 11 minutes for the GPU tray using parallel updates.

The following table shows the information about component firmware versions and update time breakdown.

Component
Version
Update time
from 1.1.1
(minutes)

Host BMC

24.01.05

Refer to DGX H100 System BMC Changes for the list of changes.

25

Host BMC EROT

04.0026

2

SBIOS EROT

04.0026

0

SBIOS

v1.01.03

Refer to DGX H100 System SBIOS Changes for the list of changes.

7

Motherboard CPLD

0.2.1.8

18

Midplane CPLD

0.2.1.1

14

PSU (Delta ECD16020137)

Primary 2.4
Secondary 2.1
Community 2.2
PSU_0: 2
PSU_1: 2
PSU_2: 2
PSU_3: 2
PSU_4: 2
PSU_5: 2
Broadcom Gen5
PCIe Switch
(PEX89072-B01)
Switch 0: v0.0.7
Switch 1: v1.0.7
Switch 0: 1
Switch 1: 1
Astera Labs Gen5 PCIe Retimer
(PT5161L)

v2.07.19

Retimer 0: 3
Retimer 1: 3

Network (Cluster) Card - ConnectX-7

v28.39.1002

Network (Storage) Card - ConnectX-7

v28.39.1002

VBIOS (H100 80GB)

96.00.89.00.01

GPU Tray (total): 11

NVSwitch (GPU Tray)

96.10.4A.00.01

EROT (GPU Tray)

02.0150

HMC (GPU Tray)

HGX-22.10-1-rc57

FPGA (GPU Tray)

2.37

PCIe Switch (GPU Tray)

1.7.5F

Astera Labs Gen5 PCIe Retimer (GPU Tray)
(PT5161L)

2.07.20

Intel 10G Ethernet

v3.60

Intel 50G Ethernet

v2.5

M.2 NVMe
(Samsung PM9A3)

GDC7502Q

M.2 NVMe
(Micron 7450)

E2MU200

U.2 Kioxia CM6

1.0.7

U.2 Samsung
(EVT2 PM1733)

MPK95B5Q

U.2 Samsung
(Gen5 PM1743)

OPPA3B5Q

FRU

0.6

TPM

v15.21

Firmware Update Procedure

Refer to Firmware Update Steps.