Compute Tray Firmware Update Process#
Method 1 - BCM/NVIDIA Mission Control Integrated Firmware Update for Compute Tray#
To use the firmware update tool in BCM 11 an NVIDIA Mission Control enabled license must be registered.
Place Firmware Update Packages in the Correct BCM Directory /cm/local/apps/cmd/etc/htdocs/bios/firmware/gb200
Copy the prod-signed.fwpkg images up to the BCM head node. The files must be placed in the following directory to be visible to the ‘firmware’ command:
scp <binary files> user@<headnode>:/cm/local/apps/cmd/etc/htdocs/bios/firmware/gb200
Reference: BCM file directory structure for firmware updates.
/cm/local/apps/cmd/etc/htdocs/bios/firmware/ README.md b200/ gb200/ gb200sw/ gh200/ h100/ ilo/ # The gb200 folder is for compute tray firmware, the gb200sw folder is # for NVLink Switch firmware
Use the firmware info command in BCM to gather information on the current FW levels of the nodes. It will detail the files and what their purpose is.
Use the firmware info command in BCM to gather information on the current FW levels of the nodes. It will detail the files and what their purpose is.
cmsh;device;firmware info [T06-HEAD-01->device]% firmware info Device Filename Component Version State Progress Result Size Date ------------- --------------------------------------------------- ------------- ------------------------------ ---------- -------- -------- -------- --------------------- T06-HEAD-01 nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg GB200-BMC DGX-GBX00_0024_250215.1.0_custom available N/A 64MiB 2025-02-15, 16:39:41 T06-HEAD-01 nvfw_GB200-P4978_0004_250213.1.0_prod-signed.fwpkg GB200-Switch GB200-P4978_0004_250213.1.0 available N/A 75MiB 2025-02-13, 10:23:28 T06-HEAD-01 nvfw_GB200-P4978_0006_250205.1.0_prod-signed.fwpkg GB200-Switch GB200-P4978_0006_250205.1.0 available N/A 16.2MiB 2025-02-05, 15:11:49 T06-HEAD-01 nvfw_GB200-P4978_0007_250121.1.2_custom_prod-signed.fwpkg GB200-Switch GB200-P4978_0007_250121.1.2_custom available N/A 1.64MiB 2025-01-21, 13:55:30 T06-HEAD-01 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg GB200-Compute HGX-GBX00_0023_250223.1.1_custom available N/A 114MiB 2025-02-23, 20:20:42
Note
This will display the file names and target (such as GB200 or Switch) of all available firmware binaries. If the files do not show up with this command, they cannot be flashed by the update tool. The officially released packages will have a common filename structure starting with nvfw_DGX-GBX00_<identifier>_<date>.
Confirm GB200/GB300 Tray BMC Access/Connectivity.
The BMC of each node needs to be configured in BCM. This should be done at the category level. Ensure that no bmcsettings are added at the node level so that the compute trays inherit the settings from the category level.
Enter cmsh and show the current BMC settings for a given node or use the category level for GB200 compute trays since all of their default passwords are the same (for DGX).
#category level category; use <dgx-category>;bmcsettings; show #device level device; use <device name>; bmcsettings; show
Only use the device level to confirm that nothing has been set.
It will show as if they have not been set before as indicated by an asterisk.
[bcm11-headnode->device*[a08-p1-dgx-04-c18\*]->bmcsettings\*]% #use this command to clear uncommitted changes refresh
Populate the bmcsettings fields in the dgx-gb200 category if it is not already populated.
cmsh;category use dgx-gb200;bmcsettings; set username admin set password <Password of choice> set userid 1 set firmwaremanagemode gb200 commit
Note
It is critical that the firmware management mode here is set to gb200.
Test that the BMC is configured by reading the current FW versions.
#at the device level cmsh; device use <dgx-node-name>; firmware status [maple->device[dgx-gb200-m07-c1]]% firmware status Device Filename Component Version State Progress Result Size Date ----------------- -------------------------------- dgx-gb200-m07-c1 CX7_0 28.42.1270 current N/A N/A dgx-gb200-m07-c1 CX7_1 28.42.1270 current N/A N/A dgx-gb200-m07-c1 CX7_2 28.42.1270 current N/A N/A dgx-gb200-m07-c1 CX7_3 28.42.1270 current N/A N/A dgx-gb200-m07-c1 FW_BMC_0 GB200Nvl-24.12-8 current N/A N/A dgx-gb200-m07-c1 FW_CPLD_0 0x00 0x0b 0x03 0x04 current N/A N/A dgx-gb200-m07-c1 FW_CPLD_1 0x00 0x0b 0x03 0x04 current N/A N/A dgx-gb200-m07-c1 FW_CPLD_2 0x00 0x10 0x01 0x0f current N/A N/A dgx-gb200-m07-c1 FW_CPLD_3 0x00 0x10 0x01 0x0f current N/A N/A dgx-gb200-m07-c1 FW_ERoT_BMC_0 01.03.0262.0000_n04 current N/A N/A dgx-gb200-m07-c1 Full_FW_Image_NIC_Slot_4 32.42.1000 current N/A N/A dgx-gb200-m07-c1 Full_FW_Image_NIC_Slot_7 32.42.1000 current N/A N/A dgx-gb200-m07-c1 UEFI buildbrain-gcid-38635631 current N/A N/A
#At the category level to see all of the compute tray FW in one shot cmsh; device;firmware -c dgx-gb200 status #At the rack level cmsh; device;firmware -r <rack location> status
As a validation step prior to executing the flash, a dry-run command is supported to show exactly what will be changing when the firmware is applied:
Perform a dry run of the BMC FW
cmsh;device; firmware flash nvfw_DGX-GBX00_0023_241223.1.0_custom_prod-signed.fwpkg --dry-run -n <device name>
The <device name> can have some regex to apply the change to multiple devices simultaneously:
dgx-gb200-r1-c[1-2]- This will run the command against both dgx-gb200-r1-c1 and dgx-gb200-r1-c2Device names can also be comma separated to run against multiple individual devices:
dgx-gb200-r1-c1,dgx-gb200-r1-c2
Example: Dry run output
s03-p1-dgx-01-c06 HGX_FW_BMC_0 HGX_FW_BMC_0 GB200Nvl-25.01-D GB200Nvl-25.01-E no install good s03-p1-dgx-01-c06 HGX_FW_CPLD_0 HGX_FW_CPLD_0 0.1C 0.1C yes skip good s03-p1-dgx-01-c06 HGX_FW_CPU_0 HGX_FW_CPU_0 02.03.19 02.03.20 no install good s03-p1-dgx-01-c06 HGX_FW_CPU_1 HGX_FW_CPU_1 02.03.19 02.03.20 no install good s03-p1-dgx-01-c06 HGX_FW_ERoT_BMC_0 HGX_FW_ERoT_BMC_0 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good s03-p1-dgx-01-c06 HGX_FW_ERoT_CPU_0 HGX_FW_ERoT_CPU_0 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good s03-p1-dgx-01-c06 HGX_FW_ERoT_CPU_1 HGX_FW_ERoT_CPU_1 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good s03-p1-dgx-01-c06 HGX_FW_ERoT_FPGA_0 HGX_FW_ERoT_FPGA_0 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good s03-p1-dgx-01-c06 HGX_FW_ERoT_FPGA_1 HGX_FW_ERoT_FPGA_1 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good s03-p1-dgx-01-c06 HGX_FW_FPGA_0 HGX_FW_FPGA_0 1.20 1.20 yes skip good s03-p1-dgx-01-c06 HGX_FW_FPGA_1 HGX_FW_FPGA_1 1.20 1.20 yes skip good s03-p1-dgx-01-c06 HGX_FW_GPU_0 HGX_FW_GPU_0 97.00.82.00.13 97.00.82.00.19 no install good s03-p1-dgx-01-c06 HGX_FW_GPU_1 HGX_FW_GPU_1 97.00.82.00.13 97.00.82.00.19 no install good s03-p1-dgx-01-c06 HGX_FW_GPU_2 HGX_FW_GPU_2 97.00.82.00.13 97.00.82.00.19 no install good s03-p1-dgx-01-c06 HGX_FW_GPU_3 HGX_FW_GPU_3 97.00.82.00.13 97.00.82.00.19 no install good
Ensure that the values that are going to be updated are the expected versions.
Start the firmware update.
cmsh -c 'device; firmware flash nvfw_DGX-GBX00_0023_250614.1.0_custom_prod-signed.fwpkg -n <device name>'
Once the payload is uploaded to the node it will say good.
[T06-HEAD-01->device]% firmware flash nvfw_DGX-GBX00_0023_250614.1.0_custom_prod-signed.fwpkg -n s03-p1-dgx-01-c{04..06} Device flashing file Result ------------------ ---------------------------------------------------- -------- s03-p1-dgx-01-c04 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg good s03-p1-dgx-01-c05 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg good s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg good
When the command completes, check the status of the update until it has completed. This will have a percentage complete while the flashing is ongoing and a complete message when the flash has finished.
cmsh -c 'device; firmware status -n <device name>' s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_BMC_0 GB200Nvl-25.01-D flashing 0.0% 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_CPU_0 02.03.19 flashing 0.0% 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_CPU_1 02.03.19 flashing 0.0% 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_0 97.00.82.00.13 flashing 0.0% 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_1 97.00.82.00.13 flashing 0.0% 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_2 97.00.82.00.13 flashing 0.0% 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_3 97.00.82.00.13 flashing 0.0% s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_BMC_0 GB200Nvl-25.01-D -> GB200Nvl-25.01-E pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_CPU_0 02.03.19 -> 02.03.20 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_CPU_1 02.03.19 -> 02.03.20 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_0 97.00.82.00.13 -> 97.00.82.00.19 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_1 97.00.82.00.13 -> 97.00.82.00.19 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_2 97.00.82.00.13 -> 97.00.82.00.19 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_3 97.00.82.00.13 -> 97.00.82.00.19 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB At the end of the BMC update, the administrator can AC Cycle the GB200 node(s) to complete the BMC update, then proceed with updating other components.
Note
It is important to AC/AUX cycle the target host after the CPLD and BMC updates because the BMC has limited memory and cannot store another firmware package. AC cycling clears the memory and applies changes, allowing the HMC update to proceed successfully.
Do the AC Cycle after each .fwpkg completed the firmware update
Note
The GB200/GB300 compute tray has two levels of power:
Primary (system) power: This is the power supplied to the compute tray CPUs and GPUs. This must be powered off before the aux_cycle process.
Standby (AUX) power: This is the power supplied to the BMC and low-level components. Cycling standby power is an automated process that temporarily removes power from the compute tray, reinitializing all hardware components. The BMC will be unavailable for several minutes during the aux_cycle process. Once completed, the primary power can be toggled on again.
Power Cycle Method 1: AUX_PWR_CYCLE using Redfish
To perform the power cycle using Redfish API calls directly to the BMC:
From the head node, power down the system:
curl -k -u ${USER}:${PASS} -H "Content-Type: application/json" -X POST \ -d '{"ResetType": "ForceOff"}' \ https://${BMCIP}/redfish/v1/Systems/System_0/Actions/ComputerSystem.Reset
Perform the AC power cycle (removal of auxiliary power):
curl -k -u ${USER}:${PASS} -H "Content-Type: application/json" -X POST \ -d '{"ResetType":"AuxPowerCycle"}' \ https://${BMCIP}/redfish/v1/Chassis/BMC_0/Actions/Oem/NvidiaChassis.AuxPowerReset
After the cycle, power on the system using Redfish:
curl -k -u ${USER}:${PASS} \ https://${BMCIP}/redfish/v1/Systems/System_0/Actions/ComputerSystem.Reset \ -d '{"ResetType": "On"}' -X POST
Examples: Powering on nodes using cmsh
While not part of the AC power cycle itself, the following commands can be used to power on nodes after the update process as needed:
Power on a single compute node:
cmsh;device;use <compute node under test>;power on
Power on multiple nodes in a category:
cmsh;device;foreach -c dgx-gb200 (power on)
Power on all nodes in a category:
cmsh;device;power on -c dgx-gb200
Power on specific nodes by name:
cmsh;device;power on -n <specific nodes>
Power Cycle Method 2: BCM “power auxcycle” Command (available in BCM 11.25.08 and later)
An AC power cycle can also be performed via the BCM command line within the device context.
Ensure the node is powered off:
[BCM11-HEAD-01->device[dgx-gb200-m06-c1]]% power status rf0 ...................... [ ON ] dgx-gb200-m06-c1 [BCM11-HEAD-01->device[dgx-gb200-m06-c1]]% power off rf0 ...................... [ OFF ] dgx-gb200-m06-c1
Note
If the node is still ON when executing the
power auxcyclecommand, an error message will be returned:[BCM11-HEAD-01->device[dgx-gb200-m06-c1]]% power auxcycle rf0 ...................... [ FAILED ] dgx-gb200-m06-c1 (System power is not OFF)
After confirming the node is OFF, perform the auxiliary power cycle:
[BCM11-HEAD-01->device[dgx-gb200-m06-c1]]% power auxcycle rf0 ...................... [AUX CYCLE]
During auxcycle, the BMC will be unavailable for several minutes. “power status” will indicate failure until the process is complete:
[BCM11-HEAD-01->device[dgx-gb200-m06-c1]]% power status rf0 ...................... [ FAILED ] dgx-gb200-m06-c1 (Unable to establish session)
When auxcycle completes, the node status will return to OFF:
[BCM11-HEAD-01->device[dgx-gb200-m06-c1]]% power status rf0 ...................... [ OFF ] dgx-gb200-m06-c1
Power on the node:
[BCM11-HEAD-01->device[dgx-gb200-m06-c1]]% power on rf0 ...................... [ ON ] dgx-gb200-m06-c1
If issues arise, getting the debug output can help root cause some issues. Use the flash command with debug options enabled to get debug output
cmsh -c 'device; firmware flash nvfw_DGX-GBX00_0023_241223.1.0_custom_prod-signed.fwpkg -n <device name> -v --debug'
Method 2 - Stand Alone nvfwupd Tool for Compute Tray#
If the license does not support NVIDIA Mission Control, the built in cm-nvfwupd will not work. Download the latest standalone nvfwupd tool from Enterprise Portal - v2.0.7 or later: Announcement: nvfwupd tool version tool or method is used independent of BCM.
Note
These instructions only cover the update of a single compute tray. The stand-alone tool supports simultaneous upgrades for multiple systems, and multiple components like the compute trays and NVLink switches together. Please refer to Chapters 17, 18 and 19 of the NVIDIA Firmware Update Guide that is included with the nvfwupdate tool.
Get the correct FW update packages for update. To see the full contents of a fwupd.pkg, use the show_pkg_content command.
./nvfwupd show_pkg_content -p
./nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg
Get current state of the hardware with show_version.
./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} \
servertype=GB200 show_version -p ./nvfw_GB200-P4972_0012_250214.1.0_custom_prod-signed.fwpkg \
./nvfw_GB200-P4975_0011_250206.1.1_custom_recovery_prod-signed.fwpkg
System Model: GB200 NVL
Part number: 699-24764-0001-RC1
Serial number: 1334524170073
Packages: ['GB200-P4972_0012_250214.1.0_custom', 'GB200-P4975_0011_250206.1.1_custom_recovery']
Connection Status: Successful
Firmware Devices:
AP Name Sys Version Pkg Version Up-To-Date
---------------------- ---------------------- -------------------------- ----------
CX7_0 28.43.2108 N/A No
CX7_1 28.43.2108 N/A No
CX7_2 28.43.2108 N/A No
CX7_3 28.43.2108 N/A No
FW_BMC_0 GB200Nvl-25.01-D GB200Nvl-25.01-E No
FW_CPLD_0 0x00 0x0b 0x03 0x04 N/A No
FW_CPLD_1 0x00 0x0b 0x03 0x04 N/A No
FW_CPLD_2 0x00 0x10 0x01 0x0f N/A No
FW_CPLD_3 0x00 0x10 0x01 0x0f N/A No
FW_ERoT_BMC_0 01.04.0008.0000_n04 01.04.0008.0000_n04 Yes
Full_FW_Image_NIC_Slot_4 32.43.2408 N/A No
Full_FW_Image_NIC_Slot_7 32.43.2408 N/A No
UEFI buildbrain-gcid-39281046 N/A No
HGX_FW_BMC_0 GB200Nvl-25.01-D N/A No
HGX_FW_CPLD_0 0.1C N/A No
HGX_FW_CPU_0 02.03.19 N/A No
HGX_FW_CPU_1 02.03.19 N/A No
HGX_FW_ERoT_BMC_0 01.04.0008.0000_n04 01.03.0196.0001 Yes
HGX_FW_ERoT_CPU_0 01.04.0008.0000_n04 01.03.0196.0001 Yes
HGX_FW_ERoT_CPU_1 01.04.0008.0000_n04 01.03.0196.0001 Yes
HGX_FW_ERoT_FPGA_0 01.04.0008.0000_n04 01.03.0196.0001 Yes
HGX_FW_ERoT_FPGA_1 01.04.0008.0000_n04 01.03.0196.0001 Yes
HGX_FW_FPGA_0 1.20 N/A No
HGX_FW_FPGA_1 1.20 N/A No
HGX_FW_GPU_0 97.00.82.00.13 1.0.61.0 No
HGX_FW_GPU_1 97.00.82.00.13 1.0.61.0 No
HGX_FW_GPU_2 97.00.82.00.13 1.0.61.0 No
HGX_FW_GPU_3 97.00.82.00.13 1.0.61.0 No
HGX_InfoROM_GPU_0 G548.0201.00.06 N/A No
HGX_InfoROM_GPU_1 G548.0201.00.06 N/A No
HGX_InfoROM_GPU_2 G548.0201.00.06 N/A No
HGX_InfoROM_GPU_3 G548.0201.00.06 N/A No
HGX_PCIeSwitchConfig_0 01151024 N/A No
------------------------------------------------------------------------------------
Error Code: 0
Create payload .jsons for the bmc and the compute tray
Reference: UpdateBMC.json
{
"Targets": []
}
Reference: UpdateCompute.json
{
"Targets": ["/redfish/v1/Chassis/HGX_Chassis_0"]
}
Run the BMC update first.
./nvfwupd -t ip=<rf0 ip> user=$USER password=$PASSWORD servertype=GB200
update_fw -s BMC_Full.json -p
./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg
Power off the system, then do an AC Cycle.
./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200
activate_fw -c PWR_OFF
# wait 15 seconds
./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200
activate_fw -c RF_AUX_PWR_CYCLE
Check if the BMC update was successful.
Reference: Successful BMC update.
./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200 update_fw -s ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200 update_fw -s BMC_Full.json -p ./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg
Updating ip address: ip=XXXX
FW package:
['./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg']
Updating ip address: ip=XXXX
FW package:
['./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg']
Ok to proceed with firmware update? <Y/N>
y
FW package:
['./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg']
Ok to proceed with firmware update? <Y/N>
y
{"@odata.id": "/redfish/v1/TaskService/Tasks/3", "@odata.type":
"#Task.v1_4_3.Task", "Id": "3", "TaskState": "Running", "TaskStatus":
"OK"}
FW update started, Task Id: 3
Wait for Firmware Update to Start...
TaskState: Running
PercentComplete: 20
TaskStatus: OK
TaskState: Running
PercentComplete: 40
TaskStatus: OK
TaskState: Running
PercentComplete: 60
TaskStatus: OK
TaskState: Completed
PercentComplete: 100
TaskStatus: OK
Firmware update successful!
Overall Time Taken: 0:13:01
Refer to ‘NVIDIA Firmware Update Document’ on activation steps for new firmware to take effect.
Do the full compute tray flash (HGX). Ensure that the system is fully up and, in its OS, to be able to do the GPU VBIOS updates.
./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200 update_fw -s Compute_Full.json -p ./nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg
Like the BMC, power down the system and then do an AUX power cycle.
Power on the machine, let it provision/boot up, then check the firmware level again
./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200 show_version -p ./nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg ip=10.78.194.13 user=root password=0penBmc servertype=GB200 show_version -p ./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg ./nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg
System Model: GB200 NVL
Part number: 692-13809-2404-RC1
Serial number: 1330125050101
Packages: ['DGX-GBX00_0024_250215.1.0_custom',
'HGX-GBX00_0023_250223.1.1_custom']
Connection Status: Successful
Firmware Devices:
AP Name Sys Version Pkg Version Up-To-Date
------------------------ ---------------------- -------------------------- ----------
CX7_0 28.43.2108 N/A No
CX7_1 28.43.2108 N/A No
CX7_2 28.43.2108 N/A No
CX7_3 28.43.2108 N/A No
FW_BMC_0 GB200Nvl-25.01-E GB200Nvl-25.01-E Yes
FW_CPLD_0 0x00 0x0b 0x03 0x04 N/A No
FW_CPLD_1 0x00 0x0b 0x03 0x04 N/A No
FW_CPLD_2 0x00 0x10 0x01 0x0f N/A No
FW_CPLD_3 0x00 0x10 0x01 0x0f N/A No
FW_ERoT_BMC_0 01.04.0008.0000_n04 01.04.0008.0000_n04 Yes
Full_FW_Image_NIC_Slot_4 32.43.2408 N/A No
Full_FW_Image_NIC_Slot_7 32.43.2408 N/A No
UEFI buildbrain-gcid-39556194 N/A No
HGX_FW_BMC_0 GB200Nvl-25.01-E GB200Nvl-25.01-E Yes
HGX_FW_CPLD_0 0.1C 0.1C Yes
HGX_FW_CPU_0 02.03.20 02.03.20 Yes
HGX_FW_CPU_1 02.03.20 02.03.20 Yes
HGX_FW_ERoT_BMC_0 01.04.0008.0000_n04 01.04.0008.0000_n04 Yes
HGX_FW_ERoT_CPU_0 01.04.0008.0000_n04 01.04.0008.0000_n04 Yes
HGX_FW_ERoT_CPU_1 01.04.0008.0000_n04 01.04.0008.0000_n04 Yes
HGX_FW_ERoT_FPGA_0 01.04.0008.0000_n04 01.04.0008.0000_n04 Yes
HGX_FW_ERoT_FPGA_1 01.04.0008.0000_n04 01.04.0008.0000_n04 Yes
HGX_FW_FPGA_0 1.20 1.20 Yes
HGX_FW_FPGA_1 1.20 1.20 Yes
HGX_FW_GPU_0 97.00.82.00.19 97.00.82.00.19 Yes
HGX_FW_GPU_1 97.00.82.00.19 97.00.82.00.19 Yes
HGX_FW_GPU_2 97.00.82.00.19 97.00.82.00.19 Yes
HGX_FW_GPU_3 97.00.82.00.19 97.00.82.00.19 Yes
HGX_InfoROM_GPU_0 G548.0201.00.06 N/A No
HGX_InfoROM_GPU_1 G548.0201.00.06 N/A No
HGX_InfoROM_GPU_2 G548.0201.00.06 N/A No
HGX_InfoROM_GPU_3 G548.0201.00.06 N/A No
HGX_PCIeSwitchConfig_0 01151024 N/A No
Applying and Verifying Firmware Update Success#
First connect to the GB200 tray BMC OS, then:
Power off the host.
# Checks that the current status is on curl -k -u ${USER}:${PASS} https://${BMCIP}/redfish/v1/Systems/System_0 | jq '."PowerState"'
# Shuts down the OS # Graceful shutdown: curl -k -u ${USER}:${PASS} \ https://${BMCIP}/redfish/v1/Systems/System_0/Actions/ComputerSystem.Reset \ -d '{"ResetType": "GracefulShutdown"}' -X POST # Force power off: curl -k -u ${USER}:${PASS} \ https://${BMCIP}/redfish/v1/Systems/System_0/Actions/ComputerSystem.Reset \ -d '{"ResetType": "ForceOff"}' -X POST
AC (AUX) cycle the node.
curl -k -u ${USER}:${PASS} \ https://${BMCIP}/redfish/v1/Chassis/BMC_0/Actions/Oem/NvidiaChassis.AuxPowerReset \ -d '{"ResetType":"AuxPowerCycleForce"}' -X POST
Wait for the BMC to ping again (should take 2-3 min). Once the BMC pings, bring the host back up.
# Checks that the current status is off (if it is 'on' no further action required) curl -k -u ${USER}:${PASS} https://${BMCIP}/redfish/v1/Systems/System_0 | jq '."PowerState"' # Power On curl -k -u ${USER}:${PASS} \ https://${BMCIP}/redfish/v1/Systems/System_0/Actions/ComputerSystem.Reset \ -d '{"ResetType": "On"}' -X POST
When the BMC and host are back up, validate that the firmware install was successful.
cmsh -c 'device; firmware status -n <device name>'