Overview#

Overall, firmware updates using the NVIDIA Base Command Manager (BCM) 11 software for a GB200 NVL72 rack can be done once all the GB200 compute trays, NVLink switch trays, and power shelves are up in BCM. The latest FW/SW recipe must be followed for the installation on both devices to be successful.

The processes described in this guide apply to DGX GB200 NVL72 software version 1.2.2 and later.

Note

FW packages for DGX SuperPOD GB200 are unique and different from the reference GB200 architecture package.

Reference: DGX GB200 Compute Tray Files Required for Update on DGX SuperPOD (As of BCM 11 1.2 GA)#

Component

DGX FW Recipe Version

Filename

DGX GB200 SW/FW Release Notes

1.3

Compute BMC bundle

nvfw_DGX-GBX00_0023_<date>.*_custom_prod-signed.fwpkg

Compute HMC bundle

nvfw_HGX-GBX00_0023_<date>.*_custom_prod-signed.fwpkg

BF3

fw-Bluefield-3-rel-*.bin

CX7

fw-ConnectX7-rel-*.bin

Switch NVOS

nvos-amd64-*.bin

Switch BMC bundle

nvfw_GB200-P4978_0004.*.fwpkg

Switch BIOS bundle

nvfw_GB200-P4978_0006.*.fwpkg

Switch CPLD bundle

nvfw_GB200-P4978_0007.*.fwpkg

Powershelf PSU

NVIDIA_5500_APP_.*.tar

Powershelf PMC

common-pmc-3.*tar

Firmware updates for the GB200 compute trays can be done by:

  1. BCM 11 integrated firmware update tool

  2. Standalone nvfwupd tool

GB200 Compute Tray Firmware Update - General Steps

  1. Obtain the compute tray package

  2. Ensure that compute tray BMC has username “admin” enabled and that the credentials are known. If username “admin” does not exist or is disabled, it must be created and enabled before the compute tray update. BCM or any rack management systems should migrate to using “admin” as default BMC account going forward as the previously used “root” will be disabled going forward. Please see Appendix A.1 before proceeding with the Update.

  3. If using BCM to do the firmware update

    1. Place the files in /cm/local/apps/cmd/etc/htdocs/bios/firmware/gb200

    2. Confirm that in the NVLink Switch bmcsettings, the firmware management mode is set to GB200

    3. Check the current nodes FW versions against the update packages

    4. Do a dry run to confirm the FW will update to the expected versions

  4. Update the BMC package first (Compute BMC bundle), then the compute tray package (Compute HMC bundle). AUX power cycle the trays after each component update is complete

NVLink Switch Tray Firmware Update - General Steps

  1. Obtain the NVLink Switch firmware

  2. If using BCM to do the firmware update

    1. Place the files in /cm/local/apps/cmd/etc/htdocs/bios/firmware/gb200sw

    2. Confirm that in the NVLink Switch bmcsettings, the firmware management mode is set to GB200sw

  3. Check the current NVLink Switch FW versions against the update packages

  4. Do a dry run to confirm the FW will update to the expected versions

  5. Update the tray level firmware first in this order

    1. BMC+FPGA+ERoT (Switch BMC bundle)

    2. CPLD1 CPLD2 CPLD3 CPLD4 (Switch CPLD bundle)

    3. SBIOS+EROT (Switch BIOS bundle)

  6. Update the NVOS from within the OS or use ZTP. (Switch NVOS)

  7. Aux power cycle the trays after each component update is complete.

Compute Tray Firmware Update Process#

Method 1 - BCM/NVIDIA Mission Control Integrated Firmware Update for Compute Tray#

To use the firmware update tool in BCM 11 an NVIDIA Mission Control enabled license must be registered.

  1. Place Firmware Update Packages in the Correct BCM Directory /cm/local/apps/cmd/etc/htdocs/bios/firmware/gb200

  2. Copy the prod-signed.fwpkg images up to the BCM head node. The files must be placed in the following directory to be visible to the ‘firmware’ command:

    scp <binary files> user@<headnode>:/cm/local/apps/cmd/etc/htdocs/bios/firmware/gb200
    

Reference: BCM file directory structure for firmware updates.

/cm/local/apps/cmd/etc/htdocs/bios/firmware/

README.md b200/ gb200/ gb200sw/ gh200/ h100/ ilo/

# The gb200 folder is for compute tray firmware, the gb200sw folder is
# for NVLink Switch firmware
  1. Use the firmware info command in BCM to gather information on the current FW levels of the nodes. It will detail the files and what their purpose is.

  2. Use the firmware info command in BCM to gather information on the current FW levels of the nodes. It will detail the files and what their purpose is.

    cmsh;device;firmware info
    
    [T06-HEAD-01->device]% firmware info
    
    Device        Filename                                             Component      Version                        State      Progress Result   Size     Date
    ------------- --------------------------------------------------- ------------- ------------------------------ ---------- -------- -------- -------- ---------------------
    T06-HEAD-01   nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg GB200-BMC   DGX-GBX00_0024_250215.1.0_custom available  N/A     64MiB    2025-02-15, 16:39:41
    T06-HEAD-01   nvfw_GB200-P4978_0004_250213.1.0_prod-signed.fwpkg       GB200-Switch GB200-P4978_0004_250213.1.0   available  N/A     75MiB    2025-02-13, 10:23:28
    T06-HEAD-01   nvfw_GB200-P4978_0006_250205.1.0_prod-signed.fwpkg       GB200-Switch GB200-P4978_0006_250205.1.0   available  N/A     16.2MiB  2025-02-05, 15:11:49
    T06-HEAD-01   nvfw_GB200-P4978_0007_250121.1.2_custom_prod-signed.fwpkg GB200-Switch GB200-P4978_0007_250121.1.2_custom available  N/A     1.64MiB  2025-01-21, 13:55:30
    T06-HEAD-01   nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg  GB200-Compute HGX-GBX00_0023_250223.1.1_custom available  N/A     114MiB   2025-02-23, 20:20:42
    

    Note: This will display the file names and target (such as GB200 or Switch) of all available firmware binaries. If the files do not show up with this command, they cannot be flashed by the update tool. The officially released packages will have a common filename structure starting with nvfw_DGX-GBX00_<identifier>_<date>.

  3. Confirm GB200 Tray BMC Access/Connectivity.

    1. The BMC of each node needs to be configured in BCM. This should be done at the category level. Ensure that no bmcsettings are added at the node level so that the compute trays inherit the settings from the category level.

    2. Enter cmsh and show the current BMC settings for a given node or use the category level for GB200 compute trays since all of their default passwords are the same (for DGX).

      #category level
      category; use <dgx-category>;bmcsettings; show
      
      #device level
      device; use <device name>; bmcsettings; show
      

      Only use the device level to confirm that nothing has been set.

      It will show as if they have not been set before as indicated by an asterisk.

      [bcm11-headnode->device*[a08-p1-dgx-04-c18\*]->bmcsettings\*]%
      
      #use this command to clear uncommitted changes
      refresh
      
    3. Populate the bmcsettings fields in the dgx-gb200 category if it is not already populated.

      cmsh;category use dgx-gb200;bmcsettings;
      set username admin
      set password <Password of choice>
      set userid 1
      set firmwaremanagemode gb200
      commit
      

      Note: It is critical that the firmware management mode here is set to gb200.

    4. Test that the BMC is configured by reading the current FW versions.

      #at the device level
      cmsh; device use <dgx-node-name>; firmware status
      
      [maple->device[dgx-gb200-m07-c1]]% firmware status
      
      Device Filename Component Version State Progress Result Size Date
      ----------------- --------------------------------
      dgx-gb200-m07-c1 CX7_0 28.42.1270 current N/A N/A
      dgx-gb200-m07-c1 CX7_1 28.42.1270 current N/A N/A
      dgx-gb200-m07-c1 CX7_2 28.42.1270 current N/A N/A
      dgx-gb200-m07-c1 CX7_3 28.42.1270 current N/A N/A
      dgx-gb200-m07-c1 FW_BMC_0 GB200Nvl-24.12-8 current N/A N/A
      dgx-gb200-m07-c1 FW_CPLD_0 0x00 0x0b 0x03 0x04 current N/A N/A
      dgx-gb200-m07-c1 FW_CPLD_1 0x00 0x0b 0x03 0x04 current N/A N/A
      dgx-gb200-m07-c1 FW_CPLD_2 0x00 0x10 0x01 0x0f current N/A N/A
      dgx-gb200-m07-c1 FW_CPLD_3 0x00 0x10 0x01 0x0f current N/A N/A
      dgx-gb200-m07-c1 FW_ERoT_BMC_0 01.03.0262.0000_n04 current N/A N/A
      dgx-gb200-m07-c1 Full_FW_Image_NIC_Slot_4 32.42.1000 current N/A N/A
      dgx-gb200-m07-c1 Full_FW_Image_NIC_Slot_7 32.42.1000 current N/A N/A
      dgx-gb200-m07-c1 UEFI buildbrain-gcid-38635631 current N/A N/A
      
      #At the category level to see all of the compute tray FW in one shot
      cmsh; device;firmware -c dgx-gb200 status
      
      #At the rack level
      cmsh; device;firmware -r <rack location> status
      
  4. As a validation step prior to executing the flash, a dry-run command is supported to show exactly what will be changing when the firmware is applied:

    1. Perform a dry run of the BMC FW

      cmsh;device; firmware flash nvfw_DGX-GBX00_0023_241223.1.0_custom_prod-signed.fwpkg --dry-run -n <device name>
      

      The <device name> can have some regex to apply the change to multiple devices simultaneously:

      • dgx-gb200-r1-c[1-2] - This will run the command against both dgx-gb200-r1-c1 and dgx-gb200-r1-c2

      • Device names can also be comma separated to run against multiple individual devices: dgx-gb200-r1-c1,dgx-gb200-r1-c2

      Example: Dry run output

      s03-p1-dgx-01-c06 HGX_FW_BMC_0 HGX_FW_BMC_0 GB200Nvl-25.01-D GB200Nvl-25.01-E no install good
      s03-p1-dgx-01-c06 HGX_FW_CPLD_0 HGX_FW_CPLD_0 0.1C 0.1C yes skip good
      s03-p1-dgx-01-c06 HGX_FW_CPU_0 HGX_FW_CPU_0 02.03.19 02.03.20 no install good
      s03-p1-dgx-01-c06 HGX_FW_CPU_1 HGX_FW_CPU_1 02.03.19 02.03.20 no install good
      s03-p1-dgx-01-c06 HGX_FW_ERoT_BMC_0 HGX_FW_ERoT_BMC_0 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good
      s03-p1-dgx-01-c06 HGX_FW_ERoT_CPU_0 HGX_FW_ERoT_CPU_0 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good
      s03-p1-dgx-01-c06 HGX_FW_ERoT_CPU_1 HGX_FW_ERoT_CPU_1 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good
      s03-p1-dgx-01-c06 HGX_FW_ERoT_FPGA_0 HGX_FW_ERoT_FPGA_0 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good
      s03-p1-dgx-01-c06 HGX_FW_ERoT_FPGA_1 HGX_FW_ERoT_FPGA_1 01.04.0008.0000_n04 01.04.0008.0000_n04 yes skip good
      s03-p1-dgx-01-c06 HGX_FW_FPGA_0 HGX_FW_FPGA_0 1.20 1.20 yes skip good
      s03-p1-dgx-01-c06 HGX_FW_FPGA_1 HGX_FW_FPGA_1 1.20 1.20 yes skip good
      s03-p1-dgx-01-c06 HGX_FW_GPU_0 HGX_FW_GPU_0 97.00.82.00.13 97.00.82.00.19 no install good
      s03-p1-dgx-01-c06 HGX_FW_GPU_1 HGX_FW_GPU_1 97.00.82.00.13 97.00.82.00.19 no install good
      s03-p1-dgx-01-c06 HGX_FW_GPU_2 HGX_FW_GPU_2 97.00.82.00.13 97.00.82.00.19 no install good
      s03-p1-dgx-01-c06 HGX_FW_GPU_3 HGX_FW_GPU_3 97.00.82.00.13 97.00.82.00.19 no install good
      
    2. Ensure that the values that are going to be updated are the expected versions.

  5. Start the firmware update.

    cmsh -c 'device; firmware flash nvfw_DGX-GBX00_0023_250614.1.0_custom_prod-signed.fwpkg -n <device name>'
    
  6. Once the payload is uploaded to the node it will say good.

    [T06-HEAD-01->device]% firmware flash nvfw_DGX-GBX00_0023_250614.1.0_custom_prod-signed.fwpkg -n s03-p1-dgx-01-c{04..06}
    
    Device              flashing file                                         Result
    ------------------  ---------------------------------------------------- --------
    s03-p1-dgx-01-c04   nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg good
    s03-p1-dgx-01-c05   nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg good
    s03-p1-dgx-01-c06   nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg good
    
  7. When the command completes, check the status of the update until it has completed. This will have a percentage complete while the flashing is ongoing and a complete message when the flash has finished.

      cmsh -c 'device; firmware status -n <device name>'
    
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_BMC_0 GB200Nvl-25.01-D flashing 0.0% 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_CPU_0 02.03.19 flashing 0.0% 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_CPU_1 02.03.19 flashing 0.0% 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_0 97.00.82.00.13 flashing 0.0% 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_1 97.00.82.00.13 flashing 0.0% 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_2 97.00.82.00.13 flashing 0.0% 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_3 97.00.82.00.13 flashing 0.0%
    
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_BMC_0 GB200Nvl-25.01-D -> GB200Nvl-25.01-E pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_CPU_0 02.03.19 -> 02.03.20 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_CPU_1 02.03.19 -> 02.03.20 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_0 97.00.82.00.13 -> 97.00.82.00.19 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_1 97.00.82.00.13 -> 97.00.82.00.19 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_2 97.00.82.00.13 -> 97.00.82.00.19 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB
      s03-p1-dgx-01-c06 nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg HGX_FW_GPU_3 97.00.82.00.13 -> 97.00.82.00.19 pending N/A success: medium-specific reset or dc power cycle or ac power cy+ 114MiB
    
    At the end of the BMC update, the administrator can AC Cycle the GB200
    node(s) to complete the BMC update, then proceed with updating other
    components.
    
    **Note:** It is important to AC/AUX cycle the target host after the
    CPLD and BMC updates because the BMC has limited memory and cannot store
    another firmware package. AC cycling clears the memory and applies
    changes, allowing the HMC update to proceed successfully.
    
  8. Do the AC Cycle after each .fwpkg completed the firmware update

    Power Cycle Method - through the AUX_PWR_CYCLE - Redfish

    #or use cmsh to power off the node
    cmsh;device;use <compute node under test>;power off
    
    #or to do multiples
    cmsh;device;foreach -c dgx-gb200 (power off)
    
    #Do this next to effectively AC Power cycle (removal of auxiliary power)
    curl -k -u "${USER}:${PASS}" -H "Content-Type: application/json" -X POST \
        -d '{"ResetType":"AuxPowerCycle"}' \
        https://<rf0 ip>/redfish/v1/Chassis/BMC_0/Actions/Oem/NvidiaChassis.AuxPowerReset
    
    #use redfish to power on
    #or use cmsh to power on the node
    cmsh;device;use <compute node under test>;power on
    
    #or to do multiples
    cmsh;device;foreach -c dgx-gb200 (power on)
    
    #or
    cmsh;device;power on -c dgx-gb200 #this does all nodes in the category
    cmsh;device;power on -n <specific nodes>
    
  9. If issues arise, getting the debug output can help root cause some issues. Use the flash command with debug options enabled to get debug output

    cmsh -c 'device; firmware flash nvfw_DGX-GBX00_0023_241223.1.0_custom_prod-signed.fwpkg -n <device name> -v --debug'
    

Method 2 - Stand Alone nvfwupd Tool for Compute Tray#

If the license does not support NVIDIA Mission Control, the built in cm-nvfwupd will not work. Download the latest standalone nvfwupd tool from Enterprise Portal - v2.0.7 or later: Announcement: nvfwupd tool version 2.0.7 This tool or method is used independent of BCM.
NOTE: These instructions only cover the update of a single compute tray. The stand-alone tool supports simultaneous upgrades for multiple systems, and multiple components like the compute trays and NVLink switches together. Please refer to Chapters 17, 18 and 19 of the NVIDIA Firmware Update Guide that is included with the nvfwupdate tool.

Get the correct FW update packages for update. To see the full contents of a fwupd.pkg, use the show_pkg_content command.

./nvfwupd show_pkg_content -p
./nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg

Get current state of the hardware with show_version.

./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} \
    servertype=GB200 show_version -p ./nvfw_GB200-P4972_0012_250214.1.0_custom_prod-signed.fwpkg \
    ./nvfw_GB200-P4975_0011_250206.1.1_custom_recovery_prod-signed.fwpkg

System Model: GB200 NVL

Part number: 699-24764-0001-RC1

Serial number: 1334524170073

Packages: ['GB200-P4972_0012_250214.1.0_custom', 'GB200-P4975_0011_250206.1.1_custom_recovery']

Connection Status: Successful

Firmware Devices:

AP Name                Sys Version             Pkg Version                Up-To-Date
---------------------- ---------------------- -------------------------- ----------
CX7_0                  28.43.2108             N/A                        No
CX7_1                  28.43.2108             N/A                        No
CX7_2                  28.43.2108             N/A                        No
CX7_3                  28.43.2108             N/A                        No
FW_BMC_0               GB200Nvl-25.01-D       GB200Nvl-25.01-E           No
FW_CPLD_0              0x00 0x0b 0x03 0x04    N/A                        No
FW_CPLD_1              0x00 0x0b 0x03 0x04    N/A                        No
FW_CPLD_2              0x00 0x10 0x01 0x0f    N/A                        No
FW_CPLD_3              0x00 0x10 0x01 0x0f    N/A                        No
FW_ERoT_BMC_0          01.04.0008.0000_n04    01.04.0008.0000_n04        Yes
Full_FW_Image_NIC_Slot_4 32.43.2408           N/A                        No
Full_FW_Image_NIC_Slot_7 32.43.2408           N/A                        No
UEFI                   buildbrain-gcid-39281046 N/A                      No
HGX_FW_BMC_0           GB200Nvl-25.01-D       N/A                        No
HGX_FW_CPLD_0          0.1C                   N/A                        No
HGX_FW_CPU_0           02.03.19               N/A                        No
HGX_FW_CPU_1           02.03.19               N/A                        No
HGX_FW_ERoT_BMC_0      01.04.0008.0000_n04    01.03.0196.0001            Yes
HGX_FW_ERoT_CPU_0      01.04.0008.0000_n04    01.03.0196.0001            Yes
HGX_FW_ERoT_CPU_1      01.04.0008.0000_n04    01.03.0196.0001            Yes
HGX_FW_ERoT_FPGA_0     01.04.0008.0000_n04    01.03.0196.0001            Yes
HGX_FW_ERoT_FPGA_1     01.04.0008.0000_n04    01.03.0196.0001            Yes
HGX_FW_FPGA_0          1.20                   N/A                        No
HGX_FW_FPGA_1          1.20                   N/A                        No
HGX_FW_GPU_0           97.00.82.00.13         1.0.61.0                   No
HGX_FW_GPU_1           97.00.82.00.13         1.0.61.0                   No
HGX_FW_GPU_2           97.00.82.00.13         1.0.61.0                   No
HGX_FW_GPU_3           97.00.82.00.13         1.0.61.0                   No
HGX_InfoROM_GPU_0      G548.0201.00.06        N/A                        No
HGX_InfoROM_GPU_1      G548.0201.00.06        N/A                        No
HGX_InfoROM_GPU_2      G548.0201.00.06        N/A                        No
HGX_InfoROM_GPU_3      G548.0201.00.06        N/A                        No
HGX_PCIeSwitchConfig_0 01151024               N/A                        No
------------------------------------------------------------------------------------
Error Code: 0

Create payload .jsons for the bmc and the compute tray

Reference: UpdateBMC.json

{
    "Targets": []
}

Reference: UpdateCompute.json

{
    "Targets": ["/redfish/v1/Chassis/HGX_Chassis_0"]
}

Run the BMC update first.

./nvfwupd -t ip=<rf0 ip> user=$USER password=$PASSWORD servertype=GB200
update_fw -s BMC_Full.json -p
./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg

Power off the system, then do an AC Cycle.

./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200
activate_fw -c PWR_OFF

# wait 15 seconds

./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200
activate_fw -c RF_AUX_PWR_CYCLE

Check if the BMC update was successful.

Reference: Successful BMC update.

./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200 update_fw -s ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200 update_fw -s BMC_Full.json -p ./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg

Updating ip address: ip=XXXX

FW package:
['./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg']

Updating ip address: ip=XXXX

FW package:
['./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg']

Ok to proceed with firmware update? <Y/N>

y

FW package:
['./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg']

Ok to proceed with firmware update? <Y/N>

y

{"@odata.id": "/redfish/v1/TaskService/Tasks/3", "@odata.type":
 "#Task.v1_4_3.Task", "Id": "3", "TaskState": "Running", "TaskStatus":
 "OK"}

FW update started, Task Id: 3

Wait for Firmware Update to Start...

TaskState: Running

PercentComplete: 20

TaskStatus: OK

TaskState: Running

PercentComplete: 40

TaskStatus: OK

TaskState: Running

PercentComplete: 60

TaskStatus: OK

TaskState: Completed

PercentComplete: 100

TaskStatus: OK

Firmware update successful!

Overall Time Taken: 0:13:01

Refer to ‘NVIDIA Firmware Update Document’ on activation steps for new firmware to take effect.

Do the full compute tray flash (HGX). Ensure that the system is fully up and, in its OS, to be able to do the GPU VBIOS updates.

./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200 update_fw -s Compute_Full.json -p ./nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg

Like the BMC, power down the system and then do an AUX power cycle.

Power on the machine, let it provision/boot up, then check the firmware level again

./nvfwupd -t ip=<rf0 ip> user=${USER} password=${PASS} servertype=GB200 show_version -p ./nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg ip=10.78.194.13 user=root password=0penBmc servertype=GB200 show_version -p ./nvfw_DGX-GBX00_0024_250215.1.0_custom_prod-signed.fwpkg ./nvfw_HGX-GBX00_0023_250223.1.1_custom_prod-signed.fwpkg

System Model: GB200 NVL

Part number: 692-13809-2404-RC1

Serial number: 1330125050101

Packages: ['DGX-GBX00_0024_250215.1.0_custom',
'HGX-GBX00_0023_250223.1.1_custom']

Connection Status: Successful

Firmware Devices:

AP Name                  Sys Version             Pkg Version                Up-To-Date
------------------------ ---------------------- -------------------------- ----------
CX7_0                    28.43.2108             N/A                        No
CX7_1                    28.43.2108             N/A                        No
CX7_2                    28.43.2108             N/A                        No
CX7_3                    28.43.2108             N/A                        No
FW_BMC_0                 GB200Nvl-25.01-E       GB200Nvl-25.01-E           Yes
FW_CPLD_0                0x00 0x0b 0x03 0x04    N/A                        No
FW_CPLD_1                0x00 0x0b 0x03 0x04    N/A                        No
FW_CPLD_2                0x00 0x10 0x01 0x0f    N/A                        No
FW_CPLD_3                0x00 0x10 0x01 0x0f    N/A                        No
FW_ERoT_BMC_0            01.04.0008.0000_n04    01.04.0008.0000_n04        Yes
Full_FW_Image_NIC_Slot_4 32.43.2408             N/A                        No
Full_FW_Image_NIC_Slot_7 32.43.2408             N/A                        No
UEFI                     buildbrain-gcid-39556194 N/A                      No
HGX_FW_BMC_0             GB200Nvl-25.01-E       GB200Nvl-25.01-E           Yes
HGX_FW_CPLD_0            0.1C                   0.1C                       Yes
HGX_FW_CPU_0             02.03.20               02.03.20                   Yes
HGX_FW_CPU_1             02.03.20               02.03.20                   Yes
HGX_FW_ERoT_BMC_0        01.04.0008.0000_n04    01.04.0008.0000_n04        Yes
HGX_FW_ERoT_CPU_0        01.04.0008.0000_n04    01.04.0008.0000_n04        Yes
HGX_FW_ERoT_CPU_1        01.04.0008.0000_n04    01.04.0008.0000_n04        Yes
HGX_FW_ERoT_FPGA_0       01.04.0008.0000_n04    01.04.0008.0000_n04        Yes
HGX_FW_ERoT_FPGA_1       01.04.0008.0000_n04    01.04.0008.0000_n04        Yes
HGX_FW_FPGA_0            1.20                   1.20                       Yes
HGX_FW_FPGA_1            1.20                   1.20                       Yes
HGX_FW_GPU_0             97.00.82.00.19         97.00.82.00.19             Yes
HGX_FW_GPU_1             97.00.82.00.19         97.00.82.00.19             Yes
HGX_FW_GPU_2             97.00.82.00.19         97.00.82.00.19             Yes
HGX_FW_GPU_3             97.00.82.00.19         97.00.82.00.19             Yes
HGX_InfoROM_GPU_0        G548.0201.00.06        N/A                        No
HGX_InfoROM_GPU_1        G548.0201.00.06        N/A                        No
HGX_InfoROM_GPU_2        G548.0201.00.06        N/A                        No
HGX_InfoROM_GPU_3        G548.0201.00.06        N/A                        No
HGX_PCIeSwitchConfig_0   01151024               N/A                        No

Applying and Verifying Firmware Update Success#

First connect to the GB200 tray BMC OS, then:

  1. Power off the host.

    # Checks that the current status is on
    curl -k -u ${USER}:${PASS} https://${BMCIP}/redfish/v1/Systems/System_0 | jq '."PowerState"'
    
    # Shuts down the OS
    # Graceful shutdown:
    curl -k -u ${USER}:${PASS} \
        https://${BMCIP}/redfish/v1/Systems/System_0/Actions/ComputerSystem.Reset \
        -d '{"ResetType": "GracefulShutdown"}' -X POST
    
    # Force power off:
    curl -k -u ${USER}:${PASS} \
        https://${BMCIP}/redfish/v1/Systems/System_0/Actions/ComputerSystem.Reset \
        -d '{"ResetType": "ForceOff"}' -X POST
    
  2. AC (AUX) cycle the node.

    curl -k -u ${USER}:${PASS} \
        https://${BMCIP}/redfish/v1/Chassis/BMC_0/Actions/Oem/NvidiaChassis.AuxPowerReset \
        -d '{"ResetType":"AuxPowerCycleForce"}' -X POST
    
  3. Wait for the BMC to ping again (should take 2-3 min). Once the BMC pings, bring the host back up.

    # Checks that the current status is off (if it is 'on' no further action required)
    curl -k -u ${USER}:${PASS} https://${BMCIP}/redfish/v1/Systems/System_0 | jq '."PowerState"'
    
    # Power On
    curl -k -u ${USER}:${PASS} \
        https://${BMCIP}/redfish/v1/Systems/System_0/Actions/ComputerSystem.Reset \
        -d '{"ResetType": "On"}' -X POST
    
  4. When the BMC and host are back up, validate that the firmware install was successful.

    cmsh -c 'device; firmware status -n <device name>'