DGX A100 System Firmware Update Container Version 20.05.12

The DGX Firmware Update container version 20.05.12 is available.

  • Package name: nvfw-dgxa100_20.05.12_200603.tar.gz

  • Run file name: nvfw-dgxa100_20.05.12_200603.run

  • Image name: nvfw-dgxa100:20.05.12

Highlights and Changes in this Release

  • This release is supported with the following DGX OS software -

    • DGX OS 4.99.8 or later

  • Enabled BMC Secure Flash

  • Enabled PCI-Compliant DPC and AER error propagation

  • Implemented critical VBIOS fixes

Contents of the DGX A100 System Firmware Container

This container includes the firmware binaries and update utilities for the firmware listed in the following table.

Component

Version

Key Changes

Update Time

BMC (via CEC)

00.12.05

Added to container.

  • BMC now recognizes the level of CEC installed, and enforces Secure Flash if the CEC supports it.

  • Removed the ability to update the BMC via the UI.

  • Added micro-controller assist (MCA) SEL, downloadable from the UI.

  • Added Logs & Reports > Debug Log > Download Debug log control to BMC UI.

31 minutes

SBIOS

0.23

Added to container

  • Removed Hidden Options and made TPM Configuration options visible

  • Fixed NVSM Show Health Errors related to DIMMs and DIMM population

  • Fixed system getting stuck at POST after enabling and then disabling drive encryption

7 minutes

Broadcom 88096 PCIe switch board

1.3

Added to container

  • Disabled hot-plug and hot-plug surprise capability

8 minutes

BMC CEC SPI

v3.05

Added to container

8 minutes

PEX88064 Retimer

0.13.0

Updated

7 minutes

PEX88080 Retimer

0.13.0

Updated

7 minutes

NvSwitch BIOS

92.10.12.00.01

No change

8 minutes

VBIOS

92.00.19.00.01

Updated

  • Fixed Xid 64 (Row Remapper Error)

7 minutes

Updating Components with Secondary Images

Some firmware components provide a secondary image as backup. The following is the policy when updating those components:

  • SBIOS: The two images are referred to as active and inactive, where the active is the currently running image and the inactive is the backup image. The update container can only update the inactive image. After reboot, the updated image becomes the active image. You can perform the update again to update the current inactive image so that both images are updated.

  • BMC: The two images are referred to as active and inactive, where the active is the currently running image and the inactive is the backup image. The update container can only update the inactive image. After the update is completed, the updated image becomes the active image. You can perform the update again to update the current inactive image so that both images are updated.

Instructions for Updating Firmware

This section provides a simple way to update the firmware on the system using the firmware update container. It includes instructions for performing a transitional update for systems that require it. The commands use the .run file, but you can also use the container image directly.

  1. Perform a transitional update if needed.

    Depending on the BMC and MB_CEC versions on the system, you may need to perform a transitional update before updating the BMC and SBIOS to the latest versions.

    1. Check if the transitional update is needed.

      $ sudo nvfw-dgxa100_20.05.12_2006003.run run_script --command "fw_transition.py show_version"
      

      The following message appears if a transition update is needed.

      BMC/MB_CEC firmware needs update to Active/Inactive, secure boot mode
      This is a one-time update required for DGXA100. All future updates require BMC in this mode
      
      • If the one-time update is required, continue with the next step to perform the transitional update.

      • If the one-time update is not required, then skip to step 2.

    2. Perform the transitional update.

      $ sudo nvfw-dgxa100_20.05.12_2006003.run run_script --command "fw_transition.py update_fw"
      $ sudo reboot
      
    3. Verify that BMC (both images) and the MB_CEC are up to date.

      $ sudo nvfw-dgxa100_20.05.12_2006003.run run_script --command "fw_transition.py show_version"
      
  2. Check if other updates are needed.

    $ sudo nvfw-dgxa100_20.05.12_2006003.run show_version
    
    • If there is “no” in any up-to-date column for updatable firmware, then continue with the next step.

    • If all up-to-date column entries are “yes”, then no updates are needed and no further action is necessary.

  3. Perform the final update for all firmware supported by the container and reboot the system.

    $ sudo nvfw-dgxa100_20.05.12_2006003.run update_fw all
    
    $ sudo reboot
    

    Note

    The update_fw all command updates the inactive BMC and SBIOS images only. After rebooting the system, the updated images become “active”. You can then update the inactive images using nvfw-dgxa100_20.05.12_2006003.run update_fw [BMC] [SBIOS] --inactive as needed.

You can verify the update by issuing the following.

$ sudo nvfw-dgxa100_20.05.12_2006003.run show_version

Expected output.

 BMC DGX
=========
Image Id             Status  Location  Onboard Version  Manifest   up_to_date
0:Active   Boot      Online   Local     00.12.05        00.12.05         yes
1:Inactive Updatable          Local     00.12.05        00.12.05         yes

  CEC
============
                                     Onboard Version   Manifest    up-to-date
MB_CEC(enabled)                       3.05              3.05             yes

 SBIOS
=======
Image Id                   Method    Onboard Version   Manifest    up_to_date
0:Inactive Updatabl        afulnx     0.24              0.24             yes
1:Active   Boot                       0.24              0.24             yes

 Video BIOS
============
Bus            Model                 Onboard Version    Manifest    up-to-date
0000:07:00.0   A100-SXM4-40GB        92.00.19.00.01     92.00.19.00.01    yes
0000:0f:00.0   A100-SXM4-40GB        92.00.19.00.01     92.00.19.00.01    yes
0000:47:00.0   A100-SXM4-40GB        92.00.19.00.01     92.00.19.00.01    yes
0000:4e:00.0   A100-SXM4-40GB        92.00.19.00.01     92.00.19.00.01    yes
0000:87:00.0   A100-SXM4-40GB        92.00.19.00.01     92.00.19.00.01    yes
0000:90:00.0   A100-SXM4-40GB        92.00.19.00.01     92.00.19.00.01    yes
0000:b7:00.0   A100-SXM4-40GB        92.00.19.00.01     92.00.19.00.01    yes
0000:bd:00.0   A100-SXM4-40GB        92.00.19.00.01     92.00.19.00.01    yes

  Switches
============
PCI Bus#                  Model       Onboard Version   Manifest     up-to-date
DGX - 0000:91:00.0(U261)  88064_Retimer  0.13.0          0.13.0            yes
DGX - 0000:88:00.0(U260)  88064_Retimer  0.13.0          0.13.0            yes
DGX - 0000:4f:00.0(U262)  88064_Retimer  0.13.0          0.13.0            yes

DGX - 0000:48:00.0(U225)  88080_Retimer  0.13.0          0.13.0            yes

DGX - 0000:c4:00.0        LR10        92.10.12.00.01    92.10.12.00.01     yes
DGX - 0000:c5:00.0        LR10        92.10.12.00.01    92.10.12.00.01     yes
DGX - 0000:c2:00.0        LR10        92.10.12.00.01    92.10.12.00.01     yes
DGX - 0000:c6:00.0        LR10        92.10.12.00.01    92.10.12.00.01     yes
DGX - 0000:c3:00.0        LR10        92.10.12.00.01    92.10.12.00.01     yes
DGX - 0000:c7:00.0        LR10        92.10.12.00.01    92.10.12.00.01     yes

DGX - 0000:01:00.0(U1)    PEX88096        1.3               1.3            yes
DGX - 0000:81:00.0(U3)    PEX88096        1.3               1.3            yes
DGX - 0000:41:00.0(U2)    PEX88096        1.3               1.3            yes
DGX - 0000:b1:00.0(U4)    PEX88096        1.3               1.3            yes