DGX A100 System Firmware Changes

DGX A100 BMC Changes

Changes in 00.16.09

  • Fixed incorrect temperatures reported for sensors on the NVIDIA Networking ConnectX-6 single-port and dual-port VPI cards.
  • Fixed a bug to ensure that the BMC will boot to the latest version updated on the system.

  • Fixed SEL log not showing the correct BMC or SBIOS version after an update.

  • Added ability to set the BMC to local time instead of default UTC.

  • Added ability to sync local time to NTP servers. (enable NTP time sync).

  • Removed unnecessary SEL log messages pointing to high CPU power consumption.

  • Fixed "/" character not allowed in BMC web UI LDAP Role Group settings.

  • Added authentication capabilities to the RESTful API.

  • Added new capabilities to identify firmware updates in the System Event Log (SEL) via "NVIDIA-firmware" event.

    Adds SEL information for BMC (end), BIOS, CPLD, and PSU.

Changes in 00.14.17

  • Added support for second source SPI ROM.

Changes in 00.14.16

  • Fixed an issue where a cold boot might put the BMC in a non-bootable state.
  • Fixed BMC update failing with "Error flashing Inactive image 2: rc = 0x-9" ,
  • Fixed occasionally neededing to log into the BMC WebUI twice.
  • Fixed the BMC dashboard system event filter not working.
  • Added ability to monitor Mellanox card transceiver temperatures and increase fan speeds.
  • Fixed inability to update the BMC after unexpected interruption.
  • Fixed missing memory, NIC and storage drive information.

Changes in 00.13.16

Changes in 00.13.04

  • Resolved increased fan speed that occurred when optional components are not installed, even when system is idle.

DGX A100 SBIOS Changes

Changes in 1.09

  • Fixed an issue where changes in the boot order are not preserved after updating the SBIOS.
  • Fixed inability to enter the SBIOS Admin/User password from the Serial Over LAN (SOL) console.

  • Fixed PXE boot configuration not persisting; helpful for multiple DGX A100 nodes.

  • Added Memory correctable ECC Error leaky bucket; prevents unnecessary replacement of working system DIMMs.

  • Fixed SBIOS Setup > Main page showing incorrect Admin/User Access level.

Changes in 0.34

  • Removed warning message that occurred when the system contained DIMMs from different vendors.

Changes in 0.33

  • Fixed mishandling of correctable PCIe errors.

Changes in 0.30

  • Added support for HTTP boot.
  • Updated DSP/USP preset values to address PCIe advanced error reporting (AER) issues.
  • Changed the following default settings.
    • Determinism Control > [Manual]
    • Determinism Slider > [Power]
    • cTDP Control > [Manual]
    • cTDP > [240]
    • Package Power Limit Control > [Manual]
    • Package Power Limit > [240]
    • DF Cstates > [Disabled]

DGX A100 U.2 NVMe Changes

Changes in EPK9CB5Q

  • Fixed drive going into read-only mode if there is sudden power cycle while performig live firmware update.

  • Improved write performance while performing drive wear-leveling; shortens wear-leveling process time.

  • Fixed drive going into failed mode when a high number of uncorrectable ECC errors occured.

DGX A100 Broadcom 88096 PCIe Switchboard Changes

Changes in 0.2.0

  • Fixed the incorrect setting of the switch's Upstream Port Number as Port 0.

Changes in 1.8

  • Implemented tuning to address PCIe advanced error reporting (AER) issues.

Changes in 1.3

  • Disabled hot-plug and hot-plug surprise capability.

DGX A100 Broadcom 880xx Retimer Changes

Release notes for the DGX A100 Broadcom 88080 and 88064 retimers.

Changes in 1.2f

  • Fixed an issue that caused NVQual to hang while loading the MODS driver.

Changes in 0.F.0

  • Improved error handling of downstream switches.

    This change modifies the PCIe topology and mapping. Refer to the DGX A100 User Guide for PCIe mapping details.

Changes in 0.13.0

  • Fixed DPC Notification behavior for Firmware First Platform.

A100 VBIOS Changes

Changes in

  • Added security protection to the I2C interface.

Changes in

  • Fixed an issue allocating the BAR1 size across resets.
  • Fixed MIG capability not being reported correctly if the driver is not loaded; for example, if accessed out-of-band.

Changes in

  • Expanded support for potential alternate HBM sources.

Changes in

  • Fixed Xid 64 (Row Remapper Error)

DGX A100 BMC CEC Changes

Changes in 3.28

  • Fixed the update progress output reporting "Update_timeout" for the motherboard CEC (MB_CEC) when using the .run file without Docker installed.
  • Fixed the the user's configuration getting lost if the BMC updated failed.

DGX A100 CEC1712 SPI Changes

Changes in 3.9

  • Fixed an issue that prevented a successful firmware update.

DGX A100 NVSwitch Firmware Changes

Changes in

  • Hardened the firmware for passthrough virtualization installations.

DGX A100 FPGA Release Notes

Changes in 2.A5

  • Fixed timing glitches that resulted in unexpected resets.
  • Improved timeout counter code.

DGX A100 Delta PSU Release Notes

Changes in 1.6/1.6/1.7

  • Fixed 0W reporting issue.