Version 21.06.8

The DGX-1 Firmware Update container version 21.06.8 is available.

  • Package name:nvfw-dgx1_21.06.8_210616.tar.gz
  • Image name: nvfw-dgx1:21.06.8
  • Run file name: nvfw-dgx1_21.06.8_210616.run

Highlights and Changes in this Release

  • This release is supported with the following DGX OS software.
    • DGX OS 4 (4.8 or later)
    • DGX OS 5 (May 6, 2021 Updated or later)
    • EL7-21.04 or later
    • EL8-20.11 or later
  • Before using the container to update firmware on systems installed with DGX OS release 5.0 or later, first stop certain NVIDIA services. See Special Instructions for all Updates.

Contents of the DGX-1 Firmware Update Container

This container includes the firmware binaries and update utilities for the firmware listed in the following table.

Component Version Key Changes
BMC 3.38.30

No change from previous release

SBIOS S2W_3A12 See DGX-1 SBIOS Changes.
SSD (Samsung SM863A)

1.92 TB

GXM1103Q No change from previous release.
SSD (Samsung PM883)

1.92 TB

HXT7904Q Added to the container.
SSD (Samsung SM883)

480 GB

HXM7904Q Added to the container.
VBIOS (DGX-1 with V100, 16 GB) 88.00.18.00.01 No change from previous release.
VBIOS (DGX-1 with V100, 32 GB) 88.00.80.00.04 No change from previous release.
VBIOS (DGX-1 with P100) 86.00.41.00.05 No change from previous release.
PSU 00.03.07 No change from previous release.

Special Notes

Note: If updating the BMC from any version earlier than 3.27.30, the update can take from 30 to 50 minutes to complete.
  • When updates to the BMC or PSU are initiated,
    • The BMC is (cold) reset to be put in a known good state before the update, then
    • Additional logs are gathered for troubleshooting purposes and made available in /var/log/comp_fw_log.txt.

      The logs are gathered before updating and upon completion of the update or in the event of an update failure.

  • (On DGX systems installed with DGX OS 4.99.x or earlier): To prevent NVSM services from interfering with BMC and PSU updates, the container stops the following services before applying the update:
    • nvsm-apis-gpumonitor
    • nvsm-apis-plugin-storage
    • nvsm-apis-selwatcher
    • nvsm-apis-plugin-memory
    • nvsm-apis-plugin-environment
    • nvsm-sys-dshmnvsm-env-dshm
    • nvsm-storage-dshm
    System health monitor will not be available until firmware update completes.
  • For the PSU update, the container implements a protective check which requires the system to be fully redundant (all four supplies are installed and in a healthy state) in order for the update to occur.

    If you are using only three of the four PSUs, the full power redundancy requirement can be overridden with the Docker run environment (DGX_MAX_PSU) as follows.

    sudo docker run --privileged -ti -v /:/hostfs <container_name> set_flags DGX_MAX_PSU=3 update_fw PSU

Special Instructions for Updating the BMC Using the Web UI

Before updating the BMC using the Web UI, refer to the following instructions to ensure the updates are successful.

BMC Updates via the Web UI

When Preserving Settings
  1. Navigate to Maintenance > Firmware Update, select IPMI, Network, and SEL, then proceed with updating the BMC.
  2. After updating the BMC, issue the following from the command line.
    $ sudo ipmitool raw 0x32 0x6 1
    $ sudo ipmitool mc reset cold
Note: You cannot preserve user settings when downgrading to a previous version of the BMC. Attempting to do so will result in a failure to log in to the BMC.

When Not Preserving Settings

Navigate to Maintenance > Firmware Update, clear all preservation items, then proceed with updating the BMC

Known Issues

Unable to Log in to the BMC Web UI

Issue

You may not be able to log in to the BMC Web UI after updating to 3.38.30 (using the firmware update container or through the Web UI) or after a factory reset.

Explanation

To work around if you encounter the issue. reset the BMC from the command line by issuing the following.
$ sudo ipmitool mc reset cold

Unable to Update SBIOS via BMC Web UI with Preserve BIOS NVRAM Region

Issue

When attempting to update the SBIOS from version 3A10 to a later version using the BMC web UI and selecting Preserve BIOS NVRAM Region, the system will hang at a black screen and SBIOS settings will not be preserved.

Explanation

A security update involved removing the BIOS Shared SW Architecture (BSSA) option in the SBIOS. This changed the NVRAM mapping, so the NVRAM region cannot be preserved. To successfully update the SBIOS using the BMC web UI, do not select Preserve BIOS NVRAM Region.