DGX-2 System Firmware Update Container Version 19.03.1

The DGX Firmware Update container version 19.03.1 is available.

  • Package name:nvfw-dgx2_19.03.1.tar.gz
  • Image name: nvfw-dgx2:19.03.1
  • Run file name: nvfw-dgx2_19.03.1.run

Obtain the files from the NVIDIA Enterprise Support announcement DGX-2 System Firmware Update Container Version 19.03.1 (requires login).

Contents of the DGX-2 System Firmware Container

This container includes the firmware binaries and update utilities for the firmware listed in the following table.

Component Version Key Changes
BMC 01.04.03
Note: If updating from versions earlier than 1.00.01, you must download and use the conf.bak file as explained in Special Instructions to complete the update.

Added support for DGX-2H.

Added support for MaxQ/MaxP power settings.

Added option to not preserve the sensor data record when updating to a later version. This fixes an erroneous battery sensor error after previous updates.

SBIOS 0.22 See the section Changes in this Release
M.2 NVMe (Samsung PM963) CXV8601Q No change
M.2 NVMe (Samsung PM983) EDA7202Q New (supports second-source component)
U.2 SSD (Micron) 101008R0 No change
U.2 SSD (Samsung) EDA5202Q New (supports second-source component)
VBIOS (DGX-2) 88.00.6B.00.01 No change
VBIOS (DGX-2H) 88.00.6B.00.08 New (supports DGX-2H VBIOS)
PSU 2.7
Note: If also updating the BMC from a version earlier than 1.00.01, then before updating the PSU firmware, be sure to first follow the steps provided under Special Instructions.
Fixed power-factor and load-balancing issues.
FPGA 3.1 New (added FPGA to container) Note: There are two FPGA images - Image-1:Rescue and Image-2:Primary. The Firmware Update Container updates the Primary FPGA image only.

Changes in this Release

  • See the contents table for the list of changes in individual components.
  • Added integration with NVSM (requires DGX OS Server 4.0.5 or later).

    This allows firmware to be updated using a .run file that simplifies the steps needed. See the DGX-2 User Guide for instructions on obtaining and using the .run file.

  • BMC Fixes
    • Fixed BMC Update Timeout issue.
    • Fixed BMC configuration backup/restore function not working properly.
    • Fixed system not shutting down when all fans in Fan Zone 2 or 3 are not detected.
    • Fixed system fans all running at 80% after hot-unplugging/hot-plugging a PSU.
    • Fixed system fans running at 80% after hot-plugging an NVMe drive.
    • Fixed system shutting down after hot-unplugging one of the fans.
    • Fixed system unable to boot after updating BMC image while one BMC module is removed.
    • Fixed incorrect SEL timestamp after executing ipmt mc reset cold.
    • Fixed missing firmware information in the BMC dashboard. Information is available on the Maintenance->Firmware Information page.
    • Fixed missing DIMM information in the BMC dashboard.
    • Fixed blinking amber-colored power LED.
    • Fixed BMC update freeze while updating using Yafuflash.
    • Fixed issues responding to 3.3V/5V/12V sensors.
    • Fixed incorrect responses to GPU temperature assertion - Fan Zone 1 goes to 80% and DIMM temperature reports 'device disabled'.
    • The BMC now saves CPU MCA registers wihen it detects a fatal MCA error.
  • SBIOS Fixes
    • Fixed system failing to switch to backup SBIOS when initial boot fails.
    • Fixed enp6s0 network disappearing after enabling M.2 module hot plug in the SBIOS settings.
    • Fixed system unable to boot after replacing a DIMM.
    • Updated the boot recovery process when BMC remains unresponsive during boot. If BMC reset fails, then boot to SBIOS setup menu.
    • Fixed the default PCIe Corrected Error Threshold Counter setting to be enabled.

Updating Components with Secondary Images

Some firmware components provide a secondary image as backup. The following is the policy when updating those components:
  • SBIOS: Only the primary image is updated.
  • BMC: Both primary and secondary (backup) images are updated.
  • FPGA: Only the primary image is updated.

Special Instructions for PSU and BMC Firmware Updates

In order to update the PSU firmware, the BMC firmware must be updated first and then a configuration file added to the BMC. The configuration file is needed to support PSU firmware updates, otherwise the PSU update will fail.
These instructions are not needed before updating other firmware, such as the SBIOS, SSDs, or VBIOS.
  1. In addition to downloading the nvfw-dgx2_19.03.1.tar.gz container, download the conf.bak file from the NVIDIA Enterprise Support announcement DGX-2 System Firmware Update Container Version 19.03.1 (requires login).
  2. Refer to the DGX-2 User Guide "Updating Firmware" chapter for complete instructions on using the container.

    Perform the following steps before updating PSU firmware.

  3. Using the firmware update container, update the BMC only.
    $ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgx2:19.03.1 update_fw BMC
  4. As the administrator, log in to the BMC dashboard, then navigate to Maintenance->Restore Configuration.

  5. Locate and select the conf.bak file downloaded in step 1 and then click Save.
  6. Now you can update other firmware. For example, to update all the downlevel firmware, issue the following.
    $ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgx2:19.03.1 update_fw all

Known Issues

VBIOS Not Updated During Combination Update

Issue

The VBIOS does not get updated when updating the VBIOS in conjunction with another component, for example by using the following options:

update_fw -f all

or

update_fw VBIOS [other]

Workaround

Update the VBIOS by itself.

$ sudo nvidia-docker run --privileged -ti -v /:/hostfs <container-name> update_fw VBIOS

PSU May not Get Powered On

Issue

When connecting AC input power to an individual PSU, the PSU may not get powered on. This is indicated by the green LEDs on the PSU not lighting.

Action to Take

Unplug the power supply, wait for more than 60 seconds, then reconnect AC power. If there is still a failure, proceed with RMA.

VBIOS Not Updated on DGX KVM Host

Issue

On a DGX-2 System that has been converted to a DGX KVM host, the VBIOS will not get updated if the GPU is being used by a guest GPU VM.

Explanation

All guest GPU VMs must be stopped before running the container to update the VBIOS. To stop the VMs, run the following from the KVM host for each guest GPU VM.

virsh shutdown <vm-domain>

Backup SBIOS Version at 0.0

Issue

The BMC dashboard incorrectly reports the backup SBIOS version to be 0.0.

Explanation

Due to a limitation in the BMC software, the software does not know the version of the backup SBIOS since it has not been run.

Note: Updating the SBIOS using the firmware update container does not resolve the issue as the container updates only the primary SBIOS and not the backup.