DGX-2 System Firmware Update Container Version 18.10.2

The DGX Firmware Update container version 18.10.2 is available.

  • Package name:nvfw-dgx2_18.10.2.tar.gz
  • Image name: nvfw-dgx2_18.10.2

Contents of the DGX-2 System Firmware Container

This container includes the firmware binaries and update utilities for the firmware listed in the following table.

Component Version Key Changes
BMC 01.00.01
Note: To complete the update, you must download and use the conf.bak file as explained in Special Instructions.

Fixed BMC update via dashboard erroneously perserving the configuration.

Fixed Network Link Configuration and Network IP Settings pages on the BMC dashboard to reflect changes only when saved.

Added dual FPGA image container update support.

Added PSU firmware container update support.

Enhanced SMBPBI support for GPU sensors, thermal polling and FAN control to avoid anomalous sensor reading for GPU sensors and corresponding thermal actions.

Added support for FPGA update of Image #1 to the BMC dashboard

Added VLAN support to the BMC dashboard

SBIOS 0.17

Added SBIOS support for recovering degraded PCIe link during system boot

Enhanced debug capability and support for faster resolution of customer cases via fully decoded MCA, Memory, POST and PCIe SEL events.

Developed in-memory PCIe topology in SBIOS to avoid full PCIe scan in turn eliminating unexpected Unsupported requests (PCIe Correctable errors).

Enable Error Logging options (enable or disable verbose loggin) in SBIOS setup menu.

Added support for changing boot order using standard IPMI interface.

M.2 SSD (Samsung) CXV8601Q No change
U.2 SSD (Micron) 101008R0 No change
VBIOS 88.00.6B.00.01 No change
PSU 2.5
Note: Before updating the PSU firmware, be sure to first follow the steps provided under Special Instructions.

Fixed power load balancing issue at light loads.

Fixed power factor on the PDU showing low value which affects outlet wattage.

Fixed issue in COM firmware that may cause a bootloader failure while updating from older PSU FW.

Changes in this Release

  • Added resiliency to the PSU firmware update
  • Added the ability to update firmware for individual PSU or NVMe units.

Special Instructions for PSU and BMC Firmware Updates

In order to update the PSU firmware, the BMC firmware must be updated first and then a configuration file added to the BMC. The configuration file is needed to support PSU firmware updates, otherwise the PSU update will fail.
These instructions are not needed before updating other firmware, such as the SBIOS, SSDs, or VBIOS.
  1. In addition to downloading the nvfw-dgx2_18.10.2.tar.gz container, download the conf.bak file from the NVIDIA Enterprise Support portal.
  2. Follow the instructions in Chapter 9 of the DGX-2 User Guide "Updating Firmware" up to section 9.5.

    Perform the following steps before updating PSU firmware.

  3. Using the firmware update container, update the BMC only.
    $ sudo docker run --rm [-e auto=1] --privileged -ti -v /:/hostfs nvfw-dgx2_18.10.2 update_fw BMC
  4. As the administrator, log in to the BMC dashboard, then navigate to Maintenance->Restore Configuration.

  5. Locate and select the conf.bak file downloaded in step 1 and then click Save.
  6. Now you can update other firmware. For example, to update all the downlevel firmware, issue the following.
    $ sudo docker run --rm [-e auto=1] --privileged -ti -v /:/hostfs nvfw-dgx2_18.10.2 update_fw all

Known Issues

PSU May not Get Powered On

Issue

When connecting AC input power to an individual PSU, the PSU may not get powered on. This is indicated by the green LEDs on the PSU not lighting.

Action to Take

Unplug the power supply, wait for more than 60 seconds, then reconnect AC power. If there is still a failure, proceed with RMA.

BMC Update Timeout

Issue

The container update may hang and report a BMC update timeout.

Workaround

If the container does not recover, stop the container as follows:
  1. From another terminal session, find the CONTAINER ID of the firmware container instance.
    # sudo docker ps | grep nvfw-dgx2

    Example output:

    CONTAINER ID    IMAGE                 COMMAND                           CREATED          STATUS        
    2e76a51fd85b    nvfw-dgx2_08.19.1     "/usr/bin/python /sr\u2026"       5 seconds ago     Up 4 seconds 
  2. Using the CONTAINER ID, terminate the instance.
    # sudo docker kill <container-id>

    Example:

    # sudo docker kill 2e76a51fd85b
  3. Determine whether the updates were performed by querying the currently installed firmware using the show_version option.
    # sudo docker run --privileged -v /:/hostfs <image-name> show_version
  4. If the BMC is still downlevel, then force the BMC update by using the -f option.
    # sudo docker run --rm --privileged -ti -v /:/hostfs <image-name> update_fw -f bmc
  5. If the issue still occurs, then reboot the system and try to perform the update.
  6. If the issue still occurs, then run nvsm dump health and submit the log files to NVIDIA Enterprise Support.

VBIOS Not Updated on DGX KVM Host

Issue

On a DGX-2 System that has been converted to a DGX KVM host, the VBIOS will not get updated if the GPU is being used by a guest GPU VM.

Explanation

All guest GPU VMs must be stopped before running the container to update the VBIOS. To stop the VMs, run the following from the KVM host for each guest GPU VM.

virsh shutdown <vm-domain>