Version 18.10.2

The DGX Firmware Update container version 18.10.2 is available.

  • Package name:nvfw-dgx2_18.10.2.tar.gz

  • Image name: nvfw-dgx2_18.10.2

Contents of the DGX-2 System Firmware Container

This container includes the firmware binaries and update utilities for the firmware listed in the following table.

Component

Version

Key Changes

BMC

01.00.01

See BMC Release Notes for the list of changes.

Changes in this Release

  • Added resiliency to the PSU firmware update

  • Added the ability to update firmware for individual PSU or NVMe units.

Special Instructions for PSU and BMC Firmware Updates

In order to update the PSU firmware, the BMC firmware must be updated first and then a configuration file added to the BMC. The configuration file is needed to support PSU firmware updates, otherwise the PSU update will fail.

These instructions are not needed before updating other firmware, such as the SBIOS, SSDs, or VBIOS.

  1. In addition to downloading the nvfw-dgx2_18.10.2.tar.gz container, download the conf.bak file from the NVIDIA Enterprise Support portal.

  2. Refer to the DGX-2 User Guide “Updating Firmware” chapter for complete instructions on using the container.

    Perform the following steps before updating PSU firmware.

  3. Using the firmware update container, update the BMC only.

    sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgx2_18.10.2 update_fw BMC
    
  4. As the administrator, log in to the BMC dashboard, then navigate to Maintenance->Restore Configuration.

    _images/dgx2_restore_config_bmc.jpg
  5. Locate and select the conf.bak file downloaded in step 1 and then click Save.

  6. Now you can update other firmware. For example, to update all the downlevel firmware, issue the following.

    sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgx2_18.10.2 update_fw all
    

Known Issues

PSU May not Get Powered On

Issue

When connecting AC input power to an individual PSU, the PSU may not get powered on. This is indicated by the green LEDs on the PSU not lighting.

Action to Take

Unplug the power supply, wait for more than 60 seconds, then reconnect AC power. If there is still a failure, proceed with RMA.

13.1.1.~BMC Update Timeout

Issue

The container update may hang and report a BMC update timeout.

Workaround

If the container does not recover, stop the container as follows:

  1. From another terminal session, find the CONTAINER ID of the firmware container instance.

# sudo docker ps | grep nvfw-dgx2

**Example output:**
CONTAINER ID    IMAGE                 COMMAND                           CREATED          STATUS
2e76a51fd85b    nvfw-dgx2_08.19.1     "/usr/bin/python /sr\u2026"       5 seconds ago     Up 4 seconds
  1. Using the CONTAINER ID, terminate the instance.

# sudo docker kill <container-id>

**Example**:
# sudo docker kill 2e76a51fd85b
  1. Determine whether the updates were performed by querying the currently installed firmware using the show_version option.

# sudo docker run --privileged -v /:/hostfs <image-name> show_version
  1. If the BMC is still downlevel, then force the BMC update by using the -f option.

# sudo docker run --rm --privileged -ti -v /:/hostfs <image-name> update_fw -f BMC
  1. If the issue still occurs, then reboot the system and try to perform the update.

  2. If the issue still occurs, then run nvsm dump health and submit the log files to NVIDIA Enterprise Support.

VBIOS Not Updated on DGX KVM Host

DGX-1 Known Issue

Issue

On a DGX-2 System that has been converted to a DGX KVM host, the VBIOS will not get updated if the GPU is being used by a guest GPU VM.

Explanation

All guest GPU VMs must be stopped before running the container to update the VBIOS. To stop the VMs, run the following from the KVM host for each guest GPU VM.

virsh shutdown <vm-domain>