Version 18.10.2
The DGX Firmware Update container version 18.10.2 is available.
Package name:
nvfw-dgx2_18.10.2.tar.gz
Image name:
nvfw-dgx2_18.10.2
Contents of the DGX-2 System Firmware Container
This container includes the firmware binaries and update utilities for the firmware listed in the following table.
Component |
Version |
Key Changes |
BMC |
01.00.01 |
See BMC Release Notes for the list of changes. |
Changes in this Release
Added resiliency to the PSU firmware update
Added the ability to update firmware for individual PSU or NVMe units.
Special Instructions for PSU and BMC Firmware Updates
In order to update the PSU firmware, the BMC firmware must be updated first and then a configuration file added to the BMC. The configuration file is needed to support PSU firmware updates, otherwise the PSU update will fail.
These instructions are not needed before updating other firmware, such as the SBIOS, SSDs, or VBIOS.
In addition to downloading the
nvfw-dgx2_18.10.2.tar.gz
container, download theconf.bak
file from the NVIDIA Enterprise Support portal.Refer to the DGX-2 User Guide “Updating Firmware” chapter for complete instructions on using the container.
Perform the following steps before updating PSU firmware.
Using the firmware update container, update the BMC only.
sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgx2_18.10.2 update_fw BMC
As the administrator, log in to the BMC dashboard, then navigate to Maintenance->Restore Configuration.
Locate and select the
conf.bak
file downloaded in step 1 and then click Save.Now you can update other firmware. For example, to update all the downlevel firmware, issue the following.
sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgx2_18.10.2 update_fw all
Known Issues
PSU May not Get Powered On
Issue
When connecting AC input power to an individual PSU, the PSU may not get powered on. This is indicated by the green LEDs on the PSU not lighting.
Action to Take
Unplug the power supply, wait for more than 60 seconds, then reconnect AC power. If there is still a failure, proceed with RMA.
13.1.1.~BMC Update Timeout
Issue
The container update may hang and report a BMC update timeout.
Workaround
If the container does not recover, stop the container as follows:
From another terminal session, find the CONTAINER ID of the firmware container instance.
# sudo docker ps | grep nvfw-dgx2
**Example output:**
CONTAINER ID IMAGE COMMAND CREATED STATUS
2e76a51fd85b nvfw-dgx2_08.19.1 "/usr/bin/python /sr\u2026" 5 seconds ago Up 4 seconds
Using the CONTAINER ID, terminate the instance.
# sudo docker kill <container-id>
**Example**:
# sudo docker kill 2e76a51fd85b
Determine whether the updates were performed by querying the currently installed firmware using the
show_version
option.
# sudo docker run --privileged -v /:/hostfs <image-name> show_version
If the BMC is still downlevel, then force the BMC update by using the
-f
option.
# sudo docker run --rm --privileged -ti -v /:/hostfs <image-name> update_fw -f BMC
If the issue still occurs, then reboot the system and try to perform the update.
If the issue still occurs, then run
nvsm dump health
and submit the log files to NVIDIA Enterprise Support.
VBIOS Not Updated on DGX KVM Host
DGX-1 Known Issue
Issue
On a DGX-2 System that has been converted to a DGX KVM host, the VBIOS will not get updated if the GPU is being used by a guest GPU VM.
Explanation
All guest GPU VMs must be stopped before running the container to update the VBIOS. To stop the VMs, run the following from the KVM host for each guest GPU VM.
virsh shutdown <vm-domain>