Version 23.06.3
The DGX-1 Firmware Update container version 23.06.3 is available.
Package name:
nvfw-dgx1_23.6.3_230608.tar.gz
Image name:
nvfw-dgx1_23.6.3_230608
Run file name:
nvfw-dgx1_23.6.3_230608.run
Highlights and Changes in this Release
This release is supported with the following DGX OS software.
DGX OS 4.14 or later
DGX OS 5.4 or later
RHEL7-22.08 or later
RHEL8-22.08 or later
Important
Updating the SBIOS with the 23.06.3 firmware update container resets all user BIOS settings to factory defaults.
Before using the container to update firmware on systems installed with DGX OS release 5.0 or later, first stop certain NVIDIA services. Refer to Special Instructions for Updating the BMC Using the Web UI.
The BMC and SBIOS update include software security enhancements. Refer to the NVIDIA Security Bulletin for details.
Contents of the DGX-1 Firmware Update Container
This container includes the firmware binaries and update utilities for the firmware listed in the following table.
Component |
Version |
Key Changes |
---|---|---|
BMC |
3.39.30 |
No change from previous release |
SBIOS |
S2W_3A13 |
No change from previous release |
SSD (Samsung SM863A) 1.92 TB |
GXM1103Q |
No change from previous release |
SSD (Samsung PM883) 1.92 TB |
HXT7904Q |
No change from previous release |
SSD (Samsung SM883) 480 GB |
HXM7904Q |
No change from previous release |
VBIOS (DGX-1 with V100, 16 GB) |
88.00.18.00.01 |
No change from previous release |
VBIOS (DGX-1 with V100, 32 GB) |
88.00.80.00.04 |
No change from previous release |
VBIOS (DGX-1 with P100) |
86.00.41.00.05 |
No change from previous release |
PSU |
00.03.07 |
No change from previous release |
Special Notes
Important
Updating the SBIOS with the 23.06.3 firmware update container resets all user BIOS settings to factory defaults.
Note
If updating the BMC from any version earlier than 3.27.30, the update can take from 30 to 50 minutes to complete.
When updates to the BMC or PSU are initiated,
The BMC is (cold) reset to be put in a known good state before the update, then
Additional logs are gathered for troubleshooting purposes and made available in
/var/log/comp_fw_log.txt
.The logs are gathered before updating and upon completion of the update or in the event of an update failure.
(On DGX systems installed with DGX OS 4.99.x or earlier): To prevent NVSM services from interfering with BMC and PSU updates, the container stops the following services before applying the update:
nvsm-apis-gpumonitor
nvsm-apis-plugin-storage
nvsm-apis-selwatcher
nvsm-apis-plugin-memory
nvsm-apis-plugin-environment
nvsm-sys-dshmnvsm-env-dshm
nvsm-storage-dshm
System health monitor will not be available until firmware update completes.
For the PSU update, the container implements a protective check which requires the system to be fully redundant (all four supplies are installed and in a healthy state) in order for the update to occur.
If you are using only three of the four PSUs, the full power redundancy requirement can be overridden with the Docker run environment (
DGX_MAX_PSU
) as follows.docker run -e DGX_MAX_PSU=3 --privileged -ti -v /:/hostfs <container_name> update_fw
Note
For container versions 21.06.8 and earlier, running in dockerless mode requires a Python 2 installation on the system. For versions 23.4.x and later, dockerless mode requires Python 3.
Special Instructions for Updating the BMC Using the Web UI
Before updating the BMC using the Web UI, refer to the following instructions to ensure the updates are successful.
BMC Updates via the Web UI
When Preserving Settings
Navigate to Maintenance > Firmware Update, select IPMI, Network, and SEL, then proceed with updating the BMC.
After updating the BMC, issue the following from the command line.
$ sudo ipmitool raw 0x32 0x6 1 $ sudo ipmitool mc reset cold
Note
You cannot preserve user settings when downgrading to a previous version of the BMC. Attempting to do so will result in a failure to log in to the BMC.
When Not Preserving Settings
Navigate to Maintenance > Firmware Update, clear all preservation items, then proceed with updating the BMC
Fixed Issues
The v23.06.3 update fixes the following issue:
The DGX-1 SBIOS install could fail with the following error message:
Rom image layout detected Rom Hole is redesigned.
Known Issues
Unable to update VBIOS
Issue
System is unable to update VBIOS where the DGX framework update service nvidia-dcgm.service will not stop. This is observed when using the latest Base OS with an older FWUC for VBIOS.
Workaround
First, stop the nvidia-dcgm.service then run VBIOS update via FWUC 21.06.8