DGX Station A100 Firmware Update Container Version 24.1.1#

The DGX Firmware Update Container version 24.1.1 is available.

  • Package name: nvfw-dgxstationa100_24.1.1_240110.tar.gz

  • Run file name: nvfw-dgxstationa100_24.1.1_240110.run

  • Image name: nvfw-dgxstationa100:24.1.1

  • ISO image: DGXSTATIONA100_FWUI-24.1.1-2024-01-16-12-06-01.iso

  • PXE netboot: pxeboot-DGXSTATIONA100_FWUI-24.1.1.tgz

Highlights and Changes in This Release#

Operating System Support#

This release is supported with the following DGX OS software:

  • DGX OS 5.5

  • DGX OS 6.1.0

  • EL8-22.08

  • EL9-23.08

The following issues were fixed in this release:

Fixed BMC Issues#

  • The pump (fan 6) speed control is removed from the BMC, and the pump pulse-width modulation (PWM) is locked to 40 percent.

  • The BMC update includes software security enhancements. Refer to NVIDIA Security Bulletin DGX for details.

  • The following table lists potential security vulnerabilities that have been reported by AMI or third-party vendors. They are addressed in DGX Station A100 BMC version 2.09.00.

    • Affected BMC versions: All BMC versions prior to 2.09.00

    • Updated BMC version: 2.09.00

    • Firmware container version: 24.1.1

    CVE IDs Addressed

    Vendor (per NVD)

    CVE-2022-26872
    CVE-2022-40258
    CVE-2023-28863
    CVE-2023-34472
    CVE-2023-34329
    CVE-2023-34471
    CVE-2023-34473
    CVE-2023-34337

    AMI

    CVE-2023-25191
    CVE-2023-25192
    CVE-2023-28863

    MITRE

    CVE-2021-46279
    CVE-2021-4228

    Nozomi Networks Inc.

Fixed SBIOS Issues#

  • The SBIOS update includes software security enhancements. Refer to NVIDIA Security Bulletin DGX for details.

  • The following table lists potential security vulnerabilities that have been reported by AMI or AMD. They are addressed in DGX A100 Station SBIOS version 10.20.

    • Affected SBIOS versions: All SBIOS versions prior to 10.20

    • Updated SBIOS version: 10.20

    • Firmware container version: 24.1.1

    CVE IDs Addressed

    Vendor (per NVD)

    CVE-2021-26316
    CVE-2021-39298
    CVE-2021-26402
    CVE-2022-23813
    CVE-2022-23814
    CVE-2021-26328
    CVE-2020-10713
    CVE-2020-34302
    CVE-2020-34303
    CVE-2017-5715
    CVE-2021-38578
    CVE-2021-30004
    CVE-2023-28005
    CVE-2021-33164
    CVE-2014-4860
    CVE-2014-4859
    CVE-2021-38575
    CVE-2019-14586
    CVE-2019-14559
    CVE-2019-14584
    CVE-2019-14563
    CVE-2019-14553
    CVE-2019-14587
    CVE-2021-38576

    AMI

    CVE-2020-12954
    CVE-2020-12961
    CVE-2021-26331
    CVE-2021-46771
    CVE-2021-26335
    CVE-2021-26315
    CVE-2020-12946
    CVE-2021-26353
    CVE-2021-26352
    CVE-2021-26351
    CVE-2021-26337
    CVE-2021-26338
    CVE-2020-12951
    CVE-2021-26390
    CVE-2021-26370
    CVE-2020-12944
    CVE-2021-26332
    CVE-2020-12988
    CVE-2021-26329
    CVE-2021-26330
    CVE-2021-26321
    CVE-2021-26323
    CVE-2021-26324
    CVE-2021-26325
    CVE-2021-26326
    CVE-2021-26322
    CVE-2021-26327
    CVE-2021-26312
    CVE-2021-26408

    AMD

Known Issues#

Contents of the DGX Station A100 System Firmware Update Container#

This container includes the firmware binaries and update utilities for the firmware listed in the following table.

Component

Version

Key Changes

BMC

2.09.00

New update

Refer to DGX Station A100 BMC Changes.

SBIOS

10.20

New update

Refer to DGX Station A100 SBIOS Changes.

Retimer

1.0.125

No change

A100 VBIOS

  • 80GB: 92.00.38.00.01

  • 40GB: 92.00.48.00.01

No change

A800 VBIOS

  • 80GB: 92.00.AC.00.0D

No change

M.2 Micron 7300 MTFDHBG1T9TDF SSD

95420260

No change

U.2 KIOXIA CM6 SSD

0107

New update

FPGA

2.71

No change

Storage Backplane

0.3

No change

NVFlash

5.821.0

New update

Updating the Firmware to Version 24.1.1#

This section explains how to update the firmware on the system by using the firmware update container. It includes instructions to complete a transitional update for systems that require the update.

Stop all unnecessary system activities.

Caution

While an update is in progress, do not add additional loads on the system, such as Kubernetes jobs or other user jobs or diagnostics. A high GPU workload can disrupt the firmware update process and result in an unusable component.

The commands use the .run file, but you can also use any method described in Using the DGX Station A100 FW Update Utility.

  1. Determine whether updates are needed by checking the installed versions.

    $ sudo ./nvfw-dgxstationa100_24.1.1_240110.run show_version
    
    • If there is a no in any up-to-date column for updatable firmware, proceed to the next step.

    • If all up-to-date column entries display a yes, no updates are required and no additional action is necessary.

  2. Stop the gdm3 service.

    $ sudo systemctl stop gdm3
    
  3. Complete the update for all firmware that is supported by the container.

    $ sudo ./nvfw-dgxstationa100_24.1.1_240110.run update_fw all
    

    Depending on the firmware that is updated, you might be prompted to reboot the system or power cycle the system:

    • If you are prompted to reboot, issue the following command:

      $ sudo reboot
      
    • If you are prompted to power cycle, issue the following command:

      $ sudo ipmitool chassis power cycle
      

You can verify the update by issuing the following command:

$ sudo ./nvfw-dgxstationa100_24.1.1_240110.run show_version

Here is an example output for a DGX Station A100 40GB system:

BMC DGX Station A100
======================
Image Id              Status         Location      Onboard Version   Manifest  up-to-date
N/A                   Online         Local         2.09.00           2.09.00     yes

 FPGA
========
Onboard version     Manifest  up-to-date
2.71                  2.71       yes

 Storage Backplane
==================
Bus               Onboard Version   Manifest         up-to-date
N/A                     0.3             0.3              yes

 Retimer Loc.
=============
PCIe Slot#      Onboard Version   Manifest         up-to-date
Retimer@slot4       1.0.125       1.0.125             yes
Retimer@slot5       1.0.125       1.0.125             yes
Retimer@slot6       1.0.125       1.0.125             yes
Retimer@slot7       1.0.125       1.0.125             yes

 SBIOS
=======
Image Id                           Onboard Version   Manifest        up-to-date
N/A                                L10.20            L10.20             yes

 Video BIOS
============
Bus            Model                Onboard Version   Manifest         up-to-date
0000:01:00.0   A100-SXM4-40GB       92.00.48.00.01    92.00.48.00.01      yes
0000:47:00.0   A100-SXM4-40GB       92.00.48.00.01    92.00.48.00.01      yes
0000:81:00.0   A100-SXM4-40GB       92.00.48.00.01    92.00.48.00.01      yes
0000:c2:00.0   A100-SXM4-40GB       92.00.48.00.01    92.00.48.00.01      yes

 Mass Storage
==============
Drive Name/Slot    Model Number                Onboard Version    Manifest    up-to-date
nvme0n1            Micron 7300_MTFDHBG1T9TDF    95420260          95420260     yes
nvme1n1            Kioxia KCM6DRUL7T68            0107              0107       yes