DGX Station A100 Firmware Update Container Release Notes

This document describes the key features, improvements, and known issues for the NVIDIA® DGX™ Station A100 System Firmware Update Container.

1. DGX Station A100 Firmware Overview

The NVIDIA DGX™ Station A100 System Firmware Update container is the preferred method for updating firmware on the DGX Station A100. It allow you to update the firmware to the latest released versions and uses the standard method to run Docker containers.

This document describes firmware components that can be updated, the known issues, and how to run this container.

For information about how to download and install the latest DGX Station A100 ISO, see Installing the DGX OS (Reimagining the System).

2. Using the DGX Station A100 FW Update Utility

The DGX Station A100 System Firmware Update utility is provided in a tarball and also as a .run file.

Forcefully Updating the VBIOS

Important: Run the following command only in a TTY console or an SSH console. Do not run it in the GUI-based terminal screen because the command will shutdown the x window
  1. Run the following command:
    $ sudo systemctl stop gdm3
Here are some examples:
  • NVSM
    $ nvsm(/system/localhost/firmware/install)
    • Set Flags to update_fw\ VBIOS\ -f
    • When setting the flags, an escape is needed before blank spaces.
  • Docker
    $ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxstationa100:22.2.1 update_fw VBIOS -f
  • A .run file
    $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run update_fw VBIOS -f

2.1. Using NVSM

Here is some information about how to use NVSM in the firmware.

The DGX Station A100 system software includes Docker software that is required to run the container.
  1. Copy the tarball to a location on the DGX system.
  2. From the directory where you copied the tarball, enter the following command to load the container image.
    $ sudo docker load -i nvfw-dgxstationa100_22.2.1_220209.tar.gz
  3. To verify that the container image is loaded, enter the following command.
    $ sudo docker images 
    
    REPOSITORY            TAG 
    nvfw-dgxstationa100   22.2.1
  4. Using NVSM interactive mode, enter the firmware update module.
    $ sudo nvsm
    nvsm-> cd systems/localhost/firmware/install
  5. Set the flags that correspnd to the action you want to take.
    $ nvsm(/system/localhost/firmware/install)-> set Flags=<option>

    See Command and Argument Summary for the list of common flags.

  6. Set the container image to run.
    $ nvsm(/system/localhost/firmware/install)-> set DockerImageRef=nvfw-dgxstationa100:22.2.1
  7. Run the command.
    $ nvsm(/system/localhost/firmware/install)-> start
    

2.2. Using docker run

The DGX Station A100 system software includes Docker software required to run the container.
  1. Copy the tarball to a location on the DGX system.
  2. From the directory where you copied the tarball, enter the following command to load the container image.
    $ sudo docker load -i nvfw-dgxstationa100_22.2.1_220209.tar.gz
  3. To verify that the container image is loaded, enter the following.
    $ sudo docker images 
    
    REPOSITORY              TAG 
    nvfw-dgxstationa100    22.2.1
  4. Use the following syntax to run the container image.
    $ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxstationa100:22.2.1 <command> <[arg1] [arg2] ... [argn]
See Commands/Argument Summary for the list of common commands and argument.
Note: If you do not have the tarball file, but you do have the .run file, you can extract the tarball from the .run file by issuing the following:
$ sudo nvfw-dgxstationa100_22.2.1_220209.run -x

2.3. Using the .run File

The update container is also available as a .run file. The .run file uses the Docker software, if the software is installed on the system, but the file can also be run without Docker.
  1. After obtaining the .run file, make the file executible.
    $ chmod +x ./nvfw-dgxstationa100_22.2.1_220209.run
  2. Use the following syntax to run the container image.
    $ sudo .nvfw-dgxstationa100_22.2.1_220209.run <command> <[arg1] [arg2] ... [argn]
See Command and Argument Summary for the list of common commands and arguments.

2.4. Command and Argument Summary

Here are the most common commands and arguments:
  • To show the manifest:
    show_fw_manifest
    • An NVSM example:
      $ nvsm(/system/localhost/firmware/install)-> set Flags=show_fw_manifest
    • A Docker run example:
      $ sudo docker run --rm --network host --privileged -ti -v /:/hostfs nvfw-dgxstationa100:22.2.1 show_fw_manifest
    • A .run file example:
      $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run show_fw_manifest
  • To show the version information:
    show_version
    • An NVSM example:
      $ nvsm(/system/localhost/firmware/install)-> set Flags=show_version
    • A Docker run example:
      $ sudo docker run --rm --network host --privileged -ti -v /:/hostfs nvfw-dgxstationa100:22.2.1 show_version
    • A .run file example:
      $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run show_version
  • To check the onboard firmware against the manifest and update all down-level firmware:
    update_fw all
    • An NVSM example:
      $ nvsm(/system/localhost/firmware/install)-> set Flags=update_fw\ all

      For NVSM, when setting the flags, an escape is needed before blank spaces.

    • A Docker run example:
      $ sudo docker run --rm --network host --privileged -ti -v /:/hostfs nvfw-dgxstationa100:22.2.1 update_fw all
    • A .run file example:
      $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run update_fw all
  • To check the specified onboard firmware against the manifest and update if down-level:
    update_fw [fw] 
    Where [fw] corresponds to the specific firmware as listed in the manifest. Multiple components can be listed in the same command. The following are examples of updating the BMC and SBIOS.
    • An NVSM example:
      $ nvsm(/system/localhost/firmware/install)-> set Flags=update_fw\ BMC\ SBIOS

      For NVSM, when setting the flags, an escape is needed before blank spaces.

    • A Docker run example:
      $ sudo docker run --rm --network host --privileged -ti -v /:/hostfs nvfw-dgxstationa100:22.2.1 update_fw BMC SBIOS
    • A .run file example:
      $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run update_fw BMC SBIOS
  • To update the VBIOS:
    Important: Ensure that you run the following command in a tty console or ssh console. Do not run it in the GUI based terminal screen because the command will shutdown x window.
    Run the following command.
    $ sudo systemctl stop gdm3
    Here are some additonal examples:
    • An NVSM example:

      $ nvsm(/system/localhost/firmware/install) -> set Flags=update_fw\ VBIOS
      Remember: When you set the flags, add an escape before the blank spaces.
    • A Docker run example:
      $ sudo docker run --rm --network host --privileged -ti -v /:/hostfs nvfw-dgxstationa100:22.2.1 update_fw VBIOS
    • A .run file example:
      $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run update_fw
  • To forcefully update the VBIOS:
    Important: Run the following command only in the tty console or the ssh console. If you run the command in the GUI-based terminal screen, the x window will shut down.
    $ sudo systemctl stop gdm3
    Here are some additonal examples:
    • An NVSM example:

      $ nvsm(/system/localhost/firmware/install) -> set Flags=update_fw\ VBIOS\ -f
      Note: For NVSM, when setting the flags, an escape is needed before blank spaces.
    • A Docker run example:
      $ sudo docker run --rm --network host --privileged -ti -v /:/hostfs nvfw-dgxstationa100:22.2.1 update_fw VBIOS -f
    • A .run file example:
      $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run update_fw VBIOS -f

3. Using the DGX Station A100 Firmware Update ISO

This section describes how to use the DGX Station A100 firmware update ISO to efficiently update the firmware in a large fleet of DGX Station A100 systems.

3.1. About the Firmware Update Menu

Once the system boots up to the firmware update ISO, it sets up the enviroment and launches a firmware update menu. The menu can be used in the following three different modes:

  • Interactive

    This displays a text-based UI with the following choices of actions to take:

    • Start the firmware update container

      This runs the firmware update container using the update_fw all option.

    • Start the firmware update container with custom options
      This runs the firmware update container using custom arguments that you enter into a text box. Separate multiple arguments by a space. Example
      update_fw BMC -f

      See Command and Argument Summary for available arguments.

    • Set up connection for automation and Exit

      This sets up an SSH connection (default user name is fwui and default password is fw_update) so you can run automation scripts from a different system. For example, this lets you use Ansible automation.

    • Exit
  • Non-interactive

    This reads reads the argument from kernel parameter (/proc/cmdline) and then runs the firmware update container automatically.

  • Automation

    This sets up an SSH connection. The default user name is fwui and default password is fw_update. From there you can use automation scripts (for example, Ansible) to perform the firmware update.

3.2. Booting to the Firmware Update ISO from a USB Flash Drive

This section describes how to boot to the DGX Station A100 firmware update ISO from a USB flash drive.

Basic Process

Download the ISO image and create a bootable USB drive that contains the ISO image.

Important: Do not use the virtual media from the BMC. If you use virtual media, the BMC will be reset during the update.

Updating the Firmware Automatically

To set up the firmware to update automatically when the system boots up:
  1. Edit the GRUB menu parameters in the ISO at BOOT/GRUB/GRUB.CGF as follows.

    Set fwuc-mode=noninteractive.

    Set the following parameters as needed.

    • fwuc-update_args=<arg1>,<arg2> ...
    • fwuc-extra_args=<extra-arg1> ...

    See Command and Argument Summary for available arguments.

    The following example boots the firmware update ISO in non-interactive mode, updates the SBIOS without first checking the installed version, and reboots the system after the update.

    menuentry "Start Firmware Update Environment (Non-interactive)" {
        linux /vmlinuz boot=live console=tty0 apparmor=0 elevator=noop nvme-core.multipath=n nouveau.modeset=0 boot-live-env start-systemd-networkd fwuc-mode=noninteractive fwuc-update_args=update_fw,SBIOS,-f fwuc-extra_args=reboot-after-update
        initrd /initrd
    }
  2. Create a bootable USB drive that contains the updated ISO.
  3. Boot to the USB drive.
  4. If the FPGA firmware was updated, complete a DC power cycle by issuing the following command.
    $ sudo ipmitool -I lanplus -H ${BMC_IP} -U ${BMC_USER} -P ${BMC_PW} chassis power cycle

3.3. Booting to the Firmware Update ISO by PXE Boot

This section describes how to PXE boot to the DGX Station A100 firmware update ISO.

  1. See Setting Up DGX OS 5 for PXE Boot for more information about setting up the DGX Station A100 to PXE boot.
  2. Download the ISO image and mount it.
    $ sudo mount -o loop ~/DGXSTATIONA100_FWUI-22.2.1-2022-01-24-12-38-20.iso /mnt
    
  3. Copy the filesystem.squashfs, initrd and vmlinuz files to the http directory.
    $ sudo mkdir -p /local/http/firmware-update/
    $ sudo cp /mnt/live/filesystem.squashfs /local/http/firmware-update/
    $ sudo cp /mnt/{initrd,vmlinuz} /local/http/firmware-update/
    $ umount /mnt
    

    The new /local/http folder structure should look like this:

    /local/http/
    ├── dgxbaseos-5.x.y
    │   ├── base_os_5.x.y.iso
    │   ├── initrd
    │   └── vmlinuz
    └── firmware-update
        ├── filesystem.squashfs
        ├── initrd
        └── vmlinuz
    
  4. Edit the /local/syslinux/efi64/pxelinux.cfg/default file to add the following menu option content for the Firmware Update OS.
    label Firmware Update Container
        menu label Firmware Update Container
        kernel http://${SERVER_IP}/firmware-update/vmlinuz
        initrd http://${SERVER_IP}/firmware-update/initrd
        append vga=788 initrd=initrd boot=live console=tty0 console=ttyS1,115200n8 apparmor=0 elevator=noop nvme-core.multipath=nouveau.modeset=0 boot-live-env start-systemd-networkd fetch=http://${SERVER_IP}/firmware-update/filesystem.squashfs
    
    Important: If the system is booting from the LAN port connection (eno1), and the connections are not on the same domain, add live-netdev=eno1 to the append line.

    Example:

    append vga=788 initrd=initrd boot=live console=tty0 apparmor=0 live-netdev=eno1 elevator=noop nvme-core.multipath=n nouveau.modeset=0 boot-live-env start-systemd-networkd fetch=http://${SERVER_IP}/filesystem.squashfs
  5. (Optional) To set up the boot configuration to run the container automatically when booting, edit the following parameters at pxelinux.cfg/default:

    Set fwuc-mode=noninteractive.

    Set the following parameters as needed.

    • fwuc-update_args=<arg1>,<arg2> ...
    • fwuc-extra_args=<extra-arg1> ...

    See Command and Argument Summary for available arguments.

    The following example boots the package in non-interactive mode and updates the SBIOS without first checking the installed version, then reboots the system after the update.

    append vga=788 initrd=initrd boot=live console=tty0 apparmor=0 elevator=noop nvme-core.multipath=n nouveau.modeset=0 fwuc-mode=noninteractive fwuc-update_args=update_fw,SBIOS,-f fwuc-extra_args=reboot-after-updateboot-live-env start-systemd-networkd fetch=http://${SERVER_IP}/filesystem.squashfs
  6. Change permissions on /local.
    $ sudo chmod 755 -R /local
  7. PXE boot by restarting the system using ipmitool.
    $ ipmitool -I lanplus -H <DGX-BMC-IP> -U <username> -P <password> chassis bootdev pxe options=efiboot
    $ ipmitool -I lanplus -H <DGX-BMC-IP> -U <username> -P <password> chassis power reset
    

    When the system PXE menu appears, select the Firmware Update Container option. The firmware is updated automatically after the system has booted. If not set to update automatically, then follow the instructions to update the firmware.

  8. If the FPGA was updated, then perform a DC power cycle by issuing the following command.
    $ sudo ipmitool -I lanplus -H ${BMC_IP} -U ${BMC_USER} -P ${BMC_PW} chassis power cycle

4. DGX Station A100 Firmware Update Container Version 22.02.1

The DGX Station A100 Firmware Update Container verison 22.02.1 is available.

  • Package name: nvfw-dgxstationa100_22.2.1_220209.tar.gz
  • Run file name: nvfw-dgxstationa100_22.2.1_220209.run
  • Image name: nvfw-dgxstationa100:22.2.1
  • ISO image: DGXSTATIONA100_FWUI-22.2.1-2022-02-15-10-20-25.iso
  • PXE netboot: pxeboot-DGXSTATIONA100_FWUI-22.2.1.tgz

Highlights and Changes in this Release

  • This release is supported with the following DGX OS software:
    • DGX OS 5.0.2 or later
    • EL7-21.04 or later
    • EL8-20.11 or later
  • The following issues were fixed in this release:
    • BMC
      • Fixed the issue where after you run the bmc mc cold reset command, the BMC was generating two or three sels entries with pre-initialized timestamps.
      • Fixed the issue where the severity of system events, including audit SEL, are all marked as Critical.
      • Fixed the issue where the USB and Build-in UEFI Boot cannot be detected at the same time.
      • Fixed the issue where the sensor names, the threshold, and the sensor type were not in sync with sensor list file.
      • Fixed the issue where if your BMC web UI session times out, and you are locked out, you needed to log in twice to enter the web UI again.
      • Fixed the issue where after you run the $ sudo ipmitool sel clear command to clear the system event log, no log entry existed to help you verify that the SEL was cleared.
      • Fixed the issue where after you enter the incorrect password 5 or more times, you do not know that you have been locked out of the web UI.
      • Fixed the issue in the BMC web GUI, when you click Logs & Reports > Debug log, there was no Debug Log button.
    • SBIOS
      • Fixed the issue where the ECC Leaky Bucket Threshold help string mentions that the range is 0 to 255, but the default value of this option is actually 1000.

Contents of the DGX Station A100 System Firmware Update Container

This container includes the firmware binaries and update utilities for the firmware in the following table:

Table 1. DGX Station A100 Firmware
Component Version
BMC 01.24.00
SBIOS 10.16
Retimer 1.0.125
VBIOS
  • 80GB: 92.00.38.00.01
  • 40GB: 92.00.48.00.01
M.2 Micron 7300 MTFDHBG1T9TDF SSD 95420260
U.2 KIOXIA CM6 SSD 0105
FPGA 2.71
Storage Backplane 0.3
NVFlash 5.714.0
Important: When you update the Retimer, Backplane, and FPGA components with the firmware update container tarball, you must add the --network host argument. If you do not add this argument, the update will fail.

Here is an example of the command for a succesful update:

$ sudo docker run --rm --network host -ti --privileged -v /:/hostfs nvfw-dgxstationa100:22.2.1 update_fw Backplane -f

4.1. Updating the Firmware to Version 22.02.1

This section explains how to update the firmware on the system by using the firmware update container. It includes instructions to complete a transitional update for systems that require the update.

Before you begin, stop all unnecessary system activities.
CAUTION:
While an update is in progress, do not add additional loads on the system, such as Kubernetes jobs or other user jobs or diagnostics. A high GPU workload can disrupt the firmware update process and result in an unusable component.

The commands use the .run file, but you can also use any method described in Using the DGX Station A100 FW Update Utility.

  1. Determine whether updates are needed by checking the installed versions.
    $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run show_version
    • If there is a no in any up-to-date column for updatable firmware, proceed to the next step.
    • If all up-to-date column entries display a yes, no updates are required and no additional action is necessary.
  2. Stop the gdm3 service.
    $ sudo systemctl stop gdm3
  3. Complete the update for all firmware that is supported by the container.
    $ sudo ./nvfw-dgxstationa100_22.2.1_220209.run update_fw all

    Depending on the firmware that is updated, you might be prompted to reboot the system or power cycle the system:

    • If you are prompted to reboot, issue the following command:
      $ sudo reboot
    • If you are prompted to power cycle, issue the following commands:
      $ sudo ipmitool chassis power cycle
You can verify the update by issuing the following command:
$ sudo ./nvfw-dgxstationa100_22.2.1_220209.run show_version

Here is an example output for a DGX Station A100 40GB system:

BMC DGX Station A100
======================
Image Id              Status         Location      Onboard Version   Manifest  up-to-date
N/A                   Online         Local         01.24.00          01.24.00     yes

 FPGA
========
Onboard version     Manifest  up-to-date
2.71                  2.71       yes

 Storage Backplane
==================
Bus               Onboard Version   Manifest         up-to-date
N/A                     0.3             0.3              yes

 Retimer Loc.
=============
PCIe Slot#      Onboard Version   Manifest         up-to-date
Retimer@slot4       1.0.125       1.0.125             yes
Retimer@slot5       1.0.125       1.0.125             yes
Retimer@slot6       1.0.125       1.0.125             yes
Retimer@slot7       1.0.125       1.0.125             yes

 SBIOS
=======
Image Id                           Onboard Version   Manifest        up-to-date
N/A                                L10.16            L10.16             yes

 Video BIOS
============
Bus            Model                Onboard Version   Manifest         up-to-date
0000:01:00.0   A100-SXM4-40GB       92.00.48.00.01    92.00.48.00.01      yes
0000:47:00.0   A100-SXM4-40GB       92.00.48.00.01    92.00.48.00.01      yes
0000:81:00.0   A100-SXM4-40GB       92.00.48.00.01    92.00.48.00.01      yes
0000:c2:00.0   A100-SXM4-40GB       92.00.48.00.01    92.00.48.00.01      yes

 Mass Storage
==============
Drive Name/Slot    Model Number                Onboard Version    Manifest    up-to-date
nvme0n1            Micron 7300_MTFDHBG1T9TDF    95420260          95420260     yes
nvme1n1            Kioxia KCM6DRUL7T68            0105              0105       yes

4.2. DGX Station A100 Firmware Known Issues

This section provides a list of the known issues in version 22.02.1.

4.2.1. VBIOS update fails

Issue

VBIOS update fails on Red Hat Enterprise Linux 9 due to system service/process caching the resource to be upgraded.

Explanation

The following services (system processes) must be stopped manually for the firmware update to start:

  • process nvidia-persiste(pid 5372)
  • process nv-hostengine(pid 2723)
  • process cache_mgr_event(pid 5276)
  • process cache_mgr_main(pid 5278)
  • process dcgm_ipc(pid 5279)
If `xorg` is holding the resources, try to stop it by running
$ sudo systemctl stop (display manager)
where the (display manager) can be acquired by
$ cat /etc/X11/default-display-manager

4.2.2. The ISO Update Menu Does Not Display the Current VBIOS Version

Issue

When used in the interactive mode, the Firmware Update Menu does not display the current VBIOS version.

4.2.3. Cannot Use a Forward Slash When Creating LDAP Group Settings in the BMC

Issue

On the BMC dashboard, when you try to create group roles, you cannot include a forward slash (/) in the Group Name or Group Domain fields. For example, "Bay/Ships" will not work.

Explanation

NVIDIA is investigating the issue, and there is no workaround at this time.

4.2.4. Boot Options Do Not Persist After a BIOS Update

Issue

After you update the BIOS from version 9.28c to L10.16, the boot options that you previously set do not persist.

Explanation

NVIDIA is investigating the issue, and there is no workaround at this time.

4.2.5. In Dockerless Mode, the Onboard Version Displays an Unknown Version Status

Issue

In RHEL7-21.07, in dockerless mode, when you run the show_version command, the onboard version displays an unknown version status.

Explanation

nvipmitool is used to query the FPGA, Backplane, PSU, and Retimer firmware versions.

The tool that is bundled in the DGX Station A100 firmware update container works only on Ubuntu and not on RHEL. As a result, in dockerless mode, when the DGX system tries to locate nvipmitool, the unknown version string is displayed.

Workaround

Important: nvipmitool is now only bundled in the rhel7-r470-cuda-11-4 package and is not installed by default on RHEL7-21.10.

To use the tool in RHEL7-21.10, and if you do not want to upgrade to the the R470 driver, run the following command:

yum install https://international.download.nvidia.com/dgx/repos/rhel7-r470-cuda11-4/nvipmitool-1.0.60_rhel7_release-1.x86_64.rpm";;

4.2.6. Cannot Update the FPGA in Dockerless Mode

Issue

When you try to update the FPGA, and you are on RHEL7, the upgrade will fail.

Workaround

There is no workaround at this time.

4.2.7. Cannot Force a Backplane Update in Dockerless Mode

Issue

When you try to force a Backplane update, and you are on RHEL7, the upgrade will fail.

Workaround

There is no workaround at this time.

4.2.8. Cannot Force a Retimer Update in Dockerless Mode

Issue

When you try to force a Retimer update, and you are on RHEL7, the upgrade will fail.

Workaround

There is no workaround at this time.

4.2.9. New FPGA Version Number Does Not Display After an Update

Issue

After you update the FPGA, and the DGX system reboots, the previous FPGA version is still displayed.

Explanation

For the FPGA update to take effect, a DC power cycle option is required, but currently only the Reboot after update option exists.

Workaround

Complete one of the following options:
  • After you complete the FPGA firmware version update, complete the following steps:
    1. Click BMC WebUI > Power Control.
    2. Power off the system
    3. Click BMC WebUI > Power Control.
    4. Power on the system.
  • In a Command Prompt window, run the following command:
    $ sudo ipmitool -I lanplus -H ${BMC_IP} -U ${BMC_USER} -P ${BMC_PW} chassis power cycle
    

4.2.10. Help Output Does Not Display Information for the Firmware Update Container Usage

Issue

In the interactive menu, when you click Show update container usage, instead of displaying the overall firmware update container usage information, only the information for previous few components is displayed.

Workaround

There is no workaround at this time.

4.2.11. Incorrect Prompt Message When Upgrading on EL8-21.08

Issue

On EL8-21.08, if the Xorg processes are holding onto the resource that will be upgraded, after you issue cat /etc/X11/default-display-manager, the following incorrect message is displayed:

No such file or directory

Workaround

For a successful upgrade on EL8-21.08, before you run the update_fw allorupdate_fw VBIOS -f commands, run the following command:

$ sudo systemctl stop gdm

4.2.12. After unloading the NVIDIA Driver, ?? is Displayed as the VBIOS Onboard Version

Issue

On RHEL, after you unload the NVIDIA driver, and run the show_version command, ?? is displayed as the VBIOS onboard version.

Workaround

There is no workaround at this time.

4.2.13. When Updating the Backplane Firmware, a Corrupted Screen is Displayed

Issue

When you use the firmware update ISO to update the Backplane firmware with the --force argument with SBIOS L9.28C, a corrupted screen appears.

Workaround

Update your SBIOS firmware to L10.16 and then update the Backplane firmware.

4.2.14. Unable to Install SSD Firmware After Multiple Upgrade and Downgrade Attempts

Issue

After multiple attempts to upgrade or downgrade the firmware, the SSD nvme1n1 KCM6DRUL7T68 firmware installation fails. Here is an example of an error message:

Failed to install SSD nvme1n1 KCM6DRUL7T68 0105

Workaround

Run the firmware update container again.

4.2.15. When Updating BMC Firmware, the BMC WebUI/KVM Connection Fails

Issue

When you update the BMC firmware from the BMC KVM by using one of the following options, the BMC shuts down its web service:
  • Firmware update container
  • Firmware user interface (UI)

Additional Information

The shutdown causes the BMC web UI/KVM to disconnect, but this connection is established again after the update is complete.

Workaround

  1. Wait about 23 minutes and log in to the BMC web UI again.
  2. Start the BMC KVM and verify that the BMC firmware has successfully updated.

5. DGX Station A100 Firmware

Highlights and Changes in this Release

This release is supported with the following DGX OS software:
  • DGX OS 5.0.2 or later.

5.1. Contents of the DGX Station A100 Firmware Container

Here is a list of the firmware components for DGX Station A100.

Table 2. DGX Station A100 Firmware
Component Version
BMC 1.08.00
SBIOS 10.03
VBIOS
  • 80GB: 92.00.38.00.01
  • 40GB: 92.00.48.00.01
FPGA 2.71
Storage Backplane 0.3
PSU 3.8
Retimer 1.0.125

Notices