Using the DGX A100 Firmware Update ISO

This section describes how to use the DGX A100 firmware update ISO to efficiently update the firmware in a large fleet of DGX A100 systems.

About the Firmware Update Menu

Once the system boots up to the firmware update ISO, it sets up the environment and launches a firmware update menu. The menu can be used in the following three different modes:

  • Interactive

    This displays a text-based UI with the following choices of actions to take:

    • Start the firmware update container

      This runs the firmware update container using the update_fw all option.

    • Start the firmware update container with custom options

      This runs the firmware update container using custom arguments that you enter into a text box. Separate multiple arguments by a space. Example

      update_fw BMC -f
      

      See List of Arguments for available arguments.

    • Set up connection for automation and Exit

      This sets up an SSH connection (default user name is fwui and default password is fw_update) so you can run automation scripts from a different system. For example, this lets you use Ansible automation.

    • Exit

  • Non-interactive

    This reads the argument from kernel parameter (/proc/cmdline) and then runs the firmware update container automatically. See section “Updating the Firmware Automatically” in Booting to the Firmware Update ISO from a USB Flash Drive.

  • Automation

    This sets up an SSH connection. The default user name is fwui and default password is fw_update. From there you can use automation scripts (for example, Ansible) to perform the firmware update.

Booting to the Firmware Update ISO from a USB Flash Drive

This section describes how to boot to the DGX A100 firmware update ISO from a USB flash drive.

Basic Process

Download the ISO image and create a bootable USB drive that contains the ISO image.

Important

Do not use the virtual media from the BMC as the BMC will be reset during the update.

Updating the Firmware Automatically

Perform the following steps to set up the firmware to update automatically when the system boots.

  1. Edit the GRUB menu parameters within the ISO at BOOT/GRUB/GRUB.CGF as follows.

    Set fwuc-mode=noninteractive.

    Set the following parameters as needed.

    • fwuc-update_args=<arg1>,<arg2> ...
      
    • fwuc-extra_args=<extra-arg1> ...
      

    See List of Arguments for available arguments.

    The following example boots the firmware update ISO in non-interactive mode and then updates the SBIOS without first checking the installed version, then reboots the system after the update.

    menuentry "Start Firmware Update Environment (Non-interactive)" {
        linux /vmlinuz boot=live console=tty0 apparmor=0 elevator=noop nvme-core.multipath=n nouveau.modeset=0 boot-live-env start-systemd-networkd fwuc-mode=noninteractive fwuc-update_args=update_fw,SBIOS,-f fwuc-extra_args=reboot-after-update
        initrd /initrd
    }
    
  2. Create a bootable USB drive that contains the updated ISO.

  3. Boot to the USB drive.

  4. If the NVMe drive firmware, the FPGA, or the CEC1712 (Delta_CEC) was updated, then perform a DC power cycle by issuing the following.

    $ sudo ipmitool -I lanplus -H ${BMC_IP} -U ${BMC_USER} -P ${BMC_PW} chassis power cycle
    

Booting to the Firmware Update ISO by PXE Boot

This section describes how to PXE boot to the DGX A100 firmware update ISO.

Prerequisites

Refer to the following topics for information about enabling PXE boot on the DGX system:

Procedure

  1. Download the ISO image and then mount it.

    $ sudo mount -o loop ~/DGXA100_FWUI-23.12.1-2023-11-22-11-29-20.iso /mnt
    
  2. Copy the filesystem.squashfs, initrd, and vmlinuz files to the http directory.

    $ sudo mkdir -p /local/http/firmware-update/
    $ sudo cp /mnt/live/filesystem.squashfs /local/http/firmware-update/
    $ sudo cp /mnt/{initrd,vmlinuz} /local/http/firmware-update/
    $ umount /mnt
    

    The new /local/http folder structure should look like this:

    /local/http/
    ├── dgxbaseos-5.x.y
    │   ├── base_os_5.x.y.iso
    │   ├── initrd
    │   └── vmlinuz
    └── firmware-update
        ├── filesystem.squashfs
        ├── initrd
        └── vmlinuz
    
  3. Edit the /local/syslinux/efi64/pxelinux.cfg/default file to add the following menu option content for the Firmware Update OS.

    label Firmware Update Container
        menu label Firmware Update Container
        kernel http://${SERVER_IP}/firmware-update/vmlinuz
        initrd http://${SERVER_IP}/firmware-update/initrd
        append vga=788 initrd=initrd boot=live console=tty0 console=ttyS1,115200n8 apparmor=0 elevator=noop nvme-core.multipath=nouveau.modeset=0 boot-live-env start-systemd-networkd fetch=http://${SERVER_IP}/firmware-update/filesystem.squashfs
    

    Important

    If the system is booting from the LAN port connection (enp226s0), connections to slot 4 (enp225s0f0 and enp225s0f1) must be on the same domain as the LAN port. If they are not on the same domain, then add live-netdev=enp226s0 to the append line.

    Example:

    append vga=788 initrd=initrd boot=live console=tty0 apparmor=0 live-netdev=enp226s0 elevator=noop nvme-core.multipath=n nouveau.modeset=0 boot-live-env start-systemd-networkd fetch=http://${SERVER_IP}/filesystem.squashfs
    
  4. Optional: To set up the boot configuration to run the container automatically when booting, edit the following parameters at pxelinux.cfg/default:

    Set fwuc-mode=noninteractive.

    Set the following parameters as needed.

    • fwuc-update_args=<arg1>,<arg2> ...
      
    • fwuc-extra_args=<extra-arg1> ...
      

    See List of Arguments for available arguments.

    The following example boots the package in non-interactive mode and updates the SBIOS without first checking the installed version, then reboots the system after the update.

    append vga=788 initrd=initrd boot=live console=tty0 apparmor=0 elevator=noop nvme-core.multipath=n nouveau.modeset=0 fwuc-mode=noninteractive fwuc-update_args=update_fw,SBIOS,-f fwuc-extra_args=reboot-after-updateboot-live-env start-systemd-networkd fetch=http://${SERVER_IP}/filesystem.squashfs
    
  5. Change permissions on /local.

    $ sudo chmod 755 -R /local
    
  6. PXE boot by restarting the system using ipmitool.

    $ ipmitool -I lanplus -H <DGX-BMC-IP> -U <username> -P <password> chassis bootdev pxe options=efiboot
    $ ipmitool -I lanplus -H <DGX-BMC-IP> -U <username> -P <password> chassis power reset
    

    When the system PXE menu comes up, choose the Firmware Update Container option. The firmware is updated automatically once the system has booted. If not set to update automatically, then follow the instructions to update the firmware.

  7. If the NVMe drive firmware, the FPGA, or the CEC1712 (Delta_CEC) was updated, then perform a DC power cycle by issuing the following.

    $ sudo ipmitool -I lanplus -H ${BMC_IP} -U ${BMC_USER} -P ${BMC_PW} chassis power cycle