Using the DGX A100 FW Update Utility

The NVIDIA DGX A100 System Firmware Update utility is provided in a tarball and also as a .run file. Copy the files to the DGX A100 system, then update the firmware using one of the following three methods:
  • NVSM provides convenient commands to update the firmware using the firmware update container
  • Using Docker to run the firmware update container
  • Using the .run file which is a self-extracting package embedding the firmware update container tarball
CAUTION:
  • Stop all unnecessary system activities before attempting to update firmware.
  • Stop all GPU activity, including accessing nvidia-smi, as this can prevent the VBIOS from updating.
  • Do not add additional loads on the system (such as user jobs, diagnostics, or monitoring services) while an update is in progress. A high workload can disrupt the firmware update process and result in an unusable component.
  • When initiating an update, the update software assists in determining the activity state of the DGX system and provides a warning if it detects that activity levels are above a predetermined threshold. If the warning is encountered, you are strongly advised to take action to reduce the workload before proceeding with the update.
Note: Fan speeds may increase while updating the BMC firmware. This is a normal part of the BMC firmware update process.

Using NVSM

The NVIDIA DGX A100 system software includes Docker software required to run the container.
  1. Copy the tarball to a location on the DGX system.
  2. From the directory where you copied the tarball, enter the following command to load the container image.
    $ sudo docker load -i nvfw-dgxa100_21.05.7_210519.tar.gz 
  3. To verify that the container image is loaded, enter the following.
    $ sudo docker images 
    
    REPOSITORY    TAG 
    nvfw-dgxa100  21.05.7
  4. Using NVSM interactive mode, enter the firmware update module.
    $ sudo nvsm
    nvsm-> cd systems/localhost/firmware/install
  5. Set the flags corresponding to the action you want to take.
    $ nvsm(/system/localhost/firmware/install)-> set Flags=<option>

    See the Command and Argument Summary section below for the list of common flags.

  6. Set the container image to run.
    $ nvsm(/system/localhost/firmware/install)-> set DockerImageRef=nvfw-dgxa100:21.05.7
    
  7. Run the command.
    $ nvsm(/system/localhost/firmware/install)-> start
    

Using docker run

The NVIDIA DGX A100 system software includes Docker software required to run the container.
  1. Copy the tarball to a location on the DGX system.
  2. From the directory where you copied the tarball, enter the following command to load the container image.
    $ sudo docker load -i nvfw-dgxa100_21.05.7_210519.tar.gz 
  3. To verify that the container image is loaded, enter the following.
    $ sudo docker images 
    
    REPOSITORY    TAG 
    nvfw-dgxa100  21.05.7
  4. Use the following syntax to run the container image.
    $ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxa100:21.05.7 <command> <[arg1] [arg2] ... [argn]
See the Commands/Argument Summary section below for the list of common commands and argument.
Note: If you do not have the tarball file, but you do have the .run file, you can extract the tarball from the .run file by issuing the following:
sudo nvfw-dgxa100_21.05.7_210519.run -x

Using the .run File

The update container is also available as a .run file. The .run file uses the Docker software if it is installed on the system, but can also be run without Docker installed.
  1. After obtaining the .run file, make the file executible.
    $ chmod +x nvfw-dgxa100_21.05.7_210519.run
  2. Use the following syntax to run the container image.
    $ sudo ./nvfw-dgxa100_21.05.7_210519.run <command> <[arg1] [arg2] ... [argn]
See the Command and Argument Summary section below for the list of common commands and arguments.

Command and Argument List

Common Commands and Arguments

The following are common commands and arguments.
  • Show the manifest
    show_fw_manifest
    • NVSM Example: $ nvsm(/system/localhost/firmware/install)-> set Flags=show_fw_manifest
    • Docker Run Example:$ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxa100:21.05.7 show_fw_manifest
    • .run File Example: $ sudo ./nvfw-dgxa100_21.05.7_210519.run show_fw_manifest
  • Show version information
    show_version
    • NVSM Example: $ nvsm(/system/localhost/firmware/install)-> set Flags=show_version
    • Docker Run Example:$ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxa100:21.05.7 show_version
    • .run File Example: $ sudo ./nvfw-dgxa100_21.05.7_210519.run show_version
  • Check the onboard firmware against the manifest and update all down-level firmware.
    update_fw all
    • NVSM Example: $ nvsm(/system/localhost/firmware/install)-> set Flags=update_fw\ all

      For NVSM, an escape is needed before blank spaces when setting the flags.

    • Docker Run Example:$ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxa100:21.05.7 update_fw all
    • .run File Example: $ sudo ./nvfw-dgxa100_21.05.7_210519.run update_fw all
  • Check the specified onboard firmware against the manifest and update if down-level.
    update_fw [fw] 
    Where [fw] corresponds to the specific firmware as listed in the manifest. Multiple components can be listed within the same command. The following are examples of updating the BMC and SBIOS.
    • NVSM Example: $ nvsm(/system/localhost/firmware/install)-> set Flags=update_fw\ BMC\ SBIOS

      For NVSM, an escape is needed before blank spaces when setting the flags.

    • Docker Run Example:$ sudo docker run --rm --privileged -ti -v /:/hostfs nvfw-dgxa100:21.05.7 update_fw BMC SBIOS
    • .run File Example: $ sudo ./nvfw-dgxa100_21.05.7_210519.run update_fw BMC SBIOS

List of Arguments

Update flags: 
   Updates all, a specified combination, or an individual firmware component
   if the image currently on the device is prior to the available version.
   syntax:
      update_fw  < firmware_components >
      update_fw < component [ -f | --force ] [ component options ] >
      
Update flag Definitions :
   --force  bypass the checks and upgrade regardless of the version.
   all      Update firmware on all components.
            syntax: update_fw all
 
   SBIOS    Update the System BIOS firmware.
            syntax: update_fw SBIOS [ -a | --active]             
                                    [ -i | --inactive]           
                   
   BMC      Update the firmware on all, or a specified Baseboard Management
            Controller.
            syntax: update_fw BMC [ -a | --active]               
                                  [ -i | --inactive]             
                      [ -b | --bmc-access-path <BMC IP:login_id:password> ]
                                  [ -m | --intermediate-fw ]
                                  [ -t | --target-bmc <target BMC> ]
            where: 
               --bmc-access-path <val>   Non-default access parameters to the BMC

   SSD      Update firmware on all, or a specified Solid State Drive.
            syntax: update_fw SSD [ -s | --select-ssd <SSD target> ]
            where:
               --select-ssd <target>  Name of the specific drive to update

   PSU      Update the firmware on all, or a specified Power Supply
            syntax: update_fw PSU [ -s | --select-psu <PSU number> ] [ -S | --select-slot <PSU slot> }
            where:
               --select-psu <target>  Name of the specific PSU to update.
               --select-slot <slot>   Name of the specific PSU slot to update

   VBIOS    Update the Video BIOS firmware on all detected GPUs.
            It is not currently possible to update individual GPU devices.
            syntax: update_fw VBIOS                   

   FPGA     Update firmware on the FPGA devices on lower and upper GPU trays.
            syntax: update_fw FPGA                   

   SWITCH   Update firmware on one, specific set, or all switch devices.           
            syntax: update_fw SWITCH [ -s | --select-switch <switch-model[:BDF]> ]

   CEC      Update firmware on one or multiple CEC 
            syntax: update_fw CEC [ -s | --select-cec [ MB_CEC | Delta_CEC ]

   CPLD     Update MB CPLD / MID CPLD firmware 
            syntax: update_fw CPLD [ -s | --select-cpld [ MB_CPLD | MID_CPLD ]