Firmware Update Steps#

Before You Begin#

  • Stop all unnecessary system activity before you begin the firmware update.

  • Stop all GPU activity, including running the nvidia-smi command. GPU activity and running the command can prevent the VBIOS update.

  • Do not add additional loads on the system, such as user jobs, diagnostics, or monitoring services, while an update is in progress. A high workload can disrupt the firmware update process and result in an unusable component.

  • When you begin the firmware update, the update software assists in determining the activity state of the DGX system and provides a warning if it detects that activity levels are above a predetermined threshold. If you encounter the warning, take action to reduce the workload before proceeding with the firmware update.

  • Fan speeds might increase during the BMC firmware update. This increase in speed is a normal part of the BMC firmware update process.

Update Steps#

  1. View the installed versions compared with the newly available firmware:

    nvfwupd -t ip=<bmc-ip-address> user=<bmc-username> password=<bmc-password> \
      show_version -p <mb-tray-package> <gpu-tray-package>
    
  2. Update the BMC.

    1. Create a file, such as update_bmc.json, with the following contents:

      {
          "Targets" :["/redfish/v1/UpdateService/FirmwareInventory/HostBMC_0"]
      }
      
    2. Run the following command to update the BMC:

      nvfwupd -t ip=<bmc-ip-address> user=<bmc-username> password=<bmc-password> update_fw \
        -p <mb-tray-package> -y -s update_bmc.json
      
  3. Reboot the BMC.

    $ ipmitool -H <bmc-ip-address> -U <user> -P <password> -I lanplus mc reset cold
    

    Wait a couple of minutes and then confirm the BMC is back online.

    • Use the system shell:

      $ ipmitool -H <bmc-ip-address> -U <user> -P <password> -I lanplus mc info
      
    • Alternatively, you can access the Web UI through a browser.

  4. Update the components on the motherboard tray.

    During a one-shot firmware update, the BMC processes all components in the provided bundle. However, components that are already matching the bundle’s version will not be updated.

    Ensure the system is powered on before updating the firmware.

    1. Create a file, such as mb_tray.json, with empty braces:

      {}
      
    2. Update the firmware:

      nvfwupd -t ip=<bmc-ip-address> user=<bmc-username> password=<bmc-password> update_fw \
        -p <mb-tray-package> -y -s mb_tray.json
      
  5. Update the components on the GPU tray.

    1. Create a file, such as gpu_tray.json, with empty braces:

      {}
      
    2. Update the firmware:

      nvfwupd -t ip=<bmc-ip-address> user=<bmc-username> password=<bmc-password> update_fw \
        -p <gpu-tray-latest-package> -y -s gpu_tray.json
      

      This step performs parallel updates on all the components contained in the GPU tray, such as VBIOS, NVSwitch, EROTs, and FPGA.

    3. Verify that the background copy has been completed successfully by looking for "BackgroundCopyStatus": "Completed" in the following command output:

      curl -s -k -u <bmc-user>:<bmc-password> -H content-type:application/json \
           -X GET https://<bmc-ip-address>/redfish/v1/Chassis/HGX_ERoT_BMC_0 | jq
      
  6. Perform an AC power cycle on the system with the Restore to Defaults option (0x1).

    ipmitool -H <bmc-ip-address> -U <user> -P <password> -I lanplus raw 0x3c 0x99 0x1
    

    The command resets the BMC/IPMI configuration and reverts the user credentials to the factory default. The default credentials are admin/admin.

    1. Configure the BMC IP address:

      • DHCP configuration

        If the BMC obtains the IP address via DHCP, ensure the IP address is correctly reassigned. If DHCP is not used, consult your network administrator to determine the assigned IP address.

      • Static IP configuration

        If you need to configure a static IP address, refer to Configuring a Static IP Address for the BMC.

    2. Set the BMC administrator password:

      ipmitool -H <BMC IP> -U admin -P admin user set password 2 <new-password>
      

      For password setting information, refer to Username and Password Requirements.

  7. Confirm the firmware update is complete by viewing the installed versions again.

    After the system is operational again, repeat the following command to confirm all firmware has been updated:

    nvfwupd -t ip=<bmc-ip-address> user=<bmc-username> password=<bmc-password> \
      show_version -p <mb-tray-package> <gpu-tray-package>
    
  8. Execute background copy commands for the BMC and the system BIOS.

    Note

    This step is required only when the two flash partitions are different. If the firmware versions are identical, the background copy operation will fail. To verify the firmware versions on both partitions, navigate to the dashboard screen after logging in to the BMC Web UI.

    1. BMC:

      Background copy Redfish API request:

      curl -k -u <bmc-user>:<password> --request POST --location 'https://<bmc-ip-address>/redfish/v1/UpdateService/Actions/Oem/NvidiaUpdateService.CommitImage' \
           --header 'Content-Type: application/json' \
           --data '{
                   "Targets": ["/redfish/v1/UpdateService/FirmwareInventory/HostBMC_0"]
                   }'
      

      Example response:

      {
         "@odata.type":"#UpdateService.v1_11_0.UpdateService",
         "Messages":[
            {
               "@odata.type":"#Message.v1_0_8.Message",
               "Message":"A new task /redfish/v1/TaskService/Tasks/1 was created.",
               "MessageArgs":[
                  "/redfish/v1/TaskService/Tasks/1"
               ],
               "MessageId":"Task.1.0.New",
               "Resolution":"None",
               "Severity":"OK"
            },
            {
               "@odata.type":"#Message.v1_0_8.Message",
               "Message":"ActivateFirmware Action is initiated.",
               "MessageId":"UpdateService.1.0.StartActivateFirmware",
               "Resolution":"None",
               "Severity":"OK"
            }
         ]
      }
      

      Query the update status using the task ID, which is 1, as shown in the output response:

      nvfwupd -t ip=<bmc-ip-address> user=<bmc-username> password=<bmc-password> show_update_progress -i 1
      

      When the status indicates 100% complete, proceed with the next step.

    2. SBIOS:

      Background copy Redfish API request:

      curl -k -u <bmc-user>:<password> --request POST --location 'https://<bmc-ip-address>/redfish/v1/UpdateService/Actions/Oem/NvidiaUpdateService.CommitImage' \
           --header 'Content-Type: application/json' \
           --data '{
                   "Targets": ["/redfish/v1/UpdateService/FirmwareInventory/HostBIOS_0"]
                   }'
      

      Find the task ID from the response, which is usually 2, to query the update status:

      nvfwupd -t ip=<bmc-ip-address> user=<bmc-username> password=<bmc-password> show_update_progress -i 2
      

      When the status indicates 100% complete, proceed with the next step.

  9. Update the NVMe drive firmware.

    For detailed instructions, refer to Updating NVMe Device Firmware.