About Firmware Updates

Firmware Updatable Components

An NVIDIA DGX H100 System has several firmware updatable components. Some of the components are on the following two trays in the system:

  • The motherboard tray has components, such as the CPUs, PCH, BMC as shown in the following figure:

    _images/dgx-h100-mb-tray-comp.png
  • The GPU tray has components, such as the GPUs, NVSwitches, HMC as shown in the following figure:

    _images/dgx-h100-gpu-tray.png

You can update the firmware on NVIDIA DGX H100 System components out-of-band (OOB) by using Redfish APIs or from the host operating system by using command-line interface (CLI) commands.

Firmware Update Prerequisites

  • You can download firmware packages from the NVIDIA Enterprise Support Portal at https://enterprise-support.nvidia.com/s/.

  • You must know the BMC IP address, a user name, and a password. The sample commands in this document show admin for both the user name and the password.

  • You must have the nvfwupd executable or know how to use the Redfish API.

Firmware Update Methods

Most of the sample commands in this document show how to update firmware by using the nvfwupd command. You can download the executable from the NVIDIA Enterprise Support Portal. Refer to About the nvfwupd Command for more information about the command.

You can run the nvfwupd command interactively to update systems. Most command examples in this document show this interactive approach. If you have several systems to update, you can create a JSON file that identifies the systems to update. Refer to Updating Multiple Systems for more information.

An alternative to the nvfwupd command is to update firmware by using the Redfish API. The BMC network interface provides remote management with Redfish APIs.

The Known Issues for updating firmware and the firmware update steps still apply when you use the Redfish API.

Refer to Redfish APIs Support in the NVIDIA DGX H100 User Guide for more information and sample commands. The sample commands show how to update firmware with the curl command.

Firmware Update Activation

After the firmware update, you must perform one or more of the following tasks to activate the firmware update, depending on the components being updated:

  • BMC component

    Reset the BMC by running the following command:

    sudo ipmitool mc reset cold
    
  • PCIe Switch, PCIe Retimer, BIOS, and HGX (GPU Tray) components

    Perform a cold reset on the system using the following command:

    sudo ipmitool chassis power cycle
    
  • EROT and CPLD components

    Perform an AC power cycle on the system by unplugging all the power supplies and then reconnecting them either manually or through an external PDU device.

    Note

    The AC power cycle will activate firmware for all updated components.