Upgrading BlueField Software Components Using PLDM
The PLDM firmware update protocol provides a standardized, out-of-band (OOB) method for upgrading firmware components on devices such as the NVIDIA® BlueField®-3. It enables a platform update agent—typically the server's BMC—to transfer firmware images to the target device.
Each PLDM image is specific to a given BlueField-3 SKU.
The BlueField-3 PLDM firmware image includes the following components:
NIC firmware
ATF/UEFI
BMC firmware
CEC firmware
The PLDM image does not include the Arm OS or DOCA software.
PLDM firmware update is supported in both NIC and DPU modes of operation.
The currently installed firmware must be at least BSP 4.11.0/DOCA 3.0.0 or later.
When operating in DPU mode - credentials for DPU-BMC are required - see DPU-BMC Credentials.
After the platform BMC completes the PLDM firmware transfer and issues the ActivateFirmware
command, use one of the following methods to apply the update:
NIC Mode
Cold Boot (Server AC/DC Power Cycle)
On the next power cycle, the firmware update is applied automatically during power-up.
Warm-Reboot
After receiving ActivateFirmware
, the subsequent server warm reboot will update all BlueField components.
DPU Mode
When BlueField operates in DPU mode, Linux runs on the embedded Arm cores. In this mode, PLDM firmware updates are handled by the /etc/acpi/actions/bf-upgrade
script, which is triggered via ACPI events.
DPU-BMC Credentials
For updating the DPU-BMC and CEC firmware, specify the necessary credentials in /etc/bf-upgrade.conf
on the Arm OS.
The bf-upgrade.conf file follows the same format as bf.cfg
. For more details, refer to the "Customizing BlueField Software Deployment" section.
Cold Boot (Server AC/DC Power Cycle)
On the next power cycle, the firmware update is applied automatically during power-up.
Ensure that the Arm cores are gracefully shut down before initiating the power cycle.
Warm-Reboot Options
Standard server warm-reboot in DPU mode will not trigger an update unless the Arm OS is shut down. Administrators have two options:
Standard Warm-Reboot
Gracefully shut down the Arm OS (manually by Admin).
Initiate a server warm reboot at a later time to reset and update the BlueField DPU NIC and Arm Complex.
Coordinated Reset (Server and DPU Together)
When enabled, Admin may set a trigger that will allow the next server warm-reboot to reset and update the BlueField DPU NIC and Arm Complex.
This allows to reduce the overall system downtime for applying a new pending image.
Step 1: Enable Auto-Shutdown for the Embedded CPU (One-time non-volatile configuration)
mlxconfig -d /dev/mst/<device> set INT_CPU_AUTO_SHUTDOWN=1
This configuration activates the mechanism for a coordinated graceful shutdown and device reset during a server warm reboot (only if triggered by the administrator, see Step 2).
Step 2: Trigger the Coordinated Reset
After the PLDM update is complete and a pending firmware image exists, Admin may choose a time that is convenient to trigger (allow) the next server warm-reboot to also gracefully shutdown the Arm OS and reset the DPU in a single flow.
On the Arm OS, run the following command using the MFT mlxreg
tool:
mlxreg -d /dev/mst/<device> -y --set "reset_trigger=c"
--reg_name="MFRL"
This sets a flag so that the next warm reboot will
shut down the BlueField Arm cores,
reset the NIC, Arm Complex, and BMC, and
boot from the new firmware image.
Without the reset trigger set, warm-reboot events will be ignored by the BlueField device.