Deploying Software Using BFB
NVIDIA® BlueField® devices support software deployment and upgrade through various BFB image types. For details on available image formats and their contents, refer to the "Types and Methods of Updating BlueField Software Image" page.
BlueField software and firmware can be deployed using one of two methods:
Update Flow | Description | Supported Image Types |
Offline Update Flow | Traditional method where the DPU or SuperNIC is taken out of service immediately when the update begins. The device reboots into maintenance mode, applies firmware, system image, and DOCA component updates, and reboots again to activate the new versions. Ensures a clean, immediate transition but involves downtime. This flow supports recovery as well. |
|
Deferred Update Flow | BlueField-3 supports a Deferred Update Flow, which enables administrators to update firmware and DOCA components without immediate service interruption. This capability allows a DPU or SuperNIC to continue servicing workloads while a new firmware bundle and user-space/kernel DOCA components are staged in the background. The new versions become active only after an activation command and reset are applied, minimizing downtime in production environments. |
|
BFB Installation Procedure
The BFB deployment process consists of these main stages:
Stage | Description | |
1 | Disable RShim (if applicable) | Ensure that the RShim interface is disabled on the host side where the given DPU resides to prevent interference with the BFB update process. |
2 | Transfer the BFB image | Initiate the image transfer using one of the supported methods:
|
3 | Track the update process and verify installation status through Redfish logs, BMC console, or CLI output. | |
4 | Apply the new version | Reboot the system to activate the new firmware and software. The specific reboot behavior depends on the selected update flow (offline or deferred). |
Update Flow
Changing Default Credentials Using bf.cfg
If installing the BF-Bundle BFB with BlueField Arm OS,
Ubuntu users are prompted to change the default password (ubuntu) for the default user (ubuntu) upon first login. Logging in will not be possible even if the login prompt appears until all services are up (DPU is ready message appears in /dev/rshim0/misc).
Alternatively, Ubuntu users can provide a unique password that will be applied at the end of the BFB installation. This password must be defined in a bf.cfg configuration file. To set the password for the ubuntu user:
Create password hash. Run:
# openssl passwd -
1Password: Verifying - Password: $1$3B0RIrfX$TlHry93NFUJzg3Nya00rE1Add the password hash in quotes to the
bf.cfgfile:# vim bf.cfg ubuntu_PASSWORD=
'$1$3B0RIrfX$TlHry93NFUJzg3Nya00rE1'The
bf.cfgfile is used with the bfb-install script in the steps that follow.
Update Flow Image Transfer
Offline Update Flow
curl -k -u root:'<password>' -H "Content-Type: application/json" -X POST -d '{"TransferProtocol":"SCP", "ImageURI":"<image_uri>","Targets":["redfish/v1/UpdateService/FirmwareInventory/DPU_OS"], "Username":"<username>"}' https://<bmc_ip>/redfish/v1/UpdateService/Actions/UpdateService.SimpleUpdate
This command initiates a soft reset on the BlueField and then pushes the boot stream. For NVIDIA-supplied BFBs, the eMMC is flashed automatically once the boot stream is pushed. Upon success, a running message is received.
After the BMC boots, it may take a few seconds (6-8 seconds for NVIDIA® BlueField®-2, and 2 seconds for BlueField-3) until the BlueField BSP (DPU_OS) is up.
Deferred Update Flow
Supported at beta level.
Deferred update flow enables upgrading DOCA components on NVIDIA® BlueField® platforms running in DPU mode while keeping services operational throughout the process. The update is applied only after a coordinated reset, minimizing downtime.
Deferred Update Flow Prerequisites
Download the per-SKU fw-bundle BFB from DOCA Downloads page.
NoteThe installed firmware must be BSP 4.13.0 (DOCA 3.2.0) or later.
When operating in DPU mode, credentials for DPU-BMC must be specified in
/etc/bf-upgrade.confon the Arm OS following the same format asbf.cfg. For more details, refer to "Customizing BlueField Software Deployment".(Optional) To enable a coordinated BlueField reboot with the host reboot, perform the following configuration from the BlueField Arm OS:
mlxconfig -d /dev/mst/<device> set INT_CPU_AUTO_SHUTDOWN=
1NoteThis must be configured in advance, as it requires a BlueField system-level reset to take effect.
Initiate Firmware Deferred Update Flow Transfer
curl -k -u root:'<password>' -H "Content-Type: application/json" -X POST -d '{"TransferProtocol":"HTTP", "ImageURI":"<image_uri>","Targets":["redfish/v1/UpdateService/FirmwareInventory/DPU_OS"], "Username":"<username>", "Stage":true}' https://<bmc_ip>/redfish/v1/UpdateService/Actions/UpdateService.SimpleUpdate
The parameter Stage is only supported when Targets is set to redfish/v1/UpdateService/ FirmwareInventory/DPU_OS. Another deferred update will fail if the staging has not completed.
Example success message if the request is valid and a task is created:
{
"@odata.id": "/redfish/v1/TaskService/Tasks/<task_id>",
"@odata.type": "#Task.v1_4_3.Task", "Id": "<task_id>",
"TaskState": "Running", "TaskStatus": "OK"
}
Transfer Command Parameters
image_uri– contains both the remote server IP address and the full path to the<fw-bundle-sku*>.bfbfile on the remote server, with one slash between the two fields (i.e.,<remote_server_ip>/<full_path_of_bfb>).InfoFor example, if
<remote_server_ip>is10.10.10.10and<full_path_of_bfb>is/tmp/file.bfbthen"ImageURI":"10.10.10.10//tmp/file.bfb".TransferProtocol– set to eitherSCP,HTTP,HTTPSNoteIf using HTTPS, make sure the BMC has a certificate to authenticate the HTTPS server. Or install a valid certificate to authenticate:
curl -c cjar -b cjar -k -u root:
'<password>'-X POST https://$bmc/redfish/v1/Managers/Bluefield_BMC/ Truststore/Certificates -d @CAcert.jsonusername– username on the remote server (only required for SCP)bmc_ip– BMC IP addressStage– a value ofTrueindicates a deferred flow, a value ofFalseor omitting this parameter indicates an offline update flow
Setting Up Secure Connection
Relevant only for SCP users with Redfish.
The following is an example for how to generate the server public key on Ubuntu 22.04 and it may be different on other OS distributions/versions.
Gather the public SSH host keys of the server holding the
new.bfbfile. Run the following against the server holding thenew.bfbfile ("Remote Server"):InfoOpenSSH is required for this step.
ssh-keyscan -t <key_type> <remote_server_ip>
Where:
key_type– the type of key associated with the server storing the BFB file (e.g., ed25519)remote_server_ip– the IP address of the server hosting the BFB file
Retrieve the remote server's public key from the response, and send the following Redfish command to the BlueField BMC:
curl -k -u root:'<password>' -H "Content-Type: application/json" -X POST -d '{"RemoteServerIP":"<remote_server_ip>", "RemoteServerKeyString":"<remote_server_public_key>"}' https://<bmc_ip>/redfish/v1/UpdateService/Actions/Oem/NvidiaUpdateService.PublicKeyExchange
Where:
password– BlueField BMC passwordremote_server_ip– the IP address of the server hosting the BFB fileremote_server_public_key– remote server's public key from thessh-keyscanresponse, which contains both the type and the public key with one space between the two fields (i.e., "<type> <public_key>")bmc_ip– BMC IP address
Extract the BMC public key information (i.e., "
<type> <bmc_public_key> <username>@<hostname>") from thePublicKeyExchangeresponse and append it to theauthorized_keysfile on the remote server holding the BFB file. This enables password-less key-based authentication for users.{ "@Message.ExtendedInfo": [ { "@odata.type": "#Message.v1_1_1.Message", "Message": "Please add the following public key info to ~/.ssh/authorized_keys on the remote server", "MessageArgs": [ "<type> <bmc_public_key> root@dpu-bmc" ] }, { "@odata.type": "#Message.v1_1_1.Message", "Message": "The request completed successfully.", "MessageArgs": [], "MessageId": "Base.1.15.0.Success", "MessageSeverity": "OK", "Resolution": "None" } ] }
Tracking Image Transfer Status and Progress
After receiving a success message of a valid SimpleUpdate request and a running task state. Run the following Redfish command to track image transfer status and progress:
curl -k -u root:'<password>' -X GET https://<bmc_ip>/redfish/v1/TaskService/Tasks/<task_id>
During the transfer, the PercentComplete value remains at 0 for offline update flow. If no errors occur, the TaskState is set to Running, and a keep-alive message is generated every 5 minutes. Once the transfer is completed, the PercentComplete is set to 100, and the TaskState is updated to Completed. Upon failure, a message is generated with the relevant resolution.
Example:
{
"@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry",
"Message": "Image 'new.bfb' is being transferred to '/dev/rshim0/boot'.",
"MessageArgs": [
"new.bfb",
"/dev/rshim0/boot"
],
"MessageId": "Update.1.0.TransferringToComponent",
"Resolution": "Transfer started",
"Severity": "OK"
},
…
"PercentComplete": 60,
"StartTime": "2024-06-10T19:39:03+00:00",
"TaskMonitor": "/redfish/v1/TaskService/Tasks/1/Monitor",
"TaskState": "Running",
"TaskStatus": "OK"
Installation Status and Activation
Tracking Offline Update Flow Installation Status
In the Offline Update Flow, once the image transfer finishes, users can use the RShim miscellaneous messages log dump to track the installation's progress and status.
Initiate request for dump download:
sudo curl -k -u root:'<password>' -d '{"DiagnosticDataType": "Manager"}' -X POST https://<ip_address>/redfish/v1/Managers/Bluefield_BMC/LogServices/Dump/Actions/LogService.CollectDiagnosticData
Where:
<ip-address>– BMC IP address<password>– BMC password
Use the received task ID to poll for dump completion:
sudo curl -k -u root:'<password>' -H 'Content-Type: application/json' -X GET https://<ip_address>/redfish/v1/TaskService/Tasks/<task_id>
Where:
<ip-address>– BMC IP address<password>– BMC password<task_id>– Task ID received from the first command
Once dump is complete, download and review the dump:
sudo curl -k -u root:'<password>' -H 'Content-Type: application/json' -X GET https://<ip_address>/redfish/v1/Managers/Bluefield_BMC/LogServices/Dump/Entries/<entry_id>/attachment --output </path/to/tar/log_dump.tar.xz>
Where:
<ip-address>– BMC IP address<password>– BMC password<entry_id>– The entry ID of the dump inredfish/v1/Managers/Bluefield_BMC/LogServices/Dump/Entries</path/to/tar/log_dump.tar.xz>– path to download the log dumplog_dump.tar.xz
Untar the file to review the logs. For example:
tar xvfJ log_dump.tar.xz
The log is contained in the
rshim.logfile. The log displaysReboot,finished,DPU is ready, orIn Enhanced NIC modewhen BFB installation completes.NoteIf the downloaded log file does not contain any of these strings, keep downloading the log file until they appear.
When installation is complete, you may crosscheck the new BFB version against the version provided to verify a successful upgrade:
curl -k -u root:"<PASSWORD>" -H "Content-Type: application/json" -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/DPU_OS
Example response:
"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/DPU_OS", "@odata.type": "#SoftwareInventory.v1_4_0.SoftwareInventory", "Description": "Host image", "Id": "DPU_OS", "Members@odata.count": 1, "Name": "Software Inventory", "RelatedItem": [ { "@odata.id": "/redfish/v1/Systems/Bluefield/Bios" } ], "SoftwareId": "", "Status": { "Conditions": [], "Health": "OK", "HealthRollup": "OK", "State": "Enabled" }, "Updateable": true, "Version": "DOCA_2.2.0_BSP_4.2.1_Ubuntu_22.04-8.23-07"
Deferred Update Flow
Checking Staging Status
Check the staging status after the transfer (i.e., the SimpleUpdate task) is completed successfully. A successful result of the staging procedure will be com.nvidia.BF.Rshim.Status.Completed after staging completes.
curl -k -u root:'<password>' -H "Content-Type: application/json" -X GET https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Actions/Oem/NvidiaManager.GetUpdateStatus
{
"UpdateStatus": "com.nvidia.BF.Rshim.Status.Completed"
}
The UpdateStatus can be:
'com.nvidia.BF.Rshim.Status.Invalid'NoteThis references not having RShim on BMC side.
'com.nvidia.BF.Rshim.Status.Idle''com.nvidia.BF.Rshim.Status.InProgress''com.nvidia.BF.Rshim.Status.Completed''com.nvidia.BF.Rshim.Status.Failed'
The default status is com.nvidia.BF.Rshim.Status.Idle and it take a while to update the status from com.nvidia.BF.Rshim.Status.Idle to com.nvidia.BF.Rshim.Status.InProgress after the SimpleUpdate command is sent. The final status should be com.nvidia.BF.Rshim.Status.Completed or com.nvidia.BF.Rshim.Status.Failed.
Activate the Firmware Components
Once staging is completed successfully, issue the Activate command. Activation is required to apply the new staged components:
curl -k -u root:'<password>' \
-H "Content-Type: application/json" \
-X POST https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Actions/Oem/NvidiaManager.Activate
Notes on BMC firmware activation:
Regular BFB bundle – BMC firmware is updated without needing this manual activation command.
PLDM BFB bundle – This activation command is required to apply the new BMC firmware.
DOCA Components Update
To complete an update to a new GA release, the DOCA Components on the Arm OS are to be updated as well. User may SSH into the DPU Arm OS and use standard Linux tools to update the DOCA components. See section "Upgrading BlueField Using Standard Linux Tools" in DOCA documentation for more details.
Applying New BFB Image
The following are different options for applying the new version:
Reset Type | Mode of Operation | Applying Reset Steps | Notes |
Cold Boot (AC/DC Power Cycle) |
|
|
|
Standard Warm Reboot |
|
|
|
Coordinated Reset (Server + DPU) | DPU Mode |
|
|
Verify New Components are Running
After DPU reboots, check that the components have been updated:
curl -k -u root:'<Password>' -X GET https://<bmc ip>/redfish/v1/UpdateService/FirmwareInventory/DPU_NIC
curl -k -u root:'<Password>' -X GET https://<bmc ip>/redfish/v1/UpdateService/FirmwareInventory/DPU_ATF
curl -k -u root:'<Password>' -X GET https://<bmc ip>/redfish/v1/UpdateService/FirmwareInventory/DPU_UEFI
If RShim is disabled:
{ "error": { "@Message.ExtendedInfo": [ { "@odata.type": "#Message.v1_1_1.Message", "Message": "The requested resource of type Target named '/dev/rshim0/boot' was not found.", "MessageArgs": [ "Target", "/dev/rshim0/boot" ], "MessageId": "Base.1.15.0.ResourceNotFound", "MessageSeverity": "Critical", "Resolution": "Provide a valid resource identifier and resubmit the request." } ], "code": "Base.1.15.0.ResourceNotFound", "message": "The requested resource of type Target named '/dev/rshim0/boot' was not found." }
If a username or any other required field is missing:
{ "Username@Message.ExtendedInfo": [ { "@odata.type": "#Message.v1_1_1.Message", "Message": "The create operation failed because the required property Username was missing from the request.", "MessageArgs": [ "Username" ], "MessageId": "Base.1.15.0.CreateFailedMissingReqProperties", "MessageSeverity": "Critical", "Resolution": "Correct the body to include the required property with a valid value and resubmit the request if the operation failed." } ] }
If host identity is not confirmed or the provided host key is wrong:
{ "@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry", "Message": "Transfer of image '<file_name>' to '/dev/rshim0/boot' failed.", "MessageArgs": [ "<file_name>, "/dev/rshim0/boot" ], "MessageId": "Update.1.0.TransferFailed", "Resolution": " Unknown Host: Please provide server's public key using PublicKeyExchange ", "Severity": "Critical" } … "PercentComplete": 0, "StartTime": "<start_time>", "TaskMonitor": "/redfish/v1/TaskService/Tasks/<task_id>/Monitor", "TaskState": "Exception", "TaskStatus": "Critical"
InfoIn this case, revoke the remote server key using the following Redfish command:
curl -k -u root:'<password>' -H "Content-Type: application/json" -X POST -d '{"RemoteServerIP":"<remote_server_ip>"}' https://<bmc_ip>/redfish/v1/UpdateService/Actions/Oem/NvidiaUpdateService.RevokeAllRemoteServerPublicKeys
Where:
remote_server_ip– remote server's IP addressbmc_ip– BMC IP address
Then repeat the following steps:
If the BMC identity is not confirmed:
{ "@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry", "Message": "Transfer of image '<file_name>' to '/dev/rshim0/boot' failed.", "MessageArgs": [ "<file_name>", "/dev/rshim0/boot" ], "MessageId": "Update.1.0.TransferFailed", "Resolution": "Unauthorized Client: Please use the PublicKeyExchange action to receive the system's public key and add it as an authorized key on the remote server", "Severity": "Critical" } … "PercentComplete": 0, "StartTime": "<start_time>", "TaskMonitor": "/redfish/v1/TaskService/Tasks/<task_id>/Monitor", "TaskState": "Exception", "TaskStatus": "Critical"
InfoIn this case, verify that the BMC key has been added correctly to the
authorized_keyfile on the remote server.If SCP fails:
{ "@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry", "Message": "Transfer of image '<file_name>' to '/dev/rshim0/boot' failed.", "MessageArgs": [ "<file_name>", "/dev/rshim0/boot" ], "MessageId": "Update.1.0.TransferFailed", "Resolution": "Failed to launch SCP", "Severity": "Critical" } … "PercentComplete": 0, "StartTime": "<start_time>", "TaskMonitor": "/redfish/v1/TaskService/Tasks/<task_id>/Monitor", "TaskState": "Exception", "TaskStatus": "Critical"