Bare-metal Reprovisioning
Relevant for NVIDIA® BlueField®-3 and later in DPU mode only (not supported in NIC mode).
The re-provisioning flow of the BlueField-3 bare metal network offers a solution for restoring the BlueField-3 system without relying on external measures. This method ensures the system can be brought back to its initial state, enabling the reloading of the operational image.
To facilitate this approach, the BMC is responsible for maintaining and managing a golden image for the UEFI and the NIC. This allows the UEFI to retrieve the operational image from the network via protocols such as HTTP or PXE.
The following block diagram describes in high level the system components and the data flow:
The entire flow of the network re-provisioning includes the following primary stages:
Initial provisioning of the golden images to the BMC.
InfoThis process usually takes place during system manufacturing.
In-field update process enables the updating of golden images.
OOB network configuration involves configuring the network settings.
Recovering the system by reinstalling the golden images.
To initiate the initial provisioning of the Golden images, the BMC must be connected to the OOB network. The user is required to copy the images from their local storage to the BMC by utilizing a standard scp command over the network. Once the images are successfully located within the BMC, the user must log into the BMC to initiate the provisioning process which involves transferring the golden images into the BMC's non-volatile storage. To accomplish this, a dedicated utility provided within the BMC can be used. Users must ensure that the BMC remains powered on and uninterrupted during this stage to avoid potential problems.
The current flow supports the portioning of the golden images golden_image_arm and golden_image_nic.
To copy the golden images from the local environment into the BMC, run:
For golden_image_nic:
#host> scp <nic-golden-image-directory>/<nic-golden-image-filename> root@<bmc-ip>:/tmp/golden-image-nic/
For golden_image_arm:
#host> scp <arm-golden-image-directory>/<arm-golden-image-filename> root@<bmc-ip>:/tmp/golden-image-arm/
After the candidate image is copied into the BMC's volatile memory, the version is extracted from it and stored to support certain features.
The NIC firmware version is extracted from the NIC firmware image filename, which only works if it is in the standard format of official releases.
After copying the golden images to the BMC's /tmp/golden-image-nic directory or /tmp/golden-image-arm directory, the user must log into the BMC and execute the following commands to provision the golden images into the BMC's non-volatile storage:
For golden_image_nic:
#bmc> dpu_golden_image golden_image_nic -w /tmp/golden-image-nic/<nic-golden-image-filename>
For golden_image_arm:
#bmc> dpu_golden_image golden_image_arm -w /tmp/golden-image-arm/<arm-golden-image-filename>
Once the golden images have been provisioned to the BMC's non-volatile storage, the user must execute the following commands to verify the correctness of the images:
For golden_image_nic:
#bmc> dpu_golden_image -v golden_image_nic #bmc> echo $?
Expected output is 0.
For golden_image_arm:
#bmc> dpu_golden_image -v golden_image_arm #bmc> echo $?
Expected output is 0.
This feature is available only for Golden Images installed following the upgrade of the BMC firmware to version 24.07-14 or later.
To get the human-readable version (MAJOR.MINOR.PATCH.BUILD versioning scheme) of the golden images, run:
For golden_image_nic:
bmc> dpu_golden_image golden_image_nic -V -H
For golden_image_arm:
bmc> dpu_golden_image golden_image_arm -V -H
To get the sha256sum values of the golden images, run:
For golden_image_nic:
bmc> dpu_golden_image golden_image_nic -V
For golden_image_arm:
bmc> dpu_golden_image golden_image_arm -V
This feature is available only for Golden Images installed following the upgrade of the BMC firmware to version 24.07-14 or later.
To get the human-readable version (MAJOR.MINOR.PATCH.BUILD versioning scheme) of the golden images over the Redfish interface, run:
For Arm golden image:
curl -k -u
'<username>'
:'<password>'
-H'Content-type: application/json'
-X GET'https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/golden_image_arm'
For NIC golden image:
curl -k -u
'<username>'
:'<password>'
-H'Content-type: application/json'
-X GET'https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/golden_image_nic'
To initiate an update, run the following command from the host:
For NIC golden image:
curl -k -u root:'<password>' -H "Content-Type: application/json" -X POST -d '{"TransferProtocol":"HTTP", "ImageURI":"<remote-server-ip>/<nic-golden-image-path>","Targets":["redfish/v1/UpdateService/FirmwareInventory/golden_image_nic"]}' https://<bmc-ip>/redfish/v1/UpdateService/Actions/UpdateService.SimpleUpdate
For Arm golden Image:
curl -k -u root:'<password>' -H "Content-Type: application/json" -X POST -d '{"TransferProtocol":"HTTP", "ImageURI":"<remote-server-ip>/<arm-golden-image-path>","Targets":["redfish/v1/UpdateService/FirmwareInventory/golden_image_arm"]}' https://<bmc-ip>/redfish/v1/UpdateService/Actions/UpdateService.SimpleUpdate
Where:
ImageURI – the image URI format should be <remote-server-ip>/<golden-image-path>
bmc-ip – BMC IP address
After initiating the update, a new task is created for monitoring the progress:
{ "@odata.id": "/redfish/v1/TaskService/Tasks/0", "@odata.type": "#Task.v1_4_3.Task", "Id": "0", "TaskState": "Running", "TaskStatus": "OK" }
To track the progress of the update:
curl -k -u root:'<password>' -X GET https://<bmc-ip>/redfish/v1/TaskService/Tasks/<task-id>
The update progress has three states: 0,10,100. After a successful update, the following output is expected:
"PercentComplete": 100, "TaskState": "Completed", "TaskStatus": "OK"
In case of a failure, it is recommended to reboot the BMC and retry the update.
InfoThe golden image update may take between 1-3 minutes.
To enhance the system's security, a new mechanism has been introduced to control network connectivity over the OOB network. This new feature provides an IPMI command to disable any communication between the BlueField BMC, BlueField, and the OOB management network. A set of IPMI commands are introduced to selectively enable the network on each of the above interfaces. This permits the platform's RoT to have complete control over which network interfaces can be enabled and when.
This IPMI can only be sent by the platform's ROT. OOB and BlueField are blocked.
By default, the OOB interface is enabled. However, for the host BMC to gain control over this interface, it must disable it during the initial boot. Once disabled, the interface remains in that state regardless of BMC reboots or system cold boots.
For more details, refer to "OOB Network 3-Port Switch Control".
The re-provisioning flow is initiated using an IPMI command:
#bmc> ipmitool raw 0x32 0x99 <golden_image_timeout> <timeout_from_network> <verbosity_level> <halt_hard_reset>
This command is designed to be executed exclusively from within the BMC since it has a potentially disruptive impact on the system. When the command is executed, it extracts the golden images from the BlueField BMC's non-volatile memory and initiates the recovery process. Once the golden images are pushed to the RShim, the RShim console output is redirected to the BMC console, enabling the user to easily monitor the progress.
Upon successful completion of this command, both the BlueField NIC and Arm execute the designated GA image fetched from a preconfigured server.
golden_image_timeout – timeout value, in minutes, for updating the golden images. For default value (15), users may input 0.
timeout_from_network – timeout value, in minutes, for booting the operational image from the network. For default value (60), users may input 0.
Verbosity level defines the type of messages that will appear during the reprovisioning process:
0 – Quiet mode; only error messages appear on the screen
1 – Info mode; only error messages and re-provisioning process messages appear on the screen
2 – Full mode; all messages appear on the screen including BlueField RShim messages
halt_hard_reset (optional) -s pecifies whether to halt the reprovisioning process before the final hard reset of the BlueField. This hard reset is the last step of the reprovisioning process and is necessary to activate the NIC firmware installed from the network.
Allowed Values:
0 - Perform the Hard reset to Complete the reprovisioning process (default behavior).
1 - Halt the reprovisioning process before performing the final hard reset of the BlueField.
InfoReprovisioning messages have the following prefix: [<running date> GOLDEN-IMAGE-RECOVERY].
After BFB installation is complete, the BlueField BMC waits for a specific sequence of messages over the RShim log:
NIC firmware update done
Installation finished
Linux up
NIC firmware update done – This message indicates that the firmware update for the NIC subsystem has been successfully completed
Installation finished – This message signals the completion of the installation process for the BFB from the network
Linux up – Upon receiving this message, the BlueField BMC acknowledges that the Arm OS has booted up and is ready
BlueField BMC expects these messages in the specified order.
Users can add custom entries to the RShim log from the BlueField Arm OS using the bfrshlog command. The syntax of the command is: bfrshlog <output>.
For example, to add the message "Linux up" to the RShim log, run:
bfrshlog "Linux up"
All output from the BlueField Arm console is redirected to the BlueField BMC console for monitoring purposes.
The steps of the re-provisioning process are printed with [<running date> GOLDEN-IMAGE-RECOVERY] prefix and are outlined in the following:
[<running date> GOLDEN-IMAGE-RECOVERY] Checking pcie slot is in reset
[<running date> GOLDEN-IMAGE-RECOVERY] Read golden images from flash
[<running date> GOLDEN-IMAGE-RECOVERY] Set FNP to 0
[<running date> GOLDEN-IMAGE-RECOVERY] Checking rshim interface after SOC hard reset
[<running date> GOLDEN-IMAGE-RECOVERY] Starting ATF/UEFI golden image update
[<running date> GOLDEN-IMAGE-RECOVERY] Finished updating ATF/UEFI golden image
[<running date> GOLDEN-IMAGE-RECOVERY] Starting NIC FW golden image update
[<running date> GOLDEN-IMAGE-RECOVERY] Finished updating NIC FW golden image
[<running date> GOLDEN-IMAGE-RECOVERY] Stop Redfish server
[<running date> GOLDEN-IMAGE-RECOVERY] Configure Recovery image to boot from network
[<running date> GOLDEN-IMAGE-RECOVERY] set FNP to 1
[<running date> GOLDEN-IMAGE-RECOVERY] Booting BFB from network
[<running date> GOLDEN-IMAGE-RECOVERY] Start Redfish server
[<running date> GOLDEN-IMAGE-RECOVERY] Set boot option to default
if halt_hard_reset is 0:
[<running date> GOLDEN-IMAGE-RECOVERY] Finished programming image from network. Start DPU hard reset
if halt_hard_reset is 1:
[<running date> GOLDEN-IMAGE-RECOVERY] Finished programming image from network
[<running date> GOLDEN-IMAGE-RECOVERY] The Reprovisioning process was halted at user's request. To complete the process, please power cycle the device
A failed update prints the following:
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: aborting process! PCIE is not in reset.
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: Reading golden_image_nic failed
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: Reading golden_image_arm failed
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: rshim has not started successfully
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: pushing ATF/UEFI golden image over rshim failed
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: programming of ATF/UEFI golden image failed
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: pushing NIC FW golden image over rshim failed
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: programming of NIC FW golden image failed
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: failed to configure image to boot from network
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: programming of image from network failed: NIC firmware update failed
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: programming of image from network failed: Installation failed
[<running date> GOLDEN-IMAGE-RECOVERY] ERROR: programming of image from network failed: Failed to get Linux up
Due to line buffering in the BlueField Arm console, buffered output lines receive the same timestamp value in <running date> when they are redirected to the BlueField BMC console.