Bare-metal Reprovisioning
Relevant only for NVIDIA® BlueField®-3 and later. Not supported in NIC mode.
The re-provisioning flow of the BlueField-3 bare metal network offers a solution for restoring the BlueField-3 system without relying on external measures. This method ensures the system can be brought back to its initial state, enabling the reloading of the operational image.
To facilitate this approach, the BMC is responsible for maintaining and managing a golden image for the UEFI and the NIC. This allows the UEFI to retrieve the operational image from the network via protocols such as HTTP or PXE.
The following block diagram describes in high level the system components and the data flow:
The entire flow of the network re-provisioning includes the following primary stages:
Initial provisioning of the golden images to the BMC.
NoteThis process usually takes place during system manufacturing.
In-field update process enables the updating of golden Images.
OOB network configuration involves configuring the network settings.
Recovering the system by reinstalling the golden images.
Initial Provisioning of Golden Image to BMC
To initiate the initial provisioning of the Golden images, the BMC must be connected to the OOB network. The user is required to copy the images from their local storage to the BMC by utilizing a standard scp command over the network. Once the images are successfully located within the BMC, the user must log into the BMC to initiate the provisioning process which involves transferring the golden images into the BMC's non-volatile storage. To accomplish this, a dedicated utility provided within the BMC can be used. It is crucial to ensure that the BMC remains powered on and uninterrupted during this stage to avoid potential problems.
The current flow supports the portioning of the golden images golden_image_arm and golden_image_nic.
To copy the golden images from the local environment into the BMC, run:
For golden_image_nic:
#host> scp <nic-golden-image> root@<bmc-ip>:/tmp
For golden_image_arm:
#host> scp <arm-golden-image> root@<bmc-ip>:/tmp
After copying the golden images to the BMC's /tmp directory, the user must log into the BMC and execute the following commands to provision the golden images into the BMC's non-volatile storage:
For golden_image_nic:
#bmc> dpu_golden_image golden_image_nic -w /tmp/<nic-golden-image>
For golden_image_arm:
#bmc> dpu_golden_image golden_image_arm -w /tmp/<arm-golden-image>
Once the golden images have been provisioned to the BMC's non-volatile storage, the user must execute the following commands to verify the accuracy and correctness of the images:
For golden_image_nic:
#bmc> dpu_golden_image -v golden_image_nic #bmc> echo $? # Expected Output: 0
For golden_image_arm:
#bmc> dpu_golden_image -v golden_image_arm #bmc> echo $? # Expected Output: 0
Golden Image Version Information
To get the version of the golden images, run:
For golden_image_nic:
bmc> dpu_golden_image golden_image_nic -r /tmp/nic_image bmc> sha256sum /tmp/nic_image
For golden_image_arm:
bmc> dpu_golden_image golden_image_arm -r /tmp/arm_image bmc> sha256sum /tmp/arm_image
OOB Network Configuration
To enhance the system's security, a new mechanism has been introduced to control network connectivity over the OOB network. This new feature provides an IPMI command to disable any communication between the DPU BMC, DPU, and the OOB management network. A set of IPMI commands are introduced to selectively enable the network on each of the above interfaces. This permits the platform's RoT to have complete control over which network interfaces can be enabled and when.
This IPMI can only be sent by the platform's ROT. OOB and DPU are blocked.
By default, the OOB interface is enabled. However, for the host BMC to gain control over this interface, it must disable it during the initial boot. Once disabled, the interface remains in that state regardless of BMC reboots or system cold boots.
netfunc |
cmd |
data |
Description |
0x32 |
0x97 |
N/A |
Get 3-port switch ports mode:
|
0x32 |
0x98 |
0x00 – All ports are allowed access to RJ45 |
Set 3-port switch ports mode |
In all these use cases, the internal pathway connecting the DPU and the BMC remains operational. This enables communication between the BMC and the DPU over the internal network.
Example for disabling the OOB network:
#bmc> ipmitool raw 0x32 0x98 0x1F
Golden Images Reprovisioning
The re-provisioning flow is initiated using an IPMI command:
#bmc> ipmitool raw 0x32 0x99
This command is designed to be executed exclusively from within the BMC since it has a potentially disruptive impact on the system. When the command is executed, it extracts the golden images from the DPU BMC's non-volatile memory and initiates the recovery process. Once the golden images are pushed to the RShim, the RShim console output is redirected to the BMC console, enabling the user to easily monitor the progress.
Upon successful completion of this command, both the DPU NIC and Arm execute the designated GA image fetched from a preconfigured server.
Arm OS Signal to DPU BMC When it Completes its Flow of Programming via RShim
After BFB installation is complete, the DPU BMC waits for a specific sequence of messages over the RShim log:
NIC firmware update done
Installation finished
Linux up
NIC firmware update done – This message indicates that the firmware update for the NIC subsystem has been successfully completed
Installation finished – This message signals the completion of the installation process for the BFB from the network
Linux up – Upon receiving this message, the DPU BMC acknowledges that the Arm OS has booted up and is ready
Make sure these messages are received in the correct sequence.
Adding Entries to RShim Log from DPU Arm OS
Users can add custom entries to the RShim log from the DPU Arm OS using the bfrshlog command. The syntax of the command is: bfrshlog <output>.
For example, to add the message "Linux up" to the RShim log, run:
bfrshlog "Linux up"
Expected Output
All output from the DPU Arm console is redirected to the DPU BMC console for monitoring purposes.
The steps of the re-provisioning process are printed with [Recovery] prefix and are outlined below:
[Recovery] Checking pcie slot is in reset
[Recovery] Read golden images from flash
[Recovery] Checking rshim interface
[Recovery] Set FNP to 0
[Recovery] Starting ATF/UEFI golden image update
[Recovery] Finished updating ATF/UEFI golden image
[Recovery] Starting NIC FW golden image update
[Recovery] Finished updating NIC FW golden image
[Recovery] Stop Redfish server
[Recovery] Configure Recovery image to boot from network
[Recovery] set FNP to 1
[Recovery] Booting BFB from network
[Recovery] Start Redfish server
[Recovery] Set boot option to default
[Recovery] Finished programming image from network. Start DPU hard reset
A failed update prints the following:
[Recovery] ERROR: aborting process! PCIE is not in reset.
[Recovery] ERROR: Reading golden_image_nic failed
[Recovery] ERROR: Reading golden_image_arm failed
[Recovery] ERROR: rshim has not started successfully
[Recovery] ERROR: pushing ATF/UEFI golden image over rshim failed
[Recovery] ERROR: programming of ATF/UEFI golden image failed
[Recovery] ERROR: pushing NIC FW golden image over rshim failed
[Recovery] ERROR: programming of NIC FW golden image failed
[Recovery] ERROR: UEFI not exited boot
[Recovery] ERROR: programming of image from network failed