Bare-metal Reprovisioning

NVIDIA BlueField BMC Software v23.07
Warning

Relevant only for NVIDIA® BlueField®-3 and later. Not supported in NIC mode.

The re-provisioning flow of the BlueField-3 bare metal network offers a solution for restoring the BlueField-3 system without relying on external measures. This method ensures the system can be brought back to its initial state, enabling the reloading of the operational image.

To facilitate this approach, the BMC is responsible for maintaining and managing a golden image for the UEFI and the NIC. This allows the UEFI to retrieve the operational image from the network via protocols such as HTTP or PXE.

The following block diagram describes in high level the system components and the data flow:

network-reprovisioning.png

The entire flow of the network re-provisioning includes the following primary stages:

  1. Initial provisioning of the golden images to the BMC.

    Note

    This process usually takes place during system manufacturing.

  2. In-field update process enables the updating of golden Images.

  3. OOB network configuration involves configuring the network settings.

  4. Recovering the system by reinstalling the golden images.

Initial Provisioning of Golden Image to BMC

To initiate the initial provisioning of the Golden images, the BMC must be connected to the OOB network. The user is required to copy the images from their local storage to the BMC by utilizing a standard scp command over the network. Once the images are successfully located within the BMC, the user must log into the BMC to initiate the provisioning process which involves transferring the golden images into the BMC's non-volatile storage. To accomplish this, a dedicated utility provided within the BMC can be used. It is crucial to ensure that the BMC remains powered on and uninterrupted during this stage to avoid potential problems.

The current flow supports the portioning of the golden images golden_image_arm and golden_image_nic.

To copy the golden images from the local environment into the BMC, run:

  • For golden_image_nic:

    Copy
    Copied!
                

    #host> scp <nic-golden-image> root@<bmc-ip>:/tmp

  • For golden_image_arm:

    Copy
    Copied!
                

    #host> scp <arm-golden-image> root@<bmc-ip>:/tmp

After copying the golden images to the BMC's /tmp directory, the user must log into the BMC and execute the following commands to provision the golden images into the BMC's non-volatile storage:

  • For golden_image_nic:

    Copy
    Copied!
                

    #bmc> dpu_golden_image golden_image_nic -w /tmp/<nic-golden-image>

  • For golden_image_arm:

    Copy
    Copied!
                

    #bmc> dpu_golden_image golden_image_arm -w /tmp/<arm-golden-image>

Once the golden images have been provisioned to the BMC's non-volatile storage, the user must execute the following commands to verify the accuracy and correctness of the images:

  • For golden_image_nic:

    Copy
    Copied!
                

    #bmc> dpu_golden_image -v golden_image_nic #bmc> echo $? # Expected Output: 0

  • For golden_image_arm:

    Copy
    Copied!
                

    #bmc> dpu_golden_image -v golden_image_arm #bmc> echo $? # Expected Output: 0

Golden Image Version Information

To get the version of the golden images, run:

  • For golden_image_nic:

    Copy
    Copied!
                

    bmc> dpu_golden_image golden_image_nic -r /tmp/nic_image bmc> sha256sum /tmp/nic_image

  • For golden_image_arm:

    Copy
    Copied!
                

    bmc> dpu_golden_image golden_image_arm -r /tmp/arm_image bmc> sha256sum /tmp/arm_image

OOB Network Configuration

To enhance the system's security, a new mechanism has been introduced to control network connectivity over the OOB network. This new feature provides an IPMI command to disable any communication between the DPU BMC, DPU, and the OOB management network. A set of IPMI commands are introduced to selectively enable the network on each of the above interfaces. This permits the platform's RoT to have complete control over which network interfaces can be enabled and when.

Warning

This IPMI can only be sent by the platform's ROT. OOB and DPU are blocked.

By default, the OOB interface is enabled. However, for the host BMC to gain control over this interface, it must disable it during the initial boot. Once disabled, the interface remains in that state regardless of BMC reboots or system cold boots.

netfunc

cmd

data

Description

0x32

0x97

N/A

Get 3-port switch ports mode:

  • 0x00 – All ports are allowed access to RJ45

  • 0x01 – Only BMC is allowed access to RJ45

  • 0x02 – Only DPU is allowed access to RJ45

  • 0x1F – Neither BMC nor DPU is allowed access to RJ45

0x32

0x98

0x00 – All ports are allowed access to RJ45
0x01 – Only BMC is allowed access to RJ45
0x02 – Only DPU is allowed access to RJ45
0x1F – Neither BMC nor DPU is allowed access to RJ4

Set 3-port switch ports mode

Warning

In all these use cases, the internal pathway connecting the DPU and the BMC remains operational. This enables communication between the BMC and the DPU over the internal network.

Example for disabling the OOB network:

Copy
Copied!
            

#bmc> ipmitool raw 0x32 0x98 0x1F


Golden Images Reprovisioning

The re-provisioning flow is initiated using an IPMI command:

Copy
Copied!
            

#bmc> ipmitool raw 0x32 0x99

This command is designed to be executed exclusively from within the BMC since it has a potentially disruptive impact on the system. When the command is executed, it extracts the golden images from the DPU BMC's non-volatile memory and initiates the recovery process. Once the golden images are pushed to the RShim, the RShim console output is redirected to the BMC console, enabling the user to easily monitor the progress.

Upon successful completion of this command, both the DPU NIC and Arm execute the designated GA image fetched from a preconfigured server.

Arm OS Signal to DPU BMC When it Completes its Flow of Programming via RShim

After BFB installation is complete, the DPU BMC waits for a specific sequence of messages over the RShim log:

Copy
Copied!
            

NIC firmware update done Installation finished Linux up

  • NIC firmware update done – This message indicates that the firmware update for the NIC subsystem has been successfully completed

  • Installation finished – This message signals the completion of the installation process for the BFB from the network

  • Linux up – Upon receiving this message, the DPU BMC acknowledges that the Arm OS has booted up and is ready

Make sure these messages are received in the correct sequence.

Adding Entries to RShim Log from DPU Arm OS

Users can add custom entries to the RShim log from the DPU Arm OS using the bfrshlog command. The syntax of the command is: bfrshlog <output>.

For example, to add the message "Linux up" to the RShim log, run:

Copy
Copied!
            

bfrshlog "Linux up"


Expected Output

  • All output from the DPU Arm console is redirected to the DPU BMC console for monitoring purposes.

  • The steps of the re-provisioning process are printed with [Recovery] prefix and are outlined below:

Copy
Copied!
            

[Recovery] Checking pcie slot is in reset [Recovery] Read golden images from flash [Recovery] Checking rshim interface [Recovery] Set FNP to 0 [Recovery] Starting ATF/UEFI golden image update [Recovery] Finished updating ATF/UEFI golden image [Recovery] Starting NIC FW golden image update [Recovery] Finished updating NIC FW golden image [Recovery] Stop Redfish server [Recovery] Configure Recovery image to boot from network [Recovery] set FNP to 1 [Recovery] Booting BFB from network [Recovery] Start Redfish server [Recovery] Set boot option to default [Recovery] Finished programming image from network. Start DPU hard reset

A failed update prints the following:

Copy
Copied!
            

[Recovery] ERROR: aborting process! PCIE is not in reset. [Recovery] ERROR: Reading golden_image_nic failed [Recovery] ERROR: Reading golden_image_arm failed [Recovery] ERROR: rshim has not started successfully [Recovery] ERROR: pushing ATF/UEFI golden image over rshim failed [Recovery] ERROR: programming of ATF/UEFI golden image failed [Recovery] ERROR: pushing NIC FW golden image over rshim failed [Recovery] ERROR: programming of NIC FW golden image failed [Recovery] ERROR: UEFI not exited boot [Recovery] ERROR: programming of image from network failed


© Copyright 2023, NVIDIA. Last updated on Sep 8, 2023.