Deploying BlueField Software Using BFB from BMC

1.0
Note

It is recommended to upgrade your BlueField product to the latest software and firmware versions available to benefit from new features and latest bug fixes.

Warning

This section assumes that a BlueField DPU has already been installed in a server according to the instructions detailed in the DPU's hardware user guide.

The following table lists an overview of the steps required to install Ubuntu BFB on your DPU:

Step

Procedure

Direct Link

1

Verify that RShim is already running on BMC

Ensure RShim is Running on BMC

2

Change the default credentials using bf.cfg file (optional)

Changing Default Credentials Using bf.cfg

3

Install the Ubuntu BFB image

BFB Installation

4

Verify installation completed successfully

Verify BFB is Installed

5

Upgrade the firmware on your DPU

Firmware Upgrade

Warning

It is is important to learn your DPU's device-id for performing some of the software installations or upgrades in this guide.

To determine the device ID of the DPUs on your setup, run:

Copy
Copied!
            

mst start mst status -v

Example output:

Copy
Copied!
            

MST modules: ------------ MST PCI module is not loaded MST PCI configuration module loaded PCI devices: ------------ DEVICE_TYPE MST PCI RDMA NET NUMA BlueField2(rev:1) /dev/mst/mt41686_pciconf0.1 3b:00.1 mlx5_1 net-ens1f1 0   BlueField2(rev:1) /dev/mst/mt41686_pciconf0 3b:00.0 mlx5_0 net-ens1f0 0   BlueField3(rev:1)       /dev/mst/mt41692_pciconf0.1   e2:00.1   mlx5_1          net-ens7f1np1             4   BlueField3(rev:1)       /dev/mst/mt41692_pciconf0     e2:00.0   mlx5_0          net-ens7f0np0             4

The device IDs for the BlueField-2 and BlueField-3 DPUs in this example are /dev/mst/mt41686_pciconf0 and /dev/mst/mt41692_pciconf0 respectively.

Display the current setting. Run:

Copy
Copied!
            

# cat /dev/rshim<N>/misc | grep DEV_NAME DEV_NAME usb-1.0

This output indicates that the RShim service is ready to use. If you do not receive this output:

  1. Restart RShim service:

    Copy
    Copied!
                

    sudo systemctl restart rshim

  2. Verify the current setting again. Run:

    Copy
    Copied!
                

    # cat /dev/rshim<N>/misc | grep DEV_NAME

    If DEV_NAME does not appear, then proceed to "RShim driver not loading on DPU with integrated BMC".

Ubuntu users are prompted to change the default password (ubuntu) for the default user (ubuntu) upon first login. Logging in will not be possible even if the login prompt appears until all services are up ("DPU is ready" message appears in /dev/rshim0/misc).

Warning

Attempting to log in before all services are up prints the following message: "Permission denied, please try again."

Alternatively, Ubuntu users can provide a unique password that will be applied at the end of the BFB installation. This password would need to be defined in a bf.cfg configuration file. To set the password for the ubuntu user:

  1. Create password hash. Run:

    Copy
    Copied!
                

    # openssl passwd -1 Password: Verifying - Password: $1$3B0RIrfX$TlHry93NFUJzg3Nya00rE1

  2. Add the password hash in quotes to the bf.cfg file:

    Copy
    Copied!
                

    # vim bf.cfg ubuntu_PASSWORD='$1$3B0RIrfX$TlHry93NFUJzg3Nya00rE1'

    The bf.cfg file will be used with the bfb-install script in the following step.

    Warning

    Password policy:

    • Minimum password length – 8

    • At least one upper-case letter

    • At least one lower-case letter

    • At least one numerical character

Warning

For comprehensive list of the supported parameters to customize bf.cfg during BFB installation, refer to section "bf.cfg Parameters".

To update the software on the BlueField Arm cores, the DPU must be booted up without mounting the eMMC flash device. This requires an external boot flow where a BFB (which includes ATF, UEFI, Linux kernel, and initramfs) is pushed from an external host via USB or PCIe. On BlueField DPUs with an integrated BMC, the USB interface is internally connected to the BMC and is enabled by default. Therefore, you must verify that the RShim driver is running on the BMC. This provides the ability to push a bootstream over the USB interface to perform an external boot.

The BFB installation procedure consists of the following main stages:

  1. Enabling RShim on the BMC.

  2. Initiating the BFB update procedure by transferring the BFB image using one of the following options:

    1. Direct SCP.

      1. Running an SCP command.

    2. Redfish interface:

      1. Confirming the identity of the host and BMC—required only during first-time setup or after BMC factory reset.

      2. Sending an Update-Service Simple-Update request.

Enabling RShim on BMC

  1. Disable RShim on the host. Run on the host:

    Copy
    Copied!
                

    systemctl stop rshim systemctl disable rshim

  2. Enable RShim on the BMC using the Redfish interface:

    Copy
    Copied!
                

    curl -k -u root:'<password>' -H "Content-Type: application/json" -XPATCH -d '{ "BmcRShim": { "BmcRShimEnabled": true } }' https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Oem/Nvidia

    If this fails, repeat steps 1 and 2.

    Note

    To disable RShim on the BMC, set BmcRShimEnabled to false.

Transferring BFB Image

Since the BFB is too large to store on the BMC flash or tmpfs, the image must be written to the RShim device. This can be done by either running SCP directly or using the Redfish interface. This can be done by either running SCP directly or using the Redfish interface.

Direct SCP

Copy
Copied!
            

scp <path_to_bfb> root@<bmc_ip>:/dev/rshim0/boot


Redfish

The following is a simple sequence diagram illustrating the flow of the BFB installation process with detailed instructions outlining each step in this process afterwards.

redfish-transferring-bfb-image.png

  1. Confirm the identity of the host and BMC.

    Note

    Required only during first-time setup or after BMC factory reset.

    1. Run the following on the host:

      Copy
      Copied!
                  

      ssh-keyscan -t <key_type> <remote_server_ip>

      Where:

      • key_type – the type of key associated with the server storing the BFB file (e.g., ed25519)

      • remote_server_ip – the IP address of the server hosting the BFB file

    2. Retrieve the remote server's public key from the response and run the following on the host:

      Copy
      Copied!
                  

      curl -k -u root:'<password>' -H "Content-Type: application/json" -X POST -d '{"RemoteServerIP":"<remote_server_ip>", "RemoteServerKeyString":"<remote_server_public_key>"}' https://<bmc_ip>/redfish/v1/UpdateService/Actions/Oem/NvidiaUpdateService.PublicKeyExchange

      Where:

      • remote_server_ip – the IP address of the server hosting the BFB file

      • remote_server_public_key – remote server's public key from the ssh-keyscan response, which contains both the type and the public key with a space between the two fields (i.e., "<type> <public_key>").

      • bmc_ip – BMC IP address

    3. Extract the BMC public key information (i.e., "<type> <bmc_public_key> <username>@<hostname>") from the PublicKeyExchange response and append it to the authorized_keys file on the remote server. This enables passwordless key-based authentication for users.

      Copy
      Copied!
                  

      { "@Message.ExtendedInfo": [ { "@odata.type": "#Message.v1_1_1.Message", "Message": "Please add the following public key info to ~/.ssh/authorized_keys on the remote server", "MessageArgs": [ "<type> <bmc_public_key> root@dpu-bmc" ] }, { "@odata.type": "#Message.v1_1_1.Message", "Message": "The request completed successfully.", "MessageArgs": [], "MessageId": "Base.1.15.0.Success", "MessageSeverity": "OK", "Resolution": "None" } ] }

      Append "<bmc_key_type> <bmc_public_key> root@dpu-bmc".

    4. If the remote server public key must be revoked, use the following command before repeating the previous step:

      Copy
      Copied!
                  

      curl -k -u root:'<password>' -H "Content-Type: application/json" -X POST -d '{"RemoteServerIP":"<remote_server_ip>"}' https://<bmc_ip>/redfish/v1/UpdateService/Actions/Oem/NvidiaUpdateService.RevokeAllRemoteServerPublicKeys

      Where:

      • remote_server_ip – remote server's IP address

      • bmc_ip – BMC IP address

  2. Run the following on the host:

    Copy
    Copied!
                

    curl -k -H root:'<password>' -H "Content-Type: application/json" -X POST -d '{"TransferProtocol":"SCP", "ImageURI":"<image_uri>","Targets":["redfish/v1/UpdateService/FirmwareInventory/DPU_OS"], "Username":"<username>"}' https://<bmc_ip>/redfish/v1/UpdateService/Actions/UpdateService.SimpleUpdate

    Warning

    This command uses SCP for the image transfer, initiates a soft reset on the BlueField and then pushes the boot stream. For Ubuntu BFBs, the eMMC is flashed automatically once the bootstream is pushed. On success, a "running" message is received with the current task ID.

    Where:

    • image_uri – the image URI format should be <remote_server_ip>/<path_to_bfb>

    • username – username on the remote server

    • bmc_ip – BMC IP address

      Examples:

      • If RShim is disabled:

        Copy
        Copied!
                    

        {  "error": {    "@Message.ExtendedInfo": [ {        "@odata.type": "#Message.v1_1_1.Message",        "Message": "The requested resource of type Target named '/dev/rshim0/boot' was not found.",        "MessageArgs": [          "Target",          "/dev/rshim0/boot" ],        "MessageId": "Base.1.15.0.ResourceNotFound",        "MessageSeverity": "Critical",        "Resolution": "Provide a valid resource identifier and resubmit the request." } ],    "code": "Base.1.15.0.ResourceNotFound",    "message": "The requested resource of type Target named '/dev/rshim0/boot' was not found." }

      • If a username or any other required field is missing:

        Copy
        Copied!
                    

        { "Username@Message.ExtendedInfo": [ { "@odata.type": "#Message.v1_1_1.Message", "Message": "The create operation failed because the required property Username was missing from the request.", "MessageArgs": [ "Username" ], "MessageId": "Base.1.15.0.CreateFailedMissingReqProperties", "MessageSeverity": "Critical", "Resolution": "Correct the body to include the required property with a valid value and resubmit the request if the operation failed." } ] }

      • If the request is valid and a task is created:

        Copy
        Copied!
                    

        { "@odata.id": "/redfish/v1/TaskService/Tasks/<task_id>", "@odata.type": "#Task.v1_4_3.Task", "Id": "<task_id>", "TaskState": "Running", "TaskStatus": "OK" }

  3. Run the following on the host to track image transfer progress:

    Copy
    Copied!
                

    curl -k -u root:'<password>' -X GET https://<bmc_ip>/redfish/v1/TaskService/Tasks/<task_id>

    Warning

    During the transfer, the PercentComplete value remains at 0. If no errors occur, the TaskState is set to Running, and a keep-alive message is generated every 5 minutes with the content "Transfer is still in progress (X minutes elapsed). Please wait". Once the transfer is completed, the PercentComplete is set to 100, and the TaskState is updated to Completed.

    Upon failure, a message is generated with the relevant resolution.

    Where:

    • bmc_ip – BMC IP address

    • task_id – task ID

      Examples:

      • If host identity is not confirmed:

        Copy
        Copied!
                    

        { "@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry", "Message": "Transfer of image '<file_name>' to '/dev/rshim0/boot' failed.", "MessageArgs": [ "<file_name>, "/dev/rshim0/boot" ], "MessageId": "Update.1.0.TransferFailed", "Resolution": " Unknown Host: Please provide server's public key using PublicKeyExchange ", "Severity": "Critical" } … "PercentComplete": 0, "StartTime": "<start_time>", "TaskMonitor": "/redfish/v1/TaskService/Tasks/<task_id>/Monitor", "TaskState": "Exception", "TaskStatus": "Critical"

      • If the BMC identity is not confirmed:

        Copy
        Copied!
                    

        { "@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry", "Message": "Transfer of image '<file_name>' to '/dev/rshim0/boot' failed.", "MessageArgs": [ "<file_name>", "/dev/rshim0/boot" ], "MessageId": "Update.1.0.TransferFailed", "Resolution": "Unauthorized Client: Please use the PublicKeyExchange action to receive the system's public key and add it as an authorized key on the remote server", "Severity": "Critical" } … "PercentComplete": 0, "StartTime": "<start_time>", "TaskMonitor": "/redfish/v1/TaskService/Tasks/<task_id>/Monitor", "TaskState": "Exception", "TaskStatus": "Critical"

      • If SCP fails:

        Copy
        Copied!
                    

        { "@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry", "Message": "Transfer of image '<file_name>' to '/dev/rshim0/boot' failed.", "MessageArgs": [ "<file_name>", "/dev/rshim0/boot" ], "MessageId": "Update.1.0.TransferFailed", "Resolution": "Failed to launch SCP", "Severity": "Critical" } … "PercentComplete": 0, "StartTime": "<start_time>", "TaskMonitor": "/redfish/v1/TaskService/Tasks/<task_id>/Monitor", "TaskState": "Exception", "TaskStatus": "Critical"

      • The keep-alive message:

        Copy
        Copied!
                    

        { "@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry", "Message": " <file_name>' is being transferred to '/dev/rshim0/boot'.", "MessageArgs": [ " <file_name>", "/dev/rshim0/boot" ], "MessageId": "Update.1.0.TransferringToComponent", "Resolution": "Transfer is still in progress (5 minutes elapsed): Please wait", "Severity": "OK" } … "PercentComplete": 0, "StartTime": "<start_time>", "TaskMonitor": "/redfish/v1/TaskService/Tasks/<task_id>/Monitor", "TaskState": "Running", "TaskStatus": "OK"

      • Upon success:

        Copy
        Copied!
                    

        { "@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry", "Message": "Device 'DPU' successfully updated with image '<file_name>'.", "MessageArgs": [ "DPU", "<file_name>" ], "MessageId": "Update.1.0.UpdateSuccessful", "Resolution": "None", "Severity": "OK" }, … "PercentComplete": 100, "StartTime": "<start_time>", "TaskMonitor": "/redfish/v1/TaskService/Tasks/<task_id>/Monitor", "TaskState": "Completed", "TaskStatus": "OK"

        Note

        Refer to section "BMC Dump Operations" under "BMC and BlueField Logs" for information on dumping the rshim.log which contains the current RShim miscellaneous messages.

Warning

For comprehensive list of the supported parameters to customize bf.cfg during BFB installation, refer to section "bf.cfg Parameters".

After installation of the Ubuntu OS is complete, the following note appears in /dev/rshim0/misc on first boot:

Copy
Copied!
            

... INFO[MISC]: Linux up INFO[MISC]: DPU is ready

"DPU is ready" indicates that all the relevant services are up and users can login the system.

After the installation of the Ubuntu 20.04 BFB, the configuration detailed in the following sections is generated.

Warning

Make sure all the services (including cloud-init) are started on BlueField before power cycling the host server.

BlueField OS image version is stored under /etc/mlnx-release in the DPU.

Copy
Copied!
            

# cat /etc/mlnx-release DOCA_v1.1_BlueField_OS_Ubuntu_20.04-<version>

Warning

mlxfwreset is not supported in this release. Please power cycle the host where mlxfwreset is requested.

To upgrade firmware:

  1. Set a temporary static IP on the host. Run:

    Copy
    Copied!
                

    sudo ip addr add 192.168.100.1/24 dev tmfifo_net0

  2. SSH to your DPU via 192.168.100.2 (preconfigured). The default credentials for Ubuntu are as follows.

    Username

    Password

    ubuntu

    Set during installation

    For example:

    Copy
    Copied!
                

    ssh ubuntu@192.168.100.2 Password: <unique-password>

  3. Upgrade the firmware on the DPU. Run:

    Copy
    Copied!
                

    sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl --force-fw-update

    Example output:

    Copy
    Copied!
                

    Device #1: ----------   Device Type: BlueField-2 [...] Versions: Current Available FW <Old_FW> <New_FW>

    Warning

    Important! To apply NVConfig changes, stop here and follow the steps in section "Updating NVConfig Params".

  4. Power cycle the host for the changes to take effect.

  1. Reset the nvconfig params to their default values:

    Copy
    Copied!
                

    # sudo mlxconfig -d /dev/mst/<device-name> -y reset   Reset configuration for device /dev/mst/<device-name>? (y/n) [n] : y Applying... Done! -I- Please reboot machine to load new configurations.

  2. (Optional) Enable NVMe emulation. Run:

    Copy
    Copied!
                

    sudo mlxconfig -d <device-name> s NVME_EMULATION_ENABLE=1

  3. Skip this step if your BlueField DPU is Ethernet only. Please refer to section "Supported Platforms and Interoperability" under the Release Notes to learn your DPU type.

    If you have a VPI DPU, the default link type of the ports will be configured to IB. If you want to change the link type to Ethernet, please run the following configuration:

    Copy
    Copied!
                

    sudo mlxconfig -d <device-name> s LINK_TYPE_P1=2 LINK_TYPE_P2=2

  4. Power cycle the host for the mlxconfig settings to take effect.

© Copyright 2023, NVIDIA. Last updated on Nov 21, 2023.