image image image image image

On This Page

It is recommended to upgrade your BlueField product to the latest software and firmware versions available to benefit from new features and latest bug fixes.

This section assumes that you have installed the BlueField OS BFB on your NVIDIA® Converged Accelerator using any of the following guides:

NVIDIA® CUDA® (GPU driver) must be installed in order to use the GPU. For information on how to install CUDA on your Converged Accelerator, refer to NVIDIA CUDA Installation Guide for Linux.

Configuring Operation Mode

After installing the BFB, you may now select the mode you want your NVIDIA Converged Accelerator to operate in.

  • Standard (default) – the NVIDIA® BlueField® DPU and the GPU operate separately (GPU is owned by the host)
  • BlueField-X – the GPU is exposed to the DPU and is no longer visible on the host (GPU is owned by the DPU)

It is is important to learn your DPU's device-id for performing some of the software installations or upgrades in this guide.

To determine the device ID of the DPUs on your setup, run:

mst start
mst status -v

Example output:

MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE             MST                           PCI       RDMA            NET                       NUMA
BlueField2(rev:1)       /dev/mst/mt41686_pciconf0.1   3b:00.1   mlx5_1          net-ens1f1                0
 
BlueField2(rev:1)       /dev/mst/mt41686_pciconf0     3b:00.0   mlx5_0          net-ens1f0                0

BlueField3(rev:1)       /dev/mst/mt41692_pciconf0.1   e2:00.1   mlx5_1          net-ens7f1np1             4

BlueField3(rev:1)       /dev/mst/mt41692_pciconf0     e2:00.0   mlx5_0          net-ens7f0np0             4

The device IDs for the BlueField-2 and BlueField-3 DPUs in this example are /dev/mst/mt41686_pciconf0 and /dev/mst/mt41692_pciconf0 respectively.

BlueField-X Mode

  1. Run the following command from the host: 

    mlxconfig -d /dev/mst/<device-name> s PCI_DOWNSTREAM_PORT_OWNER[4]=0xF
  2. Power cycle the host for the configuration to take effect.

Standard Mode

To return the DPU from BlueField-X mode to Standard mode:

  1. Run the following command from the host: 

    mlxconfig -d /dev/mst/<device-name> s PCI_DOWNSTREAM_PORT_OWNER[4]=0x0
  2. Power cycle the host for the configuration to take effect.

Verifying Configured Operational Mode

Use the following command from host or from DPU:

$ sudo mlxconfig -d /dev/mst/<device-name> q PCI_DOWNSTREAM_PORT_OWNER[4]

Example of Standard mode output:

Device #1:
----------
 
[...]
 
Configurations:                              Next Boot
         PCI_DOWNSTREAM_PORT_OWNER[4]        DEVICE_DEFAULT(0)

Example of BlueField-X mode output:

Device #1:
----------
  
[...]
 
Configurations:                              Next Boot
         PCI_DOWNSTREAM_PORT_OWNER[4]        EMBEDDED_CPU(15)

Verifying GPU Ownership

The following are example outputs for when the DPU is configured to BlueField-X mode.

The GPU is no longer visible from the host:

root@host:~# lspci | grep -i nv
None

The GPU is now visible from the DPU:

ubuntu@dpu:~$ lspci | grep -i nv
06:00.0 3D controller: NVIDIA Corporation GA20B8 (rev a1)

CEC and BMC Firmware Operations

Firmware upgrade of BMC and CEC components using BMC can be performed from a remote server using the Redfish interface. The following table presents commands available for performing the upgrade:

No.

Function

Command

Required for BMC / CEC Update

Description

1

Establish Redfish connection session

export token=`curl -k -H "Content-Type: application/json" -X POST https://<bmc_ip>/login -d '{"username" : "root", "password" : "<password>"}' | grep token | awk '{print $2;}' | tr -d '"'`

Where:

  • bmc_ip – BMC IP address
  • password – Password of root account

BMC
CEC

Establish Redfish connection session

2

Trigger a secure firmware update

curl -k -H "X-Auth-Token: <token>" -H "Content-Type: application/octet-stream" -X POST -T <package_path> https://<bmc_ip>/redfish/v1/UpdateService

Where:

  • bmc_ip – BMC IP address
  • token – session token received when establishing connection
  • package_path – firmware update package path

BMC
CEC

Triggers the secure update and starts tracking the secure update progress

3

Track secure firmware update progress

curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/TaskService/Tasks

Find the current task ID in the response and use it for checking the progress:

curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/TaskService/Tasks/<task_id> | jq -r ' .PercentComplete'

Where:

  • bmc_ip – BMC IP address
  • token – session token received when establishing connection
  • task_id – Task ID

BMC
CEC

Tracks the firmware update progress

4

Reset/reboot a BMC

curl -k -H "X-Auth-Token: <token>" -H "Content-Type: application/json" -X POST -d '{"ResetType": "GracefulRestart"}' https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Actions/Manager.Reset

Where:

  • bmc_ip – BMC IP address
  • token – session token received when establishing connection

BMC

Resets/reboots the BMC

5

Fetch running BMC firmware version

For BlueField-3:

curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware | jq -r ' .Version'

Where:

  • bmc_ip – BMC IP address
  • token – session token received when establishing connection

BMC

Fetches the running firmware version from BMC

For BlueField-2:
curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory

Fetch the current firmware ID and then perform:

curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/<firmware_id>_BMC_Firmware | jq -r ' .Version'
Where:
  • bmc_ip – BMC IP address
  • token – session token received when establishing connection
6Fetch running CEC firmware version
curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/Bluefield_FW_ERoT | jq -r ' .Version'
Where:
  • bmc_ip – BMC IP address
  • token – session token received when establishing connection
CECFetches the running firmware version from CEC

BMC Update

After initiating the BMC secure update with the command #2 to from the previous table, a response similar to the following is received:

curl -k -H "X-Auth-Token: <token>" -H "Content-Type: application/octet-stream" -X POST -T <package_path> https://<bmc_ip>/redfish/v1/UpdateService

{
  "@odata.id": "/redfish/v1/TaskService/Tasks/0",
  "@odata.type": "#Task.v1_4_3.Task",
  "Id": "0",
  "TaskState": "Running"
}

Command #3 from the previous table can be used to track secure firmware update progress. For instance:

curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/TaskService/Tasks/0 | jq -r ' .PercentComplete'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current Dload  Upload   Total   Spent    Left  Speed
100  2123  100  2123    0     0  38600      0 --:--:-- --:--:-- --:--:-- 37910
20

Command #3 is used to verify the task has completed because during the update procedure the reboot option is disabled. When "PercentComplete" reaches 100, command #4 is used to reboot the BMC. For example:

curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/TaskService/Tasks/0 | jq -r ' .PercentComplete'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current Dload  Upload   Total   Spent    Left  Speed
100  3822  100  3822    0     0  81319      0 --:--:-- --:--:-- --:--:-- 81319
100

curl -k -H "X-Auth-Token: <token>" -H "Content-Type: application/octet-stream" -X POST -d '{"ResetType": "GracefulRestart"}' https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Actions/Manager.Reset
{
  "@Message.ExtendedInfo": [
    {
      "@odata.type": "#Message.v1_1_1.Message",
      "Message": "The request completed successfully.",
      "MessageArgs": [],
      "MessageId": "Base.1.13.0.Success",
      "MessageSeverity": "OK",
      "Resolution": "None"
    }
  ]
}

Command #5 can be used to verify the current BMC firmware version after reboot:

  • For BlueField-3:

    curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware | jq -r ' .Version'
    
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current Dload  Upload   Total   Spent    Left  Speed
    100   513  100   513    0     0   9679      0 --:--:-- --:--:-- --:--:--  9679
  • For BlueField-2:
    1. Fetch the firmware ID from FirmwareInventory:

      curl -k -H "X-Auth-Token: <token>" -X GET https:/<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/
      {
        "@odata.id": "/redfish/v1/UpdateService/FirmwareInventory",
        "@odata.type": "#SoftwareInventoryCollection.SoftwareInventoryCollection",
        "Members": [
          {
            "@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/8c8549f3_BMC_Firmware"
      …
    2. Use command #5 with the fetched firmware ID in the previous step:

      curl -k -H "X-Auth-Token: <token>" -X GET https:/<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/8c8549f3_BMC_Firmware | jq -r ' .Version'
      
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100   471  100   471    0     0    622      0 --:--:-- --:--:-- --:--:--   621
      bmc-23.04

CEC Update

After initiating the BMC secure update with the command #2 to from the previous table, a response similar to the following is received:

curl -k -H "X-Auth-Token: <token>" -H "Content-Type: application/octet-stream" -X POST -T <package_path> https://<bmc_ip>/redfish/v1/UpdateService
{
  "@odata.id": "/redfish/v1/TaskService/Tasks/0",
  "@odata.type": "#Task.v1_4_3.Task",
  "Id": "0",
  "TaskState": "Running"
}

Command #3 can be used to track the progress of the CEC firmware update. For example:

curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/TaskService/Tasks/0 | jq -r ' .PercentComplete'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current Dload  Upload   Total   Spent    Left  Speed
100  2123  100  2123    0     0  38600      0 --:--:-- --:--:-- --:--:-- 37910
100

After the CEC secure update operation is complete, a power cycle or cold reset of the BlueField-3 DPU must be manually triggered to apply the changes once the update is finished.

Command #6 can be used to verify the current CEC firmware version after reboot:

curl -k -H "X-Auth-Token: <token>" -X GET https://<bmc_ip>/redfish/v1/UpdateService/FirmwareInventory/Bluefield_FW_ERoT | jq -r ' .Version'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   421  100   421    0     0   1172      0 --:--:-- --:--:-- --:--:--  1172
19-4

GPU Firmware

Get GPU Firmware

smbpbi: (See SMBPBI spec)

root@dpu:~# i2cset -y 3 0x4f 0x5c 0x05 0x08 0x00 0x80 s
root@dpu:~# i2cget -y 3 0x4f 0x5c ip 5
5: 0x04 0x05 0x08 0x00 0x5f
root@dpu:~# i2cget -y 3 0x4f 0x5d ip 5
5: 0x04 0x39 0x32 0x2e 0x30
root@dpu:~# 
root@dpu:~# 
root@dpu:~# i2cset -y 3 0x4f 0x5c 0x05 0x08 0x01 0x80 s
root@dpu:~# i2cget -y 3 0x4f 0x5c ip 5
5: 0x04 0x05 0x08 0x01 0x5f
root@dpu:~# i2cget -y 3 0x4f 0x5d ip 5
5: 0x04 0x30 0x2e 0x36 0x42
root@dpu:~# i2cset -y 3 0x4f 0x5c 0x05 0x08 0x02 0x80 s
root@dpu:~# i2cget -y 3 0x4f 0x5c ip 5
5: 0x04 0x05 0x08 0x02 0x5f
root@dpu:~# i2cget -y 3 0x4f 0x5d ip 5
5: 0x04 0x2e 0x30 0x30 0x2e
root@dpu:~# i2cset -y 3 0x4f 0x5c 0x05 0x08 0x03 0x80 s
root@dpu:~# i2cget -y 3 0x4f 0x5c ip 5
5: 0x04 0x05 0x08 0x03 0x5f
root@dpu:~# i2cget -y 3 0x4f 0x5d ip 5
5: 0x04 0x30 0x31 0x00 0x00
root@dpu:~#

39 32 2e 30 30 2e 36 42 2e 30 30 2e 30 31 00 00 → 92.00.6B.00.01

Updating GPU Firmware

root@dpu:~# scp root@10.23.201.227:/<path-to-fw-bin>/1004_0230_891__92006B0001-dbg-ota.bin /tmp/gpu_images/
root@10.23.201.227's password: 
1004_0230_891__92006B0001-dbg-ota.bin                                         100%  384KB 384.4KB/s   00:01    

root@dpu:~# cat /tmp/gpu_images/progress.txt 
TaskState="Running"
TaskStatus="OK"
TaskProgress="50"

root@dpu:~# cat /tmp/gpu_images/progress.txt 
TaskState="Running"
TaskStatus="OK"
TaskProgress="50"

root@dpu:~# cat /tmp/gpu_images/progress.txt 
TaskState=Frimware update succeeded.
TaskStatus=OK
TaskProgress=100