BlueField DPU Management

BlueField DPU BMC

The BlueField DPU platform incorporates an integrated board management controller (BMC), ASPEED AST2600. A BMC is a dedicated processor that monitors the physical state of a computer, network server, or other hardware devices. BMC uses sensors and communicates with the system administrator through an independent connection and is intended to enhance system reliability, availability, and serviceability. The on-board BMC provides security in untrusted platforms and is therefore needed in most of the DPU use cases mentioned.

Like the host BMC for the host, the DPU BMC is a trusted entity (with its own ERoT to ensure that its firmware is secured) that enables provisioning and managing the BlueField DPU over a separated management network, using standard interfaces, protocols, and security to manage the full lifecycle of the DPU. In addition, the DPU BMC enables managing the DPU even if the DPU's OS is down, and it has a separate power input so it can hard reset the DPU.

The main interface for the DPU BMC is a 1GbE RJ45 out-of-band (OOB) management port that is connected to the internal management Ethernet network of the cloud service provider or the Enterprise IT management network.

The DPU BMC allows managing the BlueField DPU as detailed hereafter.

Remote Management Using Redfish Protocol

Supported by BlueField DPU BMC, the Redfish standard is a suite of specifications that delivers an industry standard protocol providing a secured RESTful interface for the management of servers, storage, networking, and converged infrastructure. Redfish replaces IPMI, providing the following advantages:

  • Human readable schemas

  • Interoperable, equally usable by apps, GUIs, and scripts

  • Extensible to add capabilities

  • Secured using HTTPs

Management Architecture

The following diagram illustrates the architecture and connectivity for managing the DPU.

mgmt-arch-version-1-modificationdate-1701703678277-api-v2.png


Management Interfaces

Note

See this page for a detailed description of the BlueField-3 DPU's interfaces.

Note

See here for a detailed description of the BlueField-2 DPU's interfaces.

The following table describes the interfaces available to manage the BlueField DPU.

Management Interface

Description

Comment

OOB Management Port (1GbE RJ45)

A dedicated, separate Ethernet interface to manage the DPU from the remote management controller (RMC)

Note

NVIDIA recommends using this interface as the main management interface.

Enables managing the BlueField DPU life cycle using the DPU's BMC. Supports Redfish commands to the DPU BMC (eth0). Recovery flows, monitoring, and configuration operations are all available through this interface.

In addition, this physical interface allows users to SSH directly to the BlueField DPU (oob_net).

Warning

IPMI is supported for backward compatibility, but it is recommended to start new deployments with Redfish only.

SMBus (PCIe Golden Fingers)

Enables PLDM/NC-SI over MCTP between the DPU and the host BMC

Enables the host BMC to monitor the BlueField DPU

PCIe

PCIe interface between the DPU and host server

Enables the host to recover the BlueField DPU using RShim PCIe physical function (PF) when the host is trusted

Note

Unavailable while in zero-trust mode. Use the 1GbE OOB interface instead.


Recommended Management Approach

The DPU BMC allows managing the BlueField DPU over the 1GbE OOB interface using Redfish protocol. The following functions are available:

  • BlueField DPU and DPU BMC upgrade and recovery

  • Monitoring of the BlueField DPU

  • BlueField DPU reset control (even when DPU OS is halted)

  • Setting BlueField UEFI configuration

  • Console interface to BlueField DPU

The following subsections describe the recommended management methods for specific tasks on the BlueField DPU .

BlueField DPU Update and Recovery

The NVIDIA BlueField DPU offers two methods for performing software upgrade: The standard ISO format or the BlueField bootstream (BFB) file format.

  • Update and recovery

    • The BFB serves both as a comprehensive upgrade tool and a recovery solution for the DPU. To facilitate these upgrades and recovery tasks, the BlueField DPU's BMC is under the control of a remote management controller (RMC) that utilizes the Redfish protocol over the 1GbE OOB connection.

    • A pre-installed golden image can be used which allows flash and recovery of the DPUs. This can be triggered by either the RMC or a trusted platform's BMC. For more information, refer to this page.

  • Update

    • A PXE server may be used to load an ISO image which contains the necessary updates. This can be accomplished over the 1GbE OOB interface or through the high-speed data ports using the DPU's UEFI.

Warning

After an upgrade/recovery, a system power cycle may be required to apply changes.


DPU BMC Update

The DPU BMC can be updated by the RMC using Redfish over the 1GbE OOB port to the DPU BMC.

DPU BMC update is A/B redundant, using a dual firmware flash. If both flashes fail to boot, the DPU BMC may be recovered from the platform's BMC using the SMBus or UART interfaces.

Please refer to this page for more information.

Warning

After an upgrade, a system power cycle may be required to apply changes.


BlueField DPU Monitoring and Telemetry

The RMC may monitor and read telemetry of the BlueField DPU using Redfish over the 1GbE OOB port to the DPU BMC.

  • BlueField DPU temperatures (board, DDR, and ports), voltages and link states

  • BlueField DPU FRU information about NIC FW, CPU, DDR, eMMC, network interface, etc.

  • Device sensor data record (SDR), sensor threshold and events, system event logs (SEL), etc.

Please refer to this page for more information.

BlueField DPU and DPU BMC Reset Control

The RMC may issue a reset to the BlueField DPU (soft or hard) or to the DPU BMC, both using Redfish over the 1GbE OOB port to the DPU BMC.

Please refer to these pages (1,2) for more information.

BlueField DPU UEFI Configuration

BlueField DPU UEFI settings may be modified using Redfish over the 1GbE OOB port to the DPU BMC. This includes changing UEFI default password (which is mandatory), setting DPU to zero-trust, setting date and time, etc.

Please refer to this page for more information.

Console Interface

The DPU console interface is accessible via the DPU BMC using Serial-over-LAN (SoL) over the 1GbE OOB port. The RMC may access the console interface of the BlueField DPU to track its boot progress.

Please refer to this page for more information.

© Copyright 2023, NVIDIA. Last updated on Jan 10, 2024.