NVIDIA BlueField BMC Software v23.09
Intelligent Platform Management Interface

The NVIDIA® BlueField® DPU provides management interfaces to the BMC and the BlueField device.

The BMC, based on the Intelligent Platform Management Interface (IPMI) standard, supports both out-of-band (OOB) dedicated interfaces, and a serial port to access the CLI of the BMC.

External Host Retrieving Data from BMC Via LAN

The BMC is connected to an external host server via LAN. IPMItool commands may be issued from the external server to retrieve information from the BMC as follows:

ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN <ipmitool_arguments>

The sections below provide more details about the IPMItool commands which are supported.

FRU Reading

To retrieve FRU info, run:

ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN fru print <fru_id>

FRU ID of the BMC FRU EEPROM is optional and can be found using the fru print command.

It is possible to dump the binary FRU data into a file. Run:

ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN fru read <fru_id> <filename>

Warning

The parameter <filename> is the absolute path to the file.


System Event Log

The system event log (SEL) is non-volatile repository for system events and certain system configuration information. SEL entries have a unique "record ID" field. This field is used for retrieving log entries from the SEL. Record IDs are not required to be sequential or consecutive. Applications should not assume that the SEL record ID follows any particular numeric ordering.

Event logs are chassis events, recorded in the BMC software which can be read using IPMI commands.

If the SEL is full and a new event is raised, the oldest record is removed and the new one is placed at the end of the SEL.

SEL may be accessed, even after BlueField failure, on the server through IPMI LAN access.

The following table lists the command to use to view event logs:

Command

Description
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel

Displays information about SEL
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel list

Displays list of events
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel elist

Displays extended info list of events
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel save <filename>

Saves SEL events to a file
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel clear

Clears SEL

SEL Messages

The following subsections detail the messages which are added to the BMC SEL and the scenarios that trigger them.

UEFI Boot

Messages are added to the BMC SEL while the DPU UEFI is booting which describe the status of the UEFI boot.

SEL messages:

  • SMBus initialization

  • PCI resource configuration

  • System boot initiated

Example:

SEL Record ID          : 0037
 Record Type           : 02
 Timestamp             : 06:36:06 UTC 06:36:06 UTC
 Generator ID          : 0001
 EvM Revision          : 04
 Sensor Type           : System Firmwares
 Sensor Number         : 06
 Event Type            : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data            : c207ff
 Description           : PCI resource configuration


IPMB Sensors

QSFP Sensors

Messages are added to the SEL in case of a change in the status of the QSFP cables. The messages describe the event and status of the sensor.

List of QSFP sensors:

  • P0_link – the QSFP 0 cable status

  • P1_link – the QSFP 1 cable status

SEL messages:

  • Config Error – the QSFP cable is down

  • Connected – the QSFP cable is up

Example:

SEL Record ID          : 003e
 Record Type           : 02
 Timestamp             : 07:08:28 UTC 07:08:28 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : Cable / Interconnect
 Sensor Number         : 00
 Event Type            : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data (RAW)      : 010f0f
 Event Interpretation  : Missing
 Description           : Config Error
 
Sensor ID              : p0_link (0x0)
 Entity ID             : 31.1
 Sensor Type (Discrete): Cable / Interconnect
 States Asserted       : Cable / Interconnect
                         [Config Error]


Temperature Sensors

Messages are added to the SEL if temperature sensors detect a value higher than the sensor thresholds. The messages include a description of the event, DPU FRU device description, DPU BMC device description, and the status of the sensor.

List of temperature sensors:

  • bluefield_temp – Bluefield temperature

  • p0_temp – QSFP 0 cable temperature

  • p1_temp – QSFP 1 cable temperature

SEL messages:

  • Upper Critical going high – crossing a upper critical threshold

  • Upper Non-critical going high – crossing a upper non-critical threshold

  • Lower Critical going low – crossing a lower critical threshold

  • Lower Non-critical going low – crossing a lower non-critical threshold

Example:

SEL Record ID          : 003c
 Record Type           : 02
 Timestamp             : 07:01:06 UTC 07:01:06 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : Temperature
 Sensor Number         : 03
 Event Type            : Threshold
 Event Direction       : Assertion Event
 Event Data (RAW)      : 592802
 Trigger Reading       : 40.000degrees C
 Trigger Threshold     : 2.000degrees C
 Description           : Upper Critical going high
 
Sensor ID              : p0_temp (0x3)
 Entity ID             : 0.1
 Sensor Type (Threshold)  : Temperature
 Sensor Reading        : 40 (+/- 0) degrees C
 Status                : ok
 Lower Non-Recoverable : na
 Lower Critical        : -5.000
 Lower Non-Critical    : 0.000
 Upper Non-Critical    : 70.000
 Upper Critical        : 75.000
 Upper Non-Recoverable : na
 Positive Hysteresis   : Unspecified
 Negative Hysteresis   : Unspecified
 Assertion Events      : 
 Event Enable          : Event Messages Disabled
 Assertions Enabled    : lnc- lcr- unc+ ucr+ 
 Deassertions Enabled  : lnc+ lcr+ unc- ucr- 
 
FRU Device Description : Nvidia-BMCMezz (ID 169)
 Board Mfg Date        : Tue Jan  3 23:16:00 2023 UTC
 Board Mfg             : Nvidia
 Board Product         : Nvidia-BMCMezz
 Board Serial          : MT2251XZ02W5
 Board Part Number     : 900-9D3B6-00CV-AAA
 
FRU Device Description : BlueField-3 Smar (ID 250)
 Board Mfg Date        : Tue Jan  3 23:16:00 2023 UTC
 Board Mfg             : Nvidia
 Board Product         : BlueField-3 SmartNIC Main Card
 Board Serial          : MT2251XZ02W5
 Board Part Number     : 900-9D3B6-00CV-AAA
 Product Manufacturer  : Nvidia
 Product Name          : BlueField-3 SmartNIC Main Card
 Product Part Number   : 900-9D3B6-00CV-AAA
 Product Version       : A3
 Product Serial        : MT2251XZ02W5
 Product Asset Tag     : 900-9D3B6-00CV-AAA

ADC Sensors

Messages are added to the SEL if the sensor voltage crosses the sensor's thresholds. The messages include a description of the event, DPU FRU device description, DPU BMC device description, and the status of the sensor.

List of ADC sensors:

  • 1V_BMC

  • 1_2V_BMC

  • 1_8V

  • 1_8V_BMC

  • 2_5V

  • 3_3V

  • 3_3V_RGM

  • 5V

  • 12V_ATX

  • 12V_PCIe

  • DVDD

  • HVDD

  • VDD

  • VDDQ

  • VDD_CPU_L

  • VDD_CPU_R

SEL messages:

  • Upper Non-critical going high – crossing a upper non-critical threshold

  • Lower Non-critical going low – crossing a lower non-critical threshold

Example:

SEL Record ID          : 0042
 Record Type           : 02
 Timestamp             : 09:20:50 UTC 09:20:50 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : Voltage
 Sensor Number         : 06
 Event Type            : Threshold
 Event Direction       : Assertion Event
 Event Data (RAW)      : 50a9ff
 Trigger Reading       : 1.200Volts
 Trigger Threshold     : 1.810Volts
 Description           : Lower Non-critical going low
 
Sensor ID              : 1_2V_BMC (0x6)
 Entity ID             : 0.1
 Sensor Type (Threshold)  : Voltage
 Sensor Reading        : 1.200 (+/- 0) Volts
 Status                : ok
 Lower Non-Recoverable : na
 Lower Critical        : na
 Lower Non-Critical    : 1.143
 Upper Non-Critical    : 1.257
 Upper Critical        : na
 Upper Non-Recoverable : na
 Positive Hysteresis   : Unspecified
 Negative Hysteresis   : Unspecified
 Assertion Events      : 
 Event Enable          : Event Messages Disabled
 Assertions Enabled    : lnc- unc+ 
 Deassertions Enabled  : lnc+ unc- 
 
FRU Device Description : Nvidia-BMCMezz (ID 169)
 Board Mfg Date        : Tue Jan  3 23:16:00 2023 UTC
 Board Mfg             : Nvidia
 Board Product         : Nvidia-BMCMezz
 Board Serial          : MT2251XZ02W5
 Board Part Number     : 900-9D3B6-00CV-AAA
 
FRU Device Description : BlueField-3 Smar (ID 250)
 Board Mfg Date        : Tue Jan  3 23:16:00 2023 UTC
 Board Mfg             : Nvidia
 Board Product         : BlueField-3 SmartNIC Main Card
 Board Serial          : MT2251XZ02W5
 Board Part Number     : 900-9D3B6-00CV-AAA
 Product Manufacturer  : Nvidia
 Product Name          : BlueField-3 SmartNIC Main Card
 Product Part Number   : 900-9D3B6-00CV-AAA
 Product Version       : A3
 Product Serial        : MT2251XZ02W5
 Product Asset Tag     : 900-9D3B6-00CV-AAA

Sensor Data Record (SDR) Repository

Supported SDR Commands

BMC software supports reading chassis sensor information using the IPMItool.

The following table lists commands which allow reading SDR data:

Command

Description
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sdr list

Displays sensor data repository entry readings and their status
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sdr elist

Displays extended sensor information
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sensor list

Displays sensors and thresholds in a wide table format
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sdr get <name>

Displays information for sensor data records specified by sensor ID
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sdr type <type>

Displays all records from the SDR repository of a specific type
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sensor get <sensor_name>

Displays information for sensors specified by name
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sensor reading <name>…<name>

Displays readings for sensors specified by name (only for numeric sensors)
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sensor ipmitool sensor thresh <sensor_name> upper <non_critical_value> <critical_value> 0
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sensor ipmitool sensor thresh <sensor_name> lower 0 <critical_value> <non_critical_value>

  • If the original threshold value is >0, the new threshold values must be between 0-255

  • If the original threshold value is <0, the new threshold values must be between 0-127

If a threshold is crossed, a message is added to the Redfish event log, SEL, and journal.

SDR Entry List

SDR contains information about the type and number of sensors. The following is a list of the available SDR information:

Managed Entity

ID

Sensor Name

SFP link status

0x0

0x1

  • p0_link

  • p1_link

NIC thermal sensors

0x2

bluefield_temp

SFP temperature sensors

0x3

0x4

  • p0_temp

  • p1_temp

NIC voltage sensors

0x5

0x6

0x7

0x8

0x9

0xa

0xb

0xc

0xd

0xe

0xf

0x10

0x11

0x12

0x13

0x14

ADC voltage sensors:

  • 1V_BMC

  • 1_2V_BMC

  • 1_8V

  • 1_8V_BMC

  • 2_5V

  • 3_3V

  • 3_3V_RGM

  • 5V

  • 12V_ATX

  • 12V_PCIe

  • DVDD

  • HVDD

  • VDD

  • VDDQ

  • VDD_CPU_L

  • VDD_CPU_R

Rebooting BlueField with BMC

BMC software enables resetting the BlueField.

To reset the main CPU, run:

ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN chassis power reset


BMC Retrieving Data from BlueField Via IPMB

The BMC can retrieve information on BlueField's sensors and FRUs via IPMI over IPMB protocol. IPMItool commands can be issued from the BMC using the following format:

ipmitool -I ipmb <ipmitool_arguments>

List of IPMI Supported Sensors

Sensor

Sensor ID

Description

bluefield_temp

0

Support NIC monitoring of BlueField's temperature

ddr0_0_temp

1

Support monitoring of DDR0 temp (on memory controller 0)

ddr0_1_temp

2

Support monitoring of DDR1 temp (on memory controller 0)

ddr1_0_temp

3

Support monitoring of DDR0 temp (on memory controller 1)

ddr1_1_temp

4

Support monitoring of DDR1 temp (on memory controller 1)

p0_temp

5

Port 0 temperature

p1_temp

6

Port 1 temperature

p0_link

7

Port0 link status

  • 0x100 – connection OK

  • 0x200 – connection error

p1_link

8

Port1 link status

  • 0x100 – connection OK

  • 0x200 – connection error

List of IPMI Supported FRUs

FRU

ID

Description

update_timer

0

set_emu_param.service is responsible for collecting data on sensors and FRUs every 3 seconds. This regular update is required for sensors but not for FRUs whose content is less susceptible to change. update_timer is used to sample the FRUs every hour instead. Users may need this timer if they are issuing several raw IPMItool FRU read commands. This helps assess how many times users must retrieve large FRU data before the next FRU update.

update_timer is a hexadecimal number.

fw_info

1

ConnectX firmware information, Arm firmware version, and MLNX_OFED version

The fw_info is in ASCII format

nic_pci_dev_info

2

NIC vendor ID, device ID, subsystem vendor ID, and subsystem device ID

The nic_pci_dev_info is in ASCII format

cpuinfo

3

CPU information reported in lscpu and /proc/cpuinfo

The cpuinfo is in ASCII format

emmc_info

8

eMMC size, list of its partitions, and partitions usage (in ASCII format).

eMMC CID, CSD, and extended CSD registers (in binary format).

The ASCII data is separated from the binary data with StartBinary marker.

qsfp0_eeprom

9

FRU for QSFP 0 EEPROM page 0 content (256 bytes in binary format)

qsfp1_eeprom

10

FRU for QSFP 1 EEPROM page 0 content (256 bytes in binary format)

Note

Applicable for dual-port devices only.

ip_addresses

11

This FRU is empty at start time. It can be used to write the BMC port 0 and port 1 IP addresses to the BlueField. They follow these formats:

BMC: XXX.XXX.XXX.XXX
P0: XXX.XXX.XXX.XXX
P1: XXX.XXX.XXX.XXX

The size of the written file should be 61 bytes exactly.

eth0

13

Network interface 0 information. Updated once every minute.

eth1

14

Network interface 1 information. Updated once every minute.

Note

Applicable for dual-port devices only.

bf_uid

15

BlueField device UUID

Supported IPMI Commands

All the following commands are prepended with ipmitool on the command line.

Commands

IPMItool Command

Relevant IPMI 2.0 Rev1.1 Spec Section

Get Device ID

mc info

20.1

Broadcast "Get Device ID"

Part of "mc info"

20.9

Get BMC Global Enables

mc getenables

22.2

Get Device SDR Info

sdr info

35.2

Get Device SDR

"sdr get", "sdr list" or

"sdr elist"

35.3

Get Sensor Hysteresis

sdr get <sensor_id>

35.7

Set Sensor Threshold

sensor thresh <sensor-id> <threshold> <setting>

35.8

Get Sensor Threshold

sdr get <sensor_id>

35.9

Get Sensor Event Enable

sdr get <sensor_id>

35.11

Get Sensor Reading

sensor reading <sensor_id>

35.14

Get Sensor Type

sdr type <type>

35.16

Read FRU Data

fru read <fru_number> <file_to_write_to> – provides FRU data

34.2

Get SDR Repository Info

sdr info

33.9

Get SEL Info

"sel" or "sel info"

40.2

Get SEL Allocation Info

"sel" or "sel info"

40.3

Get SEL Entry

"sel list" or "sel elist"

40.5

Delete SEL Entry

sel delete <id>

40.8

Clear SEL

sel clear

40.9

BlueField Retrieving Data from BMC Via IPMB

The BMC has 2 IPMB modes. It can be used as a requester or responder.

  • Requester Mode
    When used as a requester, the BMC sends IPMB request messages to the BlueField via SMBus 0. The BlueField then processes the request and sends a message back to the BMC.

  • Responder Mode
    When used as a responder, the BMC receives IPMB request messages from the BlueField on SMBus 0. It then processes the message and sends a response back to the BlueField.

Both modes are enabled automatically at boot time.

For more information on how to use IPMI, please refer to the IPMI 2.0 standard.
