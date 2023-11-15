The BMC is connected to an external host server via LAN. IPMItool commands may be issued from the external server to retrieve information from the BMC as follows:

Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN <ipmitool_arguments>

The sections below provide more details about the IPMItool commands which are supported.

To retrieve FRU info, run:

Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN fru print <fru-id>

FRU ID of the BMC FRU EEPROM is optional and can be found using the fru print command.

It is possible to dump the binary FRU data into a file. Run:

Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN fru read <fru-id> <filename>

Warning The parameter <filename> is the absolute path to the file.





The system event log (SEL) is non-volatile repository for system events and certain system configuration information. SEL entries have a unique "record ID" field. This field is used for retrieving log entries from the SEL. Record IDs are not required to be sequential or consecutive. Applications should not assume that the SEL record ID follows any particular numeric ordering.

Event logs are chassis events, recorded in the BMC software which can be read using IPMI commands.

If the SEL is full and a new event is raised, the oldest record is removed and the new one is placed at the end of the SEL.

SEL may be accessed, even after BlueField failure, on the server through IPMI LAN access.

The following table lists the command to use in order to view event logs:

Command Description Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sel Displays information about SEL Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sel list Displays list of events Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sel elist Displays extended info list of events Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sel save <filename> Saves SEL events to a file Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sel clear Clears SEL

The following subsections detail the messages which are added to the BMC SEL and the scenarios that trigger them.

Messages are added to the BMC SEL while the DPU UEFI is booting which describe the status of the UEFI boot.

SEL messages:

SMBus initialization

PCI resource configuration

System boot initiated

Example:

Copy Copied! SEL Record ID : 0037 Record Type : 02 Timestamp : 06:36:06 UTC 06:36:06 UTC Generator ID : 0001 EvM Revision : 04 Sensor Type : System Firmwares Sensor Number : 06 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : c207ff Description : PCI resource configuration





Messages are added to the SEL in case of a change in the status of the QSFP cables. The messages describe the event and status of the sensor.

List of QSFP sensors:

P0_link – the QSFP 0 cable status

P1_link – the QSFP 1 cable status

SEL messages:

Config Error – the QSFP cable is down

Connected – the QSFP cable is up

Example:

Copy Copied! SEL Record ID : 003e Record Type : 02 Timestamp : 07:08:28 UTC 07:08:28 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : Cable / Interconnect Sensor Number : 00 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data (RAW) : 010f0f Event Interpretation : Missing Description : Config Error Sensor ID : p0_link (0x0) Entity ID : 31.1 Sensor Type (Discrete): Cable / Interconnect States Asserted : Cable / Interconnect [Config Error]





Messages are added to the SEL if temperature sensors detect a value higher than the sensor thresholds. The messages include a description of the event, DPU FRU device description, DPU BMC device description, and the status of the sensor.

List of temperature sensors:

bluefield_temp – Bluefield temperature

p0_temp – QSFP 0 cable temperature

p1_temp – QSFP 1 cable temperature

SEL messages:

Upper Critical going high – crossing a upper critical threshold.

Upper Non-critical going high – crossing a upper non-critical threshold.

Lower Critical going low – crossing a lower critical threshold.

Lower Non-critical going low – crossing a lower non-critical threshold.

Example:

Collapse Source Copy Copied! SEL Record ID : 003c Record Type : 02 Timestamp : 07:01:06 UTC 07:01:06 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : Temperature Sensor Number : 03 Event Type : Threshold Event Direction : Assertion Event Event Data (RAW) : 592802 Trigger Reading : 40.000degrees C Trigger Threshold : 2.000degrees C Description : Upper Critical going high Sensor ID : p0_temp (0x3) Entity ID : 0.1 Sensor Type (Threshold) : Temperature Sensor Reading : 40 (+/- 0) degrees C Status : ok Lower Non-Recoverable : na Lower Critical : -5.000 Lower Non-Critical : 0.000 Upper Non-Critical : 70.000 Upper Critical : 75.000 Upper Non-Recoverable : na Positive Hysteresis : Unspecified Negative Hysteresis : Unspecified Assertion Events : Event Enable : Event Messages Disabled Assertions Enabled : lnc- lcr- unc+ ucr+ Deassertions Enabled : lnc+ lcr+ unc- ucr- FRU Device Description : Nvidia-BMCMezz (ID 169) Board Mfg Date : Tue Jan 3 23:16:00 2023 UTC Board Mfg : Nvidia Board Product : Nvidia-BMCMezz Board Serial : MT2251XZ02W5 Board Part Number : 900-9D3B6-00CV-AAA FRU Device Description : BlueField-3 Smar (ID 250) Board Mfg Date : Tue Jan 3 23:16:00 2023 UTC Board Mfg : Nvidia Board Product : BlueField-3 SmartNIC Main Card Board Serial : MT2251XZ02W5 Board Part Number : 900-9D3B6-00CV-AAA Product Manufacturer : Nvidia Product Name : BlueField-3 SmartNIC Main Card Product Part Number : 900-9D3B6-00CV-AAA Product Version : A3 Product Serial : MT2251XZ02W5 Product Asset Tag : 900-9D3B6-00CV-AAA

Messages are added to the SEL if the sensor voltage crosses the sensor's thresholds. The messages include a description of the event, DPU FRU device description, DPU BMC device description, and the status of the sensor.

List of ADC sensors:

1V_BMC

1_2V_BMC

1_8V

1_8V_BMC

2_5V

3_3V

3_3V_RGM

5V

12V_ATX

12V_PCIe

DVDD

HVDD

VDD

VDDQ

VDD_CPU_L

VDD_CPU_R

SEL messages:

Upper Non-critical going high – crossing a upper non-critical threshold

Lower Non-critical going low – crossing a lower non-critical threshold

Example:

Collapse Source Copy Copied! SEL Record ID : 0042 Record Type : 02 Timestamp : 09:20:50 UTC 09:20:50 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : Voltage Sensor Number : 06 Event Type : Threshold Event Direction : Assertion Event Event Data (RAW) : 50a9ff Trigger Reading : 1.200Volts Trigger Threshold : 1.810Volts Description : Lower Non-critical going low Sensor ID : 1_2V_BMC (0x6) Entity ID : 0.1 Sensor Type (Threshold) : Voltage Sensor Reading : 1.200 (+/- 0) Volts Status : ok Lower Non-Recoverable : na Lower Critical : na Lower Non-Critical : 1.143 Upper Non-Critical : 1.257 Upper Critical : na Upper Non-Recoverable : na Positive Hysteresis : Unspecified Negative Hysteresis : Unspecified Assertion Events : Event Enable : Event Messages Disabled Assertions Enabled : lnc- unc+ Deassertions Enabled : lnc+ unc- FRU Device Description : Nvidia-BMCMezz (ID 169) Board Mfg Date : Tue Jan 3 23:16:00 2023 UTC Board Mfg : Nvidia Board Product : Nvidia-BMCMezz Board Serial : MT2251XZ02W5 Board Part Number : 900-9D3B6-00CV-AAA FRU Device Description : BlueField-3 Smar (ID 250) Board Mfg Date : Tue Jan 3 23:16:00 2023 UTC Board Mfg : Nvidia Board Product : BlueField-3 SmartNIC Main Card Board Serial : MT2251XZ02W5 Board Part Number : 900-9D3B6-00CV-AAA Product Manufacturer : Nvidia Product Name : BlueField-3 SmartNIC Main Card Product Part Number : 900-9D3B6-00CV-AAA Product Version : A3 Product Serial : MT2251XZ02W5 Product Asset Tag : 900-9D3B6-00CV-AAA

BMC software supports reading chassis sensor information using the IPMItool.

The following table lists commands which allow reading SDR data:

Command Description Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sdr list Displays sensor data repository entry readings and their status Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sdr elist Displays extended sensor information Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sensor list Displays sensors and thresholds in a wide table format Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sdr get <name> Displays information for sensor data records specified by sensor ID Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sdr type <type> Displays all records from the SDR repository of a specific type Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sensor get <sensor_name> Displays information for sensors specified by name Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sensor reading <name>…<name> Displays readings for sensors specified by name (only for numeric sensors) Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sensor ipmitool sensor thresh <sensor name> upper <non-critical value> <critical value> 0 ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN sensor ipmitool sensor thresh <sensor name> lower 0 <critical value> <non-critical value> If the original threshold value is >0, the new threshold values must be between 0-255

If the original threshold value is <0, the new threshold values must be between 0-127 If a threshold is crossed, a message is added to the Redfish event log, SEL, and journal.

SDR contains information about the type and number of sensors. The following is a list of the available SDR information:

Managed Entity ID Sensor Name SFP link status 0x0 0x1 p0_link

p1_link NIC thermal sensors 0x2 bluefield_temp SFP temperature sensors 0x3 0x4 p0_temp

p1_temp NIC voltage sensors 0x5 0x6 0x7 0x8 0x9 0xa 0xb 0xc 0xd 0xe 0xf 0x10 0x11 0x12 0x13 0x14 ADC voltage sensors: 1V_BMC

1_2V_BMC

1_8V

1_8V_BMC

2_5V

3_3V

3_3V_RGM

5V

12V_ATX

12V_PCIe

DVDD

HVDD

VDD

VDDQ

VDD_CPU_L

VDD_CPU_R

BMC software enables resetting the BlueField.

To reset the main CPU, run:

Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip_addr> -U ADMIN -P ADMIN chassis power reset



