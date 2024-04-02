The system event log (SEL) is non-volatile repository for system events and certain system configuration information. SEL entries have a unique "record ID" field. This field is used for retrieving log entries from the SEL. Record IDs are not required to be sequential or consecutive. Applications should not assume that the SEL record ID follows any particular numeric ordering.

Event logs are chassis events, recorded in the BMC software which can be read using IPMI commands.

If the SEL is full and a new event is raised, the oldest record is removed and the new one is placed at the end of the SEL.

SEL may be accessed, even after BlueField failure, on the server through IPMI LAN access.

User can dynamically configure the SEL Info event capacity through Redfish command or IPMI raw command, if the capacity to be set is smaller than the existing records number, the older records will be removed. The oldest record will be removed when the records number reaches the capacity.

Copy Copied! curl -k -u root:'<password>' -H 'Content-Type: application/json' -X GET https://<bmc_ip>/redfish/v1/Systems/Bluefield/LogServices/EventLog/ { "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog", "@odata.type": "#LogService.v1_1_0.LogService", "Actions": { "#LogService.ClearLog": { "target": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Actions/LogService.ClearLog" } }, "DateTime": "2023-09-27T14:28:50+00:00", "DateTimeLocalOffset": "+00:00", "Description": "System Event Log Service", "Entries": { "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries" }, "Id": "EventLog", "Name": "Event Log Service", "Oem": { "Nvidia": { "@odata.type": "#NvidiaLogService.v1_0_0.NvidiaLogService", "LatestEntryID": "4", "LatestEntryTimeStamp": "2023-09-27T14:19:30+00:00" } }, "OverWritePolicy": "WrapsWhenFull" }

Collapse Source Copy Copied! curl -k -u root:'<password>' -H 'Content-Type: application/json' -X GET https://<bmc_ip>/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries { "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries", "@odata.type": "#LogEntryCollection.LogEntryCollection", "Description": "Collection of System Event Log Entries", "Members": [ { "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/1", "@odata.type": "#LogEntry.v1_9_0.LogEntry", "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/1/attachment", "Created": "2023-09-27T14:18:39+00:00", "EntryType": "Event", "Id": "1", "Message": "12V_ATX sensor crossed a warning low threshold going low. Reading=6.048000 Threshold=10.400000.", "MessageArgs": [ "12V_ATX", "6.048000", "10.400000" ], "MessageId": "OpenBMC.0.1.SensorThresholdWarningLowGoingLow", "Name": "System Event Log Entry", "Resolution": "", "Resolved": false, "Severity": "OK" } … ], "Members@odata.count": 1, "Name": "System Event Log Entries" }

Copy Copied! curl -k -u root:'<password>' -H 'Content-Type: application/json' -X POST https://<bmc_ip>/redfish/v1/Systems/Bluefield/LogServices/EventLog/Actions/LogService.ClearLog

Copy Copied! curl -k -u root:'<password>' -H 'Content-Type: application/json' -X POST https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Actions/Oem/Nvidia/SelCapacity -d '{"ErrorInfoCap":300 }'

Copy Copied! curl -k -u root:'<password>' -H 'Content-Type: application/json' -X GET https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Oem/Nvidia/SelCapacity { "ErrorInfoCap": 300 }

The following table lists the command to use to view event logs:

Command Description Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel Displays information about SEL Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel list Displays list of events Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel elist Displays extended info list of events Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel save <filename> Saves SEL events to a file Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel clear Clears SEL

The following table lists the command to set/get the SEL capacity:

Command Description Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN raw 0x0a 0x4a <capacity[0:7]> <capacity[8:15]> <capacity[16:23]> <capacity[24:31]> Configure SEL Info log capacity. The capacity is 4 bytes value. To set the capacity to 300 lines, the value should be '0x2c 0x01 0x00 0x00'. Copy Copied! ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN raw 0x0a 0x4b Get SEL Info log capacity

The following subsections detail the messages which are added to the BMC SEL and the scenarios that trigger them.

Messages are added to the BMC SEL while the DPU UEFI is booting which describe the status of the UEFI boot.

SEL messages:

SMBus initialization

PCI resource configuration

System boot initiated

Example:

Copy Copied! SEL Record ID : 0037 Record Type : 02 Timestamp : 06:36:06 UTC 06:36:06 UTC Generator ID : 0001 EvM Revision : 04 Sensor Type : System Firmware Sensor Number : 06 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : c207ff Description : PCI resource configuration

Messages are added to the SEL in case of a change in the status of the QSFP cables. The messages describe the event and status of the sensor.

List of QSFP sensors:

P0_link – the QSFP 0 cable status

P1_link – the QSFP 1 cable status

SEL messages:



Config Error – the QSFP cable is down

Connected – the QSFP cable is up

Example:

Copy Copied! SEL Record ID : 003e Record Type : 02 Timestamp : 07:08:28 UTC 07:08:28 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : Cable / Interconnect Sensor Number : 00 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data (RAW) : 010f0f Event Interpretation : Missing Description : Config Error Sensor ID : p0_link (0x0) Entity ID : 31.1 Sensor Type (Discrete): Cable / Interconnect States Asserted : Cable / Interconnect [Config Error]

Messages are added to the SEL if temperature sensors detect a value higher than the sensor thresholds. The messages include a description of the event, DPU FRU device description, DPU BMC device description, and the status of the sensor.

List of temperature sensors:

bluefield_temp – Bluefield temperature

p0_temp – QSFP 0 cable temperature

p1_temp – QSFP 1 cable temperature

SEL messages:



Upper Critical going high – crossing a upper critical threshold

Upper Non-critical going high – crossing a upper non-critical threshold

Lower Critical going low – crossing a lower critical threshold

Lower Non-critical going low – crossing a lower non-critical threshold

Example:

Collapse Source Copy Copied! SEL Record ID : 003c Record Type : 02 Timestamp : 07:01:06 UTC 07:01:06 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : Temperature Sensor Number : 03 Event Type : Threshold Event Direction : Assertion Event Event Data (RAW) : 592802 Trigger Reading : 40.000degrees C Trigger Threshold : 2.000degrees C Description : Upper Critical going high Sensor ID : p0_temp (0x3) Entity ID : 0.1 Sensor Type (Threshold) : Temperature Sensor Reading : 40 (+/- 0) degrees C Status : ok Lower Non-Recoverable : na Lower Critical : -5.000 Lower Non-Critical : 0.000 Upper Non-Critical : 70.000 Upper Critical : 75.000 Upper Non-Recoverable : na Positive Hysteresis : Unspecified Negative Hysteresis : Unspecified Assertion Events : Event Enable : Event Messages Disabled Assertions Enabled : lnc- lcr- unc+ ucr+ Deassertions Enabled : lnc+ lcr+ unc- ucr- FRU Device Description : Nvidia-BMCMezz (ID 169) Board Mfg Date : Tue Jan 3 23:16:00 2023 UTC Board Mfg : Nvidia Board Product : Nvidia-BMCMezz Board Serial : MT2251XZ02W5 Board Part Number : 900-9D3B6-00CV-AAA FRU Device Description : BlueField-3 Smar (ID 250) Board Mfg Date : Tue Jan 3 23:16:00 2023 UTC Board Mfg : Nvidia Board Product : BlueField-3 SmartNIC Main Card Board Serial : MT2251XZ02W5 Board Part Number : 900-9D3B6-00CV-AAA Product Manufacturer : Nvidia Product Name : BlueField-3 SmartNIC Main Card Product Part Number : 900-9D3B6-00CV-AAA Product Version : A3 Product Serial : MT2251XZ02W5 Product Asset Tag : 900-9D3B6-00CV-AAA

Messages are added to the SEL if the sensor voltage crosses the sensor's thresholds. The messages include a description of the event, DPU FRU device description, DPU BMC device description, and the status of the sensor.

List of ADC sensors:

1V_BMC

1_2V_BMC

1_8V

1_8V_BMC

2_5V

3_3V

3_3V_RGM

5V

12V_ATX

12V_PCIe

DVDD

HVDD

VDD

VDDQ

VDD_CPU_L

VDD_CPU_R

SEL messages:



Upper Non-critical going high – crossing a upper non-critical threshold

Lower Non-critical going low – crossing a lower non-critical threshold

Example:

Collapse Source Copy Copied! SEL Record ID : 0042 Record Type : 02 Timestamp : 09:20:50 UTC 09:20:50 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : Voltage Sensor Number : 06 Event Type : Threshold Event Direction : Assertion Event Event Data (RAW) : 50a9ff Trigger Reading : 1.200Volts Trigger Threshold : 1.810Volts Description : Lower Non-critical going low Sensor ID : 1_2V_BMC (0x6) Entity ID : 0.1 Sensor Type (Threshold) : Voltage Sensor Reading : 1.200 (+/- 0) Volts Status : ok Lower Non-Recoverable : na Lower Critical : na Lower Non-Critical : 1.143 Upper Non-Critical : 1.257 Upper Critical : na Upper Non-Recoverable : na Positive Hysteresis : Unspecified Negative Hysteresis : Unspecified Assertion Events : Event Enable : Event Messages Disabled Assertions Enabled : lnc- unc+ Deassertions Enabled : lnc+ unc- FRU Device Description : Nvidia-BMCMezz (ID 169) Board Mfg Date : Tue Jan 3 23:16:00 2023 UTC Board Mfg : Nvidia Board Product : Nvidia-BMCMezz Board Serial : MT2251XZ02W5 Board Part Number : 900-9D3B6-00CV-AAA FRU Device Description : BlueField-3 Smar (ID 250) Board Mfg Date : Tue Jan 3 23:16:00 2023 UTC Board Mfg : Nvidia Board Product : BlueField-3 SmartNIC Main Card Board Serial : MT2251XZ02W5 Board Part Number : 900-9D3B6-00CV-AAA Product Manufacturer : Nvidia Product Name : BlueField-3 SmartNIC Main Card Product Part Number : 900-9D3B6-00CV-AAA Product Version : A3 Product Serial : MT2251XZ02W5 Product Asset Tag : 900-9D3B6-00CV-AAA

SEL messages:

Copy Copied! System boot initiated Initiated by warm reset

Example:

Copy Copied! SEL Record ID : 0001 Record Type : 02 Timestamp : 01/10/24 14:25:07 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : System Boot Initiated Sensor Number : 17 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : 020000 Description : Initiated by warm reset

SEL messages:

Copy Copied! System boot initiated Initiated by hard reset

Example:

Copy Copied! SEL Record ID : 0008 Record Type : 02 Timestamp : 01/10/24 14:33:01 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : System Boot Initiated Sensor Number : 17 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : 010000 Description : Initiated by hard reset

SEL messages:

Copy Copied! OS Critical Stop OS graceful shutdown

Example:

Copy Copied! SEL Record ID : 000a Record Type : 02 Timestamp : 01/10/24 14:34:45 UTC Generator ID : 0020 EvM Revision : 04 Sensor Type : OS Critical Stop Sensor Number : 18 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : 030000 Description : OS graceful shutdown

SEL messages:

Copy Copied! OS Critical Stop OS graceful shutdown

Example: