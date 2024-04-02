NVIDIA BlueField BMC Software v24.01
System Log

System Event Log

The system event log (SEL) is non-volatile repository for system events and certain system configuration information. SEL entries have a unique "record ID" field. This field is used for retrieving log entries from the SEL. Record IDs are not required to be sequential or consecutive. Applications should not assume that the SEL record ID follows any particular numeric ordering.

Event logs are chassis events, recorded in the BMC software which can be read using IPMI commands.

If the SEL is full and a new event is raised, the oldest record is removed and the new one is placed at the end of the SEL.

SEL may be accessed, even after BlueField failure, on the server through IPMI LAN access.

Dynamic Configuration of SEL capacity

User can dynamically configure the SEL Info event capacity through Redfish command or IPMI raw command, if the capacity to be set is smaller than the existing records number, the older records will be removed. The oldest record will be removed when the records number reaches the capacity.

SEL Redfish Commands

Display SEL Information

Copy
Copied!
            

            
curl -k -u root:'<password>' -H 'Content-Type: application/json' -X GET https://<bmc_ip>/redfish/v1/Systems/Bluefield/LogServices/EventLog/
{
  "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog",
  "@odata.type": "#LogService.v1_1_0.LogService",
  "Actions": {
    "#LogService.ClearLog": {
      "target": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Actions/LogService.ClearLog"
    }
  },
  "DateTime": "2023-09-27T14:28:50+00:00",
  "DateTimeLocalOffset": "+00:00",
  "Description": "System Event Log Service",
  "Entries": {
    "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries"
  },
  "Id": "EventLog",
  "Name": "Event Log Service",
  "Oem": {
    "Nvidia": {
      "@odata.type": "#NvidiaLogService.v1_0_0.NvidiaLogService",
      "LatestEntryID": "4",
      "LatestEntryTimeStamp": "2023-09-27T14:19:30+00:00"
    }
  },
  "OverWritePolicy": "WrapsWhenFull"
}

Display List of Events

Copy
Copied!
            

            
curl -k -u root:'<password>' -H 'Content-Type: application/json' -X GET https://<bmc_ip>/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries
{
  "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries",
  "@odata.type": "#LogEntryCollection.LogEntryCollection",
  "Description": "Collection of System Event Log Entries",
  "Members": [
    {
      "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/1",
      "@odata.type": "#LogEntry.v1_9_0.LogEntry",
      "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/1/attachment",
      "Created": "2023-09-27T14:18:39+00:00",
      "EntryType": "Event",
      "Id": "1",
      "Message": "12V_ATX sensor crossed a warning low threshold going low. Reading=6.048000 Threshold=10.400000.",
      "MessageArgs": [
        "12V_ATX",
        "6.048000",
        "10.400000"
      ],
      "MessageId": "OpenBMC.0.1.SensorThresholdWarningLowGoingLow",
      "Name": "System Event Log Entry",
      "Resolution": "",
      "Resolved": false,
      "Severity": "OK"
    }
    …
  ],
  "Members@odata.count": 1,
  "Name": "System Event Log Entries"
}

Clear SEL

Copy
Copied!
            

            
curl -k -u root:'<password>' -H 'Content-Type: application/json' -X POST https://<bmc_ip>/redfish/v1/Systems/Bluefield/LogServices/EventLog/Actions/LogService.ClearLog

Configure SEL Info Log Capacity

Copy
Copied!
            

            
curl -k -u root:'<password>' -H 'Content-Type: application/json' -X POST https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Actions/Oem/Nvidia/SelCapacity -d '{"ErrorInfoCap":300 }'

Get SEL Info Log Capacity

Copy
Copied!
            

            
curl -k -u root:'<password>' -H 'Content-Type: application/json' -X GET https://<bmc_ip>/redfish/v1/Managers/Bluefield_BMC/Oem/Nvidia/SelCapacity
{
"ErrorInfoCap": 300
}

SEL IPMI Commands

The following table lists the command to use to view event logs:

Command

Description
Copy
Copied!
            

            
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel

Displays information about SEL
Copy
Copied!
            

            
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel list

Displays list of events
Copy
Copied!
            

            
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel elist

Displays extended info list of events
Copy
Copied!
            

            
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel save <filename>

Saves SEL events to a file
Copy
Copied!
            

            
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN sel clear

Clears SEL

The following table lists the command to set/get the SEL capacity:

CommandDescription
Copy
Copied!
            

            
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN raw 0x0a 0x4a <capacity[0:7]> <capacity[8:15]> <capacity[16:23]> <capacity[24:31]>

Configure SEL Info log capacity. The capacity is 4 bytes value.

To set the capacity to 300 lines, the value should be '0x2c 0x01 0x00 0x00'.

Copy
Copied!
            

            
ipmitool -C 17 -I lanplus -H <bmc_ip> -U ADMIN -P ADMIN raw 0x0a 0x4b
Get SEL Info log capacity

SEL Message Types

The following subsections detail the messages which are added to the BMC SEL and the scenarios that trigger them.

UEFI Boot

Messages are added to the BMC SEL while the DPU UEFI is booting which describe the status of the UEFI boot.

SEL messages:

  • SMBus initialization
  • PCI resource configuration
  • System boot initiated

Example:

Copy
Copied!
            

            
SEL Record ID          : 0037
 Record Type           : 02
 Timestamp             : 06:36:06 UTC 06:36:06 UTC
 Generator ID          : 0001
 EvM Revision          : 04
 Sensor Type           : System Firmware
 Sensor Number         : 06
 Event Type            : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data            : c207ff
 Description           : PCI resource configuration

IPMB Sensors

QSFP Sensors

Messages are added to the SEL in case of a change in the status of the QSFP cables. The messages describe the event and status of the sensor.

List of QSFP sensors:

  • P0_link – the QSFP 0 cable status
  • P1_link – the QSFP 1 cable status

SEL messages:

  • Config Error – the QSFP cable is down
  • Connected – the QSFP cable is up

Example:

Copy
Copied!
            

            
SEL Record ID          : 003e
 Record Type           : 02
 Timestamp             : 07:08:28 UTC 07:08:28 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : Cable / Interconnect
 Sensor Number         : 00
 Event Type            : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data (RAW)      : 010f0f
 Event Interpretation  : Missing
 Description           : Config Error
 
Sensor ID              : p0_link (0x0)
 Entity ID             : 31.1
 Sensor Type (Discrete): Cable / Interconnect
 States Asserted       : Cable / Interconnect
                         [Config Error]

Temperature Sensors

Messages are added to the SEL if temperature sensors detect a value higher than the sensor thresholds. The messages include a description of the event, DPU FRU device description, DPU BMC device description, and the status of the sensor.

List of temperature sensors:

  • bluefield_temp – Bluefield temperature
  • p0_temp – QSFP 0 cable temperature
  • p1_temp – QSFP 1 cable temperature

SEL messages:

  • Upper Critical going high – crossing a upper critical threshold
  • Upper Non-critical going high – crossing a upper non-critical threshold
  • Lower Critical going low – crossing a lower critical threshold
  • Lower Non-critical going low – crossing a lower non-critical threshold

Example:

Copy
Copied!
            

            
SEL Record ID          : 003c
 Record Type           : 02
 Timestamp             : 07:01:06 UTC 07:01:06 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : Temperature
 Sensor Number         : 03
 Event Type            : Threshold
 Event Direction       : Assertion Event
 Event Data (RAW)      : 592802
 Trigger Reading       : 40.000degrees C
 Trigger Threshold     : 2.000degrees C
 Description           : Upper Critical going high
 
Sensor ID              : p0_temp (0x3)
 Entity ID             : 0.1
 Sensor Type (Threshold)  : Temperature
 Sensor Reading        : 40 (+/- 0) degrees C
 Status                : ok
 Lower Non-Recoverable : na
 Lower Critical        : -5.000
 Lower Non-Critical    : 0.000
 Upper Non-Critical    : 70.000
 Upper Critical        : 75.000
 Upper Non-Recoverable : na
 Positive Hysteresis   : Unspecified
 Negative Hysteresis   : Unspecified
 Assertion Events      : 
 Event Enable          : Event Messages Disabled
 Assertions Enabled    : lnc- lcr- unc+ ucr+ 
 Deassertions Enabled  : lnc+ lcr+ unc- ucr- 
 
FRU Device Description : Nvidia-BMCMezz (ID 169)
 Board Mfg Date        : Tue Jan  3 23:16:00 2023 UTC
 Board Mfg             : Nvidia
 Board Product         : Nvidia-BMCMezz
 Board Serial          : MT2251XZ02W5
 Board Part Number     : 900-9D3B6-00CV-AAA
 
FRU Device Description : BlueField-3 Smar (ID 250)
 Board Mfg Date        : Tue Jan  3 23:16:00 2023 UTC
 Board Mfg             : Nvidia
 Board Product         : BlueField-3 SmartNIC Main Card
 Board Serial          : MT2251XZ02W5
 Board Part Number     : 900-9D3B6-00CV-AAA
 Product Manufacturer  : Nvidia
 Product Name          : BlueField-3 SmartNIC Main Card
 Product Part Number   : 900-9D3B6-00CV-AAA
 Product Version       : A3
 Product Serial        : MT2251XZ02W5
 Product Asset Tag     : 900-9D3B6-00CV-AAA

ADC Sensors

Messages are added to the SEL if the sensor voltage crosses the sensor's thresholds. The messages include a description of the event, DPU FRU device description, DPU BMC device description, and the status of the sensor.

List of ADC sensors:

  • 1V_BMC
  • 1_2V_BMC
  • 1_8V
  • 1_8V_BMC
  • 2_5V
  • 3_3V
  • 3_3V_RGM
  • 5V
  • 12V_ATX
  • 12V_PCIe
  • DVDD
  • HVDD
  • VDD
  • VDDQ
  • VDD_CPU_L
  • VDD_CPU_R

SEL messages:

  • Upper Non-critical going high – crossing a upper non-critical threshold
  • Lower Non-critical going low – crossing a lower non-critical threshold

Example:

Copy
Copied!
            

            
SEL Record ID          : 0042
 Record Type           : 02
 Timestamp             : 09:20:50 UTC 09:20:50 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : Voltage
 Sensor Number         : 06
 Event Type            : Threshold
 Event Direction       : Assertion Event
 Event Data (RAW)      : 50a9ff
 Trigger Reading       : 1.200Volts
 Trigger Threshold     : 1.810Volts
 Description           : Lower Non-critical going low
 
Sensor ID              : 1_2V_BMC (0x6)
 Entity ID             : 0.1
 Sensor Type (Threshold)  : Voltage
 Sensor Reading        : 1.200 (+/- 0) Volts
 Status                : ok
 Lower Non-Recoverable : na
 Lower Critical        : na
 Lower Non-Critical    : 1.143
 Upper Non-Critical    : 1.257
 Upper Critical        : na
 Upper Non-Recoverable : na
 Positive Hysteresis   : Unspecified
 Negative Hysteresis   : Unspecified
 Assertion Events      : 
 Event Enable          : Event Messages Disabled
 Assertions Enabled    : lnc- unc+ 
 Deassertions Enabled  : lnc+ unc- 
 
FRU Device Description : Nvidia-BMCMezz (ID 169)
 Board Mfg Date        : Tue Jan  3 23:16:00 2023 UTC
 Board Mfg             : Nvidia
 Board Product         : Nvidia-BMCMezz
 Board Serial          : MT2251XZ02W5
 Board Part Number     : 900-9D3B6-00CV-AAA
 
FRU Device Description : BlueField-3 Smar (ID 250)
 Board Mfg Date        : Tue Jan  3 23:16:00 2023 UTC
 Board Mfg             : Nvidia
 Board Product         : BlueField-3 SmartNIC Main Card
 Board Serial          : MT2251XZ02W5
 Board Part Number     : 900-9D3B6-00CV-AAA
 Product Manufacturer  : Nvidia
 Product Name          : BlueField-3 SmartNIC Main Card
 Product Part Number   : 900-9D3B6-00CV-AAA
 Product Version       : A3
 Product Serial        : MT2251XZ02W5
 Product Asset Tag     : 900-9D3B6-00CV-AAA

System Commands

Warm Rebooting DPU

SEL messages:

Copy
Copied!
            

            
System boot initiated
Initiated by warm reset

Example:

Copy
Copied!
            

            
SEL Record ID          : 0001
 Record Type           : 02
 Timestamp             : 01/10/24 14:25:07 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : System Boot Initiated
 Sensor Number         : 17
 Event Type            : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data            : 020000
 Description           : Initiated by warm reset

Hard Rebooting DPU

SEL messages:

Copy
Copied!
            

            
System boot initiated
Initiated by hard reset

Example:

Copy
Copied!
            

            
SEL Record ID          : 0008
 Record Type           : 02
 Timestamp             : 01/10/24 14:33:01 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : System Boot Initiated
 Sensor Number         : 17
 Event Type            : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data            : 010000
 Description           : Initiated by hard reset

Shutting Down DPU

SEL messages:

Copy
Copied!
            

            
OS Critical Stop
OS graceful shutdown

Example:

Copy
Copied!
            

            
SEL Record ID          : 000a
 Record Type           : 02
 Timestamp             : 01/10/24 14:34:45 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : OS Critical Stop
 Sensor Number         : 18
 Event Type            : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data            : 030000
 Description           : OS graceful shutdown

Updating BMC

SEL messages:

Copy
Copied!
            

            
OS Critical Stop
OS graceful shutdown

Example:

Copy
Copied!
            

            
SEL Record ID          : 0010
 Record Type           : 02
 Timestamp             : 01/10/24 15:48:01 UTC
 Generator ID          : 0020
 EvM Revision          : 04
 Sensor Type           : Version Change
 Sensor Number         : 19
 Event Type            : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data            : c70e00
 Description           : Firmware or software change success, Mngmt SW agent change

Redfish Event Log

System Commands

Adding BMC User

Copy
Copied!
            

            
    {
      "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/3",
      "@odata.type": "#LogEntry.v1_13_0.LogEntry",
      "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/3/attachment",
      "Created": "2024-01-10T14:25:14+00:00",
      "EntryType": "Event",
      "Id": "3",
      "Message": "BMC User Create test0",
      "Modified": "2024-01-10T14:25:14+00:00",
      "Name": "System Event Log Entry",
      "Resolved": false,
      "Severity": "OK"
    }


Deleting BMC User

Copy
Copied!
            

            
    {
      "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/2",
      "@odata.type": "#LogEntry.v1_13_0.LogEntry",
      "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/2/attachment",
      "Created": "2024-01-10T14:25:14+00:00",
      "EntryType": "Event",
      "Id": "2",
      "Message": "BMC User Delete test0",
      "Modified": "2024-01-10T14:25:14+00:00",
      "Name": "System Event Log Entry",
      "Resolved": false,
      "Severity": "OK"
    }


Renaming BMC User

Copy
Copied!
            

            
    {
      "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/2",
      "@odata.type": "#LogEntry.v1_13_0.LogEntry",
      "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/2/attachment",
      "Created": "2024-01-10T14:25:14+00:00",
      "EntryType": "Event",
      "Id": "2",
      "Message": "BMC User Rename test0 To test1",
      "Modified": "2024-01-10T14:25:14+00:00",
      "Name": "System Event Log Entry",
      "Resolved": false,
      "Severity": "OK"
    }


Changing BMC IPv6

Copy
Copied!
            

            
    {
      "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/21",
      "@odata.type": "#LogEntry.v1_13_0.LogEntry",
      "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/21/attachment",
      "Created": "2024-01-10T15:53:57+00:00",
      "EntryType": "Event",
      "Id": "21",
      "Message": "BMC IPv6 Address Change",
      "Modified": "2024-01-10T15:53:57+00:00",
      "Name": "System Event Log Entry",
      "Resolved": false,
      "Severity": "OK"
    }


Changing BMC IPv4

Copy
Copied!
            

            
    {
      "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/19",
      "@odata.type": "#LogEntry.v1_13_0.LogEntry",
      "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/19/attachment",
      "Created": "2024-01-10T15:53:57+00:00",
      "EntryType": "Event",
      "Id": "19",
      "Message": "BMC IPv4 Address Change",
      "Modified": "2024-01-10T15:53:57+00:00",
      "Name": "System Event Log Entry",
      "Resolved": false,
      "Severity": "OK"
    }


Resetting BMC Soft

Copy
Copied!
            

            
    {
      "@odata.id": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/17",
      "@odata.type": "#LogEntry.v1_13_0.LogEntry",
      "AdditionalDataURI": "/redfish/v1/Systems/Bluefield/LogServices/EventLog/Entries/17/attachment",
      "Created": "2024-01-10T15:52:46+00:00",
      "EntryType": "Event",
      "Id": "17",
      "Message": "BMC Soft Reset",
      "Modified": "2024-01-10T15:52:46+00:00",
      "Name": "System Event Log Entry",
      "Resolved": false,
      "Severity": "OK"
    }

