Logging
RShim logging uses an internal 1KB HW buffer to track booting progress and record important messages. It is written by the NVIDIA ® BlueField ® Arm cores and is displayed by the RShim driver from the USB/PCIe host machine. Starting in release 2.5.0, ATF has been enhanced to support the RShim logging.
The RShim log messages can be displayed described in the following:
Check the DISPLAY_LEVEL level in file /dev/rshim0/misc.
# cat /dev/rshim0/misc DISPLAY_LEVEL 0 (0:basic, 1:advanced, 2:log) …
Set the DISPLAY_LEVEL to 2.
# echo "DISPLAY_LEVEL 2" > /dev/rshim0/misc
Log messages are displayed in the misc file.
The following is an example output for BlueField-2:
# cat /dev/rshim0/misc ... --------------------------------------- Log Messages --------------------------------------- INFO[BL2]: start INFO[BL2]: no DDR on MSS0 INFO[BL2]: calc DDR freq (clk_ref 53836948) INFO[BL2]: DDR POST passed INFO[BL2]: UEFI loaded INFO[BL31]: start INFO[BL31]: runtime INFO[UEFI]: eMMC init INFO[UEFI]: eMMC probed INFO[UEFI]: PCIe enum start INFO[UEFI]: PCIe enum end
The following is an example output for BlueField:
# cat /dev/rshim0/misc ... --------------------------------------- Log Messages --------------------------------------- INFO[BL2]: start INFO[BL2]: no DDR on MSS0 INFO[BL2]: calc DDR freq (clk_ref 53836948) INFO[BL2]: DDR POST passed INFO[BL2]: UEFI loaded INFO[BL31]: start INFO[BL31]: runtime INFO[UEFI]: eMMC init INFO[UEFI]: eMMC probed
The following table details the ATF/UEFI messages for BlueField-2:
Message |
Explanation |
Action |
INFO[BL2]: start |
BL2 started |
Informational |
INFO[BL2]: no DDR on MSS<N> |
DDR is not detected on memory controller <N> |
Informational (depends on device) |
INFO[BL2]: calc DDR freq (clk_ref 156M, clk xxx) |
DDR frequency is calculated based on reference clock 156M |
Informational |
INFO[BL2]: calc DDR freq (clk_ref 100M, clk xxx) |
DDR frequency is calculated based on reference clock 100M |
Informational |
INFO[BL2]: calc DDR freq (clk_ref xxxx) |
DDR frequency is calculated based on reference clock xxxx |
Informational |
INFO[BL2]: DDR POST passed |
BL2 DDR training passed |
Informational |
INFO[BL2]: UEFI loaded |
UEFI image is loaded successfully in BL2 |
Informational |
ERR[BL2]: DDR init fail on MSS<N> |
DDR initialization failed on memory controller <N> |
Informational (depends on device) |
ERR[BL2]: image <N> bad CRC |
Image with ID <N> is corrupted which will cause hang |
Error message. Reset the device and retry. If problem persists, use a different image to retry it. |
ERR[BL2]: DDR BIST failed |
DDR BIST failed |
Need to retry. Check the ATF booting message whether the detected OPN is correct or not, or whether it is supported by this image. If still fails, contact NVIDIA Support. |
ERR[BL2]: DDR BIST Zero Mem failed |
DDR BIST failed in the zero-memory operation |
Power-cycle and retry. If the problem persists, contact your NVIDIA FAE. |
WARN[BL2]: DDR frequency unsupported |
DDR training is programmed with unsupported parameters |
Check whether official FW is being used. If the problem persists, contact your NVIDIA FAE. |
WARN[BL2]: DDR min-sys(unknown) |
System type cannot be determined and boot as a minimal system |
Check whether the OPN or PSID is supported. If the problem persists, contact your NVIDIA FAE. |
WARN[BL2]: DDR min-sys(misconf) |
System type misconfigured and boot as a minimal system |
Check whether the OPN or PSID is supported. If the problem persists, contact your NVIDIA FAE. |
Exception(BL2): syndrome = xxxxxxxx |
Exception in BL2 with syndrome code and register dump. System hung. |
Capture the log, analyze the cause, and report to FAE if needed |
PANIC(BL2): PC = xxx |
Panic in BL2 with register dump. System will hung. |
Capture the log, analyze the cause, and report to FAE if needed |
ERR[BL2]: load/auth failed |
Failed to load image (non-existent/corrupted), or image authentication failed when secure boot is enabled |
Try again with the correct and properly signed image |
INFO[BL31]: start |
BL31 started |
Informational |
INFO[BL31]: runtime |
BL31 enters the runtime state. This is the latest BL31 message in normal booting process. |
Informational |
Exception(BL31): syndrome = xxxxxxxx |
Exception in BL31 with syndrome code and register dump. System hung. |
Capture the log, analyze the cause, and report to FAE if needed |
PANIC(BL31): PC = xxx |
Panic in BL31 with register dump. System hung. |
Capture the log, analyze the cause, and report to FAE if needed |
INFO[UEFI]: eMMC init |
eMMC driver is initialized |
Informational and should always be printed |
INFO[UEFI]: eMMC probed |
eMMC card is initialized |
Informational and should always be printed |
ASSERT(UEFI]: xxx : line-no |
Runtime assert message in UEFI |
Contact your NVIDIA FAE with this information. Usually the system is able to continue running. |
INFO[UEFI]: PCIe enum start |
PCIe enumeration start |
Informational |
INFO[UEFI]: PCIe enum end |
PCIe enumeration end |
Informational |
ERR[UEFI]: Synchronous Exception at xxxxxx |
UEFI Exception with PC value reported |
Contact your NVIDIA FAE with this information |
The following table details the ATF/UEFI messages for BlueField:
Message |
Explanation |
Action |
INFO[BL2]: start |
BL2 started |
Informational |
INFO[BL2]: no DDR on MSS<N> |
DDR is not detected on memory controller <N> |
Informational (depends on device) |
INFO[BL2]: calc DDR freq (clk_ref 156M, clk xxx) |
DDR frequency is calculated based on reference clock 156M |
Informational |
INFO[BL2]: calc DDR freq (clk_ref 100M, clk xxx) |
DDR frequency is calculated based on reference clock 100M |
Informational |
INFO[BL2]: calc DDR freq (clk_ref xxxx) |
DDR frequency is calculated based on reference clock xxxx |
Informational |
INFO[BL2]: DDR POST passed |
BL2 DDR training passed |
Informational |
INFO[BL2]: UEFI loaded |
UEFI image is loaded successfully in BL2 |
Informational |
ERR[BL2]: DDR init fail on MSS<N> |
DDR initialization failed on memory controller <N> |
Informational (depends on device) |
ERR[BL2]: image <N> bad CRC |
Image with ID <N> is corrupted which will cause hang |
Error message. Reset the device and retry. If problem persists, use a different image to retry it. |
ERR[BL2]: DDR BIST failed |
DDR BIST failed |
Need to retry. Check the ATF booting message whether the detected OPN is correct or not, or whether it is supported by this image. If still fails, contact NVIDIA Support. |
ERR[BL2]: DDR BIST Zero Mem failed |
DDR BIST failed in the zero-memory operation |
Power-cycle and retry. If the problem persists, contact your NVIDIA FAE. |
WARN[BL2]: DDR frequency unsupported |
DDR training is programmed with unsupported parameters |
Check whether official FW is being used. If the problem persists, contact your NVIDIA FAE. |
WARN[BL2]: DDR min-sys(unknown) |
System type cannot be determined and boot as a minimal system |
Check whether the OPN or PSID is supported. If the problem persists, contact your NVIDIA FAE. |
WARN[BL2]: DDR min-sys(misconf) |
System type misconfigured and boot as a minimal system |
Check whether the OPN or PSID is supported. If the problem persists, contact your NVIDIA FAE. |
Exception(BL2): syndrome = xxxxxxxx |
Exception in BL2 with syndrome code and register dump. System hung. |
Capture the log, analyze the cause, and report to FAE if needed |
PANIC(BL2): PC = xxx |
Panic in BL2 with register dump. System will hung. |
Capture the log, analyze the cause, and report to FAE if needed |
ERR[BL2]: load/auth failed |
Failed to load image (non-existent/corrupted), or image authentication failed when secure boot is enabled |
Try again with the correct and properly signed image |
INFO[BL31]: start |
BL31 started |
Informational |
INFO[BL31]: runtime |
BL31 enters the runtime state. This is the latest BL31 message in normal booting process. |
Informational |
Exception(BL31): syndrome = xxxxxxxx |
Exception in BL31 with syndrome code and register dump. System hung. |
Capture the log, analyze the cause, and report to FAE if needed |
PANIC(BL31): PC = xxx |
Panic in BL31 with register dump. System hung. |
Capture the log, analyze the cause, and report to FAE if needed |
INFO[UEFI]: eMMC init |
eMMC driver is initialized |
Informational and should always be printed |
INFO[UEFI]: eMMC probed |
eMMC card is initialized |
Informational and should always be printed |
ASSERT(UEFI]: xxx : line-no |
Runtime assert message in UEFI |
Contact your NVIDIA FAE with this information. Usually the system is able to continue running. |
ERR[UEFI]: Synchronous Exception at xxxxxx |
UEFI Exception with PC value reported |
Contact your NVIDIA FAE with this information |
During UEFI boot, the BlueField sends IPMI SEL messages over IPMB to the BMC in order to track boot progress and report errors. The BMC must be in responder mode to receive the log messages.
SEL Record Format
The following table presents standard SEL records (record type = 0x02).
Byte(s) |
Field |
Description |
1 |
Record ID |
ID used to access SEL record. Filled in by the BMC. Is initialized to zero when coming from UEFI. |
3 |
Record Type |
Record type |
4 |
Timestamp |
Time when event was logged. Filled in by BMC. Is initialized to zero when coming from UEFI. |
8 |
Generator ID |
This value is always 0x0001 when coming from UEFI |
10 |
EvM Rev |
Event message format revision which provides the version of the standard a record is using. |
11 |
Sensor Type |
Sensor type code for sensor that generated the event |
12 |
Sensor Number |
Number of the sensor that generated the event. |
13 |
Event Dir | |
[7] – 0b0 = Assertion, 0b1 = Deassertion |
14 |
Event Data 1 |
[7:6] – Type of data in Event Data 2
[5:4] – Type of data in Event Data 3
[3:0] – Event Offset; offers more detailed event categories. See IPMI 2.0 Specification section 29.7 for more detail. |
15 |
Event Data 2 |
Data attached to the event. 0xFF for unspecified. |
16 |
Event Data 3 |
Data attached to the event. 0xFF for unspecified. |
See IPMI 2.0 Specification section 32.1 for more detail.
Possible SEL Field Values
BlueField UEFI implements a subset of the IPMI 2.0 SEL standard. Each field may have the following values:
Field |
Possible Values |
Description of Values |
Record Type |
0x02 |
Standard SEL record. All events sent by UEFI are standard SEL records. |
Event Dir |
0b0 |
All events sent by UEFI are assertion events |
Event Type |
0x6F |
Sensor-specific discrete events. Events with this type do not deviate from the standard. |
Sensor Number |
0x06 |
UEFI boot progress “sensor”. If value is 0x06, the sensor type will always be “System Firmware Progress” (0x0F). |
For Sensor Type, Event Offset, and Event Data 1-3 definitions, see next table.
Event Definitions
Events are defined by a combination of Record Type, Event Type, Sensor Type, Event Offset (occupies Event Data 1), and sometimes Event Data 2 (referred to as the Event Extension if it defines sub-events).
The following tables list all currently implemented IPMI events (with Record Type = 0x02, Event Type = 0x6F).
Note that if an Event Data 2 or Event Data 3 value is not specified, it can be assumed to be Unspecified (0xFF).
Sensor Type |
Sensor Type Code |
Event Offset |
Event Description, Actions to Take |
System Firmware Progress |
0x0F |
0x00 |
System firmware error (POST error). Event Data 2:
|
0x02 |
System firmware progress: Informational message, no actions needed. Event Data 2:
|
Reading IPMI SEL Log Messages
Log messages may be read from the BMC by issuing it a “Get SEL Entry Command” while it is in responder mode, either from a remote host, or from the BlueField DPU itself once it is booted.
$ ipmitool sel list
7b | Pre-Init |0000691604| System Firmwares #0x06 | SMBus initialization | Asserted
7c | Pre-Init |0000691604| System Firmwares #0x06 | Hard-disk initialization | Asserted
7d | Pre-Init |0000691654| System Firmwares #0x06 | System boot initiated
$ ipmitool sel get 0x7d
SEL Record ID : 007d
Record Type : 02
Timestamp : 01/09/1970 00:07:34
Generator ID : 0001
EvM Revision : 04
Sensor Type : System Firmwares
Sensor Number : 06
Event Type : Sensor-specific Discrete
Event Direction : Assertion Event
Event Data : c213ff
Description : System boot initiated
$ ipmitool sel clear
Clearing SEL. Please allow a few seconds to erase.
$ ipmitool sel list
SEL has no entries
ACPI boot error record table (BERT) is supported to log last boot error in Linux. Once Linux printk is enabled (e.g., by adding "kernel.printk=8" to /etc/sysctl.conf), it will try to report the errors automatically for last boot. The following is an example of such error reports:
[ 2.635539] BERT: Error records from previous boot:
[ 2.640434] [Hardware Error]: event severity: fatal
[ 2.645331] [Hardware Error]: Error 0, type: fatal
[ 2.650236] [Hardware Error]: section type: unknown, c6adf9e6-1108-4760-8827-003d059fe2e1
[ 2.658606] [Hardware Error]: section length: 0x35
[ 2.663580] [Hardware Error]: 00000000: 52524520 4645555b 203a5d49 0a0d0a0d ERR[UEFI]: ....
[ 2.672284] [Hardware Error]: 00000010: 636e7953 6e6f7268 2073756f 65637845 Synchronous Exce
[ 2.680987] [Hardware Error]: 00000020: 6f697470 7461206e 36783020 37313643 ption at 0x6C617
[ 2.689696] [Hardware Error]: 00000030: 34 37 30 0d 0a
...