CRDUMP feature allows for taking an automatic snapshot of the device CR-Space in case the device's FW/HW fails to function properly.

Snapshots Triggers:

ConnectX-3 adapters family - the snapshot is triggered in case the driver detects any of the following issues:

Critical event, such as a command timeout Critical FW command failure PCI errors Internal FW error

ConnectX-4/ConnectX-5 adapters family - the snapshot is triggered after firmware detects a critical issue, requiring a recovery flow (see Reset Flow ).

This snapshot can later be investigated and analyzed to track the root cause of the failure.

Currently, only the first snapshot is stored, and is exposed using a temporary virtual file. The virtual file is cleared upon driver reset.

When a critical event is detected, a message indicating CRDUMP collection will be printed to the Linux log. User should then back up the file pointed to in the printed message. The file location format is:

For mlx4 driver: /proc/driver/mlx4_core/crdump/<pci address>

For mlx5 driver: /proc/driver/mlx5_core/crdump/<pci address>

Example - the following message is printed to the log:

Copy Copied! [ 257480.719070 ] mlx4_core 0000 : 00 : 05.0 : Internal error detected: [ 257480.726019 ] mlx4_core 0000 : 00 : 05.0 : buf[ 00 ]: 0fffffff [ 257480.732082 ] mlx4_core 0000 : 00 : 05.0 : buf[ 01 ]: 00000000 .... [ 257480.806531 ] mlx4_core 0000 : 00 : 05.0 : buf[0f]: 00000000 [ 257480.811534 ] mlx4_core 0000 : 00 : 05.0 : device is going to be reset [ 257482.781154 ] mlx4_core 0000 : 00 : 05.0 : crdump: Crash snapshot collected to /proc/driver/mlx4_core/crdump/ 0000 : 00 : 05.0 [ 257483.789230 ] mlx4_core 0000 : 00 : 05.0 : device was reset successfully

Snapshot should be copied by Linux standard tool for future investigation.