DMN is a bus driver (mlx4_bus.sys) feature that generates dumps and traces from various components, including hardware, firmware and software, upon internally detected issues (by the resiliency sensors), user requests (mlxtool). DMN is unsupported on VFs.
DMN dumps are crucial for offline debugging. Once an issue is hit, the dumps can provide useful information about the NIC's state at the time of the failure. This includes hardware state dumps, firmware traces and various driver component state and resource dumps.
For information on the relevant registry keys for this feature, please refer to Configuration below.
DMN Triggers and APIs
The DMN feature supports the following triggering APIs:
mlxtool.exe tool can be used to trigger DMN by running the dump-me-now debug subcommand:
The BDFs (bus device function) of the installed Mellanox devices can be found, using the command:
- An internal API between different driver components, in order to support generating DMN upon self-detected errors and failures (by the resiliency feature).
Dumps and Incident Folders
The DMN feature generates a directory per incident, where it places all of the needed NIC dump files.
The DMN incident directory name includes a timestamp, dump type, DMN event source and reason. It uses the following directory naming scheme:
dmn-<device name>-<type of DMN>-<source of DMN trigger>-<reason>--<timestamp>
In this example:
- The dump type is "general”.
- The DMN was triggered internally by the self-healing feature.
- In this version of the driver the cause for the dump is not available in case of a self-healing trigger.
- The dump was created on April 13th, 2017 at 747 milliseconds after 7:49:02 AM.
In this version of the driver, the DMN generates the following dump files upon a DMN event. Additional files will be added in the future:
- MST dump - Adapter's configuration space registers contents.
- Firmware commands dump - history of recent firmware commands and their status.
- EQ dump - Commands event queue contents.
- Firmware traces - Traces generated by the firmware, and collected by the driver
IOV objects state in case of SR-IOV-Setup
In this version of the driver, the firmware traces that are logged into the driver's WPP session are not an actual part of the DMN dump, and should be collected separately by the user.
DMN incident dumps are created under the DMN root directory, which can be controlled via the registry. The default is
Cyclic DMN Mechanism
The driver manages the DMN incident dumps in a cyclic fashion, in order to limit the amount of disk space used for saving DMN dumps, and avoid low disk space conditions that can be caused from creating the dumps.
Rather than using a simple cyclic override scheme by replacing the oldest DMN incident folder every time it generates a new one, the driver allows the user to determine whether the first N incident folders should be preserved or not. This means that the driver will maintain a cyclic overriding scheme starting from a given index.
The two registry keys used to control this behavior are DumpMeNowTotalCount, which specifies the maximum number of allowed dumps under the DMN root folder, and DumpMeNowPreservedCountMin, which specifies the number of reserved incident folders that will not be overridden by the cyclic algorithm.
The following diagram illustrates the cyclic scheme’s work, assuming DumpMeNowPreservedCountMin=2 and DumpMeNowTotalCount=16:
The registry keys for the DMN feature are located in:
The DMN dump is controlled by the following registry keys:
Dump Me Now Configurations
|Key Name||Key Type||Default||Values||Description|
|DumpMeNowDirectory||REG_SZ||\Systemroot\temp\Mlx4_Dump_Me_Now||File system path|
Path to the root directory in which the DMN places its dumps. The path should be provided in kernel path style, which means prefixing the drive name with "\??\" (e.g. \??\C:\DMN_DIR).
The maximum number of allowed DMN dumps. Newer dumps beyond this number will override old ones.
The number of DMN dumps that will be reserved and will never be overridden by newer DMN dumps.
Certain per-device resiliency registry keys determine when to trigger the DMN from the resiliency feature, and what sensors are allowed to perform that.
The DMN-IOV detail level can be configured by the "DmnIovMode" value that is located in device parameters registry key.
The default value is 2. The acceptable values are 0-4:
|0||The feature is disabled|
|1||Major IOV objects and their state will be listed|
|2||All VF hardware resources and their state will be listed in the dump (QPs, CQs, MTTs, etc.)|
|3||All QP to Ring mapping will be added (the huge dump)|
All IOV objects and their state will be list
Upon the success or failure of generating a dump, the DMN generates an event to the system event log. The following two events are used for that purpose:
|0x100||Info||<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request.|
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx4_Dump_Me_Now
or a folder that was set by the registry keyword HKLM\SYSTEM\CurrentControlSet\Services\mlx4_bus\Parameters\DumpMeNowDirectory.
<device name>: Failed to create a full dump me now.