Dump Me Now (DMN)

DMN is a bus driver (mlx4_bus.sys) feature that generates dumps and traces from various components, including hardware, firmware and software, upon internally detected issues (by the resiliency sensors), user requests (mlxtool). DMN is unsupported on VFs.

DMN dumps are crucial for offline debugging. Once an issue is hit, the dumps can provide useful information about the NIC's state at the time of the failure. This includes hardware state dumps, firmware traces and various driver component state and resource dumps.

For information on the relevant registry keys for this feature, please refer to Configuration below.

The DMN feature supports the following triggering APIs:

  1. mlxtool.exe tool can be used to trigger DMN by running the dump-me-now debug subcommand:

    Copy
    Copied!
                

    mlxtool.exe dbg dump-me-now bus dev func

    For example:

    Copy
    Copied!
                

    > mlxtool.exe dbg dump-me-now 8 0 0

    The BDFs (bus device function) of the installed Mellanox devices can be found, using the command:

    Copy
    Copied!
                

    > mlxtool.exe show devices

  2. An internal API between different driver components, in order to support generating DMN upon self-detected errors and failures (by the resiliency feature).

The DMN feature generates a directory per incident, where it places all of the needed NIC dump files.

The DMN incident directory name includes a timestamp, dump type, DMN event source and reason. It uses the following directory naming scheme:
dmn-<device name>-<type of DMN>-<source of DMN trigger>-<reason>--<timestamp>

Example:
dmn-GENERAL-SH-NA-4.13.2017–07.49.02.747

In this example:

  1. The dump type is "general”.

  2. The DMN was triggered internally by the self-healing feature.

  3. In this version of the driver the cause for the dump is not available in case of a self-healing trigger.

  4. The dump was created on April 13th, 2017 at 747 milliseconds after 7:49:02 AM.

In this version of the driver, the DMN generates the following dump files upon a DMN event. Additional files will be added in the future:

  1. MST dump - Adapter's configuration space registers contents.

  2. Firmware commands dump - history of recent firmware commands and their status.

  3. EQ dump - Commands event queue contents.

  4. Firmware traces - Traces generated by the firmware, and collected by the driver

  5. IOV objects state in case of SR-IOV-Setup

    Warning

    In this version of the driver, the firmware traces that are logged into the driver's WPP session are not an actual part of the DMN dump, and should be collected separately by the user.

DMN incident dumps are created under the DMN root directory, which can be controlled via the registry. The default is \Systemroot\temp\Mlx4_Dump_Me_Now.

The driver manages the DMN incident dumps in a cyclic fashion, in order to limit the amount of disk space used for saving DMN dumps, and avoid low disk space conditions that can be caused from creating the dumps.

Rather than using a simple cyclic override scheme by replacing the oldest DMN incident folder every time it generates a new one, the driver allows the user to determine whether the first N incident folders should be preserved or not. This means that the driver will maintain a cyclic overriding scheme starting from a given index.

The two registry keys used to control this behavior are DumpMeNowTotalCount, which specifies the maximum number of allowed dumps under the DMN root folder, and DumpMeNowPreservedCountMin, which specifies the number of reserved incident folders that will not be overridden by the cyclic algorithm.

The following diagram illustrates the cyclic scheme’s work, assuming DumpMeNowPreservedCountMin=2 and DumpMeNowTotalCount=16:

image2019-3-12_16-6-49.png

The registry keys for the DMN feature are located in: HKLM\SYSTEM\CurrentControlSet\Services\mlx4_bus\Parameters

The DMN dump is controlled by the following registry keys:

Dump Me Now Configurations

Key Name

Key Type

Default

Values

Description

DumpMeNowDirectory

REG_SZ

\Systemroot\temp\Mlx4_Dump_Me_Now

File system path

Path to the root directory in which the DMN places its dumps. The path should be provided in kernel path style, which means prefixing the drive name with "\??\" (e.g. \??\C:\DMN_DIR).

DumpMeNowTotalCount

REG_SZ

0-0xFFFF

128

The maximum number of allowed DMN dumps. Newer dumps beyond this number will override old ones.

DumpMeNowPreservedCountMin

REG_SZ

0-0xFFFF

8

The number of DMN dumps that will be reserved and will never be overridden by newer DMN dumps.

Certain per-device resiliency registry keys determine when to trigger the DMN from the resiliency feature, and what sensors are allowed to perform that.

DMN-IOV Configuration

The DMN-IOV detail level can be configured by the "DmnIovMode" value that is located in device parameters registry key.

The default value is 2. The acceptable values are 0-4:

DMN-IOV Configuration

Values

Description

0

The feature is disabled

1

Major IOV objects and their state will be listed

2

All VF hardware resources and their state will be listed in the dump (QPs, CQs, MTTs, etc.)

3

All QP to Ring mapping will be added (the huge dump)

4

All IOV objects and their state will be list

Upon the success or failure of generating a dump, the DMN generates an event to the system event log. The following two events are used for that purpose:

Event Logs

Event ID

Severity

Message

0x100

Info

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx4_Dump_Me_Now
or a folder that was set by the registry keyword HKLM\SYSTEM\CurrentControlSet\Services\mlx4_bus\Parameters\DumpMeNowDirectory.

0x101

Error

<device name>: Failed to create a full dump me now.
Dump me now root directory: <path to root DMN folder>
Failure: <Failure description>
Status: <status code>

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.