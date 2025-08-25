DMN generates dumps and traces from various components, including hardware, firmware and software, upon user requests, upon internally detected issues (by the resiliency sensors) and ND application requests via the extended NVIDIA® ND API.

DMN dumps are crucial for offline debugging. Once an issue is hit, the dumps can provide useful information about the NIC's state at the time of the failure. This includes hardware state dumps, firmware traces and various driver component state and resource dumps.

For information on the relevant registry keys for this feature, please refer to Dump Me Now (DMN) Registry Keys.

DMN supports three triggering APIs:

mlx5Cmd.exe can be used to trigger DMN by running the -Dmn sub command: Copy Copied! Mlx5Cmd -Dmn -hh | -Name <adapter name> Submit dump-me-now request Options: -hh Show this help screen -Name <adapter name> Network adapter name -NoMstDump Run DMN without mst dump -CoreDumpQP<QP number> Run DMN with QP Core Dump ND SPI NVIDIA® extension (defined in ndspi_ext_mlx.h): API function to generate a general DMN dump from an ND application: Copy Copied! HRESULT Nd2AdapterControlDumpMeNow( __in IND2AdapterControl* pCtrl, __in HANDLE hOverlappedFile, __inout OVERLAPPED* pOverlapped ); API function to generate a QP based DMN dump from an ND application. The function generates a dump that might include more information about the queue pair specified by its number. Copy Copied! HRESULT Nd2AdapterControlDumpQpNow( __in IND2AdapterControl* pCtrl, __in HANDLE hOverlappedFile, __in ULONG Qpn, __inout OVERLAPPED* pOverlapped ); An internal API between different driver components, in order to support generating DMN upon self-detected errors and failures (by the resiliency feature).

DMN generates a directory per incident, where it places all of the needed NIC dump files. There is a mechanism to limit the number of created Incident Directories. For further information, see Cyclic DMN Mechanism.

The DMN incident directory name includes a timestamp, dump type, DMN event source and reason. It uses the following directory naming scheme: dmn-<type of DMN>-<source of DMN trigger>-<reason>-<timestamp>

Example:

Copy Copied! dmn-GN-USR-NA- 4.13 . 2017 - 07.49 . 02.747

In this example:

GN: The dump type is "General”

USR: The DMN was triggered by mlx5Cmd (user)

NA: In this version of the driver, the cause for the dump is not available in case of mlx5Cmd triggering

The dump was created on April 13th, 2017 at 747 milliseconds after 7:49:02 AM

In this version of the driver, the DMN generates the following dump files upon a DMN event:

IPoIB: The adapter’s IPoIB state

PDDR: The port diagnostics database

General

mst files

Registry

DMN incident dumps are created under the DMN root directory, which can be controlled via the registry. The root directory will include the port identification in its name.

The default is:

Host: "\Systemroot\temp\Mlx5_Dump_Me_Now-<b>-<d>-<f>"

VF: "\Systemroot\temp\Mlx5_Dump_Me_Now-<b>-<d>". See section Dump Me Now (DMN) Registry Keys.

Upon several types of events, the drivers can produce a set of files reflecting the current state of the adapter.

Automatic state dumps via DMN are done upon the following events:

Event Type Description Provider Default Tag CMD_FAILED Command failure Mlx5 On FAILED CMD_TIMEOUT Timeout reached on a command Mlx5 On TOUT RESILIENCY Resiliency sensor was activated Mlx5 OFF RES EQ_STUCK Driver decided that an event queue is stuck Mlx5 On EQ TXCQ_STUCK Driver decided that a transmit completion queue is stuck Mlx5 On TXCQ RXCQ_STUCK Driver decided that a receive completion queue is stuck Mlx5 On RXCQ PORT_STATE Adapter passed to “port up” state, “port down” state or “port unknown” state. Mlx5 On PORT USER User application asked to generate dump files Mlx5 N/A USR

where

Provider The driver creating the set of files. Default Whether or not the state dumps are created by default upon this event. Tag Part of the file name, used to identify the event that has triggered the state dump.

Dump events can be enabled/disabled by adding DWORD32 parameters into HKLM\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1- 08002be10318}\<nn> as follows:

Dump events can be disabled by adding MstDumpMode parameter as follows: Copy Copied! MstDumpMode 0

PORT_STATE events can be disabled by adding EnableDumpOnUnknownLink and EnableDumpOnPortDown parameters as follows: Copy Copied! EnableDumpOnUnknownLink 0 EnableDumpOnPortDown 0 EnableDumpOnPortUp 0 Note As of WinOF-2 v2.10, the registry keys above can be changed dynamically. In any case of an illegal input, the value will fall back to the default value and not to the last value used.

EQ_STUCK, TXCQ_STUCK and RXCQ_STUCK events can be disabled by adding DisableDumpOnEqStuck, DisableDumpOnTxCqStuck and DisableDumpOnRxCqStuck parameters as follows: Copy Copied! DisableDumpOnEqStuck 1 DisableDumpOnTxCqStuck 1 DisableDumpOnRxCqStuck 1

The set consists of 2 consecutive mstdump files. These files are created in the same directory as the DMN, and should be sent to NVIDIA® Support for analysis when debugging WinOF2 driver problems.

Their names have the following format: <event_name>-<dump_mode>_<file_index>.txt

<event_name>

Event name Description poll-tout-<OPCODE> Timeout reached on command with polling mode, OPCODE is the command opcode in the driver. wait-tout-<OPCODE> Timeout reached on command while waiting, OPCODE is the command opcode in the driver. poll-failed-<OPCODE> Command with polling mode failed, OPCODE is the command opcode in the driver. wait-failed-<OPCODE> Command failed, OPCODE is the command opcode in the driver. eth-eq-<EQN >-<EQ_IDX> EQ stuck, EQN: EQ number, EQ_IDX: EQ index eth-txcq-<CQN> TXCQ is stuck, CQN is the CQ number eth-rxcq-<CQN> RXCQ is stuck, CQN is the CQ number eth-<STATE> PORT change event, STATE: [“up”, “down”, “none”] oid User application asked the dump BugCheck Bug check event resiliency When resiliency flow is triggered

<dump_mode>: The mode of collecting the mstdump: “crspcae”, “fast-crspace”

<file_index>: The file number of this type in the set

Example:

Copy Copied! Name: wait-failed- 936 -fast-crspace_1.txt

The default number of sets of files for each event is 20. The other dump files have the filename of: <DumpType>.log

DumpType can be: PDDR, Registry, General, IPoIB, MiniportProfiling

The driver manages the DMN incident dumps in a cyclic fashion, in order to limit the amount of disk space used for saving DMN dumps, and avoid low disk space conditions that can be caused from creating the dumps.

Rather than using a simple cyclic override scheme by replacing the oldest DMN incident folder every time it generates a new one, the driver allows the user to determine whether the first N incident folders should be preserved or not. This means that the driver will maintain a cyclic overriding scheme starting from a given index.

The two registry keys used to control this behavior are DumpMeNowTotalCount, which specifies the maximum number of allowed dumps under the DMN root folder, and DumpMeNowPreservedCount, which specifies the number of reserved incident folders that will not be overridden by the cyclic algorithm.

The following diagram illustrates the cyclic scheme’s work, assuming DumpMeNowPreservedCount=2 and DumpMeNowTotalCount=16:

The DMN-IOV detail level can be configured by the "DmnIovMode" value that is located in device parameters registry key. The default value is 2. The acceptable values are 0-4:

Values Description 0 The feature is disabled 1 Major IOV objects and their state will be listed 2 All VF hardware resources and their state will be listed in the dump (QPs, CQs, MTTs, etc.) 3 All QP-to-Ring mapping will be added (the huge dump) 4 All IOV objects and their state will be list

The DMN-PDDR can configured by the "EnableDumpOnPortUp" and "EnableDumpOnPortDown" values that are located in device parameters registry keys.

The default values of the keys are follow:

EnableDumpOnPortUp = 0 [capability disabled]

EnableDumpOnPortDown = 1 [capability enabled]

DMN generates an event to the system event log upon the success or failure of the dump file generation.

Event ID Message 0x101 <device name>: Failed to create a full dump me now. Dump me now root directory: <path to root DMN folder> Failure: <Failure description> Status: <status code>

For a list of the DMN Warning events, see Reported Driver Events.