NVIDIA WinOF-2 Documentation v2.80
Linux Kernel Upstream Release Notes v6.5

Reported Driver Events

The driver records events in the system log of the Windows server event system which can be used to identify, diagnose, and predict sources of system problems.

To see the log of events, open System Event Viewer as follows:

Right click on My Computer, click Manage, and then click Event Viewer.

or

  1. Click start-->Run and enter "eventvwr.exe".

  2. In Event Viewer, select the system log. The following events are recorded:

Event ID

Message

0x0002

<Adapter name>: Adapter failed to initialize due to FW initialization timeout.

0x0004

<Adapter name>: device has been configured to use RSS while Windows' TCP RSS is disabled. This configuration prevents the initialization and enabling of the port. You need to either enable Windows' TCP RSS, or configure the adapter's port to disable RSS. For further details, see the README file under the documentation folder.

0x0006

<Adapter name>: Maximum MTU supported by FW <L>.<Y>.<Z>(<q>) is smaller than the minimum value <K>.

0x0008

<Adapter name>: Adapter failed to complete FLR.

0x000C

<Adapter name>: device startup fails due to less than minimum MSI-X vectors available.

0x0042

<Adapter name>: FW health report - ver <Y>, hw 0x<Z>, rfr 0x<K>, callra 0x<L>, var[1] 0x<L>, synd <M>, ext_synd 0x<R>, exit_ptr 0x<G>.

0x0045

<Adapter name>: Driver startup fails because minimal IPoIB driver requirements are not supported by FW <Y><Z><F>

FW reported: IPoIB enhanced offloads are not supported
Please burn a firmware that supports the requirements and restart the Mellanox ConnectX device. For additional information, please refer to Support information on http://mellanox.com

0x0046

<Adapter name>: Driver startup fails because IPoIB driver is not supported <Y><Z>

IPoIB mode is supported only on physical adapter with RSS mode

0x0047

<Adapter name>: Driver startup fails because RDMA device initialization failed, failure <Y>.

0x004C

<Adapter name>: VF #<Y> reached the maximum number of allocated 4KB pages (<Z>). You could extend this limit by configuring the registry key "MaxFWPagesUsagePerVF".

For more details, please refer to the user manual document.

0x008a

<Adapter name>: Resiliency - Ignores error that was reported by sensor <Y>(0x<Z>) as a result of reaching the limit (<F>) of resets.Please clear the counters of the Resiliency feature.
For more details, please refer to WinOF-2 User Manual.

0x0095

Restart <Adapter name> as a result of error that was reported by sensor <Y>(0x<Z>)

Resiliency state:

  • Restarts count: <F>

  • Max restarts count: <L>

0x0096

Restart <Adapter name> as a result of error that was reported by sensor <Y>(0x<Z>)

Resiliency state:

  • Restarts count: <F>

0x010b

<Adapter name>: QUERY_HCA_CAP command fails with error <Y>.

The adapter card is dysfunctional. Most likely a FW problem. Please burn the last FW and restart the Mellanox ConnectX device.

0x010c

<Adapter name>: QUERY_ADAPTER command fails with error <Y>.

The adapter card is dysfunctional. Most likely a FW problem. Please burn the last FW and restart the Mellanox ConnectX device.

0x0130

<Adapter name>: FW command fails. op 0x<Y>, status 0x<Z>, errno <F>, syndrome 0x<L>.

0x0133

<Adapter name>: Execution of FW command fails. op 0x<Y>, errno <Z>.

0x014f

<Adapter name>: Driver startup fails because an insufficient number of Event Queues (EQs) is available.

(<Y> are required, <Z> are recommended, <M> are available)

0x0153

<Adapter name>: Driver startup has failed due to unsupported port type=<Y> configured on the device.

The driver supports Ethernet mode only, please refer to the Mellanox WinOF-2 User Manual for instructions on how to configure the correct mode.

0x0154

<Adapter name>: Driver startup fails because minimal driver requirements are not supported by FW <Y>.<Z>.<L>.
FW reported:

  • rss_ind_tbl_cap <Q>

  • vlan_cap <M>

  • max_rqs <F>

  • max_sqs <N>

  • max_tirs <O>

Please burn a firmware that supports the requirements and restart the Mellanox ConnectX device. For additional information, please refer to Support information on http://mellanox.com

0x0155

<Adapter name>: Driver startup fails because maximum flow table size that is supported by FW <Y>.<Z>.<L> is too small (<K> entries).

Please burn a firmware that supports a greater flow table size and restart the Mellanox ConnectX device. For additional information, please refer to Support information on http://mellanox.com.

0x0156

<Adapter name>: Driver startup fails because required receive WQE size is greater than the maximum WQEs size supported by FW <Y>.<Z>.<M>.

(<F> are required, <O> are supported)

0x0157

<Adapter name>: Driver startup fails because maximum WQE size that is supported by FW <Y>.<L>.<M> is too small (<K>).

Please burn a firmware that supports a greater WQE size and restart the Mellanox ConnectX device. For additional information, please refer to Support information on http://mellanox.com

0x0163

NDIS initiated reset on device <Adapter name> has failed.

0x0164

<Adapter name>: FW reported receive engine hang event.

0x0165

<Adapter name>: FW reported transmit engine hang event: vhca_id <Y>, transmit_engine_id <Z>, qpn 0x<F>.

0x016b

Restart <Adapter name> as a result of error that was reported by sensors <Y>(0x<Z>)

0x016e

<Adapter name>: Failed to open Channel Adapter.

Event ID

Message

0x0003

<Adapter name>: device has been requested for <Y> Virtual Functions (VFs), while it only supports <Z> VFs. Therefore, only <L> VFs will be allowed.

0x0005

<Adapter name>: Jumbo packet value read from registry (<Y>) is greater than the value supported by FW (<Z>). Therefore use the maximum value supported by FW(<q>).

0x0009

<Adapter name>: Jumbo packet value read from registry(<Y>) is invalid. Therefore use the default value (<Z>).

0x000A

<Adapter name>: Q_Key 0x<Y> is not supported. Only default Q_Key(0x<Z>) is supported by FW.

Note: The adapter will continue to work with the default Q_Key.

0x000F

<Adapter name>: device configures not to use RSS. This configuration may significantly affect the network performance.

0x0010

<Adapter name>: device reports an "Error event" on CQn #<Y>. Since the event type is:<Z>, the NIC will be reset. (The issue is reported in Function <K>).

0x0012

<Adapter name>: Resiliency - The current firmware does not support hardware reset. For more details, please refer to the user manual document.

0x0013

<Adapter name>: device reports a send=<Y> "CQE error" on cqn #<Z> qpn #<M> cqe_error->syndrome <L>, cqe_error->vendor_error_syndrome <N>, Opcode <O> Therefore, the NIC might be reset. (The issue is reported in Function <P>). For more information refer to details.

0x0014

<Adapter name>: device reports an "EQ stuck" on EQn <Y>. Attempting recovery.

0x0015

<Adapter name>: device reports a send completion handling timeout on TxQueue 0x<Y>. Attempting recovery.

0x0016

<Adapter name>: device reports a receive completion handling timeout on RxQueue 0x<Y>. Attempting recovery.

0x0017

<Adapter name>: detected that Head-of-Queue life limit value (<Y>) does not correspond with the Resiliency feature configuration - CheckForHangCQMaxNoProgress = <Z>, SHCheckForHangTimeInSeconds =<F>.

CheckForHangCQMaxNoProgress value is increased to <L>.
For more details, please refer to WinOF-2 User Manual.

0x0018

<Adapter name>: detected that Head of Queue feature is disabled. It is recommended to enable it in order to prevent the system from hanging.

For more details, please refer to WinOF-2 User Manual.

0x0019

<Adapter name>: <Y> value read from registry(<Z>) is invalid. Therefore use the default value (<F>).

0x001A

For more details, please refer to the user manual document.

0x001B

<Adapter name>: Shutting Down RDMA QPs with Excessive Retransmissions feature is not supported by FW <Y>.
For more details, please refer to the user manual document.

0x00020

Flow control on the device <Adapter name> was not enabled. Therefore, RoCE cannot function properly. To resolve this issue, please make sure that flow control is configured on both the hosts and switches in your network. For more details, please refer to the user manual.

0x00022

<Adapter name>: Setting QoS port default priority is not allowed on a virtual device. This adapter will use the default priority <Y>.

0x00023

<Adapter name>: failed to set port default priority to <Y>. This adapter will use the default priority <Z>.

0x00024

<Adapter name>: DCQCN is not allowed on a virtual device.

0x00025

Dcqcn was enabled for adapter <Adapter name> but FW <Y>.<Z>.<W> does not support it. Dcqcn congestion control will not be enabled for this adapter. Please burn a newer firmware. For more details, please refer to the user manual document.

0x0026

<Adapter name>: failed to set Dcqcn RP/NP congestion control parameters. This adapter will use default Dcqcn RP/NP congestion control values. Please verify the Dcqcn configuration and then restart the adapter.

0x0027

<Adapter name>: device is configured with a MAC address designated as a multicast address: <Y>.

Please configure the registry value NetworkAddress with another address, then restart the driver.

0x0029

<Adapter name>: failed to enable Dcqcn RP/NP congestion control for priority <Y>. This adapter will continue without Dcqcn <Y> congestion control for this priority. Please verify the Dcqcn configuration and then restart the adapter.

0x002C

The miniport driver initiates reset on device <Adapter name>.

0x002D

NDIS initiates reset on device <Adapter name>.

0x0034

<Adapter name>: Non-default PKey is not supported by FW.
For more details, please refer to the user manual document.

0x0035

<Adapter name>: According to the configuration under the "Jumbo Packets" advanced property, the MTU configured is <Y>. The effective MTU is the supplied value + 4 bytes (for the IPoIB header). This configuration exceeds the MTU reported by OpenSM, which is <Z>. This inconsistency may result in communication failures. Please change the MTU of IPoIB or OpenSM, and restart the driver.

0x0036

<Adapter name>: GRH-based is configured but IPoIB in Virtual Function (VF) is supported only with LID-based.
The link will stay down until LID-based is configured.

0x0043

<Adapter name>: RDMA device initialization failure <Y>. This adapter will continue running in Ethernet only mode.

0x0048

<Adapter name>: Dcbx is not supported by FW. For more details, please refer to the User Manual document.

0x0049

<Adapter name>: Head of queue Feature is not supported by the installed Firmware

0x004A

<Adapter name>: "RxUntaggedMapToLossless" registry key was enabled but the device is not configured for lossless traffic. please enable PFC or global pauses.

0x004B

<Adapter name>: Delay drop timer timed out for RQ Index 0x<Y>. Dropless mode feature is now disabled.

0x004D

<Adapter name>: Dropless mode entered. For more details, please refer to the User Manual document.

0x004E

<Adapter name>: Dropless mode exited. For more details, please refer to the User Manual document.

0x004F

<Adapter name>: RxUntaggedMapToLossless is enabled. Default priority changed from <Y> to <Z> in order to map traffic to lossless.

0x0050

<Adapter name>: Skipping device (bdf=<Y>:<Z>.<F>), Looks like it's a leftover from KDNET dedicated PF.

0x0051

<Adapter name>: (module <Y>) detects that the link is down. Bad cable was detected, error: <Z>.

Please replace the cable to continue working.

0x0052

<Adapter name>: (module <Y>) detects that the link is down. Cable is unplugged. Please connect the cable to continue working.

0x0053

<Adapter name>: (module <Y>) detected high temperature. Error: <Z>.

0x0054

<Adapter name>: (module <Y>) detects that the link is down. Cable is unsupported.

Please connect a supported cable to continue working.

0x0055

<Adapter name>: (module <Y>) detected bad/unreaddable EEPROM.

0x0056

<Adapter name>: (module <Y>) detected an unknown error type.

0x0080

<Adapter name>: RDMA is disabled as a part of the healing policy.

For more details, please refer to the Resiliency section in the WinOF-2 User Manual.

0x0097

<Adapter name>: Failed to initialize Resiliency mechanism as a result of <Y> failure, error <Z>.

0x0107

<Adapter name>: Firmware version <Y>.<Z>.<F> is below the minimum FW version recommended for this driver.

Minimum recommended Firmware version for this driver: <Y>.<Z>.<F>
It is recommended to upgrade the FW, for more details, please refer to WinOF-2 User Manual.

0x0132

Too many IPs in-use for RRoCE.

<Adapter name>: RRoCE supports only <Y> IPs per port.

Please reduce the number of IPs to use the new IPs.

0x0158

<Adapter name>: CQ moderation is not supported by FW <Y>.<Z>.<L>.

0x0159

<Adapter name>: CQ to EQ remap is not supported by FW <Y>.<Z>.<L>.

0x015a

<Adapter name>: PCIe slot power capability was not advertised. Please make sure to use a PCIe slot that is capable of supplying the required power.

0x015b

<Adapter name>: Detected insufficient power on the PCIe slot (<n>W). Please make sure to use a PCIe slot that is capable of supplying the required power.

0x0160

<Adapter name>: VPort counters are not supported by FW <Y>.<Z>.<L>.

0x0161

<Adapter name>: LSO is not supported by FW <Y>.<Z>.<L>.

0x0162

<Adapter name>: Checksum offload is not supported by FW <Y>.<Z>.<L>.

0x0166

<Adapter name>: FW tracer is not supported.

0x0167

<Adapter name>: FW doesn't support trusted VFs, update FW to get more secured VFs.

0x0169

<Adapter name>: Failed to create full dump me now. Dump me now root directory: <Y>, Failure: <Z>, Status: <F>

0x016f

<Adapter name>: Failed to enable NDK with status <Y>.

0x0170

<Adapter name>: Failed to disable NDK with status <Y>.

0x0171

<Adapter name>: RoCE is disabled for the Virtual Functions (VFs) as the FW doesn't support it. For more details, please refer to the User Manual.

0x0173

<Adapter name>: Configuration value cannot be updated for value <Y>.

0x0174

<Adapter name>: Registry key DumpMeNowTotalCount must be greater than registry key DumpMeNowPreservedCount, setting new values: [DumpMeNowTotalCount: <Y> - DumpMeNowPreservedCount: <Z>].

0x0175

<Adapter name>: One or more network ports have been powered down due to insufficient/unadvertised power on the PCIe slot. Please refer to the card's user manual for power specifications or contact Mellanox support.

0x0176

<Adapter name>: (module <Y>) detects that Cable is plugged but the link is down.

0x0178

<Adapter name>: Device dynamic Registry configuration: < > invalid value, refer to user manual for acceptable values.

0x0181

<Adapter name>: Reducing the advertised MaxNumQueuePairs for vPorts to a power of two. Requested: <Y> Set: <Z>.

0x0182

<Adapter name>: Device reports a Send completion handling timeout on TxQueue 0x<Y> of VMQ <Z> . Attempting recovery.

0x0183

<Adapter name>: Device reports a Receive completion handling timeout on RxQueue 0x<Y> Rss table index <Z>VMQ <L> . Attempting recovery.

0x0184

<Adapter name> Firmware does not support the dynamic MSI-X allocation feature.

0x0186

<Adapter name>: DCQCN <X> values read from registry are invalid. Therefore use the default values.

0x0189

<Adapter name>: DCQCN <X> parameter was requested but FW <L>.<Y>.<Z> does not support it. Please burn a newer firmware.

For more details, please refer to the user manual document.

0x018a

<Adapter name>: <X>: QP attached to priority <Y>, which is lossy.

Why lossy: Configured neither PFC nor Global Pause.

Peer: <L>:<M>

Local: <N>:<O>

More: peer_qpn <P>, local_qpn <Q>

0x018b

<Adapter name>: <X>: QP attached to priority <Y>, which is lossy.

Why lossy: Configured PFC with no priorities.

Peer: <L>:<M>

Local: <N>:<O>

More: peer_qpn <P>, local_qpn <Q>

0x018c

<Adapter name>: <X>: QP attached to priority <Y>, which is lossy.

Why lossy: Configured PFC with wrong priority.

Peer: <L>:<M>

Local: <N>:<O>

More: peer_qpn <P>, local_qpn <Q>

0x018e

<Adapter name>: Striding RQ parameters are illegal. Striding RQ will be disabled.

Bytes per stride should be between 64-8192. Number of strides is: <X>. Receive buffer size is: <Y>.

0x0191

<Adapter name>: PCIe width/speed doesn’t match expected value. Expected speed: < > actual speed: < >. Expected width: < > actual width: < >.

0x0192

<Adapter name>: An attempt was made to enable Relaxed Ordering <Read/Write> for Ethernet but the firmware/adapter card does not support this feature or the feature was turned off by the host.
Please upgrade the relevant component or contact the host administrator if you are using an SRI-OV VF to enable this capability. To stop seeing this message in the future, disable it in the Windows Registry.

0x01A1

<Adapter name>: The firmware used does not support the "WQE too small" capability. Please update the firmware to enable it.

0x0193

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source USER.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x0194

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source RESILIENCY.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x0195

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source PORT.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x0196

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source EQ STUCK.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x0197

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source TX CQ STUCK.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x0198

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source RX CQ STUCK.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x0199

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source CMD TIMEOUT.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x019A

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source CMD FAILED.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x019B

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source RESOURCE DUMP.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x019C

<device name>: The dump was created at folder (DMN folder name), due to dump-me-now request with source MP STATS.
Dump-me-now dumps are placed by default in folder %SystemRoot%\temp\Mlx5_Dump_Me_Now
or a folder that was set by the registry keyword HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\nnnn\DumpMeNowDirectory.

0x019D

<Adapter name>: Failed to add VXLAN UDP port <X> with status <Y>.

0x019E

<Adapter name>: dump-me-now is triggered due to request with source <X>.

Files were not generated since they were not required (Config dump mask=<Y>, Source dump mask=<Z>)

0x01A0

<Adapter name>: DecoupleVmSwitch feature cannot be enabled. Driver: <X>, Port Type: <Y>, FW supports SRIOV: <Z>.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.