NVIDIA WinOF-2 Documentation v2.90
Linux Kernel Upstream Release Notes v6.5

RDMA Capabilities

Warning

This capability is supported in RoCE (Ethernet) only.

The driver offers a mechanism to detect excessive retransmissions for an RC connection, and to close the connection in response to it. If the number of retransmissions due to a Local Ack Timeout, NAK-Sequence Error, or Implied NAK, during a specified period, exceeds the specified threshold, the QP will be handled as if the IB spec defined Retry Count was exceeded.

Setting this limit for all RC QPs is done by setting the EXT_QP_MAX_RETRY_PERIOD registry as a measurement period, and the EXT_QP_MAX_RETRY_LIMIT registry as a retries threshold. If any of these registries is set to 0x0, the feature is disabled.

Warning

When the threshold is exceeded during the measurement period, the following will occur:

  • The QP will be transitioned to an Error (ERR) state

  • The "Requester QP Transport Retries Exceeded Errors” counter will be incremented. See Mellanox WinOF-2 Diagnostics.

The Shutdown RDMA QPs feature is controlled per adapter, using registry keys.

Registry keys location: HKLM\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1- 08002be10318}\<nn>

For more information on how to find a device index nn, please refer to Finding the Index Value of the Network Interface.

Key Name

Key Type

Values

Description

EXT_QP_MAX_RETRY_LIMIT

REG_DWORD

[0-0xFFFF]

Default = 50

The number of retransmissions during EXT_QP_MAX_RETRY_PERIOD for which the QP will be closed due to a faulty connection. The 0x0 value indicates that the feature is disabled.Note: As of WinOF-2 v2.10, this key can be changed dynamically. In any case of an illegal input, the value will fall back to the default value and not to the last value used.

Note: If the EXT_QP_MAX_RETRY_LIMIT value is set to 0, the EXT_QP_MAX_RETRY_PERIOD value must be set to 0 as well.

Note: EXT_QP_MAX_RETRY_LIMIT and EXT_QP_MAX_RETRY_PERIOD registry keys are supported only if the firmware supports this capability. If these keys are used, but not supported by the firmware, the following message is displayed to the user:
"<adapter name>: Shutting Down RDMA QPs with Excessive Retransmissions feature is not supported by FW <FW version>".

EXT_QP_MAX_RETRY_PERIOD

REG_DWORD

[0-0xFFFF]

Default = 1

The period for measuring the number of retransmissions to declare the connection as faulty and close the QP. The value is given in seconds. The 0x0 value indicates that the feature is disabled.

Note: As of WinOF-2 v2.10, this key can be changed dynamically. In any case of an illegal input, the value will fall back to the default value and not to the last value used.

Note: If the EXT_QP_MAX_RETRY_PERIOD value is set to 0, the EXT_QP_MAX_RETRY_LIMIT value must be set to 0 as well.

Note: EXT_QP_MAX_RETRY_LIMIT and EXT_QP_MAX_RETRY_PERIOD registry keys are supported only if the firmware supports this capability. If these keys are used, but not supported by the firmware, the following message is displayed to the user:
"<adapter name>: Shutting Down RDMA QPs with Excessive Retransmissions feature is not supported by FW <FW version>".

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.