Lossless TCP

Operating Systems: Windows Server 2012, Windows Server 2012 R2, Windows 7 Client, Windows 8.1 Client and Windows Server 2016.

Inbound packets are stored in the data buffers. They are split into 'Lossy' and 'Lossless' according to the priority field in the 802.1Q VLAN tag. In DSCP based PFC, all traffic is directed to the 'Lossless' buffer. Packets are taken out of the packet buffer in the same order they were stored, and moved into processing, where a destination descriptor ring is selected. The packet is then scattered into the appropriate memory buffer, pointed by the first free descriptor.

Lossless TCP

image2019-3-12_12-6-53.png

When the 'Lossless' packet buffer crosses the XOFF threshold, the adapter sends 802.3x pause frames according to the port configuration: Global pause, or per-priority 802.1Qbb pause (PFC), where only the priorities configured as 'Lossless' will be noted in the pause frame. Packets arriving while the buffer is full are dropped immediately.

During packet processing, if the selected descriptor ring has no free descriptors, two modes for handling are available – drop mode and poll mode.

Drop Mode

In this mode, a packet arriving to a descriptor ring with no free descriptors is dropped, after verifying that there are really no free descriptors. This allows isolation of the host driver execution delays from the network, as well as isolation between different SW entities sharing the adapter (e.g. SR-IOV VMs).

Poll Mode

In this mode, a packet arriving to a descriptor ring with no free descriptors will patiently wait until a free descriptor is posted. All processing for this packet and the following packets is halted, while free descriptor status is polled. This behavior will propagate the backpressure into the Rx buffer which will accumulate incoming packets. When XOFF threshold is crossed, Flow Control mechanisms mentioned earlier will stop the remote transmitters, thus avoiding packets from being dropped.

Since this mode breaks the aforementioned isolation, the adapter offers a mitigation mechanism that limits the amount of time a packet may wait for a free descriptor, while halting all packet processing. When the allowed time expires the adapter reverts to the 'Drop Mode' behavior.

Default Behavior

By default the adapter works in 'Drop Mode'. The adapter reverts to this mode upon initialization/restart.

  • The feature is not available for SR-IOV Virtual Functions

  • It is recommended that the feature be used only when the port is configured to maintain flow control.

  • It is recommended not to exceed typical timeout values of management protocols, usually in the order of several seconds.

  • In order for the feature to effectively prevent packet drops, the DPC load duration needs to be lower than the TCP retransmission timeout.

  • The feature is only activated if neither of the ports is IB.

  • Operating Systems: Windows Server 2012 or Windows Server 2012 R2 and Windows Server 2016

  • Firmware: 2.31.5050

This feature is controlled using the registry key DelayDropTimeout that enables Lossless TCP capability in hardware and by Set OID OID_MLX_DROPLESS_MODE which triggers transition to/from Lossless (poll) mode.

Enabling Lossless TCP Using The Registry Key DelayDropTimeout

Registry key location:

HKLM\SYSTEM\CurrentControlSet\Control\Class\Class\{4d36e972-e325-11cebfc1-08002be10318}\<nn>\DelayDropTimeout

For instructions on how to find interface index in registry <nn>, Please refer to Finding the Index Value of the Network Interface.

Enabling Lossless TCP Using The Registry Key DelayDropTimeout

Key Name

Key Type

Values

Description

DelayDropTimeout

REG_DWORD

0 = disabled (default)

1-65535 =

enabled 0

Choosing values between 1-65534 enables the feature, but the chosen value limits the amount of time a packet may wait for a free descriptor. The value is in units of 100 microseconds with inaccuracy of up to 2 units. The chosen time ranges between 100 microseconds and ~6.5 seconds. For example, DelayDropTimeout=3000 limits the wait time to 300 miliseconds (+/- 200 microseconds).

Choosing the value of 65535 enables the feature but the amount of time a packet may wait for a free descriptor is infinite.

Note: Changing the value of the DelayDropTimeout registry key requires restart of the network interface


Entering/Exiting Lossless Mode Using Set OID OID_MLX_DROPLESS_MODE

In order to enter poll mode, registry value of DelayDropTimeout should be non-zero and OID_MLX_DROPLESS_MODE Set OID should be called with Information Buffer containing 1.

  • OID_MLX_DROPLESS_MODE value: 0xFFA0C932

  • OID Information Buffer Size: 1 byte

  • OID Information Buffer Contents: 0 - exit poll mode; 1 - enter poll mode

In order to allow state transition monitoring, events are written to event log with mlx4_bus as the source. The associated events are listed in the table below.

Lossless TCP Associated Events

Event ID

Event Description

0x0057 <Device Name>

Dropless mode entered on port <X>. Packets will not be dropped.

0x0058 <Device Name>

Dropless mode exited on port <X>. Drop mode entered; packets may now be dropped.

0x0059 <Device Name>

Delay drop timeout occurred on port <X>. Drop mode entered; packets may now be dropped.

© Copyright 2023, NVIDIA. Last updated on Oct 26, 2023.