MLNX_OFED v4.5-1.0.1.0 Documentation
1.0

Out-of-Order (OOO) Data Placement Experimental Verbs

Warning

This feature is only supported on:

  • ConnectX-5 adapter cards and above

  • RC and XRC QPs

  • DC transport

In certain fabric configurations, InfiniBand packets for a given QP may take up different paths in a network from source to destination. This results into packets being received in an out-of-order manner. These packets can now be handled instead of being dropped, in order to avoid retransmission, by:

  • Achieving better network utilization

  • Decreasing latency

Data will be placed into host memory in an out-of-order manner when out-of-order messages are received.

User Space Application QPs

OOO Data Placement for user space applications QPs can be enabled using the Linux environment variable:

Procedure_Heading_Icon.PNG

To enable for all devices, set the variable to "all". Example:

Copy
Copied!
            

$ export ="all"

Procedure_Heading_Icon.PNG

To enable for a specific device (in this example, mlx5_0):

Copy
Copied!
            

$ export MLX5_RELAXED_PACKET_ORDERING_ON="mlx5_0"

Procedure_Heading_Icon.PNG

To enable for multiple devices:

Copy
Copied!
            

$ export MLX5_RELAXED_PACKET_ORDERING_ON="mlx5_0 mlx5_3 mlx5_4"


Kernel ULP QPs

OOO Data Placement for kernel QPs can be enabled by modifying the driver's sysfs entry. Example:

Copy
Copied!
            

$ echo <0|1> > /sys/kernel/debug/mlx5/<pci-bus>/ooo/enable

To make sure this configuration remains permanent, configuration file /etc/infiniband/mlx5.conf should be updated as follows:

Copy
Copied!
            

# Enable all mlx5 devices. MLX5_RELAXED_PACKET_ORDERING_ON="all"   # Enable mlx5_0 only MLX5_RELAXED_PACKET_ORDERING_ON="mlx5_0"   # Enable mlx5_0, mlx5_3 and mlx5_4 MLX5_RELAXED_PACKET_ORDERING_ON="mlx5_0 mlx5_3 mlx5_4"   # To Disable the feature, use MLX5_RELAXED_PACKET_ORDERING_OFF variable # which supports same syntax as MLX5_RELAXED_PACKET_ORDERING_ON: MLX5_RELAXED_PACKET_ORDERING_OFF="mlx5_1"


Notes

  • On the responder side, contents of the RDMA write buffer are guaranteed to be fully received only if one of the following events takes place:

    • Completion of the RDMA Write with immediate data

    • Arrival and completion of the subsequent Send message

    • Update of a memory element by subsequent RDMA Atomic operation

  • On the requester side, contents of the RDMA read buffer are guaranteed to be fully received only if one of the following events takes place:

    • Completion of the RDMA Read Work Request (if completion is requested)

    • Completion of the subsequent Work Request

© Copyright 2023, NVIDIA. Last updated on Oct 23, 2023.