Changes and New Features

Linux Kernel Upstream Release Notes v5.17

The following are the changes and/or new features that have been added to this version of MLNX_OFED.

Feature/Change

Description

Adapters: ConnectX-4 and above

IP-in-IP RSS Offload

Added support for receive side scaling (RSS) offload in IP-in-IP (IPv4 and IPv6).

Devlink Port Support in Non-representor Mode

Added support for viewing the mlx5e physical devlink ports using the 'devlink port' command. This also may affect network interface names, if predictable naming scheme is configured. Suffix indicating a port number will be added to interface name.

Devlink Health State Notifications

Added support for receiving notifications on devlink health state changes when an error is reported or recovered by one of the reporters. These notifications can be seen using the userspace ‘devlink monitor’ command.

Legacy SR-IOV VF LAG Load Balancing

When VF LAG is in use, round-robin the Tx affinity of channels among the different ports, if supported by the firmware, enables all SQs of a channel to share the same port affinity. This allows the distribution of traffic sent from a VF between two ports, as well as round-robin the starting port among VFs to distribute traffic originating from single-core VMs.

RDMA-CM DevX Support

Added support for DevX in RDMA-CM applications.

RoCEv2 Flow Label and UDP Source Port Definition

This feature provides flow label and UDP source port definition in RoCE v2. Those fields are used to create entropy for network routes (ECMP), load balancers and 802.3ad link aggregation switching that are not aware of RoCE headers.

RDMA Tx Steering

Enabled RDMA Tx steering flow table. Rules in this flow table will allow for steering transmitted RDMA traffic.

Custom Parent-Domain Allocators for CQ

Enabled specific custom allocations for CQs.

mlx5dv Helper APIs for Tx Affinity Port Selection

Added support for the following mlx5dv helper APIs which enable the user application to query or set a RAW QP's Tx affinity port number in a LAG configuration.

  • mlx5dv_query_qp_lag_port

  • mlx5dv_modify_qp_lag_port

RDMA-CM Path Alignment

Added support for RoCE network path alignment between RDMA-CM message and QP data. The drivers and network components in RoCE calculate the same hash results for egress port selection both on the NICs and the switches.

IPoIB QP Number Creation

Enabled setting the QP number of an IPoIB PKey interface in Enhanced mode. This is done using the standard ip link add command while padding the hardware address of the newly created interface. The QP number is the 2nd-4th bytes. To enable the feature, the MKEY_BY_NAME configuration should firstly be enabled in the NvConfig.

CQ and QP Context Exposure

Exposed QP, CQ and MR context in raw format via RDMA tool.

In-Driver xmit_more

Enabled xmit_more feature by default in kernels that lack Rx bulking support (v4.19 and above) to ensure optimized IP forwarding performance when stress from Rx to Tx flow is insufficient.

In kernels with Rx bulking support, xmit_more is disabled in the driver by default, but can be enabled to achieve enhanced IP forwarding performance.

Relaxed Ordering

Relaxed ordering is a PCIe feature which allows flexibility in the transaction order over the PCIe. This reduces the number of retransmissions on the lane, and increases performance up to 4 times.

By default, mlx5e buffers are created with Relaxed Ordering support when firmware capabilities are on and the PCI subsystem reports that CPU is not on the kernel's blocklist.

Note: Some CPUs which are not listed in the kernel's blocklist may suffer from buggy implementation of relaxed ordering, in which case the user may experience a degradation in performance and even unexpected behavior. To turn off relaxed ordering and restore previous behavior, run setpci command as instructed here. Example:

"RlxdOrd-“ : setpci -s82:00.0 CAP_EXP+8.w=294e

ODP Huge Pages Support

Enabled ODP Memory Region (MR) to work with huge pages by exposing IBV_ACCESS_HUGETLB access flag to indicate that the MR range is mapped by huge pages.

The flag is applicable only in conjunction with IBV_ACCESS_ON_DEMAND.

Offloaded Traffic Sniffer

Removed support for Offloaded Traffic Sniffer feature and replaced its function with Upstream solution tcpdump tool.

Adapters: ConnectX-5 and above

Connection Tracking Offload

Added support for offloading TC filters containing connection tracking matches and actions.

Dual-Port RoCE Support

Enabled simultaneous operation of dual-port RoCE and Ethernet in SwitchDev mode.

IP-in-IP Tunnel Offload for Checksum and TSO

Added support for the driver to offload checksum and TSO in IP-in-IP tunnels.

Packet Pacing DevX Support

Enabled RiverMax to work over DevX with packet pacing functionality by exposing a few DV APIs from rdma-core to enable allocating/destroying a packet pacing index. For further details on usage, see man page for: mlx5dv_pp_alloc() and mlx5dv_pp_free().

Software Steering Support for Memory Reclaiming

Added support for reclaiming device memory to the system when it is not in use. This feature is disabled by default and can be enabled using the command mlx5dv_dr_domain_set_reclaim_device_memory().

SR-IOV Live Migration

[Beta] Added support for performing a live migration for a VM with an SR-IOV NIC VF attached to it and with minimal to no traffic disruption. This feature is supported in SwitchDev mode; enabling users to fully leverage VF TC/OVS offloads, where the failover inbox driver is in the Guest VM, and the bonding driver is in the Hypervisor.

Note that you must use the latest QEMU and libvirt from the Upstream github.com sources.

Uplink Representor Modes

Removed support for new_netdev mode in SwitchDev mode. The new default behaviour is to always keep the NIC netdev.

OVS-DPDK Offload Statistics

Added support for dumping connection tracking offloaded statistics.

OVS-DPDK Connection Tracking Labels Exact Matching

Added support for labels exact matching in OVS-DPDK CT openflow rules.

Adapters: ConnectX-5 & ConnectX-6 Dx

OVS-DPDK LAG Support

Added support for LAG (modes 1,2,4) with OVS-DPDK.

Adapters: ConnectX-6 and above

Get FEC Status on PAM4/50G

Allowed configuration of Reed Solomon and Low Latency Reed Solomon over PAM4 link modes.

RDMA-CM Enhanced Connection Establishment (ECE)

Added support for allowing automatic enabling/disabling of vendor specific features during connection establishment between network nodes, which is performed over RDMA-CM messaging interface.

RoCE Selective Repeat

This feature introduces a new QP retransmission mode in RoCE in which dropped packet recovery is done by re-sending the packet instead of re-sending the PSN window only (Go-Back-N protocol). This feature is enabled by default when RDMA-CM is being used and both connection nodes support it.

Adapters: ConnectX-6 Dx & BlueField-2

IPsec Full Offload

[Beta] Added support for IPsec full offload (VxLAN over ESP transport).

Hardware vDPA on OVS-DPDK

Added support for configuring hardware vDPA on OVS-DPDK. This support includes the option to fall back to Software vDPA in case the NIC installed on the driver does not support hardware vDPA.

Adapters: ConnectX-6 Dx

IPsec Crypto Offloads

Support for IPsec Crypto Offloads feature over ConnectX-6 Dx devices and up is now at GA level.

TLS Tx Hardware Offload

Support for TLS Tx Hardware Offload feature over ConnectX-6 Dx devices and up is now at GA level.

TLS Rx Hardware Offload

[Alpha] Added support for hardware offload decryption of TLS Rx traffic over crypto-enabled ConnectX-6 Dx NICs and above.

Userspace Software Steering ConnectX-6 Dx Support

Support for software steering on ConnectX-6 Dx adapter cards in the user-space RDMA-Core library through the mlx5dv_dr API is now at GA level.

Kernel Software Steering ConnectX-6 Dx Support

[Beta] Added support for kernel software steering on ConnectX-6 Dx adapter cards.

Adapters: ConnectX-6 Lx

Adapters

Added support for ConnectX-6 Lx adapter cards.

Adapters: All

RDMA-Core Migration

As of MLNX_OFED v5.1, Legacy verbs libraries have been fully replaced by RDMA-Core library.

For the list of new APIs used for various MLNX_OFED features, please refer to the Migration to RDMA-Core document.

Firmware Reactivation

Added support for safely inserting consecutive firmware images without the need to reset the NIC in between.

UCX-CUDA Support

UCX-CUDA is now supported on the following OSs and platforms.

OS

Platform

RedHat 7.6 ALT

PPC64LE

RedHat 7.7

x86_64

RedHat 7.8

PPC64LE/x86_64

RedHat 7.9

x86_64

RedHat 8.1

x86_64

RedHat 8.2

x86_64

HCOLL-CUDA

The hcoll package includes a CUDA plugin (hmca_gpu_cuda.so). As of MLNX_OFED v5.1, it is built on various platforms as the package hcoll-cuda. It will be installed by default if the system has CUDA 10-2 installed.

Notes:

  • If you install MLNX_OFED from a package repository, you will need to install the package hcoll-cuda explicitly to be able to use it.

  • HCOLL-CUDA is supported on the same OSs that include support for UCX-CUDA (listed in the table above), except for RedHat 8.1 and 8.2.

GPUDirect Storage (GDS)

[Beta] Added support for the new technology of GDS (GPUDirect Storage) which enables a direct data path between local or remote storage, such as NFS, NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. Both GPUDirect RDMA and GPUDirect Storage avoid extra copies through a bounce buffer in the CPU's memory. They enable the direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory, without burdening the CPU or GPU.

To enable the feature, run ./mlnxofedinstall --with-nfsrdma –-with-nvmf --enable-gds --add-kernel-support

To get access to GDS Beta, please reach out to the GDS team at GPUDirectStorageExt@nvidia.com.

For the list of operating systems on which GDS is supported, see here.

Bug Fixes

See “Bug Fixes" section.

For additional information on the new features, please refer to MLNX_OFED User Manual.

MLNX_OFED Verbs API Migration

As of MLNX_OFED v5.0 release (Q1 of the year 2020), MLNX_OFED Verbs API have migrated from the legacy version of user space verbs libraries (libibervs, libmlx5, etc.) to the Upstream version rdma-core.

For the list of MLNX_OFED verbs APIs that have been migrated, refer to Migration to RDMA-Core document.

© Copyright 2023, NVIDIA. Last updated on Oct 23, 2023.