NVIDIA MLNX_OFED Documentation Rev 5.4-1.0.3.0
Linux Kernel Upstream Release Notes v5.17

Changes and New Features

The following are the new features and changes that were added in this version.

Feature/Change

Description

ASAP2

Enlarge Switchdev Tables

[ConnectX-5 and above] Added support for allowing OVS kernel to support up to 128 matches (groups) per table and 16M entries per group.

Offloading Extended ct_state Flags

[ConnectX-5 and above] Added support to offload ct_state flags rpl, inv, and rel.

  • For rpl, support was added for both set and not set matching offload (i.e., +rpl and -rpl).

  • For inv and rel, support was added only for the not set option (i.e., -rel and -inv).

Core

Scalable Functions (Subfunctions)

[ConnectX-5 and above] Added support for scalable functions (also called subfunctions). The feature enables the user to create, configure, and deploy a scalable functions (e.g., RDMA and networking applications) and to assign them to a container when a container is started via mlxdevm tool.

A scalable function can also be deployed in an untrusted guest/host system from the NIC/DPU. This enables full configuration of the function and its representors from the NIC/DPU before giving the function for a container to run in a host system.

For more information, see https://github.com/Mellanox/scalablefunctions/wiki/MLNX_OFED-step-by-step-guide.

Scalable Function QoS

[ConnectX-5 and above] Added support for scalable function QoS and QoS group via mlxdevm's rate commands. Run "man mlxdevm port" for details.

Auxiliary Bus in mlx5 Driver

[ConnectX-4] Updated mlx5 driver to use auxiliary bus in order to integrate different driver components into driver core and optimize module load/unload sequences.

Installation

Script Removal from mlnx-ofa_kernel

[General] Moved all Python scripts and some other common scripts out of the mlnx-ofa_kernel packages. This removed the python dependency from that package when rebuilding it and avoided unnecessary errors when rebuilding them for custom kernels.

Netdev

What-Just-Happened (WJH) in NICs

[ConnectX-4] Added support for WJH in NICs. WJH allows for visibility of dropped packets (i.e., receiving notice of drop counters increase, seeing content of the dropped packets, debugging, and more).

WJH is a service in devlink context and it is already implemented in the switch.

Note: processing dropped packets (even for visibility purposes) may cause a degradation in performance and leaves the driver vulnerable for malicious attacks. The feature is disabled by default.

Supported traps:

  • VLAN mismatch: existing generic trap DEVLINK_TRAP_GENERIC_ID_DMAC_MISMATCH

    Traps received packets with wrong VLAN tag

  • DMAC mismatch: new generic trap DEVLINK_TRAP_GENERIC_ID_DMAC_MISMATCH

    Traps received packets with wrong destination MAC

Support added in user-space (N/A or package name + version): Devlink infrastructure (man7.org/linux/man-pages/man8/devlink-trap.8.html)

Devlink provides an infrastructure called devlink trap which allow a device to register/unregister and to enable/disable traps. Devlink traps also provide traps grouping and policing. The trapped packets are monitored and then forward to the drop monitor. Drop monitor is used to send notifications to user space about dropped packets.Note: For this release, NIC WJH will not implement the policy.

ethtool Extended Link State

[General] Added ethtool extended link state to mlx5e.

ethtool can be used to get more information to help troubleshoot the state.

For example, if there is no link due to missing cable, run the following:

$ ethtool eth1

...

Link detected: no (No cable)

Besides the general extended state, drivers can pass additional information about the link state using the sub-state field.

Example:

$ ethtool eth1

...

Link detected: no (Autoneg, No partner detected)

The extended state is available only for some cases of no link. In other cases, ethtool will print only "Link detected: no" as it did before.

RDMA

DV "Signature API"

[ConnectX-5 and above] Added support for "Signature API" which, on supported devices, allows application-level data-integrity checks via a signature handover mechanism. Various signature types, including CRC32 and T10-DIF, can be automatically calculated and checked, stripped, or appended during the transfer at full wire speed.

ibv_query_qp_data_in_order() verb

[General] Added support for ibv_query_qp_data_in_order() API. This API enables an application to check if the given QP data is guaranteed to be in order, enabling poll for data instead of poll for completion.

Relaxed Ordering for Kernel ULPs

[ConnectX-4] Added support for enabling Relaxed Ordering for Kernel ULPs. Using relaxed ordering can improve performance in some setups. Since kernel ULPs are expected to support RO, it is enabled for them by default so they can benefit from it.

ah_to_qp Mapping

[ConnectX-6 Dx] Added support for mapping a QP to AH over DEVX API, which enables DC/UD QPs to use multiple CC algorithms in the same data center.

Steering UserSpace

Matching on RAW Tunnel Headers

[ConnectX-5 and above] Added DR support for matching on RAW tunnel headers using the misc5 parameters, This feature allows matching on each bit of the header, inducing reserved fields.

Software Steering Insertion Rate Optimizations

[ConnectX-6 Dx] Added support for better insertion rate in software steering. This includes multi-QP which skips areas in the code that may be for debug only.

Software Steering Rule Optimization

[ConnectX-6 Dx] Improved rate of updating steering rules, insertion, and deletion. The feature includes definers, multi-qp approach, and better memory usage.

Duplicate Rules Insertion

[ConnectX-5 and above] Added support for ability to allow or prevent insertion of duplicate rules, so the user can choose one of the following behaviors:

1. Prevent duplicate rules, so that already-existing rule and fail can be detected.

2. Allow duplicate rules, to enable updating the rule's action (this will only take effect once the previous rule is deleted).

By default, duplicate rules are allowed.

Improved Software Steering Rule Creation Stability

[ConnectX-6 Dx] Made it so that all rule's insertion occur in a defined time using defined (export) size of Htble and decreased use of dynamic allocation.

For additional information on the new features, please refer to MLNX_OFED User Manual.

Feature/Change

Description

udev Rules

As of version 5.4, the driver is set so that udev rules will change the names of network interfaces created from NVIDIA adapters.

The udev rules are shipped to "/lib/udev/rules.d" and may be overridden by placing a file with the same name in "/etc/udev/rules.d".

Example: /etc/udev/rules.d/82-net-setup-link.rules

Network Interface Names, udevd

[ConnectX-4 and above] In MLNX_OFED 5.4 GA, ConnectX-4/5/6 Ethernet network interfaces are now provided with permanent names.

Prior to this release, the default interface names were provided by the kernel and udevd (ethX) remained as-is.

From this release onwards, interface names are generated via new udevd rules.

The generated names are now predictable, and the default names are automatically renamed to the predictable names by the udevd daemon, according to udev rules files installed by OFED.

The new interface names look as follows: en[P][p<bus number]sf

For example, a ConnectX device with PCI address: 0005:01:00.1 will be named enP5p1s0f1

The advantage of such a scheme for interface naming is that device whose PCI address is 0005:01:00.1 will always get the same device name since that name now depends on the host geography.

(Previously there were race conditions which sometimes caused the same physical device to get a different interface name upon reboot).

Note: Ethernet interface renaming for ConnectX-4/5/6 is performed only when eswitch is supported.

eswitch is supported on kernels starting from kernel version 4.9; for Linux distro kernels earlier than 4.9.0, eswitch is supported only on RHEL7.x and on XenServer 7.1 CU2.

Deprecated, OvS-DPDK

OvS-DPDK deprecated the command "ovs-appctl dpctl/dump-e2e-stats".

Instead, the command has been integrated into the existing command "ovs-appctl dpctl/offload-stats-show -m" (when e2e-cache is enabled).

OvS-DPDK

OvS-DPDK ct-ct-nat offloads is now disabled by default.

A new knob in OvS was introduced: "ovs-vsctl set open_vswitch . other_config:ct-action-on-nat-conns=" (default value is false).

If disabled, ct-ct-nat configurations will not be fully offloaded, improving connection offloading rate for other cases (ct and ct-nat).

If enabled, ct-ct-nat configurations will be fully offloaded but ct and ct-nat offloading will be created more slowly.

mlnxofedinstall, udev, MLNX_OFED, umad

Before version 5.4, /etc/udev/rules.d/90-ib.rules was potentially automatically edited by installation scripts in case the options —umad-dev-rw or —umad-dev-na were used. From version 5.4 and above, those changes are made in /etc/udev/rules.d/91-ib-permissions.rules which (if exist) only include the settings for those command-line options.

MLNX_OFED Verbs API Migration

As of MLNX_OFED v5.0 release (Q1 of the year 2020), MLNX_OFED Verbs API have migrated from the legacy version of user space verbs libraries (libibervs, libmlx5, etc.) to the Upstream version rdma-core.

For the list of MLNX_OFED verbs APIs that have been migrated, refer to Migration to RDMA-Core document.

© Copyright 2023, NVIDIA. Last updated on Oct 23, 2023.