Release Notes Change Log History

This section includes history of changes and new feature of three major (GA) releases back. For older versions' history, please refer to their dedicated release notes.

Supported Cards

Description

All HCAs

Supported in the following adapter cards unless specifically stated otherwise:

ConnectX-4 / ConnectX -4 Lx / ConnectX-5 / ConnectX-6 / ConnectX-6 Dx / ConnectX-6 Lx / ConnectX-7 / BlueField-2

ConnectX-6 Dx and above

Supported in the following adapter cards unless specifically stated otherwise:

ConnectX-6 Dx / ConnectX-6 Lx / ConnectX-7 / BlueField-2

ConnectX-6 and above

Supported in the following adapter cards unless specifically stated otherwise:

ConnectX-6 / ConnectX-6 Dx / ConnectX-6 Lx / ConnectX-7 / BlueField-2

ConnectX-5 and above

Supported in the following adapter cards unless specifically stated otherwise:

ConnectX-5 / ConnectX-6 / ConnectX-6 Dx / ConnectX-6 Lx / ConnectX-7 / BlueField-2

ConnectX-4 and above

Supported in the following adapter cards unless specifically stated otherwise:

ConnectX-4 / ConnectX -4 Lx / ConnectX-5 / ConnectX-6 / ConnectX-6 Dx / ConnectX-6 Lx / ConnectX-7 / BlueField-2

Feature/Change

Description

5.6-2.0.9.0

Operating Systems

Added support for the following Operating Systems: RHEL8.6, RHEL9.0, SLES15-SP4.

General

Bug fixes

Feature/Change

Description

5.6-1.0.3.3

General

New Adapter Card Support

Added support for ConnectX-7 adapter cards. ConnectX-7 has the same feature set as the ConnectX-6 adapter card.

ASAP2 Features

Bridge Spoof Check

[All HCAs] Added support for spoof check with TC flower rules on representors attached to bridge to mirror spoof check SR-IOV functionality.

Setting VF Group Rate Limit

[ConnectX-5 and above] Added support for setting VF group rate limit using Devlink command.

TC Flows on Shared Block

[ConnectX-5 and above] Added support for creation of TC flows on shared block of VF representors.

Flow Metering

[ConnectX-6 Dx and above] Added support for offloading OpenFlow Meters in OVS-DPDK.

Please note the following:

  • Meter offload can be applied only on port 0 and it's VFs

  • Only one meter per flow is allowed

  • Only one meter band per meter is allowed

  • Only meter band type drop is supported

  • Meter-stats might not be accurate

Core Features

Firmware Reset

[BlueField-2] Added support of firmware reset in DPU NIC mode.

Increased Robustness of mlx5_core Driver Recovery

[All HCAs] Increased the firmware pre-initialization timeout from 2 minutes to 2 hours when waiting for firmware during driver health recovery, allowing the driver to passively recover from a firmware reset, even if the reset takes an unusually long time. Additionally, added an exit clause to the wait for firmware loop, allowing immediate response to a user initiated device removal.

NetDev Features

Ethtool CQE Mode Control

[ConnectX-4 and above] Replaced the vendor-specific Ethtool API (priv-flag) with a standard Ethtool API (replaced 'ethtool --set-priv-flags ethX rx_cqe_moder on/off tx_cqe_moder on/off' with 'ethtool -C ethX cqe-mode-rx on/off cqe-mode-tx on/off'). This decreases the amount of vendor-specific configurations and aligns mlx5 driver with the upstream Ethtool API.

SyncE

[ConnectX-6 Dx] Added an indication in SyncE Daemon that states whether SyncE engine moved to holdover state due to failure (the reason for failure will be displayed). In addition, added indication whether SyncE engines collected enough frequency samples in order to move to holdover. Note: Not supported in ConnectX-6 Lx adapter cards.

RDMA Features

VFIO, CQ Interrupt Mode

[ConnectX-5 and above] Added support for VFIO applications to listen on and capture completion events via the Event Queue mechanism.

VFIO, Asynchronous Event

[ConnectX-5 and above] Added support for VFIO applications to listen on and capture device asynchronous events via the Event Queue mechanism.

Security

OVS-IPSec Full Offload

[BlueField-2] Added support for configuration of IPsec full offload using OVS by adding VXLAN tunnel to OVS with the PSK option.

Software Steering Features

Full Tunnel Header Matching

[ConnectX-6 Dx and above] Added support for using full-tunnel-header matching along with many other criteria within one matcher. This feature uses the new definer index, defined in the firmware, to build a matcher so that the full tunnel header matching can be used along with all other criteria.

Matching Granularity Change

[ConnectX-5 and above] Added support for matching granularity change. As a result, when creating FDB flow with destination of VPORT, a src_port matching must no longer be added. Now, FDB flow can match all vports and goto a VPORT destination. The new behavior is the same as done on firmware steering.

Installation

Installation

New options were added to the ofed_uninstall.sh script: --only-kernel and --only-user. Those can be used to uninstall only kernel packages or only user-space packages (the equivalent of kernel-only install or user-space-only install, respectively). This may be useful to keep different sets of kernel and user-space installations.

When running the uninstall script with a combination of --only-kernel and --only-user produced an undefined result.

Feature/Change

Description

5.5-1.0.3.2

ASAP2 Features

Bridge Offloads with VLAN

[ConnectX-4 and above] Added support for bridge offloads with VLAN support that works on top of mlx5 representors in switchdev mode.

Supporting OVS Groups in Fast-Failover Mode

[ConnectX-6 Dx] Improved OVS failover through support for OVS groups in fast-failover mode + VF_LAG configuration with OVS.

Exposing Hairpin Queues Information

[ConnectX-6 Dx and BlueField-2] Added support for exposing hairpin out of buffer drop counter per device. This feature shows buffer drops related only to hairpin queues which were opened on the queried device.

To enable this counting mode (this must be done before any hairpin rules are created), use the following: echo "on <peer_devname>" > /sys/class/net/<dev>/hp_oob_cnt_mode where <peer_devname> is the peer device to which traffic coming to the configured device will be forwarded to for transmission.

To read the drop counter, use the following: cat /sys/class/net/<dev>/hp_oob_cnt

Linux Bridge Offload

[ConnectX-6 Dx and BlueField-2] Added bridge offloads to support bonding (VF LAG), attaching bond device to bridge instead of uplink representors.

VLAN Pop/Push

[ConnectX-6 Dx] Added OOB support for VLAN push on Rx (wire to VF) and VLAN pop on Tx (wire to VF) in switchdev mode.

Offload Forwarding to Multiple Destinations

[ConnectX-5 and above] Added support for offloading packet replication to up to 32 destination through the use of TC rule.

Slow Path Metering

[ConnectX-4 and above] Expanding the RDMA statistic tool to support setting vendor-specific optional counters dynamically using netlink.

Added to mlx5_ib the following optional counters:

cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts.

Example:

$ rdma statistic mode supported link rocep8s0f0/1

link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts

$ sudo rdma statistic set link rocep8s0f0/1 optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts

$ rdma statistic mode link rocep8s0f0/1

link rocep8s0f0/1 optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts

$ sudo rdma statistic set link rocep8s0f0/1 optional-counters cc_rx_ce_pkts

$ rdma statistic mode link rocep8s0f0/1

link rocep8s0f0/1 optional-counters cc_rx_ce_pkts

$ sudo rdma statistic unset link rocep8s0f0/1 optional-counters

Using Specific Interface for Tunnel Route Lookup

[ConnectX-5 and above] Added ability to use a specific interface for tunnel route lookup if tunnel was created with the current device.

Core Features

Subfunction Trust Configuration Enhancement

[ConnectX-5 and above] Added support via mlxdevm to mark a given PCI subfunction (SF) or virtual function (VF) as a trusted function. The device/firmware decides how to define privileges and access to resources.

Prevent VF Memory Exhaustion

[All] Added support for preventing VF memory exhaustion. This feature exposes a sysfs (to the system admin) which can set a limit on each VF memory consumption.

Note: Currently only supported on Ethernet.

BlueField NIC Separate Reset

[BlueField-2] Added support for resetting the NIC domain of BlueField-2 while keeping ARM alive.

Multiple Steering Priorities for FDB Rules

[ConnectX-6 Dx and BlueField-2] Added support in multiple flow steering priorities for FDB rules.

NetDev Features

Traffic Engineering: Hierarchical QoS

[ConnectX-5 and above] Added support for offloading the HTB qdisc to the NIC, allowing it to scale better by eliminating a single locking point. The configuration is done with the TC commands.

Note: Kernel 5.15 or higher is required. Limited to 256 nodes.

TLS RX Resynchronization Resiliency Feature Description

[ConnectX-6 Dx and above] Added support for driver resiliency against high load of RX resync operations.

Simultaneous PTP and CQE Compression

Added support for the activation of PTP and CQE compression simultaneously. Since CQE compression might harm the accuracy of the PTP, the feature enables PTP packets to be moved to a dedicated queue where they are not subjected to compression. However, this configuration conflicts with setting aRFS. Turning off CQE compression, causes a hiccup in traffic which may cause a loss of synchronization. To overcome this, restart the synchronization.

Note: This combination is supported only for Ethernet drivers. Other driver profiles, like IPoIB and representors, do not support this combination.

RDMA Features

ODP On Demand Synchronization

Added support to expose an option to prefetch ODP MR without faulting. This enables updating the device page table with the presenting CPU pages and reducing page faults in the system.

DV API for AES-XTS

[ConnectX-6 and above] Added DV API that allows configuration of MKey with AES-XTS crypto offloads. The MKey can be configured for both crypto and signature offloads.

Huge Page Support for DEVX UMEMs

[ConnectX-4 and above] Added support to allow DEVX UMEM to be created with larger page sizes than 4K. For some device objects (e.g., RegEx) this is a must. In addition, page size larger than 4K may need less MTTs which may improve performance.

A new API mlx5dv_devx_umem_reg_ex() was added which requests a specific page sizes. It enables better application control on the required UMEM page size. The new API named mlx5dv_devx_umem_reg_ex() will be part of rdma-core V35.

ODP Locking Optimization

[ConnectX-4 and above] Added support for cleanup of the synchronize_srcu() from the ODP flow because it was a time-consuming part of dereg_mr.

Note: This only affects the driver and not the firmware.

Export Object IDs to Users

[ConnectX-4 and above] Extended support for the "rdma res show" command to SRQ and context resources.

Raw WQE

[ConnectX-5 and above] Added support for Raw WQE (mlx5dv_wr_raw_wqe). This feature allows applications to build a new custom work request (WQE) that is not supported by the verbs or driver and post it on normal QP. It is an extension for IBV work request (ibv_wr_*) with mlx5 specific features for sending a work request.

mlx5 Over VFIO

Added support for mlx5 user space driver over VFIO.

This feature enables an application to take full ownership on the opened device and run any firmware command (e.g., port up/down) without any concern to hurt someone else.

The application look and feel is like regular RDMA application over DEVX. It uses verbs API to open/close a device and then mostly uses DEVX APIs to interact with the device.

New mlx5 DV APIs were added to get ibv_device for a given mlx5 PCI name and to manage device specific events.

For description of the relevant APIs and expected usage of those APIs, look up the following:

mlx5dv_get_vfio_device_list()

mlx5dv_vfio_get_events_fd()

mlx5dv_vfio_process_events()

Software Steering Features

system_image_guid to Group Bonding Interfaces

[ConnectX-4 and above] Added support for using system_image_guid to group bonding interfaces.

With some specific NICs, each interface may have different PCIe domain, bus, device, or function IDs. For interfaces with the same system_image_guid, the driver assumes they reside on the same physical device and use a native_port_id to distinguish its index. Fallback is to PCIe BDF, if unsupported.

Software Steering Dump File Parser Tool

[ConnectX-4 and above] mlx_steering_dump tool is used to parse software steering dump files which includes information about domains, tables, matchers and rules created by software steering (mlx5dv_dr API), it can be used offline by providing a dump file as input, or it can be used to trigger DPDK app (like testpmd) to generate the dump and parse it.

Installation Features

Multiple Development Headers Packages

Allowed installing multiple mlnx-ofa_kernel development headers packages (for different kernel versions of the same mlnx-ofa_kernel package version) side by side on the same system.

Kernel Module Signature

Added signature of kernel modules of EulerOS 2.0 SP8-SP10 (x86_64 and aarch64) builds of MLNX_OFED.

Enable sf-cfg-drv by Default in EulerOS2.0

Enabled SF_CFG (SF config dummy driver, --with-sf-cfg-drv) on EulerOS2.0 SP8 and SP10.

Feature/Change

Description

5.4-3.0.3.0

GPUDirect

Kernel Space GPUDirect from VM

[ConnectX-5 and above] Added support for kernel space GPUDirect from the VM. To use GDS with high performance in a VM, set the ATS capability in ib_alloc_mr.

ASAP2

CTC Metering

[ConnectX-6 Dx] Added support for per flow metering using OVS or TC, PPS, and BPS.

Slow Path Metering

[ConnectX-6 Dx] Added support for slow path metering on representor using OVS or TC, PPS, and BPS.

Core

VF LAG

[ConnectX-6 Dx and BlueField-2] Added support to have physical port selection based on the hash function defined by the bond so that different packets of the same flow will be egress from the same physical port.

In order to enable this feature, set this mode for both bonded devices through the below sysfs before the device is in switchdev mode:

echo "hash" > /sys/class/net/enp8s0f0/compat/devlink/lag_port_select_mode

In order to have the legacy behaviour (queue affinity based selection), echo the following:

echo "queue_affinity" > /sys/class/net/enp8s0f0/compat/devlink/lag_port_select_mode

This feature requires to set LAG_RESOURCE_ALLOCATION to 1 with mlxconfig.

Single IRQ for PCI Function

[ConnectX-4 and above] Added support for single IRQ for PCI function. To use a high number of VFs, a large amount of IRQs is required which the device cannot always support. This feature enables VFs to function with a minimum of a single IRQ instead of two.

This is done via dynamic MSIX feature. In case dynamic MSIX feature is not supported (old kernels), the following configuration will probe all VFs with single IRQ:

$ mlxconfig -d <pci_dev> s NUM_VF_MSIX=0 STRICT_VF_MSIX_NUM=1

Netdev

ethtool EEPROM Support for DSFP

[ConnectX-6 Dx] Added support for reading DSFP module information. The change includes adding new options to ethtool netlink EEPROM module read API to read a specific page and bank.

RDMA

Dynamic VF MSI-X Allocation

[ConnectX-5 and above] Added support for dynamic assignment of MSI-X vector count.

The number of MSI-X vectors is a PCI property visible through lspci and is read-only field configured by the device. The static assignment of an amount of MSI-X vectors does not allow to utilize the newly created VF because the future load and configuration where that VF will be used is not known to the device. The VFs are created on the hypervisor and forwarded to the VMs that have different properties (for example number of CPUs).

To overcome the inefficiency in the spread of such MSI-X vectors, the kernel is now allowed to instruct the device with the needed number of such vectors before the VF is initialized and bounded to the driver.

DV API for DMA GGA memcpy

[BlueField-2 and above] DMA memcpy is one of several Memory-to-Memory Offloads (MMO) available from BlueField-2 onwards. It utilizes the GGA modules on the DPU to perform DMA memcpy, thus improving performance. The memcpy can be done locally, on the same host, or between the host and the Arm.

To use this feature, expose DV API.

Steering UserSpace

Set DR Matcher Layout

[ConnectX-6 Dx] Added support for a new RDMA CORE DR API to set the DR matcher layout by calling mlx5dv_dr_matcher_set_layout.

Setting the matcher layout allows presetting the matcher size and increasing matcher rule capacity, as well as other performance improvements in case matcher size is known.

Flex Parsers misc4

[ConnectX-5 and ConnectX-6 Dx] Added ability to expose flex parsers 4-7 provided by misc4 to extend matching ability of flex parsers. Now all flex parsers can be matched at the same time.

Software encap Action

[ConnectX-5 and above] Added support for software encap action. There is requirement for more than 1M encap actions, but currently the encap action creation uses devx, which is very slow for 1M encap actions. As such, there is a need to support a way for software to create encap actions.

The encap reformat action creation in rdma-core can now be done via software, rather than devx. It will use the new ICM memory type of software encap and directly copy encap data there, then use the memory pointer for flow creation.

Bug Fixes

See Bug Fixes.

Feature/Change

Description

5.4-1.0.3.0

ASAP2

Enlarge Switchdev Tables

[ConnectX-5 and above] Added support for allowing OVS kernel to support up to 128 matches (groups) per table and 16M entries per group.

Offloading Extended ct_state Flags

[ConnectX-5 and above] Added support to offload ct_state flags rpl, inv, and rel.

  • For rpl, support was added for both set and not set matching offload (i.e., +rpl and -rpl).

  • For inv and rel, support was added only for the not set option (i.e., -rel and -inv).

Core

Scalable Functions (Subfunctions)

[ConnectX-5 and above] Added support for scalable functions (also called subfunctions). The feature enables the user to create, configure, and deploy a scalable functions (e.g., RDMA and networking applications) and to assign them to a container when a container is started via mlxdevm tool.

A scalable function can also be deployed in an untrusted guest/host system from the NIC/DPU. This enables full configuration of the function and its representors from the NIC/DPU before giving the function for a container to run in a host system.

For more information, see https://github.com/Mellanox/scalablefunctions/wiki/MLNX_OFED-step-by-step-guide.

Scalable Function QoS

[ConnectX-5 and above] Added support for scalable function QoS and QoS group via mlxdevm's rate commands. Run "man mlxdevm port" for details.

Auxiliary Bus in mlx5 Driver

[ConnectX-4 and above] Updated mlx5 driver to use auxiliary bus in order to integrate different driver components into driver core and optimize module load/unload sequences.

Installation

Script Removal from mlnx-ofa_kernel

[General] Moved all Python scripts and some other common scripts out of the mlnx-ofa_kernel packages. This removed the python dependency from that package when rebuilding it and avoided unnecessary errors when rebuilding them for custom kernels.

Netdev

What-Just-Happened (WJH) in NICs

[ConnectX-4 and above] Added support for WJH in NICs. WJH allows for visibility of dropped packets (i.e., receiving notice of drop counters increase, seeing content of the dropped packets, debugging, and more).

WJH is a service in devlink context and it is already implemented in the switch.

Note: processing dropped packets (even for visibility purposes) may cause a degradation in performance and leaves the driver vulnerable for malicious attacks. The feature is disabled by default.

Supported traps:

  • VLAN mismatch: existing generic trap DEVLINK_TRAP_GENERIC_ID_DMAC_MISMATCH

    Traps received packets with wrong VLAN tag

  • DMAC mismatch: new generic trap DEVLINK_TRAP_GENERIC_ID_DMAC_MISMATCH

    Traps received packets with wrong destination MAC

Support added in user-space (N/A or package name + version): Devlink infrastructure (man7.org/linux/man-pages/man8/devlink-trap.8.html)

Devlink provides an infrastructure called devlink trap which allow a device to register/unregister and to enable/disable traps. Devlink traps also provide traps grouping and policing. The trapped packets are monitored and then forward to the drop monitor. Drop monitor is used to send notifications to user space about dropped packets.

Note: For this release, NIC WJH will not implement the policy.

ethtool Extended Link State

[General] Added ethtool extended link state to mlx5e.

ethtool can be used to get more information to help troubleshoot the state.

For example, if there is no link due to missing cable, run the following:

$ ethtool eth1

...

Link detected: no (No cable)

Besides the general extended state, drivers can pass additional information about the link state using the sub-state field.

Example:

$ ethtool eth1

...

Link detected: no (Autoneg, No partner detected)

The extended state is available only for some cases of no link. In other cases, ethtool will print only "Link detected: no" as it did before.

RDMA

DV "Signature API"

[ConnectX-5 and above] Added support for "Signature API" which, on supported devices, allows application-level data-integrity checks via a signature handover mechanism. Various signature types, including CRC32 and T10-DIF, can be automatically calculated and checked, stripped, or appended during the transfer at full wire speed.

ibv_query_qp_data_in_order() verb

[General] Added support for ibv_query_qp_data_in_order() API. This API enables an application to check if the given QP data is guaranteed to be in order, enabling poll for data instead of poll for completion.

Relaxed Ordering for Kernel ULPs

[ConnectX-4] Added support for enabling Relaxed Ordering for Kernel ULPs. Using relaxed ordering can improve performance in some setups. Since kernel ULPs are expected to support RO, it is enabled for them by default so they can benefit from it.

ah_to_qp Mapping

[ConnectX-6 Dx] Added support for mapping a QP to AH over DEVX API, which enables DC/UD QPs to use multiple CC algorithms in the same data center.

Steering UserSpace

Matching on RAW Tunnel Headers

[ConnectX-5 and above] Added DR support for matching on RAW tunnel headers using the misc5 parameters, This feature allows matching on each bit of the header, inducing reserved fields.

Software Steering Insertion Rate Optimizations

[ConnectX-6 Dx] Added support for better insertion rate in software steering. This includes multi-QP which skips areas in the code that may be for debug only.

Software Steering Rule Optimization

[ConnectX-6 Dx] Improved rate of updating steering rules, insertion, and deletion. The feature includes definers, multi-qp approach, and better memory usage.

Duplicate Rules Insertion

[ConnectX-5 and above] Added support for ability to allow or prevent insertion of duplicate rules, so the user can choose one of the following behaviors:

1. Prevent duplicate rules, so that already-existing rule and fail can be detected.

2. Allow duplicate rules, to enable updating the rule's action (this will only take effect once the previous rule is deleted).

By default, duplicate rules are allowed.

Improved Software Steering Rule Creation Stability

[ConnectX-6 Dx] Made it so that all rule's insertion occur in a defined time using defined (export) size of Htble and decreased use of dynamic allocation.

Feature/Change

Description

5.3-1.0.0.1

MLX5DR SF

[ConnectX-5 and above] Added support for up to 512 SFs with the mlx5dv_dr API.

Dump Single Flow

[ConnectX-5 & ConnectX-6 Dx] Added support to dump single flow/rule with flow-id.

Local/Remote Mirroring

[ConnectX-5* & ConnectX-6 Dx/ BlueField* & BlueField-2] OVS-DPDK added support for local and remote mirroring for offloaded traffic.

*Enabling the port mirroring feature on a ConnectX-5 NIC and BlueField will break Connection Tracking.

Connection Tracking Replay State

[BlueField-2] Added support for matching on CT state replay.

Kernel TLS Offload

[BlueField-2] Added support for TX and RX kTLS offloads on the ARM in switchdev mode via a sub-function.

Increased Number of Virtio Functions

[ConnectX-6 Dx & BlueField-2] Added support for up to 504 Virtio functions. 512 total functions are supported, but some are consumed by PF, Host PF, and RSHIM.

VF Metering

[ConnectX-6 Dx] Added support for RX/TX metering per VF using sysfs API.

PTP Hardware Translation Offload

[ConnectX-6 Dx] Added support for the hardware clock device to be adjusted and provide timestamps which are translated into real-time nanoseconds. This can be used by the driver for PTP protocol.

For further information, see PTP Cyc2time Hardware Translation Offload section.

TLS Rx Hardware Offload

[ConnectX-6 Dx] Added GA-level support for hardware offload decryption of TLS Rx traffic over crypto-enabled ConnectX-6 Dx NICs and above. Note: Not supported in ConnectX-6 Lx adapter cards.

MLX5DR Match Definer

[ConnectX-6 Dx] Added support for match definers which are used internally in the mlx5dv_dr API. Definers allow filtering on more packet fields, improving the packet rate and accelerates mlx5dv_dr API.

MLX5DR Packet OK and Checksum Checks

[ConnectX-6 Dx] Added support for new matching fields ipv4_checksum_ok and l4_checksum_ok. l3_ok, l4_ok.

Pop VLAN on VF/SF Tx Direction

[ConnectX-6 Dx] Added support to pop VLAN on VF/SF Tx direction.

Connection Tracking Window Validation

[ConnectX-6 Dx, ConnectX-6 Lx, BlueFlield-2] Added support for ASO connection tracking of action creation and modification. This action allows performing TCP connection tracking using hardware offloads.

Using this offload, the validity of the connection state of the incoming or outgoing packets on this TCP connection can be examined.

Also added the ability for an ASO CT action created on one GVMI to be used on different GVMI.

Pyverbs

[All HCAs] Pyverbs are no longer being built for Debian 9.

Bug Fixes

See Bug Fixes.

Category

Description

5.2-2.2.0.0

DCT Support for Connection Establishment with RDMA_CM

[ConnectX-5 and above] [Alpha level] Added support for the dv APIs to allocate/deallocate a unique QP number that can be used as DC QPN in RDMA_CM connection establishment.

MXM

[All HCAs] The MXM package is deprecated and removed from MLNX_OFED.

Devlink Firmware Reset

[All HCAs] In MLNX_OFED v5.2-1.0.4.0, it was noted that MLNX_OFED did not include the latest iproute2 that provided support for this feature, and that the latest iproute2 must be installed from Github (see Release Notes Change Log History section). This note is no longer relevant.

Bug Fixes

See Bug Fixes.

5.2-1.0.4.0

Rx Multi-strides CQE Compression

[ConnectX-5 and above] Added CQE compression support for Rx multi-strides packets.

Multi-application QoS

[ConnectX-5 and above] Added support for configuring QoS on a single QP or on a group of QPs.

MPLS-over-UDP Hardware Offload Support

[ConnectX-5 and above] Added support for encap/decap hardware offload of IPv4 traffic over MPLS-over-UDP. This can be used in networks with MPLS routers to achieve more efficient routing.

Connection Tracking with Hairpin

[ConnectX-5 and above] Added support for adding connection tracking rules on VFs to forward traffic from one VF to the other.

sFlow Sampling Rules Offload

[ConnectX-5 and above] Added support for offloading sFlow sampling rules.

sFlow is an industry standard technology for monitoring high speed switched networks.Open vSwitch integrated sFlow to extend the visibility into virtual servers, ensuring data center visibility and control. Added support for offloading sFlow sampling rules.

mlx5dv_dr Software Steering Parallel Rules Insertion

[ConnectX-5 & ConnectX-6 Dx/ BlueField & BlueField-2] Added support for a locking mechanism to enable parallel insertion of rules into the software steering using the mlx5dv_dr API. The parallel insertion improves the insertion rate and takes place when adding Rx and Tx rules via the FDB domain.

mlx5dv_dr API Matching on Geneve Tunnel

[ConnectX-5 & ConnectX-6 Dx/ BlueField & BlueField-2] Added support for the option to match mlx5dv_dr API on Geneve tunnel using a dynamic flex parser. The option header consists of class, type, length and data. The parser should be configured using devx command, after which a rule can be created to match on parser ID and data.

OVS-DPDK Geneve Encap/Decap

[ConnectX-5 & ConnectX-6 Dx/ BlueField & BlueField-2] Added support for Geneve tunneling offload, including matching on extension header.

OVS-DPDK Parallel Offloads

[ConnectX-5 & ConnectX-6 Dx/ BlueField & BlueField-2] Added support for parallel insertion and deletion of offloaded rules using multiple OVS threads.

GTP-U TEID Modification

[BlueField-2 & ConnectX-6 Dx] [Beta] Added support to modify GTP-U TEID. This support requires flex parser configuration.

OVS-DPDK E2E Cache Support

[BlueField-2 & ConnectX-6 Dx] [Beta] Improved performance of OVS Connection Tracking flows by enabling the merge of the multi-table flow matches and actions into one joint flow.

Tx Port Time-Stamping

[ConnectX-6 Dx and above] Transmitted packet timestamping accuracy can be improved when using a timestamp generated at the port level instead of a timestamp generated upon CQE creation. Tx port time-stamping better reflects the actual time of a packet's transmission.

This feature is disabled by default. The feature can be enabled or disabled using the following command.

ethtool --set-priv-flags <ifs-name> tx_port_ts on / off

For further information on this feature, please see Tx Port Time-Stamping.

Tunnel Rules Offload

[ConnectX-6 Dx and above] Added support for offloading tunnel rules when the source interface is VF (in addition to uplink) in the Hypervisor.

[ConnectX-6 Dx and above] Added support for offloading tunnel rules when the source interface is OpenvSwitch bridge (internal port).

Connection Tracking Mirroring Offload

[ConnectX-6 Dx and above] Added support for using Mirroring Offload with Connection Tracking.

mlx5dv_dr API ASO Flow Meter

[ConnectX-6 Dx and above] Added support for ASO flow meter using the mlx5dv_dr API, which allows for monitoring the packet rate for specific flows. When a packet hits a flow that is connected to a flow meter, the rate of packets through this meter is evaluated, and the packet is marked with a color copied into one of the C registers, according to the current rate compared to the reference rate.

mlx5dv_dr API ASO First Hit

[ConnectX-6 Dx and above] Added support for ASO first hit using the mlx5dv_dr API, which allows for tracking rule hits by packets. When a packet hits a rule with the ASO first hit action, a flag is set indicating this event, and the original value of the flag is copied to one of the C registers.

mlx5dv_dr API GTP-U Extension Header

[ConnectX-6 Dx and above] Added mlx5dv_dr API support for matching on a new field "gtpu_first_ext_dw_0". This field enables packet filtering based on the GTP-U first extension header (first dword only). To enable parsing of tunnel GTP-U extension header, run the following command.

./cloud_fw_reset.py FLEX_PARSER_PROFILE_ENABLE=3

IPsec Offload

[ConnectX-6 Lx and above] Added IPsec full offload support for extended sequence number, replay protection window and lifetime packet limit.

Firmware Upgrade

[All HCAs] Firmware upgrade during MLNX_OFED installation is now done on all supported devices simultaneously rather than consecutively.

RDMA-CM Disassociate Support

[All HCAs] Added support for connecting kernel and RDMA-CM in a reliable way based on device index.

New Query GID API

[All HCAs] Added support for a new query GID API that allows for querying a single GID entry by its port and GID index, or querying for all GID tables of a specific device. This API works over ioctl instead of sysfs, which accelerates the querying process.

Multi-Host Firmware Reset

[All HCAs] Added support for performing multi-host firmware reset in order to upgrade the device firmware.

Firmware reset loads the new firmware in case it was burnt on the flash and was pending activation, and reloads the current firmware image from the flash in case no new firmware was pending.

Firmware Live Patching

[All HCAs] [Alpha] Added support for firmware live patching in the driver. Live patching updates the firmware without the need to perform firmware reset. However, it can only be applied in scenarios where the difference between the current and new firmware versions are minor, which is decided upon by the firmware itself.

Devlink Firmware Reset

[All HCAs] Added support in the devlink tool for performing firmware reset in order to upgrade the device firmware.

Firmware reset loads the new firmware in case it was burnt on the flash and was pending activation, and reloads the current firmware image from the flash in case no new firmware was pending.

For further information, please refer to the the devlink man page.

Note: In order for the firmware reset to run successfully, the following conditions should be met.

  • Each function should have the driver up and active with a version that supports this feature

  • None of the functions has the devlink parameter enable_remote_dev_reset set to False.

Command Interface Resiliency

[All HCAs] Added a resiliency mechanism for the driver to manually poll the command event queue (EQ) in case of a command timeout. In case the resiliency mechanism finds unhandled event queue entry (EQE) due to a lost interrupt, the driver will handle it, after which the command interface returns to a healthy state.

Offloaded Traffic Sniffer

[All HCAs] Setting a sniffer private flag is deprecated and no longer required. In order to capture offloaded/RoCE traffic, tcpdump can now be run on the RDMA device.

Devlink Port Health Reporters

[All HCAs] Added per-port reporters to devlink health to manage per-port health activities. Users can now access the devlink port reporters by specifying the port index in addition to the device devlink name through the devlink health commands API. This update was first introduced in iproute2 v5.8. As part of this feature, mlx5e Tx and Rx reporters are now redefined as devlink port reporters. For examples, please see devlink-health manpage.

Memory Registration Optimization

[All HCAs] Optimized memory consumption of memory registration in huge page systems. As an example, in a 2MB huge page system, 600 MB would be saved for 100 GB memory registration.

mlx5dv API

[All HCAs] Added support for mlx5dv API to modify the configured UDP source port for RoCE packets of a given RC/UC QP when QP is in RTS state.

Enhanced Tx Multi-packet WQE (MPWQE)

[All HCAs] Added support for accelerating Tx datapath by saving PCI bandwidth and CPU utilization. The savings are achieved by aggregating multiple packets into a single WQE. The feature is driven by xmit_more for certain traffic types, such as UDP.

Innova IPsec NIC Support

Removed support for the network adapter Innova IPsec (EN).

Bug Fixes

See Bug Fixes.

Category

Description

5.1-0.6.6.0

IP-in-IP RSS Offload

[ConnectX-4 and above] Added support for receive side scaling (RSS) offload in IP-in-IP (IPv4 and IPv6).

Devlink Port Support in Non-representor Mode

[ConnectX-4 and above] Added support for viewing the mlx5e physical devlink ports using the 'devlink port' command. This also may affect network interface names, if predictable naming scheme is configured. Suffix indicating a port number will be added to interface name.

Devlink Health State Notifications

[ConnectX-4 and above] Added support for receiving notifications on devlink health state changes when an error is reported or recovered by one of the reporters. These notifications can be seen using the userspace ‘devlink monitor’ command.

Legacy SR-IOV VF LAG Load Balancing

[ConnectX-4 and above] When VF LAG is in use, round-robin the Tx affinity of channels among the different ports, if supported by the firmware, enables all SQs of a channel to share the same port affinity. This allows the distribution of traffic sent from a VF between two ports, as well as round-robin the starting port among VFs to distribute traffic originating from single-core VMs.

RDMA-CM DevX Support

[ConnectX-4 and above] Added support for DevX in RDMA-CM applications.

RoCEv2 Flow Label and UDP Source Port Definition

[ConnectX-4 and above] This feature provides flow label and UDP source port definition in RoCE v2. Those fields are used to create entropy for network routes (ECMP), load balancers and 802.3ad link aggregation switching that are not aware of RoCE headers.

RDMA Tx Steering

[ConnectX-4 and above] Enabled RDMA Tx steering flow table. Rules in this flow table will allow for steering transmitted RDMA traffic.

Custom Parent-Domain Allocators for CQ

[ConnectX-4 and above] Enabled specific custom allocations for CQs.

mlx5dv Helper APIs for Tx Affinity Port Selection

[ConnectX-4 and above] Added support for the following mlx5dv helper APIs which enable the user application to query or set a RAW QP's Tx affinity port number in a LAG configuration.

  • mlx5dv_query_qp_lag_port

  • mlx5dv_modify_qp_lag_port

RDMA-CM Path Alignment

[ConnectX-4 and above] Added support for RoCE network path alignment between RDMA-CM message and QP data. The drivers and network components in RoCE calculate the same hash results for egress port selection both on the NICs and the switches.

IPoIB QP Number Creation

[ConnectX-4 and above] Enabled setting the QP number of an IPoIB PKey interface in Enhanced mode. This is done using the standard ip link add command while padding the hardware address of the newly created interface. The QP number is the 2nd-4th bytes. To enable the feature, the MKEY_BY_NAME configuration should firstly be enabled in the NvConfig.

CQ and QP Context Exposure

[ConnectX-4 and above] Exposed QP, CQ and MR context in raw format via RDMA tool.

In-Driver xmit_more

[ConnectX-4 and above] Enabled xmit_more feature by default in kernels that lack Rx bulking support (v4.19 and above) to ensure optimized IP forwarding performance when stress from Rx to Tx flow is insufficient.

In kernels with Rx bulking support, xmit_more is disabled in the driver by default, but can be enabled to achieve enhanced IP forwarding performance.

Relaxed Ordering

[ConnectX-4 and above] Relaxed ordering is a PCIe feature which allows flexibility in the transaction order over the PCIe. This reduces the number of retransmissions on the lane, and increases performance up to 4 times.

By default, mlx5e buffers are created with Relaxed Ordering support when firmware capabilities are on and the PCI subsystem reports that CPU is not on the kernel's blocklist.

Note: Some CPUs which are not listed in the kernel's blocklist may suffer from buggy implementation of relaxed ordering, in which case the user may experience a degradation in performance and even unexpected behavior. To turn off relaxed ordering and restore previous behavior, run setpci command as instructed here. Example:

"RlxdOrd-“ : setpci -s82:00.0 CAP_EXP+8.w=294e

ODP Huge Pages Support

[ConnectX-4 and above] Enabled ODP Memory Region (MR) to work with huge pages by exposing IBV_ACCESS_HUGETLB access flag to indicate that the MR range is mapped by huge pages.

The flag is applicable only in conjunction with IBV_ACCESS_ON_DEMAND.

Offloaded Traffic Sniffer

[ConnectX-4 and above] Removed support for Offloaded Traffic Sniffer feature and replaced its function with Upstream solution tcpdump tool.

Connection Tracking Offload

[ConnectX-5 and above] Added support for offloading TC filters containing connection tracking matches and actions.

Dual-Port RoCE Support

[ConnectX-5 and above] Enabled simultaneous operation of dual-port RoCE and Ethernet in SwitchDev mode.

IP-in-IP Tunnel Offload for Checksum and TSO

[ConnectX-5 and above] Added support for the driver to offload checksum and TSO in IP-in-IP tunnels.

Packet Pacing DevX Support

[ConnectX-5 and above] Enabled RiverMax to work over DevX with packet pacing functionality by exposing a few DV APIs from rdma-core to enable allocating/destroying a packet pacing index. For further details on usage, see man page for: mlx5dv_pp_alloc() and mlx5dv_pp_free().

Software Steering Support for Memory Reclaiming

[ConnectX-5 and above] Added support for reclaiming device memory to the system when it is not in use. This feature is disabled by default and can be enabled using the command mlx5dv_dr_domain_set_reclaim_device_memory().

SR-IOV Live Migration

[ConnectX-5 and above] [Beta] Added support for performing a live migration for a VM with an SR-IOV NIC VF attached to it and with minimal to no traffic disruption. This feature is supported in SwitchDev mode; enabling users to fully leverage VF TC/OVS offloads, where the failover inbox driver is in the Guest VM, and the bonding driver is in the Hypervisor.

Note that you must use the latest QEMU and libvirt from the Upstream github.com sources.

Uplink Representor Modes

[ConnectX-5 and above] Removed support for new_netdev mode in SwitchDev mode. The new default behaviour is to always keep the NIC netdev.

OVS-DPDK Offload Statistics

[ConnectX-5 and above] Added support for dumping connection tracking offloaded statistics.

OVS-DPDK Connection Tracking Labels Exact Matching

[ConnectX-5 and above] Added support for labels exact matching in OVS-DPDK CT openflow rules.

OVS-DPDK LAG Support

[ConnectX-5 & ConnectX-6 Dx] Added support for LAG (modes 1,2,4) with OVS-DPDK.

Get FEC Status on PAM4/50G

[ConnectX-6 and above] Allowed configuration of Reed Solomon and Low Latency Reed Solomon over PAM4 link modes.

RDMA-CM Enhanced Connection Establishment (ECE)

[ConnectX-6 and above] Added support for allowing automatic enabling/disabling of vendor specific features during connection establishment between network nodes, which is performed over RDMA-CM messaging interface.

RoCE Selective Repeat

[ConnectX-6 and above] This feature introduces a new QP retransmission mode in RoCE in which dropped packet recovery is done by re-sending the packet instead of re-sending the PSN window only (Go-Back-N protocol). This feature is enabled by default when RDMA-CM is being used and both connection nodes support it.

IPsec Full Offload

[ConnectX-6 Dx & BlueField-2] [Beta] Added support for IPsec full offload (VxLAN over ESP transport).

Hardware vDPA on OVS-DPDK

[ConnectX-6 Dx & BlueField-2] Added support for configuring hardware vDPA on OVS-DPDK. This support includes the option to fall back to Software vDPA in case the NIC installed on the driver does not support hardware vDPA.

IPsec Crypto Offloads

[ConnectX-6 Dx] Support for IPsec Crypto Offloads feature over ConnectX-6 Dx devices and up is now at GA level.

TLS Tx Hardware Offload

[ConnectX-6 Dx] Support for TLS Tx Hardware Offload feature over ConnectX-6 Dx devices and up is now at GA level.

TLS Rx Hardware Offload

[ConnectX-6 Dx] [Alpha] Added support for hardware offload decryption of TLS Rx traffic over crypto-enabled ConnectX-6 Dx NICs and above.

Userspace Software Steering ConnectX-6 Dx Support

[ConnectX-6 Dx] Support for software steering on ConnectX-6 Dx adapter cards in the user-space RDMA-Core library through the mlx5dv_dr API is now at GA level.

Kernel Software Steering ConnectX-6 Dx Support

[ConnectX-6 Dx] [Beta] Added support for kernel software steering on ConnectX-6 Dx adapter cards.

Adapters

[ConnectX-6 Lx] Added support for ConnectX-6 Lx adapter cards.

RDMA-Core Migration

[All HCAs] As of MLNX_OFED v5.1, Legacy verbs libraries have been fully replaced by RDMA-Core library.

For the list of new APIs used for various MLNX_OFED features, please refer to the Migration to RDMA-Core document.

Firmware Reactivation

[All HCAs] Added support for safely inserting consecutive firmware images without the need to reset the NIC in between.

UCX-CUDA Support

[All HCAs] UCX-CUDA is now supported on the following OSs and platforms.

OS

Platform

RedHat 7.6 ALT

PPC64LE

RedHat 7.7

x86_64

RedHat 7.8

PPC64LE/x86_64

RedHat 7.9

x86_64

RedHat 8.1

x86_64

RedHat 8.2

x86_64

HCOLL-CUDA

[All HCAs] The hcoll package includes a CUDA plugin (hmca_gpu_cuda.so). As of MLNX_OFED v5.1, it is built on various platforms as the package hcoll-cuda. It will be installed by default if the system has CUDA 10-2 installed.

Notes:

  • If you install MLNX_OFED from a package repository, you will need to install the package hcoll-cuda explicitly to be able to use it.

  • HCOLL-CUDA is supported on the same OSs that include support for UCX-CUDA (listed in the table above), except for RedHat 8.1 and 8.2.

GPUDirect Storage (GDS)

[All HCAs] [Beta] Added support for the new technology of GDS (GPUDirect Storage) which enables a direct data path between local or remote storage, such as NFS, NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. Both GPUDirect RDMA and GPUDirect Storage avoid extra copies through a bounce buffer in the CPU's memory. They enable the direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory, without burdening the CPU or GPU.

To enable the feature, run ./mlnxofedinstall --with-nfsrdma –-with-nvmf --enable-gds --add-kernel-support

To get access to GDS Beta, please reach out to the GDS team at GPUDirectStorageExt@nvidia.com.

For the list of operating systems on which GDS is supported, see here.

Category

Description

5.0-2.1.8.0

Kernel Software Managed Flow Steering (SMFS) Performance

[ConnectX-5 and above] Improved the performance of Kernel software steering by reducing its memory consumption.

NEO-Host SDK

[All HCAs] Added support for NEO-Host SDK installation on MLNX_OFED.

Bug Fixes

See Bug Fixes.

5.0-1.0.0.0

Adapters

[ConnectX-6 Dx] Added support for ConnectX-6 Dx adapter cards.

Userspace Software Steering ConnectX-6 Dx Support

[ConnectX-6 Dx] [Beta] Added support for software steering on ConnectX-6 Dx adapter cards in the user-space RDMA-Core library through the mlx5dv_dr API.

Virtual Output Queuing (VoQ) Counters

[ConnectX-6 Dx and above] Exposed rx_prio[p]_buf_discard, rx_prio[p]_wred_discard and rx_prio[p]_marked firmware counters that count the number of packets that were dropped due to insufficient resources.

IPsec Crypto Offloads

[ConnectX-6 Dx and above] [Beta] IPsec crypto offloads are now supported on ConnectX-6 Dx devices and up. The offload functions use the existing ip xfrm tool to activate offloads on the device. It supports transport/tunnel mode with AES-GCM IPsec scheme.

TLS TX Hardware Offload

[ConnectX-6 Dx and above] [Alpha] Added support for hardware offload encryption of TLS traffic.

Note: Not supported in ConnectX-6 Lx adapter cards.

VirtIO Acceleration through Datapath I/O Processor (vDPA)

[ConnectX-6 Dx and above] Added support to enable mapping the VirtIO access region (VAR) to be used for doorbells by vDPA applications. Specifically, the following DV APIs were introduced (see man page for more details):

  • mlx5dv_alloc_var()

  • mlx5dv_free_var()

Note: Not supported in ConnectX-6 Lx adapter cards.

Resource Allocation on External Memory

[ConnectX-5 and above] Added support to enable overriding mlx5 internal allocations in order to let applications allocate some resources on external memory, such as that of the GPU.

The above is achieved by extending the parent domain object with custom allocation callbacks. Currently supported verbs objects are: QP, DBR, RWQ, SRQ.

Hardware Clock Exposure

[ConnectX-5 and above] Added support for querying the adapter clock via mlx5dv_query_device.

ODP Diagnostic Counters

[ConnectX-5 and above] Added ODP diagnostics counters for the following items per MR (memory region) within IB/mlx5 driver:

  1. Page faults: Total number of faulted pages.

  2. Page invalidations: Total number of pages invalidated by the OS during all invalidation events. The translations can no longer be valid due to either non-present pages or mapping changes.

  3. Prefetched pages: When prefetching a page, a page fault is generated in order to bring the page to the main memory.

Devlink Health CR-Space Dump

[ConnectX-5 and above] Added the option to dump configuration space via the devlink tool in order to improve debug capabilities.

Multi-packet TX WQE Support for XDP Transmit Flows

[ConnectX-5 and above] The conventional TX descriptor (WQE or Work Queue Element) describes a single packet for transmission. Added driver support for the HW feature of multi-packet TX WQEs in XDP transmit flows. With this, the HW becomes capable of working with a new and improved WQE layout that describes several packets. In effect, this feature saves PCI bandwidth and transactions, and improves transmit packet rate.

OVS-Kernel ToS Rewrite

[ConnectX-5 and above] Added support for Type of Service (ToS) rewrite in the OVS-Kernel.

OVS-Kernel Mirroring

[ConnectX-5 and above] Added support for mirroring output in SwitchDev mode in the OVS-Kernel. The mirroring port may either be a local or a remote VF, using VxLAN or GRE encapsulations.

GENEVE Encap/Decap Rules Offload

[ConnectX-5 and above] Added support for GENEVE encapsulation/decapsulation rules offload.

GPRS Tunneling Protocol (GTP) Header

[ConnectX-5 and above] [Beta] Added support for matching (filtering) GTP header-based packets using mlx5dv_dr API over user-space RDMA-Core library.

Multi Packet Tx WQE Support for XDP Transmit Flows

[ConnectX-5 and above] Added driver support for the hardware feature of multi-packet Tx to work with a new and improved WQE layout that describes several packets instead of a single packet for XDP transmission flows. This saves PCI bandwidth and transactions, and improves transmit packet rate.

Userspace Software Steering Debugging API

[ConnectX-5 and above] [Beta] Added support for software steering to dump flows for debugging purposes in the user-space RDMA-Core library through the mlx5dv_dr API.

Kernel Software Steering for Connection Tracking (CT)

[ConnectX-5 and above] [Beta] Added support for updating CT rules using the software steering mechanism.

Kernel Software Steering Remote Mirroring

[ConnectX-5 and above] [Beta] Added support for updating remote mirroring rules using the software steering mechanism.

OVS-DPDK Support

[ConnectX-5 and BlueField] Added OVS-DPDK component as part of the MLNX_OFED package with hardware offload capabilities.

OVS-DPDK Connection Tracking

[ConnectX-5 and BlueField] [Beta] Added support for OvS-DPDK Connection Tracking hardware offload.

OVS-DPDK VirtIO Acceleration through VF Relay

[ConnectX-5 and BlueField] Added support for OVS-DPDK VirtIO Acceleration through VF Relay (also known as Software vDPA) forwarding of traffic from VF to Virtio VM and vice-versa.

OVS-DPDK VXLAN Encap/Decap

[ConnectX-5 and BlueField] Added support for OVS-DPDK VXLAN encapsulation and decapsulation hardware offload.

Discard Counters

[ConnectX-4 and above] Exposed rx_prio[p]_discards discard counters per priority that count the number of received packets dropped due to lack of buffers on the physical port.

MPLS Traffic

[ConnectX-4 and above] Added support for reporting TSO and CSUM offload capabilities for MPLS tagged traffic and, allowed the kernel stack to use these offloads.

mlx5e Max Combined Channels

[ConnectX-4 and above] Increased the driver’s maximal combined channels value from 64 to 128 (however, note that OOB value will not cross 64).

128 is the upper bound. Lower maximal value can be seen on the host, depending on the number of cores and MSIX's configured by the firmware.

RoCE Accelerator Counters

[ConnectX-4 and above] Added the following RoCE accelerator counters:

  • roce_adp_retrans - counts the number of adaptive retransmissions for RoCE traffic

  • roce_adp_retrans_to - counts the number of times RoCE traffic reached timeout due to adaptive retransmission

  • roce_slow_restart - counts the number of times RoCE slow restart was used

  • roce_slow_restart_cnps - counts the number of times RoCE slow restart generated CNP packets

  • roce_slow_restart_trans - counts the number of times RoCE slow restart changed state to slow restart

Migration to RDMA-Core

[All HCAs] The default installation of the userspace is now the RDMA-Core library instead of the legacy verbs. This achieves most of the legacy experimental verbs’ functionalities, and more.

For NVIDIA VMA or NVIDIA RiverMax, use experimental verbs (prefix “ibv_exp”).

For further information on the migration to RDMA-Core and the list of new APIs used for various MLNX_OFED features, please refer to the Migration to RDMA-Core document.

ibdev2netdev Tool Output

[All HCAs] ibdev2netdev tool output was changed such that the bonding device now points at the bond instead of the slave interface.

Memory Region

[All HCAs] Added support for the user to register memory regions with a relaxed ordering access flag. This can enhance performance, depending on architecture and scenario.

Devlink Health Reporters

[All HCAs] Added support for monitoring and recovering from errors that occur on the RX queue, such as cookie errors and timeout.

GSO Optimization

[All HCAs] Improved GSO (Generic Segmentation Offload) workload performance by decreasing doorbells usage to the minimum required.

TX CQE Compression

[All HCAs] Added support for TX CQE (Completion Queue Element) compression. Saves on outgoing PCIe bandwidth by compressing CQEs together. Disabled by default. Configurable via private flags of ethtool.

Firmware Versions Query via Devlink

[All HCAs] Added the option to query for running and stored firmware versions using the devlink tool.

Firmware Flash Update via Devlink

[All HCAs] Added the option to update the firmware image in the flash using the devlink tool.

Usage: devlink dev flash <dev> file <file_name>.mfa2

For further information on how to perform this update, see "Updating Firmware Using ethtool/devlink and .mfa2 File" section in MFT User Manual.

Devlink Health WQE Dump

[All HCAs] Added support for WQE (Work Queue Element) dump, triggered by an error on Rx/Tx reporters. In addition, some dumps (not triggered by an error) can be retrieved by the user via devlink health reporters.

GENEVE Tunnel Stateless Offload

[All HCAs] Added support for GENEVE tunneled hardware offloads of TSO, CSUM and RSS.

TCP Segmentation and Checksum Offload

[All HCAs] Added TCP segmentation and checksum offload support for MPLS-tagged traffic.

Category

Description

4.7-3.2.9.0

Uplink Representor Modes

[ConnectX-5 and above] Added support for new_netdev and nic_netdev uplink representor modes.

For further information on how to configure these modes, please refer to Configuring Uplink Representor Mode.

mlx5_core

[ConnectX-5 and above] Added new mlx5_core module parameter "num_of_groups", which controls the number of large groups in the FDB flow table.

Note: The default value of num_of_groups may change per MLNX_OFED driver version. The following table lists the values that must be set when upgrading the MLNX_OFED version prior to driver load, in order to achieve the same OOB experience.

MLNX_OFED Version

num_of_groups Default Value

v4.7-3.2.9.0

4

v4.6-3.1.9.0.14

15

v4.6-3.1.9.0.15

15

v4.5-1.0.1.0.19

63

For further information, please refer to Performance Tuning Based on Traffic Patterns section in MLNX_OFED User Manual.

VFs Groups Minimum Bandwidth Rate

[ConnectX-5] Added support for setting a minimum bandwidth rate on a group of VFs (BW guarantee) to ensure this group is able to transmit at least the amount of bandwidth specified on the wire.

Direct Verbs Support for Batch Counters on Root Table

[ConnectX-5] Added support for mlx5dv_dr API to set batch counters for root tables.

Modify Header

[ConnectX-5 and BlueField] Added support for mlx5dv_dr_actions to support up to 32 modify actions.

mlx5dv_dr Memory Consumption

[ConnectX-5 and BlueField] Reduced the mlx5dv_dr API memory consumption by improving the memory allocator.

mlx5dv_dr Memory Allocation

[ConnectX-5 and BlueField] Reduced memory allocation time when using the mlx5dv_dr API. This is particularly significant for the first inserted rules on which memory is allocated.

Mediated Devices

[ConnectX-5 and BlueField] Added support for mediated devices that allows the creation of accelerated devices without SR-IOV on the Bluefield® system.

For further information on mediated devices and how to configure them, please refer to Mediated Devices section in MLNX_EN User Manual.

4.7-1.0.0.1

Counters Monitoring

[ConnectX-4 and above] Added support for monitoring selected counters and generating a notification event (Monitor_Counter_Change event) upon changes made to these counters.

The counters to be monitored are selected using the SET_MONITOR_COUNTER command.

Signature Offload Kernel Verbs Enhancements

[ConnectX-4 and above] Added a new API which enables posting a single WR that completes the Protection Information (PI) operation internally. This reduces CPU utilization for posting and processing multiple WRs and improves performance by choosing the optimal mkey for the hardware according to the buffer memory layout.

EEPROM Device Thresholds via Ethtool

[ConnectX-4 and above] Added support to read additional EEPROM information from high pages of modules such as SFF-8436 and SFF-8636. Such information can be: 1. Application Select table 2. User writable EEPROM 3. Thresholds and alarms - Ethtool dump works on active cables only (e.g. optic), but thresholds and alarms can be read with “offset” and “length” parameters in any cable by running: ethtool -m <DEVNAME> offset X length Y

Performance Improvements

[ConnectX-4 and above]

  • Updated Blueflame capability reporting to prevent redundant use of Blueflame when Write-combining is not supported.

  • Added Blueflame capabilities over VFs.

RDMA_RX RoCE Steering Support

[ConnectX-4 and above] Added the ability to create rules to steer RDMA traffic, with two destinations supported: DevX object and QP. Multiple priorities are also supported.

SRQ and XRC Support on On Demand Paging (ODP) Memory Region (MR)

[ConnectX-4 and above] Added support for using ODP MR with SRQ WQEs and XRC transport.

Indirect Mkey ODP

[ConnectX-4 and above] Added the ability to create indirect Mkeys with ODP support over DevX interface.

DevX Asynchronous Query Commands

[ConnectX-4 and above] Added support for running QUERY commands over the DevX interface in an asynchronous mode. This enables applications to issue many commands in parallel while firmware processes the commands.

Implicit ODP

[ConnectX-4 and above] Added support for reporting implicit ODP support to user applications in order to allow better granularity over ODP creation.

Devlink Health Utility

[ConnectX-4 and above] Added support for real-time alerting of functionality issues that may be found in a system component (reporter). This utility helps detect and recover from a problem with a PCI device. It provides a centralize status of drivers' health activities in the generic Devlink instance and inter alia, supports the following:

  • Storing real-time error dumps

  • Performing automatic (configurable) real-time reporter recovery

  • Performing real-time reporter diagnosis

  • Indicating real-time reporter's health status

  • Providing admins with the ability to dump, diagnose and recover a reporter

  • Providing admins with the ability to configure a reporter

User-Mode Memory Registration (UMR)

[ConnectX-4 and above] Enabled registration of memory patterns that can be used for future RDMA operations.

GENEVE Tunnel Stateless Offload

[ConnectX-4 and above] Added support for Generic Network Virtualization Encapsulation (GENEVE) tunneled hardware offload of TSO, CSUM and RSS.

ODP Pre-fetch

[ConnectX-4 and above] Added support for pre-fetching a range of an on-demand paging (ODP) memory region (MR), this way reducing latency by making pages present with RO/RW permissions before the actual IO is conducted.

Fragmented

QPs Buffer

[ConnectX-4 and above] Added the ability to allocate a fragmented buffer to in-kernel QP creation requests, in cases of large QP size requests that used to fail due to low memory resources on the host.

Flow Counters Batch Query

[ConnectX-4 and above] Allowed flow counters created with the DevX interface to be attached to flows created with the raw flow creation API.

DevX Privilege Enforcement

[ConnectX-4 and above] Enforced DevX privilege by firmware. This enables future device functionality without the need to make driver changes unless a new privilege type is introduced.

DevX Interoperability APIs

[ConnectX-4 and above] Added support for modifying and/or querying for a verb object (including CQ, QP, SRQ, WQ, and IND_TBL APIs) via the DevX interface.

This enables interoperability between verbs and DevX.

Counters Monitoring

[ConnectX-4 and above] Added support for monitoring selected counters and generating a notification event (Monitor_Counter_Change event) upon changes made to these counters.

The counters to be monitored are selected using the SET_MONITOR_COUNTER command.

Rx Hash Fields Configuration

[ConnectX-4 and above] Added the ability to configure Rx hash fields used for traffic spreading into Rx queues using ETHTOOL_SRXFH and ETHTOOL_GRXFH ethtool commands. Built-in Receive Side Scaling (RSS) profiles can now be changed on the following traffic types: UDP4, UDP6, TCP4 and TCP6. This configuration affects both outer and inner headers.

Equal Cost Multi-Path (ECMP)

[ConnectX-4 Lx and above] Added support for offloading ECMP rules by tracking software multipath route and related next-hops, and reflecting this as port affinity to the hardware.

VF LAG

[ConnectX-4 Lx and above] Added support for High Availability and load balancing for Virtual Functions of different physical ports in SwitchDev SR-IOV mode.

Uplink Representors

[ConnectX-4 Lx and above] Exposed PF (uplink) representors in SwitchDev mode, similarly to VF representors, as an infrastructure improvement for SmartNICs.

Userspace Software Steering for eSwitch

[ConnectX-5] Added software steering capabilities to the SR-IOV eSwitch. Software steering enables better rules insertion rate compared to the current firmware-based solution. This is achieved by performing calculations on the main CPU which allows for higher insertion rates.

Userspace Software Steering for NICs

[ConnectX-5] Added software steering capabilities to NIC Rx/Tx. Software steering enables better rules insertion rate compared to the current firmware-based solution. This is achieved by performing calculations on the main CPU which allows for higher insertion rates. This solution was designed to work with Virtio DPDK.

Note: Support will be enabled by default once the support for GID change is added.

ASAP2

[ConnectX-5 and above] Incorporated the documentation of Accelerated Switching And Packet Processing (ASAP2): Hardware Offloading for vSwitches into MLNX_OFED Release Notes and User Manual.

QP Counters and Firmware Errors per PID

[ConnectX-5 and above] QP counters and flow counters are now set per Process ID (PID) to allow better visibility of RDMA error states. Users will be able to manually tune the Q counter to monitor specific QPs, or automatically monitor QPs according to predefined criteria, such as the QP type.

ODP over DC

[ConnectX-5 and above] Added support for On-Demand Paging (ODP) over DC transport.

Address Translation Services

[ConnectX-5 and above] Added support for Address Translation Services (ATS) feature, which improves performance for virtualized PeerDirect applications by caching PA-> MA translations and preventing PCI transactions from going to the root complex.

XDP Inline Transmission of Small Packets

[ConnectX-5 and above] Added support for when forwarding packets with XDP, a packet smaller than 256 bytes would be sent inline within its WQE Tx descriptor for better performance. The number of packets that are transmitted inline depends on CPUs load, where lower load leads to a higher number of inline transmission.

VLAN Rewrite

[ConnectX-5 and above] Added support for offloading VLAN ID modify operation, allowing the user to replace the VLAN tag of the incoming frame with a user-specified VLAN tag value.

CQE Padding

[ConnectX-5 and above] Added support for padding 64B CQEs to 128B cache lines to improve performance on 128B cache line systems, such as PPC.

XDP Multi-Packet Tx Work Queue Element (WQE)

[ConnectX-5 and above] Added support for Multi-Packet Tx WQEs in XDP transmit flows to work with a new and improved WQE layout that describes several packets.This saves PCI bandwidth and transactions, and improves transmit packet rate.

ConnectX Device IDs

[ConnectX-6] Added support for the following new device IDs:

  • ConnectX-6 Dx (PF)

  • ConnectX Family mlx5Gen Virtual Function (VF)

    Note that every new device (adapter) VF will be identified with this device ID. Different VF models will be distinguished by their revision ID.

Ethtool 200Gbps

[ConnectX-6 and above] ConnectX-6 hardware introduces support for 200Gbps and 50Gbps-per-lane link mode. The driver supports full backward compatibility with previous configurations. Note that in order to advertise newly added link-modes, the full bitmap related to the link modes must be advertised from ethtool man page.

NOTE: This feature is firmware-dependent. Currently, ConnectX-6 Ethernet firmware supports up to 100Gbps only. Thus, this capability may not function properly using the current driver and firmware versions.

HDR Link Speed Exposure

[ConnectX-6 and above] Added support for HDR link speed in CapabilityMask2 field in port attributes.

QP Packet Based Credit Mode

[ConnectX-6 and above] Added support for an alternative end-to-end credit mode for QP creation. Credits transported from the responder to the requester are now issued per packet. This is particularly useful for sending large RDMA messages from HCA to switches that are short in memory.

Device Emulation Infrastructure

[BlueField] Added support for Device Emulation in BlueField. This mechanism allows function-A to perform operations on behalf of function-B. The emulation manager creates a channel (named VHCA_TUNNEL general object) that acts as the direct command interface between the emulated function host and the HCA hardware. The emulation software creates this tunnel for every managed function and issues commands via the DevX general command interface.

Verbs Migration to RDMA-Core

[All HCAs] Legacy verbs remain the default userspace installation option in the MLNX_OFED. However, as of MLNX_OFED v4.7, you can opt to install full RDMA-Core based userspace by adding the

--upstream-libs flag to the mlnxofedinstall script.

MLNX_OFED Installation via Repository

[All HCAs] The repository providing legacy verbs has been moved from RPMS or DEBS folders to RPMS/MLNX_LIBS and DEBS/MLNX_LIBS.

In addition, a new repository providing RDMA-Core based userspace has been added to RPMS/UPSTREAM_LIBS and DEBS/UPSTREAM_LIBS.

NFSoRDMA

[All HCAs] Added support for NFS over RDMA (NFSoRDMA) module over the OSs listed in NFSoRDMA Supported OSs section.

[All HCAs] As of MLNX_OFED v4.7, NFSoRDMA driver is no longer installed by default. In order to install it over a supported kernel, add the “--with-nfsrdma” installation option to the “mlnxofedinstall” script.

RDMA-CM QP Timeout Control

[All HCAs] Added a new option to rdma_set_option that allows applications to override the RDMA-CM's QP ACK timeout value.

Object IDs Exportation

[All HCAs] Added a unique ID for each verbs object to allow direct query over rdma-tool and rdma-netlink for enhanced debuggability.

RDMA-CM Application Managed QP

[All HCAs] Added support for the RDMA application to manage its own QPs and use RDMA-CM only for exchanging Address information.

Bug Fixes

See Bug Fixes.

Category

Description

4.6-1.0.1.0

Devlink Configuration Parameters Tool

[ConnectX-3/ConnectX-3 Pro] Added support for a set of configuration parameters that can be changed by the user through the Devlink user interface.

ODP Pre-fetch

[ConnectX-4 and above] Added support for pre-fetching a range of an on-demand paging (ODP) memory region (MR), this way reducing latency by making pages present with RO/RW permissions before the actual IO is conducted.

DevX Privilege Enforcement

[ConnectX-4 and above] Enforced DevX privilege by firmware. This enables future device functionality without the need to make driver changes unless a new privilege type is introduced.

DevX Interoperability APIs

[ConnectX-4 and above] Added support for modifying and/or querying for a verb object (including CQ, QP, SRQ, WQ, and IND_TBL APIs) via the DevX interface.

This enables interoperability between verbs and DevX.

DevX Asynchronous Query Commands

[ConnectX-4 and above] Added support for running QUERY commands over the DevX interface in an asynchronous mode. This enables applications to issue many commands in parallel while firmware processes the commands.

DevX User-space PRM Handles Exposure

[ConnectX-4 and above] Exposed all PRM handles to user-space so DevX user application can mix verbs objects with DevX objects.

For example: Take the cqn from the created ibv_cq and use it on a devx)create(QP).

Indirect Mkey ODP

[ConnectX-4 and above] Added the ability to create indirect Mkeys with ODP support over DevX interface.

XDP Redirect

[ConnectX-4 and above] Added support for XDP_REDIRECT feature for both ingress and egress sides. Using this feature, incoming packets on one interface can be redirected very quickly into the transmission queue of another capable interface. Typically used for load balancing.

RoCE Disablement

[ConnectX-4 and above] Added the option to disable RoCE traffic handling. This enables forwarding of traffic over UDP port 4791 that is handled as RoCE traffic when RoCE is enabled.

When RoCE is disabled, there is no GID table, only Raw Ethernet QP type is supported and RoCE traffic is handled as regular Ethernet traffic.

Forward Error Correction (FEC) Encoding

[ConnectX-4 and above] Added the ability to query and modify Forward Error Correction (FEC) encoding, as well as disabling it via Ethtool.

RAW Per-Lane Counters Exposure

[ConnectX-4 and above] Exposed RAW error counters per cable-module lane via ethtool stats. The counters show the number of errors before FEC correction (if enabled).

For further information, please see phy_raw_errors_lane[i] under Physical Port Counters section in Understanding mlx5 ethtool Counters Community post.

VF LAG

[ConnectX-4 Lx and above] Added support for High Availability and load balancing for Virtual Functions of different physical ports in SwitchDev SR-IOV mode.

ASAP2 Offloading VXLAN Decapsulation with HW LRO

[ConnectX-5 and above] Added support for performing hardware Large Receive Offload (HW LRO) on VFs with HW-decapsulated VXLAN.

For further information on the VXLAN decapsulation feature, please refer to ASAP2 User Manual under nvidia.com/en-us/networking/ → Products → Software → ASAP2.

PCI Atomic Operations

[ConnectX-5 and above] Added the ability to run atomic operations on local memory without involving verbs API or compromising the operation's atomicity.

Equal-Cost Multi-Path (ECMP) Routing Offloading

[ConnectX-5 and above] Enabled Equal-Cost Multi-Path (ECMP) Routing offloading.

Equal-Cost Multi-Path (ECMP) is a forwarding mechanism for routing packets along multiple paths of equal cost with the goal to achieve almost equally distributed link load sharing.

VXLAN over VLAN

[ConnectX-5 and above] VXLAN over VLAN enables the user to use VXLAN offloads' benefit to offload VLAN tagged tunnels thus boost system's performance.

VLAN Rewrite

[ConnectX-5 and above] Rewriting VLAN tags allows the user to replace the VLAN tag of the incoming frame with a user-specified VLAN tag value.

Virtual Ethernet Port Aggregator (VEPA)

[ConnectX-5] Added support for activating/deactivating Virtual Ethernet Port Aggregator (VEPA) mode on a single virtual function (VF). To turn on VEPA on the second VF, run:

bridge link set dev <netdev> hwmode vepa

VFs Rate Limit

[ConnectX-5] Added support for setting a rate limit on groups of Virtual Functions rather on an individual Virtual Function.

ConnectX-6 Support

[ConnectX-6] [Beta] Added support for ConnectX-6 (VPI only) adapter cards.

NOTE: In HDR installations that are built with remotely managed Quantum-based switches, the switch’s firmware must be upgraded to version 27.2000.1142 prior to upgrading the HCA’s (ConnectX-6) firmware to version 20.25.1500. When using ConnectX-6 HCAs with firmware v20.25.1500 and connecting them to Quantum-based switches, make sure the Quantum firmware version is 27.2000.1142 in order to avoid any critical link issues.

Ethtool 200Gbps

[ConnectX-6] ConnectX-6 hardware introduces support for 200Gbps and 50Gbps-per-lane link mode. MLNX_OFED supports full backward compatibility with previous configurations.

Note that in order to advertise newly added link-modes, the full bitmap related to the link modes must be advertised from ethtool man page. For the full bitmap list per link mode, please refer to MLNX_OFED User Manual.

NOTE: This feature is firmware-dependent. Currently, ConnectX-6 Ethernet firmware supports up to 100Gbps only. Thus, this capability may not function properly using the current driver and firmware versions.

PCIe Power State

[ConnectX-6] Added support for the following PCIe power state indications to be printed to dmesg:

  1. Info message #1: PCIe slot power capability was not advertised.

  2. Warning message: Detected insufficient power on the PCIe slot (xxxW).

  3. Info message #2: PCIe slot advertised sufficient power (xxxW).

    When indication #1 or #2 appear in dmesg, user should make sure to use a PCIe slot that is capable of supplying the required power.

Message Signaled

Interrupts-X (MSI-X)

Vectors

[mlx5] Added support for using a single MSI-X vector for all control event queues instead of one MSI-X vector per queue in a virtual function driver. This frees extra MSI-X vectors to be used for completion event queue, allowing for additional traffic channels in the network device.

Send APIs

[mlx5] Introduced a new set of QP Send operations (APIs) which allows extensibility for new Send opcodes.

DC Data-path

[mlx5] Added DC QP data-path support using new Send APIs introduced in Direct Verbs (DV).

BlueField Support

[BlueField] BlueField is now fully supported as part of the NVIDIA OFED mainstream version sharing the same code baseline with all the adapters product line.

Representor Name Change

[BlueField] In SwitchDev mode:

  • Uplink representors are now called p0/p1

  • Host PF representors are now called pf0hpf/pf1hpf

  • VF representors are now called pf0vfN/pf1vfN

ECPF Net Devices

[BlueField] In SwitchDev mode, net devices enp3s0f0 and enp3s0f1 are no longer created.

Setting Host MAC and Tx Rate Limit from ECPF

[BlueField] Expanded to support VFs as well as the host PFs.

RDMA-CM Application Managed QP

[All HCAs] Added support for the RDMA application to manage its own QPs and use RDMA-CM only for exchanging Address information.

RDMA-CM QP Timeout Control

[All HCAs] Added a new option to rdma_set_option that allows applications to override the RDMA-CM's QP ACK timeout value.

MLNX_OFED Verbs API

[All HCAs] As of MLNX_OFED v5.0 release (Q1 2020) onwards, MLNX_OFED Verbs API will be migrated from the legacy version of the user space verbs libraries (libibervs, libmlx5 ..) to the upstream version rdma-core.

More details are available in MLNX_OFED user manual under Installing Upstream rdma-core Libraries.

Bug Fixes

See Bug Fixes.

Category

Description

4.5-1.0.1.0

VFs per PF

[ConnectX-5] Increased the amount of maximum virtual functions (VF) that can be allocated to a physical function (PF) to 127 VF.

SW-Defined UDP Source Port for RoCE v2

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] UDP source port for RoCE v2 packets is now calculated by the driver rather than the firmware, achieving better distribution and less congestion. This mechanism works for RDMA- CM QPs only, and ensures that RDMA connection messages and data messages have the same UDP source port value.

Local Loopback Disable

[mlx5 Driver] Added the ability to manually disable Local Loopback regardless of the number of open user-space transport domains.

Adapter Cards

[ConnectX-6] Added support for ConnectX-6 Ready. For further information, please contact support networking-support@nvidia.com.

NEO-Host

[All HCAs] Integrated NEO-Host for orchestration and management of host networking into MLNX_OFED package.

Bug Fixes

See Bug Fixes.

4.4-2.0.7.0

Bug Fixes

See Bug Fixes.

4.4-1.0.0.0

Adaptive Interrupt Moderation

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for adaptive Tx, which optimizes the moderation values of the Tx CQs on runtime for maximum throughput with minimum CPU overhead.

This mode is enabled by default.

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Updated Adaptive Rx to ignore ACK packets so that queues that only handle ACK packets remain with the default moderation.

Docker Containers [Beta]

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for Docker containers to run over Virtual RoCE and InfiniBand devices using SR-IOV mode.

VF Statistics

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Performed the following virtual function statistics changes:

  • Added tx_broadcast and tx_multicast counters

  • Included RDMA statistics for existing counters

Force TTL

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for setting a global TTL value for all RC QPs and rdma-cm QPs.

Firmware Tracer

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added a new mechanism for the device’s FW/HW to log important events into the event tracing system (/sys/kernel/debug/tracing) without requiring any NVIDIA-specific tool.

Note: This feature is enabled by default.

CR-Dump

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Accelerated the original cr-dump by optimizing the reading process of the device’s CR-Space snapshot.

RoCE ICRC Error Counter

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for a new counter that exposes the amount of corrupted RoCE packets that arrive with bad Invariant Cyclic Redundancy Code (ICRC).

VST Q-in-Q

[ConnectX-4/ConnectX-4 Lx] Added support for C-tag (0x8100) VLAN insertion to tagged packets in VST mode.

Ethernet Tunneling Over IPoIB Driver (eIPoIB)

[ConnectX-4] Re-added support for eth_ipoib driver, which provides a standard Ethernet interface to be used as a Physical Interface (PIF) into the Hypervisor virtual network, and serves one or more Virtual Interfaces (VIF).

OVS Offload using ASAP2

[ConnectX-4 Lx/ConnectX-5] Added support for NVIDIA Accelerated Switching And Packet Processing (ASAP2) technology, which allows OVS offloading by handling OVS data-plane, while maintaining OVS control-plane unmodified. OVS Offload using ASAP2 technology provides significantly higher OVS performance without the associated CPU load.

For further information, refer to ASAP2 Release Notes under nvidia.com/en-us/networking/.com → Products → Software → ASAP2.

Upstream Libraries

[All HCAs] Added a repository repodata to support installing upstream libraries (based on upstream rdma-core), using he Operating System's standard package manager (yum, apt-get, etc.).

For further information, please refer to “Installing Upstream rdma-core Libraries” section in MLNX_OFED User Manual

Note: This is intended only for DPDK users.

Installation

[All HCAs] Added support for new metadata packages that only install userspace packages at a time (without any kernel packages), using the Operating System's standard package manager (yum, apt-get, etc.). These metadata packages will have the suffix “-user-only”. For example: “mlnx-ofed-all-user-only”.

Bug Fixes

See Bug Fixes.

4.3-1.0.1.0

Adaptive Interrupt Moderation

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for adaptive Tx, which optimizes the moderation values of the Tx CQs on runtime for maximum throughput with minimum CPU overhead.

This mode is enabled by default.

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Updated Adaptive Rx to ignore ACK packets so that queues that only handle ACK packets remain with the default moderation.

Docker Containers [Beta]

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for Docker containers to run over Virtual RoCE and InfiniBand devices using SR-IOV mode.

VF Statistics

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Performed the following virtual function statistics changes:

  • Added tx_broadcast and tx_multicast counters

  • Included RDMA statistics for existing counters

Force TTL

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for setting a global TTL value for all RC QPs and rdma-cm QPs.

Firmware Tracer

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added a new mechanism for the device’s FW/HW to log important events into the event tracing system (/sys/kernel/debug/tracing) without requiring any NVIDIA-specific tool.

Note: This feature is enabled by default.

CR-Dump

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Accelerated the original cr-dump by optimizing the reading process of the device’s CR-Space snapshot.

RoCE ICRC Error Counter

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for a new counter that exposes the amount of corrupted RoCE packets that arrive with bad Invariant Cyclic Redundancy Code (ICRC).

VST Q-in-Q

[ConnectX-4 andConnectX-4 Lx] Added support for C-tag (0x8100) VLAN insertion to tagged packets in VST mode.

Ethernet Tunneling Over IPoIB Driver (eIPoIB)

[ConnectX-4] Re-added support for eth_ipoib driver, which provides a standard Ethernet interface to be used as a Physical Interface (PIF) into the Hypervisor virtual network, and serves one or more Virtual Interfaces (VIF).

OVS Offload using ASAP2

[ConnectX-4 Lx/ConnectX-5] Added support for NVIDIA Accelerated Switching And Packet Processing (ASAP2) technology, which allows OVS offloading by handling OVS data-plane, while maintaining OVS control-plane unmodified. OVS Offload using ASAP2 technology provides significantly higher OVS performance without the associated CPU load.

For further information, refer to ASAP2 Release Notes under nvidia.com/en-us/networking/.com → Products → Software → ASAP2.

Upstream Libraries

[All HCAs] Added a repository repodata to support installing upstream libraries (based on upstream rdma-core), using the Operating System's standard package manager (yum, apt-get, etc.).

For further information, please refer to “Installing Upstream rdma-core Libraries” section in MLNX_OFED User Manual

Note: This is intended only for DPDK users.

Installation

[All HCAs] Added support for new metadata packages that only install userspace packages at a time (without any kernel packages), using the Operating System's standard package manager (yum, apt-get, etc.). These metadata packages will have the suffix “-user-only”. For example: “mlnx-ofed-all-user-only”.

Bug Fixes

See Bug Fixes.

4.3-1.0.1.0

Multi-Packet Work Request (WR)

[ConnectX-5] Added support for the following multi-packet WR related verbs for control path:

  • ibv_exp_query_device

  • ibv_exp_create_srq

For further information on the use of these verbs, please refer to the Verbs man page.

Coherent Accelerator Processor Interface (CAPI) [beta]

[ConnectX-5] Added support for CAPI, an interface that enables

ConnectX-5 adapter cards to provide the best performance for Power and OpenPower based platforms.

Tunneled Atomic

[ConnectX-5] Added support for RDMA atomic commands offload so that when an RDMA Write operation is issued, the payload indicates which atomic operation to perform, instead of being written to the Memory Region (MR).

Packet Pacing

[ConnectX-5] Added support for the following advanced burst control parameters:

  • max_burst_sz - for indicating the maximal burst size of packets

  • typical_pkt_sz - for improving the accuracy of the rate limiter

Erasure Coding

Offload verbs

[ConnectX-5] Added support for erasure coding offload software verbs (encode/decode/update API) supporting a number of redundancy blocks (m) greater than 4.

Virtual MAC

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Removed support for Virtual MAC feature.

RoCE LAG

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added out of box RoCE LAG support for RHEL 7.2 and RHEL 6.9.

Dropped Counters

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added a new counter rx_steer_missed_packets which provides the number of packets that were received by the NIC, yet were discarded/dropped since they did not match any flow in the NIC steering flow table.

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added the ability for SR-IOV counter rx_dropped to count the number of packets that were dropped while vport was down.

Relaxed Ordering (RSYNC)

[ConnectX-4/ ConnectX-4 Lx/ConnectX-5] Added support for RSYNC feature to ensure correct ordering of memory operations between the GPU and HCA.

Reset Flow

[mlx5 Driver] Added support for triggering software reset for firmware/driver recovery. When fatal errors occur, firmware can be reset and driver reloaded.

Striding RQ with HW Time-Stamping

[ConnectX-4 Lx/ConnectX-5] Added the option to retrieve the HW timestamp when polling for completions from a completion queue that is attached to a multi-packet RQ (Striding RQ).

4.2-1.2.0.0

DSCP Trust Mode

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for automatically setting the number of TC to 8 when the Trust state is changed to DSCP.

Receive Buffer

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added xon and xoff columns to the Receive Buffer configuration display.

4.2-1.0.0.0

Physical Address Memory Allocation

[mlx5 Driver] Added support to register a specific physical address range.

Innova IPsec Adapter Cards

[Innova IPsec EN] Added support for NVIDIA Innova IPsec EN adapter card, that provides security acceleration for IPsec-enabled networks.

Precision Time Protocol (PTP)

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for PTP feature over PKEY interfaces.

This feature allows for accurate synchronization between the distributed entities over the network. The synchronization is based on symmetric Round Trip Time (RTT) between the master and slave devices, and is enabled by default.

1PPS Time Synchronization

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for One Pulse Per Second (1PPS) over IPoIB interfaces.

Virtual MAC

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for Virtual MAC feature, which allows users to add up to 4 virtual MACs (VMACs) per VF. All traffic that is destined to the VMAC will be forwarded to the relevant VF instead of PF. All traffic going out from the VF with source MAC equal to VMAC will go to the wire also when Spoof Check is enabled.

For further information, please refer to “Virtual MAC” section in MLNX_OFED User Manual.

Receive Buffer

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added the option to change receive buffer size and cable length. Changing cable length will adjust the receive buffer's xon and xoff thresholds.

For further information, please refer to “Receive Buffer” section in MLNX_OFED User Manual.

GRE Tunnel Offloads

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for the following GRE tunnel offloads:

  • TSO over GRE tunnels

  • Checksum offloads over GRE tunnels

  • RSS spread for GRE packets

NVMEoF

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for the host side (RDMA initiator) in RedHat 7.2 and above.

Dropless Receive Queue (RQ)

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for the driver to notify the FW when SW receive queues are overloaded.

PFC Storm Prevention

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for configuring PFC stall prevention in cases where the device unexpectedly becomes unresponsive for a long period of time. PFC stall prevention disables flow control mechanisms when the device is stalled for a period longer than the default pre-configured timeout. Users now have the ability to change the default timeout by moving to auto mode.

For further information, please refer to “PFC Stall Prevention” section in MLNX_OFEDUser Manual.

Force DSCP

[ConnectX-4/ConnectX-4 Lx/ConnectX-5] Added support for this feature that enables setting a global traffic_class value for all RC QPs.

Q-in-Q

[ConnectX-5] Added support for Q-in-Q VST feature in ConnectX-5 adapter cards family.

Device Memory Programming [beta]

[ConnectX-5] Added support for on-chip memory allocation and usage in send/receive and RDMA operations at beta level.

Virtual Guest Tagging (VGT+)

[ConnectX-5] Added support for VGT+ in ConnectX-4/ConnectX-5 HCAs. This feature is s an advanced mode of Virtual Guest Tagging (VGT), in which a VF is allowed to tag its own packets as in VGT, but is still subject to an administrative VLAN trunk policy. The policy determines which VLAN IDs are allowed to be transmitted or received. The policy does not determine the user priority, which is left unchanged.

For further information, please refer to “Virtual Guest Tagging (VGT+)” section in MLNX_OFED User Manual.

Tag Matching Offload

[ConnectX-5] Added support for hardware Tag Matching offload with Dynamically Connected Transport (DCT).

Shared Memory Region (MR)

[ConnectX-3/ConnectX-3 Pro] Removed support for Shared MR feature on ConnectX-3/ConnectX-3 Pro adapter cards. As a result of this change, the following API/flags should not be used:

  • ibv_exp_reg_shared_mr

  • access shared flags for ibv_exp_reg_mr (IBV_EXP_ACCESS_SHARED_MR_XXX)

CR-DUMP

[All HCAs] Added support for the driver to take an automatic snapshot of the device’s CR-Space in cases of critical failures.

For further information, please refer to “CRDUMP” section in MLNX_OFED User Manual.

Upstream Libraries

[All HCAs] Added the option to install upstream libraries (based on upstream rdma-core) for DPDK users only.

For further information, please refer to “Installing Upstream rdma-core Libraries” section in MLNX_OFED User Manual.

DiSNI

[All HCAs] Added the option to install libdisni package as part of MLNX_OFED.

For further information, please refer to section “Installing libdisni Package” in MLNX_OFED User Manual.

Service Scripts

[All HCAs] Added the ability to disable the ‘ stop ’ option in the openibd service script, by setting ALLOW_STOP=no in

/etc/infiniband/openib.conf.

Starting from the next release, ‘ stop ’ option will be disabled by default, and in order to enable it, ALLOW_STOP should be set to ‘yes’ in the conf file, or force-stop should be run.

4.1-1.0.2.0

RoCE Diagnostics and ECN Counters

[mlx5 Driver] Added support for additional RoCE diagnostics and ECN congestion counters under /sys/class/infiniband/mlx5_0/ports/1/hw_counters/ directory.

For further information, refer to the Understanding mlx5 Linux Counters and Status Parameters Community post.

rx-fcs Offload (ethtool)

[mlx5 Driver] Added support for rx-fcs ethtool offload configuration. Normally, the FCS of the packet will be truncated by the ASIC hardware before sending it to the application socket buffer (skb). Ethtool allows to set the rx-fcs not to be truncated, but to pass it to the application for analysis.

For more information and usage, refer to Understanding ethtool rx-fcs for mlx5 Drivers Community post.

DSCP Trust Mode

[mlx5 Driver] Added the option to enable PFC based on the DSCP value. Using this solution, VLAN headers will no longer be mandatory for use.

For further information, refer to the HowTo Configure Trust Mode on NVIDIA Adapters Community post.

RoCE ECN Parameters

[mlx5 Driver] ECN parameters have been moved to the following directory: /sys/kernel/debug/mlx5/<PCI BUS>/cc_params/

For more information, refer to the HowTo Configure DCQCN (RoCE CC) for ConnectX-4 (Linux) Community post.

Flow Steering Dump Tool

[mlx5 Driver] Added support for mlx_fs_dump, which is a python tool that prints the steering rules in a readable manner.

Secure Firmware Updates

[mlx5 Driver] Firmware binaries embedded in MLNX_OFED package now support Secure Firmware Updates. This feature provides devices with the ability to verify digital signatures of new firmware binaries, in order to ensure that only officially approved versions are installed on the devices.

For further information on this feature, refer to NVIDIA Firmware Tools (MFT) User Manual.

Enhanced IPoIB

[mlx5 Driver] Added support for Enhanced IPoIB feature, which enables better utilization of features supported in ConnectX-4 adapter cards, by optimizing IPoIB data path and thus, reaching peak performance in both bandwidth and latency.

Enhanced IPoIB is enabled by default.

PeerDirect

[mlx5 Driver] Added the ability to open a device and create a context while giving PCI peer attributes such as name and ID.

For further details, refer to the PeerDirect Programming Community post.

Probed VFs

[mlx5 Driver] Added the ability to disable probed VFs on the hypervisor. For further information, see HowTo Configure and Probe VFs on mlx5 Drivers Community post.

Local Loopback

[mlx5 Driver] Improved performance by rendering Local loopback (unicast and multicast) disabled by mlx5 driver by default while local loopback is not in use. The mlx5 driver keeps track of the number of transport domains that are opened by user-space applications. If there is more than one user-space transport domain open, local loopback will automatically be enabled.

1PPS Time Synchronization (at alpha level)

[mlx5 Driver] Added support for One Pulse Per Second (1PPS), which is a time synchronization feature that allows the adapter to send or receive 1 pulse per second on a dedicated pin on the adapter card.

For further information on this feature, refer to the HowTo Test 1PPS on NVIDIA Adapters Community post.

Precision Time Protocol (PTP)

[mlx5 Driver] Added support for PTP feature in IPoIB offloaded devices.

This feature allows for accurate synchronization between the distributed entities over the network.

The synchronization is based on symmetric Round Trip Time (RTT) between the master and slave devices.

The feature is enabled by default.

For further information, refer to Running Linux PTP with ConnectX-4 Community post.

Fast Driver Unload

[mlx5 Driver] Added support for fast driver teardown in shutdown and kexec flows.

HCAs: ConnectX-5/ConnectX-5 Ex

NVMEoF Target Offload

[ConnectX-5/ConnectX-5 Ex] Added support for NVMe over fabrics (NVMEoF) offload, an implementation of the new NVMEoF standard target (server) side in hardware.

For further information on NVMEoF Target Offload, refer to HowTo Configure NVMEoF Target Offload.

MPI Tag Matching

[ConnectX-5/ConnectX-5 Ex] Added support for offloading MPI tag matching to HCA.

RDMA CM

[All HCAs] Changed the default RoCE mode on which RDMA CM runs to RoCEv2 instead of RoCEv1.

RDMA_CM session requires both the client and server sides to support the same RoCE mode. Otherwise, the client will fail to connect to the server.

For further information, refer to RDMA CM and RoCE Version Defaults Community post.

Lustre

[All HCAs] Added support for Lustre file system open-source project.

4.0-2.0.2.0

Operating Systems

Added support for Ubuntu v17.04.

4.0-2.0.0.1

PCIe Error Counting

[ConnectX-4/ConnectX-4 Lx] Added the ability to expose physical layer statistical counters to ethtool.

Multiprotocol Label Switching (MPLS) Tagged Packets Classification

[ConnectX-4/ConnectX-4 Lx] Enabled packet flow steering rules with IPv4/IPv6 classification (for raw packet QP (DPDK) only) to work on IPv4/IPv6 over MPLS (Ethertype 0x8847 and 0x8848) encapsulated packets.

RoCE VFs

[ConnectX-4/ConnectX-4 Lx] Added the ability to enable/disable RoCE on VFs.

RoCE LAG

[ConnectX-4/ConnectX-4 Lx] Added support for RoCE over LAG interface.

Standard ethtool

[ConnectX-4/ConnectX-4 Lx] Added support for flow steering and rx-all mode.

SR-IOV Bandwidth Share for Ethernet/RoCE (beta)

[ConnectX-4/ConnectX-4 Lx] Added the ability to guarantee the minimum rate of a certain VF in SR-IOV mode.

Adapter Cards

Added support for ConnectX-5 and ConnectX-5 Ex HCAs.

DSCP ConfigFS Control for RDMA-CM QPs

Added the ability to configure ToS/DSCP for RDMA-CM QPs only.

Soft RoCE (beta)

Add software implementation of RoCE that allows RoCE to run on any Ethernet network adapter whether it offers hardware acceleration or not.

NVMe over Fabrics (NVMEoF)

NVMEoF related module installation has been disabled by default. In order to enable it, add the “ --with-nvmf ” installation option to the “mlnxofedinstall” script.

NFS over RDMA (NFSoRDMA)

Removed support for NFSoRDMA drivers. These drivers are no longer provided along with the MLNX_OFED package.

Customer Affecting Change

Description

Customer Affecting Changes 5.6-1.0.3.3

Interface Renaming, PF/VF, Udev

The OFED driver no longer performs Ethernet NetDev interface renaming for PFs and VFs.

The udev rules file which implemented renaming (82-net-setup-link.rules) and its supporting script vf-net-link-name.sh are no longer installed by default.

Renaming is thus performed by underlying mechanisms -- in udev, in the kernel, and in the BIOS.

Users who wish to continue using the OFED driver renaming mechanism must add option --copy-ifnames-udev to the OFED install command.

To install these files at a later time, copy them from one of the following directories:

  • /usr/share/doc/mlnx-ofa_kernel (RHEL8 and newer)

  • /usr/share/doc/mlnx-ofa_kernel-[1-9]* (RHEL 7.X)

  • /usr/share/doc/packages/mlnx-ofa_kernel (SLES)

  • /usr/share/doc/mlnx-ofed-kernel-utils/examples (Debian-based releases)

Community Operating Systems

Starting OFED 5.6, NVIDIA is introducing a new support model for OFED used on open source community operating systems. The goal of this new support model is to enable customers to use community-maintained variants of the Linux operating system, without being limited to major distributions that NVIDIA provides primary support for. For more information, see "Installation on Community Operating Systems" section in the user manual. For a list of supported Community OSs, please see "Supported Community Operating Systems" section in the release notes.

ar_mgr Subnet Manager Plugin

ar_mgr subnet manager plugin is no longer supported.

For adaptive routing and SHIELD subnet manager configuration, please see the MLNX_OFED user manual.

Fabric Collector in UFM

Starting UFM v6.7, Fabric Collector is no longer supported. For more information, see the UFM release notes.

OVS-DPDK—Partial Offload

Starting OFED 5.6, OVS-DPDK does not support partial offload.

Customer Affecting Change

Description

5.5-1.0.3.2

Disabling RoCE While Using sysfs

When using sysfs to enable/disable roce in kernel 5.5 and up, the "devlink reload" command (using iproute2 with devlink tool) will need to be used to activate the RoCE status change.

Disable RoCE example:

1. echo 0 > /sys/bus/pci/devices/0000:08:00.0/roce_enable

2. devlink dev reload pci/0000:08:00.0

mlnx-ofa_kernel Installation

The source code for mlnx-ofa_kernel is no longer installed by default on RPM-based distributions (e.g., RHEL and SLES).

Notes:

• mlnx-ofa_kernel is included in the <> in the MLNX_OFED distributions under RPMS/ and may be manually installed from there.

• There is no change for deb-based distributions (Debian and Ubuntu). The full source is included, as before, in the package mlnx-ofed-kernel-dkms.

Software Encapsulation Compatibility

There is an encapL2 compatibility issue with accelerated reformat action creation using mlx5dv_dr API.

Using OFED 5.4 with firmware xx.32.1xxx and above or using OFED 5.5 with firmware lower than xx.32.1xxx will not allow accelerated reformat action. (Using OFED 5.4 and 5.5 with bundle firmware works properly.)

xpmem in RHEL8

Added xpmem packages in RHEL8 builds.

Python3

Starting OVS DPDK 2.15, the Python minimum required version is 3 and OVS-DPDK will not be compiled using Python 2.

Customer Affecting Change

Description

Customer Affecting Changes 5.4-3.0.3.0

CUDA, UCX, HCOLL

For UCX-CUDA and hcoll-cuda, CUDA was upgraded from version 10.2 to 11.2.

Customer Affecting Change

Description

Customer Affecting Changes 5.4-1.0.3.0

udev Rules

As of version 5.4, the driver is set so that udev rules will change the names of network interfaces created from NVIDIA adapters.

The udev rules are shipped to "/lib/udev/rules.d" and may be overridden by placing a file with the same name in "/etc/udev/rules.d".

Example: /etc/udev/rules.d/82-net-setup-link.rules

Network Interface Names, udevd

[ConnectX-4 and above] In MLNX_OFED 5.4 GA, ConnectX-4/5/6 Ethernet network interfaces are now provided with permanent names.

Prior to this release, the default interface names were provided by the kernel and udevd (ethX) remained as-is.

From this release onwards, interface names are generated via new udevd rules.

The generated names are now predictable, and the default names are automatically renamed to the predictable names by the udevd daemon, according to udev rules files installed by OFED.

The new interface names look as follows: en[P][p<bus number]sf

For example, a ConnectX device with PCI address: 0005:01:00.1 will be named enP5p1s0f1

The advantage of such a scheme for interface naming is that device whose PCI address is 0005:01:00.1 will always get the same device name since that name now depends on the host geography.

(Previously there were race conditions which sometimes caused the same physical device to get a different interface name upon reboot).

Note: Ethernet interface renaming for ConnectX-4/5/6 is performed only when eswitch is supported.

eswitch is supported on kernels starting from kernel version 4.9; for Linux distro kernels earlier than 4.9.0, eswitch is supported only on RHEL7.x and on XenServer 7.1 CU2.

Deprecated, OvS-DPDK

OvS-DPDK deprecated the command "ovs-appctl dpctl/dump-e2e-stats".

Instead, the command has been integrated into the existing command "ovs-appctl dpctl/offload-stats-show -m" (when e2e-cache is enabled).

OvS-DPDK

OvS-DPDK ct-ct-nat offloads is now disabled by default.

A new knob in OvS was introduced: "ovs-vsctl set open_vswitch . other_config:ct-action-on-nat-conns=" (default value is false).

If disabled, ct-ct-nat configurations will not be fully offloaded, improving connection offloading rate for other cases (ct and ct-nat).

If enabled, ct-ct-nat configurations will be fully offloaded but ct and ct-nat offloading will be created more slowly.

mlnxofedinstall, udev, MLNX_OFED, umad

Before version 5.4, /etc/udev/rules.d/90-ib.rules was potentially automatically edited by installation scripts in case the options --umad-dev-rw or --umad-dev-na were used. From version 5.4 and above, those changes are made in /etc/udev/rules.d/91-ib-permissions.rules which (if exist) only include the settings for those command-line options.

© Copyright 2023, NVIDIA. Last updated on Nov 27, 2023.