NVIDIA MLNX_OFED Documentation Rev 5.4-3.7.7.0 LTS
Linux Kernel Upstream Release Notes v5.17

Bug Fixes in This Version

The following table provides a list of bugs fixed in this version. For a list of old fixes, please see Bug Fixes History.

Internal Reference Number

Description

3657522

Description: Fixed installation failures that occured when using Ubuntu v20.04 and Ubuntu v22.04 updated kernels.

Keywords: installation: Ubuntu

Discovered in Release: 5.4-3.7.5.0

Fixed in Release: 5.4-3.7.7.0

Internal Reference Number

Description

3359558

Description: On a rare occasion, when posting a NOP WQE to ICOSQ to trigger a hardware interrupt and start NAPI, a race condition is possible if NAPI is triggered by something else (e.g., TX) with bad timing.

Keywords: NOP, ICOSQ

Discovered in Release: 5.4-3.4.0.0

Fixed in Release: 5.4-3.7.5.0

Internal Reference Number

Description

3227951

Description: When offloading traffic in setup with VXLAN tunnel on one OVS bridge and PF on another OVS bridge in old kernels, encap/decap offload does not work properly.

Keywords: VXLAN, OVS

Discovered in Release: 5.4-3.5.8.0

Fixed in Release: 5.4-3.6.8.1

3105835

Description: In SLES15SP4, openibd service may not run after a reboot, resulting in unloaded drivers.

Keywords: Installation, SLES15SP4

Discovered in Release: 5.4-3.4.0.0

Fixed in Release: 5.4-3.6.8.1

Internal Reference Number

Description

3157841

Description: Trying to offload more than 8K connections with Firmware Steering caused call trace.

Keywords: Connection Tracking, Call Trace

Discovered in Release: 5.4-3.1.0.0

Fixed in Release: 5.4-3.5.8.0

3126857

Description: In some cases VF metering configuration failure caused a deadlock.

Keywords: VF Metering

Fixed in Release: 5.4-3.5.8.0

Internal Reference Number

Description

3051981

Description: Deallocating non-contiguous memory with non-matching deallocation API.

Keywords: NetDev, Memcationory Allocation

Fixed in Release: 5.4-3.4.0.0

3069993

Description: Using hairpin tunnel traffic, caused incorrect TC rules to be created.

Example:

Copy
Copied!
            

tunnel(tun_id=0×65,src=10.10.11.3,dst=10.10.11.2,ttl=0/0,tp_dst=4789,flags(+key)),…,in_port(vxlan_sys_4789),…, actions:set(tunnel(tun_id=0×66,src=10.10.12.2,dst=10.10.12.3,tp_dst=4789,flags(key))),vxlan_sys_4789

Keywords: ASAP2, Hairpin, OVS, SwitchDev

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.4.0.0

3070027

Description: Traffic failed to pass when OVS bridge is configured with bond interface and IP is configured over the OVS internal (bridge) port.

Keywords: ASAP2, Bond, VF LAG, OVS, Internal Port

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.4.0.0

3070039

Description: In some cases, the firmware tracer did not work with NEO-Host.

Keywords: NEO-Host, Firmware Tracer

Fixed in Release: 5.4-3.4.0.0

3084468

Description: When there is a loaded 'non-mellanox' auxiliary device on the auxiliary bus, OFED driver load may fail and cause kernel panic.

To prevent this, when the driver looks for auxiliary devices, verify that they are not from other vendors before using them accordingly.

Keywords: Auxiliary Device

Fixed in Release: 5.4-3.4.0.0

3065052

Description: On rare occasions, the application did not use any raw WQE feature and unexpectedly got wc opcode IBV_WC_DRIVER2.

Keywords: RDMA, Raw WQE, IB_WC_DRIVER2

Fixed in Release: 5.4-3.4.0.0

3071120

Description: A locking issue in steering rules deletion, at times, could cause a deadlock while inserting or deleting new rules.

Keywords: RDMA, Deadlock, Steering

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.4.0.0

3065036

Description: In the rdma-core library, the CMA device was retrieved in the wrong way when libnl is not used.

Keywords: RDMA, libnl

Discovered in Release: 5.2-1.0.4.0

Fixed in Release: 5.4-3.4.0.0

2973603

Description: Gratuitous ARP during rdma_connect was not handled properly.

Keywords: RDMA, Gratuitous ARP

Fixed in Release: 5.4-3.4.0.0

2971704

Description: When using NFS over RDMA, rpcrdma.ko created some entry files under /proc folder (e.g., “-rw-r--r-- 1 root root 0 . . .).

Keywords: NFS over RDMA

Discovered in Release: 5.4-3.1.0.0

Fixed in Release: 5.4-3.4.0.0

3058627

Description: On some occasion, a locking issue in steering rules deletion, could cause a deadlock while inserting or deleting new rules.

Keywords: RDMA, Steering Rules

Discovered in Release: 5.1-0.6.6.0

Fixed in Release: 5.4-3.4.0.0

3071110

Description: Leaving a multicast group (rdma_leave_multicast) used the wrong address and left the interface in the multicast group.

Keywords: RDMA, Multicast

Discovered in Release: 5.4-3.1.0.0

Fixed in Release: 5.4-3.4.0.0

3071097

Description: When the bond device is configured to be active-backup, there is a difference between steering that is software-only (kernel) and between when rules are also offloaded to the hardware.

Keywords: RDMA, SwitchDev

Discovered in Release: 5.4-3.1.0.0

Fixed in Release: 5.4-3.4.0.0

Internal Reference Number

Description

2736003

Description: Starting from GPU Driver version r465, nv_peer_mem was shipping in the GPU driver package under the name nvidia-peermem. Updating OFED required nvidia-peermem rebuild, otherwise it was stubbed out by the kernel.

Keywords: Installation, GPU Driver

Discovered in Release: 5.4-3.0.3.0

Fixed in Release: 5.4-3.1.0.0

2852904

Description: In version 5.4, there was some offload breakage when using OVS.

Keywords: TSO, UDP Tunnels

Fixed in Release: 5.4-3.1.0.0

2792480

Description: Running tcpdump on a bonding standby port resulted in the loss of the network.

Keywords: NetDev

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2696789

Description: Redesigned the locks around peer MR invalidation flow to avoid a potential deadlock as Peer-direct patch may cause deadlock due to lock inversion.

Notes:

  • For GPU drivers prior to r470, the user should update nv_peer_mem to the next version, probably 1.2.

  • For GPU drivers from r470 or later branches shipped with nvidia-peermem, the driver will have an option to update to newer releases which take advantage of the redesigned MLNX_OFED support.

Keywords: lock inversion, nv_peer_mem

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2739689

Description: A race that resulted in a QCE with an error, caused errors in UMR QP. To prevent the UMR QP from getting into error, we fixed the MR deregistration flow (e.g., Peer lkey which is always revoked before destroying it).

Keywords: QCE, UMR

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2691656

Description: When using bonding, ibdev2netdev would sometimes match the infiniband device to the net device bonding interface, and sometimes to the underlying Infiniband net device interface.

ibdev2netdev now skips InfiniBand net device bonding interfaces, and always matches InfiniBand devices to the underlying InfiniBand net device interfaces.

Keywords: ibdev2netdev Bonding

Discovered in Release: 5.0-1.0.0.0

Fixed in Release: 5.4-3.0.3.0

2687643

Description: Fixed Decap flows inner IP_ECN match to take into account software modification of the match value according to RFC 6040 4.2.

Keywords: decap, ASAP2, ECN, RoCE

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2691081

Description: Removed metadata from the rpm package mlnx-ofa_kernel where it claimed to Provide an older version of rdma-core. This made sense in older versions where we needed to avoid installing rdma-core. But does not make sense anymore. And caused problems to some users installing rdma-core-devel through meta-packages.

Keywords: Installation

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2727062

Description: Removed manual build-time file list generation in mlnx-tools. Only keep it for python-installed files. And avoid guessing the version of python we use and the directory to which we install.

Keywords: Installation

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2708220

Description: Removed useless build-time editing of uninstall.sh in ofed-scripts that caused the build to fail (in the case of --add-kernel-support) in some rare cases.

Keywords: Installation

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2730547

Description: Some Dell OFED Factory Installation packages were missing dependencies. Removed the package rdma-core-devel from the Dell MLNX_OFED packages as it was not needed and some of its dependencies are not included.

Keywords: Installation, Dell

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2699662

Description: MLNX_OFED build scripts fixed to also build hcoll with CUDA support on RHEL8 x86_64 platforms.

Keywords: Installation, CUDA

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2686877

Description: Changing mtu takes too long. Reduced number of calls to synchronize_net to once for all channels.

Keywords: mtu, synchronize_net

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2748328

Description: When trying to upgrade a kmp package, it conflicts and needs user help to choose whether to replace it or not. The fix avoids conflicts from /usr/lib/rpm/kernel-module-subpackage script which was changed in the builder. Building the packages with kmp enabled on the other image will cause the issue to reproduce.

Keywords: Upgrade, kmp Package

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2707023

Description: On Ubuntu and Debian systems for openvswitch-switch (in case installing using e.g. --ovs-dpdk or --with-openvswitch), the installer misses a run-time dependency of libpcap0.8.

Keywords: Installation, Ubuntu, Debian

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2563366

Description: The full path to the directory that contains the installer must not contain a space or any similar white-space character, otherwise the installer will fail.

Keywords: Installation, White Space

Discovered in Release: 5.3-1.0.0.1

Fixed in Release: 5.4-3.0.3.0

© Copyright 2023, NVIDIA. Last updated on Dec 18, 2023.