NVIDIA MLNX_OFED Documentation Rev 5.4-3.1.0.0
Linux Kernel Upstream Release Notes v5.17

Bug Fixes in This Version

The following table provides a list of bugs fixed in this MLNX_OFED version. For a list of old fixes, please see Bug Fixes History

Internal Reference Number

Description

2736003

Description: Starting from GPU Driver version r465, nv_peer_mem was shipping in the GPU driver package under the name nvidia-peermem. Updating OFED required nvidia-peermem rebuild, otherwise it was stubbed out by the kernel.

Keywords: Installation, GPU Driver

Discovered in Release: 5.4-3.0.3.0

Fixed in Release: 5.4-3.1.0.0

2852904

Description: TSO that was non-functional in 5.4 was restored in this version.

Keywords: TSO, UDP Tunnels

Fixed in Release: 5.4-3.1.0.0

2792480

Description: Running tcpdump on a bonding standby port resulted in the loss of the network.

Keywords: NetDev

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2696789

Description: Redesigned the locks around peer MR invalidation flow to avoid a potential deadlock as Peer-direct patch may cause deadlock due to lock inversion.

Notes:

  • For GPU drivers prior to r470, the user should update nv_peer_mem to the next version, probably 1.2.

  • For GPU drivers from r470 or later branches shipped with nvidia-peermem, the driver will have an option to update to newer releases which take advantage of the redesigned MLNX_OFED support.

Keywords: lock inversion, nv_peer_mem

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2739689

Description: A race that resulted in a QCE with an error, caused errors in UMR QP. To prevent the UMR QP from getting into error, we fixed the MR deregistration flow (e.g., Peer lkey which is always revoked before destroying it).

Keywords: QCE, UMR

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2691656

Description: When using bonding, ibdev2netdev would sometimes match the infiniband device to the net device bonding interface, and sometimes to the underlying Infiniband net device interface.

ibdev2netdev now skips InfiniBand net device bonding interfaces, and always matches InfiniBand devices to the underlying InfiniBand net device interfaces.

Keywords: ibdev2netdev Bonding

Discovered in Release: 5.0-1.0.0.0

Fixed in Release: 5.4-3.0.3.0

2687643

Description: Fixed Decap flows inner IP_ECN match to take into account software modification of the match value according to RFC 6040 4.2.

Keywords: decap, ASAP2, ECN, RoCE

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2691081

Description: Removed metadata from the rpm package mlnx-ofa_kernel where it claimed to Provide an older version of rdma-core. This made sense in older versions where we needed to avoid installing rdma-core. But does not make sense anymore. And caused problems to some users installing rdma-core-devel through meta-packages.

Keywords: Installation

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2727062

Description: Removed manual build-time file list generation in mlnx-tools. Only keep it for python-installed files. And avoid guessing the version of python we use and the directory to which we install.

Keywords: Installation

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2708220

Description: Removed useless build-time editing of uninstall.sh in ofed-scripts that caused the build to fail (in the case of --add-kernel-support) in some rare cases.

Keywords: Installation

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2730547

Description: Some Dell OFED Factory Installation packages were missing dependencies. Removed the package rdma-core-devel from the Dell MLNX_OFED packages as it was not needed and some of its dependencies are not included.

Keywords: Installation, Dell

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2699662

Description: MLNX_OFED build scripts fixed to also build hcoll with CUDA support on RHEL8 x86_64 platforms.

Keywords: Installation, CUDA

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2686877

Description: Changing mtu takes too long. Reduced number of calls to synchronize_net to once for all channels.

Keywords: mtu, synchronize_net

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2748328

Description: When trying to upgrade a kmp package, it conflicts and needs user help to choose whether to replace it or not. The fix avoids conflicts from /usr/lib/rpm/kernel-module-subpackage script which was changed in the builder. Building the packages with kmp enabled on the other image will cause the issue to reproduce.

Keywords: Upgrade, kmp Package

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2707023

Description: On Ubuntu and Debian systems for openvswitch-switch (in case installing using e.g. --ovs-dpdk or --with-openvswitch), the installer misses a run-time dependency of libpcap0.8.

Keywords: Installation, Ubuntu, Debian

Discovered in Release: 5.4-1.0.3.0

Fixed in Release: 5.4-3.0.3.0

2563366

Description: The full path to the directory that contains the installer must not contain a space or any similar white-space character, otherwise the installer will fail.

Keywords: Installation, White Space

Discovered in Release: 5.3-1.0.0.1

Fixed in Release: 5.4-3.0.3.0

© Copyright 2023, NVIDIA. Last updated on Oct 23, 2023.