Bug Fixes in This Version

NVIDIA MLNX_EN Documentation v5.8-3.0.7.0 LTS

Below are the bugs fixed in this version. For a list of fixes previous version, see Bug Fixes History.

Internal Reference Number

Description

3484175

Description: The driver conducted a recovery without considering whether the device is in teardown or probe flow, causing the kernel to crash.

Keywords: Core, Recovery

Discovered in Release: 5.8- 1.0.1.1

Fixed in Release: 5.8-3.0.7.0

3440491

Description: High storage IO latency that occurred while establishing a large number of rdma_cm connections by setting the rdma_cm RoCE static rate to 0.

Keywords: RDMA, Static Rate

Discovered in Release: 5.0-1.0.0.0

Fixed in Release: 5.8-3.0.7.0

3491146

Description: "rdma res show qp" returns an unexpected "Invalid argument" error when there's a large number of QPs.

Keywords: RDMA Tool, QP

Discovered in Release: 5.4-3.1.0.0

Fixed in Release: 5.8-3.0.7.0

3428773

Description: A soft lockup causes call trace. Upgraded knem to support RHEL 8.7, to avoid this issue.

Keywords: Installation, knem, RHEL 8.7

Discovered in Release: 5.8- 1.0.1.1

Fixed in Release: 5.8-3.0.7.0

3485679

Description: In some systems with multiple ConnectX adapters, after loading the mlx5_core drivers, a system may hang in the middle of the boot process.

Keywords: Installation, Boot, ConnectX Adapters

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8-3.0.7.0

3440362

Description: Reading tir-dir or indir-tir from Sysfs causes kernel to crash.

Keywords: Sysfs

Discovered in Release: 5.8- 1.0.1.1

Fixed in Release: 5.8-3.0.7.0

3432282

Description: On some occasions, when the active channel is paired with a remote Non-Uniform Memory Access (NUMA), PCI retransmission may occur due to lack of ordering.

Added support for using Relaxed Ordering in VFs directly and in VFs assigned to QEMU. Relaxed Ordering improves performance on certain setups. Until now, it could be used only in PFs.

Keywords: Performance, VF

Discovered in Release: 5.8-2.0.3.0

Fixed in Release: 5.8-3.0.7.0

Internal Reference Number

Description

3344682

Description: If there are multiple encapsulations and not all neighbors are valid, the kernel will go into panic mode.

Keywords: ASAP2, Kernel Panic

Fixed in Release: 5.8-2.0.3.0

3350185

Description: IRQ naming was incorrect for mlx5 interfaces.

From now on, the IRQ naming on an inactive channel will be indexed from 0-(n-1).

Keywords: Core, IRQ Naming

Fixed in Release: 5.8-2.0.3.0

3333920

Description: Changing traffic class via the sysfs while modifying QPs in parallel, causes a deadlock.

Keywords: RDMA, TC, Sysfs, QP

Fixed in Release: 5.8-2.0.3.0

Internal Reference Number

Description

3253500

Description: The redundant freeing of a list item could lead to memory corruption, potentially causing the application to crash or incorrect traffic handling.

Keywords: Steering, Memory Corruption, List, Pattern/Argument

Fixed in Release: 5.8- 1.1.2.1

3214161

Description: The knem-dkms package explicitly requires GCC to build the knem driver (at install times). Under some circumstances, on Debian systems, the apt install method may result in a system that has only gcc-<version> (e.g., gcc-10) installed.

Keywords: Installation, Debian, GCC

Fixed in Release: 5.8- 1.1.2.1

3230613

Description: Installing MLNX_OFED_LINUX on an Ubuntu system with CUDA (version < 11.6) may result in an automatic installation of the ucx-cuda package that will fail with an error message in the log file ucx-cuda.debinstall.log about missing dependencies.

Keywords: Installation, Ubuntu, CUDA

Fixed in Release: 5.8- 1.1.2.1

3235521

Description: The host driver probe did not check whether there are existing SFs which are present in the device, causing the host driver to not recreate those SFs.

Keywords: Core, Scalable Functions

Fixed in Release: 5.8- 1.1.2.1

3228357

Description: If there are multiple encapsulations and not all neighbors are valid, the kernel will go into panic mode.

Keywords: ASAP2, Encapsulation

Discovered in Release: 5.5-1.0.3.2, 5.7-1.0.2.0

Fixed in Release: 5.8- 1.1.2.1

3232445

Description: When using BlueField with old kernels, multiple OVS meter do not work.

Keywords: ASAP2, BlueField, Meter, OVS, Offload

Fixed in Release: 5.8- 1.1.2.1

Internal Reference Number

Description

3234066

Description: When configuring IPsec full offload, after sending traffic for approximately 30 minutes, the traffic stops at some point and the connection gets lost.

Keywords: Steering, SMFS, Matcher Disconnect

Fixed in Release: 5.8- 1.0.1.1

3179535

Description: SMFS will try to merge flow rules with the same matching criteria (as they share the same matcher) into one multi-destination rule.

If merging fails, the matcher is disconnected by mistake.

Keywords: Steering, SMFS, Matcher Disconnect

Fixed in Release: 5.8- 1.0.1.1

3214198

Description: ibv_reg_mr for huge pages was optimized in kernel >= 5.12

Keywords: RDMA, ibv_reg_mr

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

2984134

Description: Moving to SwitchDev mode while deleting namespace over Linux-6.0 can sometimes cause a deadlock.

Keywords: RDMA, SwitchDev

Discovered in Release: 5.5-1.0.3.2

Fixed in Release: 5.8- 1.0.1.1

3106228

Description: A net device validation issue prevented running IPv6 traffic using an RDMA communication manager between two interfaces on same host with same subnet.

Keywords: RDMA, IPv6, Communication Manager

Discovered in Release: 5.6-1.0.3.3

Fixed in Release: 5.8- 1.0.1.1

3151843

Description: In mlx5dv_mkey_check manpage, there is an inaccurate description of signature error handling flow.

Keywords: RDMA, manpage

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3229002

Description: Creating and deleting MRs, caused a kernel slab cache leak issue.

Keywords: RDMA, Cache

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3236217

Description: The rdma res show cm_id command does not list all cm_ids when some of them are in LISTEN state.

Keywords: RDMA, cm_ids

Discovered in Release: 5.0-1.0.0.0

Fixed in Release: 5.8- 1.0.1.1

3146128

Description: In older kernel version, PTP was not supported over VLAN interfaces.

Keywords: NetDev, PTP, VLAN

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

2969772

Description: HW-GRO feature was blocked due to firmware limitations.

Keywords: NetDev, HW-GRO

Discovered in Release: 5.6-1.0.3.3

Fixed in Release: 5.8- 1.0.1.1

3096393

Description: STP packets failed to be transmitted.

Keywords: NetDev, STP

Discovered in Release: 5.5-1.0.3.2

Fixed in Release: 5.8- 1.0.1.1

3236984

Description: When using sysfs to read the hash function used to distribute the traffic between the TIRs (Transport Interface Receive), on occasion, the server crashed.

Keywords: NetDev, sysfs

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3126000

Description: Upgrading from version 5.6-2 to 5.7 failed.

Keywords: Installation

Discovered in Release: 5.6-2.0.9.0

Fixed in Release: 5.8- 1.0.1.1

3230524

Description: Building with KMP enabled fails due to missing packages. OFED packages will now be built with KMP disabled.

Keywords: Installation, KMP

Fixed in Release: 5.8- 1.0.1.1

3158725

Description: The script install.pl, used for (re)building kernel modules, used the name "kernel-source" as the package of the kernel-source on SLES systems.

Keywords: Installation, SLES

Discovered in Release: 5.6-1.0.3.3

Fixed in Release: 5.8- 1.0.1.1

3142212

Description: Starting firmware version xx.34.0350, a new NVCONFIG has been added to the ARM side only: MANAGEMENT_PF_MODE.

If this config is on, the user will see a PCI Function (PF) which failed to probe:

Copy
Copied!
            

[    6.837102] mlx5_core 0000:03:00.2: mlx5_cmd_check:756:(pid 206): ENABLE_HCA(0x104) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x6ca1f5) [    6.864227] mlx5_core 0000:03:00.2: mlx5_peer_pf_init:40:(pid 206): Failed to enable peer PF HCA err(-22)                                                                [    6.883453] mlx5_core 0000:03:00.2: mlx5_load:1129:(pid 206): Failed to init embedded CPU [    8.261268] mlx5_core 0000:03:00.2: init_one:1365:(pid 206): mlx5_load_one failed with error code -22                                                                     [    8.280056] mlx5_core: probe of 0000:03:00.2 failed with error -22  

Keywords: Installation

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3174928

Description: Using a 1-CPU system casues possible command flush deadlock.

Keywords: Core

Discovered in Release: 5.6-1.0.3.3

Fixed in Release: 5.8- 1.0.1.1

3228721/3228357

Description: An incorrect termination table was used with the uplink-to-uplink forward rule.

Keywords: ASAP2, eSwitch

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3220120

Description: In old kernels, when a VXLAN tunnel is set up on one OVS bridge and PF is up on another OVS bridge, traffic does not offload as expected.

Keywords: ASAP2, VXLAN

Discovered in Release: 5.4-3.0.3.0

Fixed in Release: 5.8- 1.0.1.1

© Copyright 2023, NVIDIA. Last updated on Nov 3, 2023.