NVIDIA MLNX_EN Documentation v5.8-4.1.5.0 LTS
NVIDIA MLNX_EN Documentation v5.8-4.1.5.0 LTS

Bug Fixes in This Version

Below are the bugs fixed in this version. For a list of fixes previous version, see Bug Fixes History.

Internal Reference Number

Description

3677957

Description: Fixed an issue that triggered a race condition when SR-IOV was enabled during a PF bring-up.

Keywords: SR-IOV, PF, Race

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3487533

Description: Fixed an issue that could cause a kernel crash when executing some reconfiguration items in parallel to reboot.

Keywords: IRQ map, kernel shutdown

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3499136

Description: Addressed a problem where the sysfs phy counters displayed outdated information.

Keywords: sysfs, PHY

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3485679

Description: Resolved a problem where the system boot process would hang when more than two Network Interface Cards were installed.

Keywords: Installation, Boot, ConnectX Adapters

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3521991

Description: Fixed an issue in SLES 15 SP4 where the openibd service failed to start automatically after system boot.

Keywords: openibd service

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3584069

Description: Fixed an issue where congestion control parameters/counters were not exposed when the device was in switchdev mode.

Keywords: Congestion Control

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3590514

Description: Fixed an issue that could cause a deadlock in case the VF group metering syfs directory was removed while reading the metering files.

Keywords: VF group metering syfs directory

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3569695

Description: Updated the NFSoRDMA driver to support Kernel v5.14.0-350.

Keywords: NFSoRDMA driver

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3625072

Description: The srp_daemon service is now disabled by default.

To enable it, start the MLNX_OFED installation process and then run the "systemctl enable srp_daemon" command.

Keywords: SRP

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3705160

Description: Updated the rule actions STE apply flow to check if the rule domain is different from the ASO CT action domain when applying the ASO CT action.

Keywords: Software Steering

Discovered in Release: 5.8-3.0.7.0

Fixed in Release: 5.8-4.1.5.0

3484175

Description: The driver conducted a recovery without considering whether the device is in teardown or probe flow, causing the kernel to crash.

Keywords: Core, Recovery

Discovered in Release: 5.8- 1.0.1.1

Fixed in Release: 5.8-3.0.7.0

3440491

Description: High storage IO latency that occurred while establishing a large number of rdma_cm connections by setting the rdma_cm RoCE static rate to 0.

Keywords: RDMA, Static Rate

Discovered in Release: 5.0-1.0.0.0

Fixed in Release: 5.8-3.0.7.0

3491146

Description: "rdma res show qp" returns an unexpected "Invalid argument" error when there's a large number of QPs.

Keywords: RDMA Tool, QP

Discovered in Release: 5.4-3.1.0.0

Fixed in Release: 5.8-3.0.7.0

3428773

Description: A soft lockup causes call trace. Upgraded knem to support RHEL 8.7, to avoid this issue.

Keywords: Installation, knem, RHEL 8.7

Discovered in Release: 5.8- 1.0.1.1

Fixed in Release: 5.8-3.0.7.0

3485679

Description: In some systems with multiple ConnectX adapters, after loading the mlx5_core drivers, a system may hang in the middle of the boot process.

Keywords: Installation, Boot, ConnectX Adapters

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8-3.0.7.0

3440362

Description: Reading tir-dir or indir-tir from Sysfs causes kernel to crash.

Keywords: Sysfs

Discovered in Release: 5.8- 1.0.1.1

Fixed in Release: 5.8-3.0.7.0

3432282

Description: On some occasions, when the active channel is paired with a remote Non-Uniform Memory Access (NUMA), PCI retransmission may occur due to lack of ordering.

Added support for using Relaxed Ordering in VFs directly and in VFs assigned to QEMU. Relaxed Ordering improves performance on certain setups. Until now, it could be used only in PFs.

Keywords: Performance, VF

Discovered in Release: 5.8-2.0.3.0

Fixed in Release: 5.8-3.0.7.0

Internal Reference Number

Description

3344682

Description: If there are multiple encapsulations and not all neighbors are valid, the kernel will go into panic mode.

Keywords: ASAP2, Kernel Panic

Fixed in Release: 5.8-2.0.3.0

3350185

Description: IRQ naming was incorrect for mlx5 interfaces.

From now on, the IRQ naming on an inactive channel will be indexed from 0-(n-1).

Keywords: Core, IRQ Naming

Fixed in Release: 5.8-2.0.3.0

3333920

Description: Changing traffic class via the sysfs while modifying QPs in parallel, causes a deadlock.

Keywords: RDMA, TC, Sysfs, QP

Fixed in Release: 5.8-2.0.3.0

Internal Reference Number

Description

3253500

Description: The redundant freeing of a list item could lead to memory corruption, potentially causing the application to crash or incorrect traffic handling.

Keywords: Steering, Memory Corruption, List, Pattern/Argument

Fixed in Release: 5.8- 1.1.2.1

3214161

Description: The knem-dkms package explicitly requires GCC to build the knem driver (at install times). Under some circumstances, on Debian systems, the apt install method may result in a system that has only gcc-<version> (e.g., gcc-10) installed.

Keywords: Installation, Debian, GCC

Fixed in Release: 5.8- 1.1.2.1

3230613

Description: Installing MLNX_OFED_LINUX on an Ubuntu system with CUDA (version < 11.6) may result in an automatic installation of the ucx-cuda package that will fail with an error message in the log file ucx-cuda.debinstall.log about missing dependencies.

Keywords: Installation, Ubuntu, CUDA

Fixed in Release: 5.8- 1.1.2.1

3235521

Description: The host driver probe did not check whether there are existing SFs which are present in the device, causing the host driver to not recreate those SFs.

Keywords: Core, Scalable Functions

Fixed in Release: 5.8- 1.1.2.1

3228357

Description: If there are multiple encapsulations and not all neighbors are valid, the kernel will go into panic mode.

Keywords: ASAP2, Encapsulation

Discovered in Release: 5.5-1.0.3.2, 5.7-1.0.2.0

Fixed in Release: 5.8- 1.1.2.1

3232445

Description: When using BlueField with old kernels, multiple OVS meter do not work.

Keywords: ASAP2, BlueField, Meter, OVS, Offload

Fixed in Release: 5.8- 1.1.2.1

Internal Reference Number

Description

3234066

Description: When configuring IPsec full offload, after sending traffic for approximately 30 minutes, the traffic stops at some point and the connection gets lost.

Keywords: Steering, SMFS, Matcher Disconnect

Fixed in Release: 5.8- 1.0.1.1

3179535

Description: SMFS will try to merge flow rules with the same matching criteria (as they share the same matcher) into one multi-destination rule.

If merging fails, the matcher is disconnected by mistake.

Keywords: Steering, SMFS, Matcher Disconnect

Fixed in Release: 5.8- 1.0.1.1

3214198

Description: ibv_reg_mr for huge pages was optimized in kernel >= 5.12

Keywords: RDMA, ibv_reg_mr

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

2984134

Description: Moving to SwitchDev mode while deleting namespace over Linux-6.0 can sometimes cause a deadlock.

Keywords: RDMA, SwitchDev

Discovered in Release: 5.5-1.0.3.2

Fixed in Release: 5.8- 1.0.1.1

3106228

Description: A net device validation issue prevented running IPv6 traffic using an RDMA communication manager between two interfaces on same host with same subnet.

Keywords: RDMA, IPv6, Communication Manager

Discovered in Release: 5.6-1.0.3.3

Fixed in Release: 5.8- 1.0.1.1

3151843

Description: In mlx5dv_mkey_check manpage, there is an inaccurate description of signature error handling flow.

Keywords: RDMA, manpage

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3229002

Description: Creating and deleting MRs, caused a kernel slab cache leak issue.

Keywords: RDMA, Cache

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3236217

Description: The rdma res show cm_id command does not list all cm_ids when some of them are in LISTEN state.

Keywords: RDMA, cm_ids

Discovered in Release: 5.0-1.0.0.0

Fixed in Release: 5.8- 1.0.1.1

3146128

Description: In older kernel version, PTP was not supported over VLAN interfaces.

Keywords: NetDev, PTP, VLAN

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

2969772

Description: HW-GRO feature was blocked due to firmware limitations.

Keywords: NetDev, HW-GRO

Discovered in Release: 5.6-1.0.3.3

Fixed in Release: 5.8- 1.0.1.1

3096393

Description: STP packets failed to be transmitted.

Keywords: NetDev, STP

Discovered in Release: 5.5-1.0.3.2

Fixed in Release: 5.8- 1.0.1.1

3236984

Description: When using sysfs to read the hash function used to distribute the traffic between the TIRs (Transport Interface Receive), on occasion, the server crashed.

Keywords: NetDev, sysfs

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3126000

Description: Upgrading from version 5.6-2 to 5.7 failed.

Keywords: Installation

Discovered in Release: 5.6-2.0.9.0

Fixed in Release: 5.8- 1.0.1.1

3230524

Description: Building with KMP enabled fails due to missing packages. OFED packages will now be built with KMP disabled.

Keywords: Installation, KMP

Fixed in Release: 5.8- 1.0.1.1

3158725

Description: The script install.pl, used for (re)building kernel modules, used the name "kernel-source" as the package of the kernel-source on SLES systems.

Keywords: Installation, SLES

Discovered in Release: 5.6-1.0.3.3

Fixed in Release: 5.8- 1.0.1.1

3142212

Description: Starting firmware version xx.34.0350, a new NVCONFIG has been added to the ARM side only: MANAGEMENT_PF_MODE.

If this config is on, the user will see a PCI Function (PF) which failed to probe:

Copy
Copied!
            

[    6.837102] mlx5_core 0000:03:00.2: mlx5_cmd_check:756:(pid 206): ENABLE_HCA(0x104) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x6ca1f5) [    6.864227] mlx5_core 0000:03:00.2: mlx5_peer_pf_init:40:(pid 206): Failed to enable peer PF HCA err(-22)                                                                [    6.883453] mlx5_core 0000:03:00.2: mlx5_load:1129:(pid 206): Failed to init embedded CPU [    8.261268] mlx5_core 0000:03:00.2: init_one:1365:(pid 206): mlx5_load_one failed with error code -22                                                                     [    8.280056] mlx5_core: probe of 0000:03:00.2 failed with error -22  

Keywords: Installation

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3174928

Description: Using a 1-CPU system casues possible command flush deadlock.

Keywords: Core

Discovered in Release: 5.6-1.0.3.3

Fixed in Release: 5.8- 1.0.1.1

3228721/3228357

Description: An incorrect termination table was used with the uplink-to-uplink forward rule.

Keywords: ASAP2, eSwitch

Discovered in Release: 5.7-1.0.2.0

Fixed in Release: 5.8- 1.0.1.1

3220120

Description: In old kernels, when a VXLAN tunnel is set up on one OVS bridge and PF is up on another OVS bridge, traffic does not offload as expected.

Keywords: ASAP2, VXLAN

Discovered in Release: 5.4-3.0.3.0

Fixed in Release: 5.8- 1.0.1.1

© Copyright 2023, NVIDIA. Last updated on Dec 28, 2023.