What can I help you with?
NVIDIA MLNX_OFED Documentation v24.10-3.2.5.0 LTS

Known Issues

The following is a list of general limitations and known issues of the current version of the release.

Internal Ref. Number

Issue

4087492

Description: When using hardware LAG, the hardware's hash may differ from the kernel's software bond hash. This discrepancy can cause packets from the same stream to be sent out through different ports.

Keywords: LAG

Workaround: N/A

Discovered in Release: 24.07-0.6.1.0

3990230

Description: DOCA-HOST include drivers for both Ethernet and InfiniBand protocols. By installing the kernel part of DOCA-HOST (as with MLNX_OFED before it) the InfiniBand drivers, such as ib_core and its dependencies that you may have previously installed on your host server will be replaced with the DOCA-HOST drivers.

Keywords: DOCA-HOST, kernel, inbox drivers

Workaround: To maintain the non-DOCA drivers, please use the relevant Inbox drivers https://docs.nvidia.com/networking/software/adapter-software/index.html#linux-inbox-drivers-upstream-releases

Discovered in Release: 24.07-0.6.1.0

4040187

Description: An mlx5_core probe error may occur on RHEL 9.4 after unloading the mlx5_core module.

Keywords: mlx5_core

Workaround: Upgrade to kernel 5.14.0-427.20.1.el9_4 or newer to resolve the kernel panics.

Discovered in Release: 24.07-0.6.1.0

4022803

Description: fwctl subsystem is supported on default kernels only. "add-kernel-support" will not build and install fwctl.

Keywords: fwctl, kernel

Workaround: N/A

Discovered in Release: 24.07-0.6.1.0

4001184

Description: When using QEMU versions older than 8.2, interruption loss are experienced resulting in firmware commands timeouts and undefined behavior in the driver.

Keywords: QEMU

Workaround: To avoid this issue, make sure:

Discovered in Release: 24.07-0.6.1.0

3856101

Description: In Debian 12, using dhcpcd instead of dhclient to configure the network interface (using Networkmanager) will result in wrong network interface configuration.

Keywords: dhcpcd, dhclient, Debian 12, Networkmanager

Workaround: Use dhclient to configure the network interface.

Discovered in Release: 24.04-0.6.6.0

3964215

Description: Driver might try to access privileged registers resulting in an error with syndrome.

Keywords: Unbind and bind the function or restart the driver.

Workaround: N/A

Discovered in Release: 24.04-0.6.6.0

3640907

Description: When using a kernel version lower than v5.5, application termination on PCIe Gen5 servers could lead to kernel problems, such as IOMMU call traces, because of a lack of support in the AMD IOMMU kernel component.

Keywords: PCIe Gen5, IOMMU, Call Trace

Workaround: To resolve the issue either:

  • Add kernel parameter cmdline "iommu=pt"

or

Discovered in Release: 24.04-0.6.6.0

3004304

Description: Setting NVMe num_p2p_queues module parameter value to be greater than 0, may cause a harmless warning "irq #XXX: nobody cared" with Call Trace afterwards.

Keywords: NVMe, Call Trace, num_p2p_queues

Workaround: N/A

Discovered in Release: 24.01-0.3.3.1

3735400

Description: The NVMF connect command does not work on IB setups when AR (Adaptive Routing) is enabled, since the PI (the Protection Information that is used by the NVMF) and AR are not supported simultaneously .

Keywords: NVMF connect, PI, Adaptive Routing

Workaround: Disable the AR at the opensm, or, alternatively, disable the PI at the nvme_rdma with a new module parameter.

Discovered in Release: 24.01-0.3.3.1

3774149

Description: In some cases, there could be a race condition between RDMA_WRITE and shared memory write, leading to the MPI receiving invalid data with large messages or collective operations between ranks on the same node.

Keywords: Race condition, RDMA_WRITE, shared memory write

Workaround: Set UCX_RNDV_SCHEME=get_zcopy to force using RDMA_READ protocol.

Discovered in Release: 24.01-0.3.3.1

3565433

Description: An error may occur when creating a DCI due to oversized WQEs. This is caused by a loose enforcement of the allowed max quantity of SGEs.

Keywords: DCI, SGEs

Workaround: N/A

Discovered in Release: 24.01-0.3.3.1

3732632

Description: Geneve offload does not opeate together with FLEX_PARSER.

Keywords: Geneve offload, FLEX_PARSER

Workaround: Make sure that the firmware is appropriately configured by verifying that the FLEX_PARSER_PROFILE_ENABLE mlxconfig flag is set to 0.

Discovered in Release: 24.01-0.3.3.1

3644590

Description: When working in switchdev mode, the number of XFRM IN rules that can be added is limited to 2047.

Keywords: switchdev mode, XFRM IN rules

Workaround: N/A

Discovered in Release: 24.01-0.3.3.1

3563584

Description: In case of a steering loop, the packet would loop indefinitely, causing a device hang.

Keywords: Steering loop

Workaround: Enable firmware infinite loop protection.

Discovered in Release: 24.01-0.3.3.1

Internal Ref. Number

Issue

3678715

Description: When attempting to restart drivers using openIbd service while the nvme_rdma module is loaded, the process may fail. This behavior is intentional, as unloading nvme_rdma during the driver restart can lead to connectivity issues in other applications within the setup.

Keywords: openIbd service, nvme_rdma module

Workaround: Manually unload the nvme_rdma module before performing the driver restart. This can be achieved using the modprobe -r nvme_rdma command.

Discovered in Release: 23.10-1.1.9.0

3676223

Description: When using kernel version 4.12 or above, it is advised to run

echo 0 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_drivers_autoprobe to avoid VF probing

Keywords: VF probing

Workaround: N/A

Discovered in Release: 23.10-1.1.9.0

3682658

Description: While using the RDMA-CM user application and the AF_IB parameter, the kernel uses only the first byte of the private data to set the CMA version. In such scenario, any user data written to this byte will be overwritten.

Keywords: RDMA-CM user application, AF_IB, private data

Workaround: Do not use AF_IB for application's private data.

Discovered in Release: 23.10-0.5.5.0

3640082

Description: A potential null pointer dereference might occur due to a missing update in the PCI subsystem code when creating the maximum number of VFs.

All kernel versions lacking the following fix are impacted:

"PCI: Avoid enabling PCI atomics on VFs."

Keywords: Maximal VF number

Workaround: N/A

Discovered in Release: 23.10-0.5.5.0

3653417

Description: When offloading IPsec policy rules while in legacy mode there are two options:

  1. Software steering - The software stack will handle the task, and no device offload will take place.

2. Changing the steering mode to firmware steering will return unsupported.

Keywords: IPsec, legacy mode

Workaround: Perform a devlink reload after changing the steering mode.

Discovered in Release: 23.10-0.5.5.0

3612274

Description: Currently, either IPsec offload or TC offload for a specific interface is allowed. The offloading TC rule to an interface will fail if an IPSec rule is already offloaded on it, and vice-versa.

Keywords: IPsec offload, TC offload

Workaround: N/A

Discovered in Release: 23.10-0.5.5.0

3596126

Description: OVS mirroring of both egress and ingress together with modified TTL is not supported by Connectx-5 cards, and may cause packets checksum issues and errors in the dmesg command.

Keywords: OVS mirroring, Connectx-5

Workaround: N/A

Discovered in Release: 23.10-0.5.5.0

3538463

Description: A Kernel ABI problem in Sles15SP4 may lead to issues during driver start. This impacts kernels starting from version 5.14.21-150400.24.11.1 up to version 5.14.21-150400.24.63.1 (July 2022 to May 2023), inclusive. For more information, see https://www.suse.com/support/kb/doc/?id=000021137.

Keywords: Kernel ABI, Sles15SP4, driver start

Workaround: Upgrade to a kernel version newer than 5.14.21-150400.24.63.1 (May 2023).

Discovered in Release: 23.10-0.5.5.0

3637252

Description: When running over REHL7.6 with excessive RDMA/RoCE workload, kernel warnings may be triggered.

Keywords: REHL7.6, RDMA, RoCE

Workaround: N/A

Discovered in Release: 23.10-0.5.5.0

Internal Ref. Number

Issue

3046655

Description: A package manager upgrade with zypper (on an SLES system) may prompt a question about vendor change from "Mellanox Technologies" to "OpenFabrics".

Keywords: Installation, SLES

Workaround: Either accept the prompted change, or add the /etc/zypp/vendors.d/mlnx_ofed file with the following content:

[main]

vendors = Mellanox,OpenFabrics

Discovered in Release: 23.07-0.5.0.0

3392477

Description: The ConnectX-7 firmware embedded in this MLNX_OFED version cannot be burnt using the MLNX_OFED installer script.

Keywords: ConnectX-7, MLNX_OFED installer script

Workaround: Please download and install the dedicated firmware from the web https://network.nvidia.com/support/firmware/connectx7ib/

Discovered in Release: 23.07-0.5.0.0

3532756

Description: The kernel may crash when restarting the driver while IP sec rules are configured.

Keywords: IP sec

Workaround: Flush the IP sec configuration before reloading the driver:

ip xfrm state flush

ip xfrm policy flush

Discovered in Release: 23.07-0.5.0.0

3472979

Description: When a large number of virtual functions are present, the output of the "ip link show" command may be truncated.

Keywords: virtual functions, ip link show

Workaround: N/A

Discovered in Release: 23.07-0.5.0.0

3413938

Description: When using the mlnx-sf script, creating and deleting an SF with the same ID number in a stressful manner may cause the setup to hang due to a race between the create and delete commands.

Keywords: Hang; mlnx-sf

Workaround: N/A

Discovered in Release: 23.07-0.5.0.0

3461572

Description: Configuring Multiport Eswitch LAG mode can be performed only via devlink from this release onwards. The compat sysfs should not be used to configure mpesw LAG.

Keywords: Multiport Eswitch, compat sysfs, mpesw LAG

Workaround: N/A

Discovered in Release: 23.07-0.5.0.0

3464337

Description: Simultaneously adding or removing TC rules while operating on kernel version 6.3 could potentially result in stability issues.

Keywords: ASAP, rules, TC

Workaround: Make sure the following fix is part of the kernel: https://lore.kernel.org/netdev/20230504181616.2834983-3-vladbu@nvidia.com/T/

Discovered in Release: 23.07-0.5.0.0

3469484

Description: Mirror and connection tracking (CT) offload actions are not supported simultaneously if the kernel version does not support hardware miss to TC actions. Thus, when performing a CT offload test, the actual number of offloaded connections may be lower than expected.

Keywords: ASAP, CT offload

Workaround: Make sure to have the following offending commit in the tree:

net/sched: act_ct: offload UDP NEW connections

Make sure to to have https://www.spinics.net/lists/stable-commits/msg303536.html in the kernel tree to fix this issue.

Discovered in Release: 23.07-0.5.0.0

3473331

Description: When performing a CT offload test, the actual number of offloaded connections may be lower than expected.

Keywords: ASAP, CT offload

Workaround: The fix is external to the driver, make sure to have this commit in the tree:

offending commit: net/sched: act_ct: offload UDP NEW connections

Make sure you have:

https://www.spinics.net/lists/stable-commits/msg303536.html in the kernel tree to fix this issue.

Discovered in Release: 23.07-0.5.0.0

Internal Ref. Number

Issue

3360710

Description: Configuring PFC in parallel to buffer size and prio2buffer commands may lead to misalignment between firmware and software in regards to receiving buffer ownership.

Keywords: NetDev, PFC, Buffer Size, prio2buffer

Workaround: First, configure PFC on all ports, and then perform other needed QoS (i.e., buffer_size or prio2buffer) configurations accordingly.

Discovered in Release: 23.04-0.5.3.3

3413879

Description: OpenSM may not be started automatically if chkconfig was not installed before OpenSM is installed. Note, however, that chkconfig will fail to install if the directory (rather than symbolic link to directory) /etc/init.d already exists (e.g., from a previous installation of MLNX_OFED).

Keywords: Installation, OpenSM, chkconfig

Workaround: Install chkconfig before installing MLNX_OFED. If installing it fails, make sure /etc/init.d does not exist at the time of installing it.

Discovered in Release: 23.04-0.5.3.3

3424596

Description: On SLES 15.4, installing MLNX_OFED using a package repository (with zypper) may trigger an error message about missing dependency for 'librte_eal.so.20.0()(64bit)' . This is because the inbox package libdpdk-20_0 is being uninstalled as it is incompatible with the MLNX_OFED rdma-core packages.

Keywords: Installation, SLES 15.4

Workaround: Uninstall the relevant packages: 'zypper uninstall libdpdk-20_0' before installing MLNX_OFED. This will also remove the inbox openvswitch package.

Discovered in Release: 23.04-0.5.3.3

3433416

Description: On systems that were installed with MLNX_OFED 5.9 or older and include a CUDA package (ucx-cuda / hcoll-cuda), an upgrade to MLNX_OFED 23.04 using the package manager ("yum") method will fail. This is because MLNX_OFED up to 5.9 is built with CUDA 11. MLNX_OFED 23.04 is built with CUDA 12 and those CUDA versions are incompatible.

Keywords: Installation, CUDA, yum

Workaround: Remove CUDA packages included with OFED (ucx-cuda, hcoll-cuda) before upgrading. This will allow to upgrade MLNX_OFED regardless of CUDA version installed. To install them later, CUDA 12 must be installed on the system.

Discovered in Release: 23.04-0.5.3.3

3420831

Description: mlx-steering-dump is not supported on systems in which Python3 is not the default.

Keywords: mlx-steering-dump, Python3

Workaround: N/A

Discovered in Release: 23.04-0.5.3.3

3351989

Description: If the underlying persistent device name exceeds 15 characters in length, the operating system will not be able to perform renaming (i.e., the device name will remain "eth").

Keywords: Persistant Interface Names

Workaround: Add the --copy-ifnames-udev flag to the OFED installation command. Note that this flag is only applicable if the persistent name provided by the kernel, without the 'np' suffix, is 15 characters or fewer.

Discovered in Release: 23.04-0.5.3.3

© Copyright 2025, NVIDIA. Last updated on Jul 2, 2025.