Known Issues
The following is a list of general limitations and known issues of the current version of the release.
Internal Ref. Number | Issue |
3546668 | Description: On 64k page size systems, applications that open a large number of RDMA resources (UARs/QPs/CQs etc.) might face errors creating those resources due to a PCI BAR size limitation. |
Keywords: PCI BAR size limitation | |
Workaround: It is recommended to increase the BAR size via mlxconfig to allow enough space for the allocation of all the needed RDMA resources. | |
Discovered in Release: 23.10-1.1.9.0 | |
3678715 | Description: When attempting to restart drivers using openIbd service while the nvme_rdma module is loaded, the process may fail. This behavior is intentional, as unloading nvme_rdma during the driver restart can lead to connectivity issues in other applications within the setup. |
Keywords: openIbd service, nvme_rdma module | |
Workaround: Manually unload the nvme_rdma module before performing the driver restart. This can be achieved using the | |
Discovered in Release: 23.10-1.1.9.0 | |
3676223 | Description: When using kernel version 4.12 or above, it is advised to run echo 0 > /sys/bus/pci/devices/0000\:08\:00.0/sriov_drivers_autoprobe to avoid VF probing |
Keywords: VF probing | |
Workaround: N/A | |
Discovered in Release: 23.10-1.1.9.0 | |
3682658 | Description: While using the RDMA-CM user application and the AF_IB parameter, the kernel uses only the first byte of the private data to set the CMA version. In such scenario, any user data written to this byte will be overwritten. |
Keywords: RDMA-CM user application, AF_IB, private data | |
Workaround: Do not use AF_IB for application's private data. | |
Discovered in Release: 23.10-0.5.5.0 | |
3640082 | Description: A potential null pointer dereference might occur due to a missing update in the PCI subsystem code when creating the maximum number of VFs. All kernel versions lacking the following fix are impacted:"PCI: Avoid enabling PCI atomics on VFs." |
Keywords: Maximal VF number | |
Workaround: N/A | |
Discovered in Release: 23.10-0.5.5.0 | |
3653417 | Description: When offloading IPsec policy rules while in legacy mode there are two options:
2. Changing the steering mode to firmware steering will return unsupported. |
Keywords: IPsec, legacy mode | |
Workaround: Perform a devlink reload after changing the steering mode. | |
Discovered in Release: 23.10-0.5.5.0 | |
3612274 | Description: Currently, either IPsec offload or TC offload for a specific interface is allowed. The offloading TC rule to an interface will fail if an IPSec rule is already offloaded on it, and vice-versa. |
Keywords: IPsec offload, TC offload | |
Workaround: N/A | |
Discovered in Release: 23.10-0.5.5.0 | |
3596126 | Description: OVS mirroring of both egress and ingress together with modified TTL is not supported by Connectx-5 cards, and may cause packets checksum issues and errors in the dmesg command. |
Keywords: OVS mirroring, Connectx-5 | |
Workaround: N/A | |
Discovered in Release: 23.10-0.5.5.0 | |
3538463 | Description: A Kernel ABI problem in Sles15SP4 may lead to issues during driver start. This impacts kernels starting from version 5.14.21-150400.24.11.1 up to version 5.14.21-150400.24.63.1 (July 2022 to May 2023), inclusive. For more information, see https://www.suse.com/support/kb/doc/?id=000021137. |
Keywords: Kernel ABI, Sles15SP4, driver start | |
Workaround: Upgrade to a kernel version newer than 5.14.21-150400.24.63.1 (May 2023). | |
Discovered in Release: 23.10-0.5.5.0 | |
3637252 | Description: When running over REHL7.6 with excessive RDMA/RoCE workload, kernel warnings may be triggered. |
Keywords: REHL7.6, RDMA, RoCE | |
Workaround: N/A | |
Discovered in Release: 23.10-0.5.5.0 |
Internal Ref. Number | Issue |
3046655 | Description: A package manager upgrade with zypper (on an SLES system) may prompt a question about vendor change from "Mellanox Technologies" to "OpenFabrics". |
Keywords: Installation, SLES | |
Workaround: Either accept the prompted change, or add the /etc/zypp/vendors.d/mlnx_ofed file with the following content: [main] vendors = Mellanox,OpenFabrics | |
Discovered in Release: 23.07-0.5.0.0 | |
3392477 | Description: The ConnectX-7 firmware embedded in this MLNX_OFED version cannot be burnt using the MLNX_OFED installer script. |
Keywords: ConnectX-7, MLNX_OFED installer script | |
Workaround: Please download and install the dedicated firmware from the web https://network.nvidia.com/support/firmware/connectx7ib/ | |
Discovered in Release: 23.07-0.5.0.0 | |
3532756 | Description: The kernel may crash when restarting the driver while IP sec rules are configured. |
Keywords: IP sec | |
Workaround: Flush the IP sec configuration before reloading the driver: ip xfrm state fluship xfrm policy flush | |
Discovered in Release: 23.07-0.5.0.0 | |
3472979 | Description: When a large number of virtual functions are present, the output of the |
Keywords: virtual functions, ip link show | |
Workaround: N/A | |
Discovered in Release: 23.07-0.5.0.0 | |
3413938 | Description: When using the mlnx-sf script, creating and deleting an SF with the same ID number in a stressful manner may cause the setup to hang due to a race between the create and delete commands. |
Keywords: Hang; mlnx-sf | |
Workaround: N/A | |
Discovered in Release: 23.07-0.5.0.0 | |
3461572 | Description: Configuring Multiport Eswitch LAG mode can be performed only via devlink from this release onwards. The compat sysfs should not be used to configure mpesw LAG. |
Keywords: Multiport Eswitch, compat sysfs, mpesw LAG | |
Workaround: N/A | |
Discovered in Release: 23.07-0.5.0.0 | |
3464337 | Description: Simultaneously adding or removing TC rules while operating on kernel version 6.3 could potentially result in stability issues. |
Keywords: ASAP, rules, TC | |
Workaround: Make sure the following fix is part of the kernel: https://lore.kernel.org/netdev/20230504181616.2834983-3-vladbu@nvidia.com/T/ | |
Discovered in Release: 23.07-0.5.0.0 | |
3469484 | Description: Mirror and connection tracking (CT) offload actions are not supported simultaneously if the kernel version does not support hardware miss to TC actions. Thus, when performing a CT offload test, the actual number of offloaded connections may be lower than expected. |
Keywords: ASAP, CT offload | |
Workaround: Make sure to have the following offending commit in the tree: net/sched: act_ct: offload UDP NEW connections Make sure to to have https://www.spinics.net/lists/stable-commits/msg303536.html in the kernel tree to fix this issue. | |
Discovered in Release: 23.07-0.5.0.0 | |
3473331 | Description: When performing a CT offload test, the actual number of offloaded connections may be lower than expected. |
Keywords: ASAP, CT offload | |
Workaround: N/A | |
Discovered in Release: 23.07-0.5.0.0 | |
3499413 | Description: Due to the following kernel issue, under heavy load, some connections may not be offloaded, leading to performance issues: "net/sched: act_ct: offload UDP NEW connections." |
Keywords: ASAP, CT offload | |
Workaround: N/A | |
Discovered in Release: 23.07-0.5.0.0 |
Internal Ref. Number | Issue |
3360710 | Description: Configuring PFC in parallel to buffer size and prio2buffer commands may lead to misalignment between firmware and software in regards to receiving buffer ownership. |
Keywords: NetDev, PFC, Buffer Size, prio2buffer | |
Workaround: First, configure PFC on all ports, and then perform other needed QoS (i.e., buffer_size or prio2buffer) configurations accordingly. | |
Discovered in Release: 23.04-0.5.3.3 | |
3413879 | Description: OpenSM may not be started automatically if chkconfig was not installed before OpenSM is installed. Note, however, that chkconfig will fail to install if the directory (rather than symbolic link to directory) /etc/init.d already exists (e.g., from a previous installation of MLNX_OFED). |
Keywords: Installation, OpenSM, chkconfig | |
Workaround: Install chkconfig before installing MLNX_OFED. If installing it fails, make sure /etc/init.d does not exist at the time of installing it. | |
Discovered in Release: 23.04-0.5.3.3 | |
3424596 | Description: On SLES 15.4, installing MLNX_OFED using a package repository (with zypper) may trigger an error message about missing dependency for 'librte_eal.so.20.0()(64bit)' . This is because the inbox package libdpdk-20_0 is being uninstalled as it is incompatible with the MLNX_OFED rdma-core packages. |
Keywords: Installation, SLES 15.4 | |
Workaround: Uninstall the relevant packages: 'zypper uninstall libdpdk-20_0' before installing MLNX_OFED. This will also remove the inbox openvswitch package. | |
Discovered in Release: 23.04-0.5.3.3 | |
3433416 | Description: On systems that were installed with MLNX_OFED 5.9 or older and include a CUDA package (ucx-cuda / hcoll-cuda), an upgrade to MLNX_OFED 23.04 using the package manager ("yum") method will fail. This is because MLNX_OFED up to 5.9 is built with CUDA 11. MLNX_OFED 23.04 is built with CUDA 12 and those CUDA versions are incompatible. |
Keywords: Installation, CUDA, yum | |
Workaround: Remove CUDA packages included with OFED (ucx-cuda, hcoll-cuda) before upgrading. This will allow to upgrade MLNX_OFED regardless of CUDA version installed. To install them later, CUDA 12 must be installed on the system. | |
Discovered in Release: 23.04-0.5.3.3 | |
3420831 | Description: mlx-steering-dump is not supported on systems in which Python3 is not the default. |
Keywords: mlx-steering-dump, Python3 | |
Workaround: N/A | |
Discovered in Release: 23.04-0.5.3.3 | |
3351989 | Description: If the underlying persistent device name exceeds 15 characters in length, the operating system will not be able to perform renaming (i.e., the device name will remain "eth |
Keywords: Persistant Interface Names | |
Workaround: Add the --copy-ifnames-udev flag to the OFED installation command. Note that this flag is only applicable if the persistent name provided by the kernel, without the 'np | |
Discovered in Release: 23.04-0.5.3.3 |