NVIDIA MLNX_OFED Documentation v5.6-1.0.3.3
Linux Kernel Upstream Release Notes v5.17

Known Issues

The following is a list of general limitations and known issues of the current version of the release.For the list of old known issues, please refer to NVIDIA OFED Archived Known Issues file at http://www.mellanox.com/pdf/prod_software/MLNX_OFED_Archived_Known_Issues.pdf

Internal Ref. Number

Issue

3079038

Description: When there is a loaded 'non-mellanox' auxiliary device on the auxiliary bus, OFED driver load may fail and cause kernel panic.

Workaround: N/A

Keywords: Driver Load

Discovered in Release: 5.6-1.0.3.3

3066233

Description: On SLES15 systems that have both python3 and python2 installed, rebuilding kernel modules fails with an error in the mlnx-tools package, and specifically in the mlnx-tools build log, about missing ib2ibsetup.8.

Workaround: Uninstall python2 to allow rebuilding the kernel modules (on the system that builds them).

Keywords: Installation

Discovered in Release: 5.6-1.0.3.3

2998194

Description: On some systems with many (e.g., 64) virtual functions (VFs) attached to a ConnectX interface, 'ip link' may give an error message: "Error: Buffer too small for object." This applies to both IP commands: the inbox iproute package in RHEL8.x and the mlnx-iproute2 package from MLNX_OFED.

This is known to work well and not give an error in RHEL7.x kernel regardless of what user-space package is used (including user-space from RHEL8.x).

Workaround: N/A

Keywords: NetDev, RHEL, Virtual Functions

Discovered in Release: 5.6-1.0.3.3

3045436

Description: Rebooting the host while the Arm is down may block the shutdown flow till the Arm is up.

Workaround: Restart the driver on the host side before reboot.

Keywords: Reboot, Arm

Discovered in Release: 5.6-1.0.3.3

3040350

Description:

  1. When offload is enabled, removing a physical port from ovs-dpdk bridge requires restarting OVS service. Not doing so will result in wrong configuration of datapath rules.

  2. When offload is enabled, the physical port must be attached to a bridge.

Workaround:

  1. When removing a physical port from an ovs-dpdk bridge while offload is enabled, need to restart openvswitch after reattaching it.

  2. Attach physical port to a bridge according to the desired topology.

Keywords: OVS-DPDK, Bridge, Offload

Discovered in Release: 5.6-1.0.3.3

2973726

Description: dec_ttl only work with ConnectX-6. It does not work with ConnectX-5.

Workaround: N/A

Keywords: OVS-DPDK, dec_ttl

Discovered in Release: 5.6-1.0.3.3

2971708

Description: If RoCE can be enabled or disabled via devlink, the sysfs interface for enabling or disabling RoCE should not be used.

In newer kernels, which allow RoCE to be enabled/disabled via Devlink, using sysfs to enable or disable RoCE can result in stack traces and possibly kernel crashes.

To determine if devlink can be used to enable or disable RoCE, execute the following command after starting OFED:

Copy
Copied!
            

devlink dev param show | grep roce

Enable/disable roce ONLY via devlink if you see the following output line:

Copy
Copied!
            

name enable_roce type generic

To enable or disable RoCE via devlink, perform the following which lists the PCI interfaces accessible by devlink:

Copy
Copied!
            

devlink dev show

For example:

Copy
Copied!
            

$devlink dev show pci/0000:08:00.0 pci/0000:08:00.1

Then, to enable/disable RoCE on the first PCI interface, execute the following two commands:

Copy
Copied!
            

$devlink dev param set pci/0000:08:00.0 name enable_roce value <true | false> cmode driverinit $devlink dev reload pci/0000:08:00.0

Workaround: N/A

Keywords: Enabling/Disabling RoCE

Discovered in Release: 5.6-1.0.3.3

3054413

Description: In the current release, the following OPNs/PSIDs should be manually upgraded:

MCX753106AS-HEA-N NVD0000000023

MCX75310AAS-HEA-N NVD0000000024

Workaround: N/A

Keywords: ConnectX-7, Upgrade

Discovered in Release: 5.6-1.0.3.3

2946873

Description: Moving to switchdev mode while deleting namespace may cause a deadlock.

Workaround: Unload mlx5_ib module before moving to Switchdev mode.

Keywords: ASAP2, Switchdev, Namespace

Discovered in Release: 5.6-1.0.3.3

2811957

Description: If a system is run from a network boot and is connected to the network storage through an NVIDIA ConnectX card, unloading the mlx5_core driver (such as running '/etc/init.d/openibd restart') will render the system unusable and should therefore be avoided.

Workaround: N/A

Keywords: Installation, mlx5_core

Discovered in Release: 5.6-1.0.3.3

2979243

Description: The kernel in CentOS 7.6alt (for non-x86 architectures) is different than that of RHEL 7.6alt. Some of the MLNX_OFED kernel modules that were built for the RHEL7.6alt kernel will not load on a system with Centos7.6alt kernel. If you want to install MLNX_OFED on such a system, you should use ./mlnxofedinstall --add-kernelsupport to rebuild the kernel modules for the Centos kernel.

Workaround: Use add-kernel-support.

Keywords: Installation,CentOS

Discovered in Release: 5.6-1.0.3.3

3011440

Description: In Debian 11.2, Ubuntu 21.10, and Ubuntu 22.04, attempting to install an "exact" type of metapackage (such as mlnx-ofed-all-exact or mlnx-ofed-basic-exact) may fail with an error regarding the version of mstflint.

Workaround: Install also mstflint of the exact same version (e.g., apt install mlnx-ofed-all-exact mstflint=4.16.0-1.56xxxx).

Keywords: Installation,Debian, Ubuntu, MST

Discovered in Release: 5.6-1.0.3.3

3024520

Description: The option --copy-ifnames-udev copy some files under /etc (/etc/udev/rules.d/82-net-setup-link.rules and /etc/infiniband/vf-net-link-name.sh) that are never removed--not in the case this option is not given and not upon uninstallation. Those scripts are merely examples. They are files under /etc to be maintained by the user.

Workaround: Remove the files, if needed.

Keywords: Installation

Discovered in Release: 5.6-1.0.3.3

3046601

Description: When rebuilding the kernel modules (--add-kernel-support) for some kernel versions (specifically mainline 4.14) do not unset LDFLAGS properly. Rebuilding xpmem in such a case may fail with the error such as "unrecognized option '-Wl,-z,relro'" in the xpmem build log.

Workaround: Either disable building xpmem by adding --without-xpmem to the command line, or edit the kernel Makefile to make it unset LDFLAGS:

Copy
Copied!
            

sed -i -e '/^export ARCH/iLDFLAGS :=' /lib/modules/$(uname -r)/Makefile

Note: The Makefile may be located elsewhere, such as the top-level directory of the kernel source directory.

Keywords: Installation, SLES

Discovered in Release: 5.6-1.0.3.3

3046655

Description: A package manager upgrade with zypper (on a SLES system) may prompt a question about vendor change from "Mellanox Technologies" to "OpenFabrics".

Workaround: Either accept this when prompted or add the file /etc/zypp/vendors.d/mlnx_ofed with the following content:

Copy
Copied!
            

[main] vendors = Mellanox,OpenFabrics

Keywords: Installation, SLES

Discovered in Release: 5.6-1.0.3.3

3048411

Description: After installing OFED with rebuilt kernel modules, error messages indicating that the kernel module mlx5_ib failed to load (e.g. "mlx5_ib: Unknown symbol . . .") appear. These messages could be safely ignored because the module eventually loads.

Workaround: Run the command 'dracut -f' to update the initramfs.

Keywords: Installation

Discovered in Release: 5.6-1.0.3.3

3048444

Description: OFED installation failed using yum for --add-kernel-support option (building packages without KMP enabled) if libfabric package is installed.

Workaround: Remove libfabric package before OFED installation or use installation script.

Keywords: Installation, RHEL 8.5

Discovered in Release: 5.6-1.0.3.3

3015210

Description: OVS topology where the tunnel device is over a VF and the VF representor is connected to a bond is not supported.

Workaround: N/A

Keywords: ASAP2, ConnectX-6 Dx, Tunnel Over VF, LAG, Connection Tracking

Discovered in Release: 5.6-1.0.3.3

3028300

Description: OVS metering is not support over kernel 5.17.

Workaround: N/A

Keywords: ASAP2,OVS, Meter, Kernel 5.17

Discovered in Release: 5.6-1.0.3.3

3044255

Description: Destroying mlxdevm group while SF is attached to it is not supported.

Workaround: N/A

Keywords: ASAP2, mlxdevm, QoS, Group, Scalable Functions, ConnectX-6 Dx

Discovered in Release: 5.6-1.0.3.3

2900346

Description: On Ubuntu OS, configuring different IP addresses with different subnets to both ports 0 and 1 is currently not supported. When trying to ping from port 0 on one BlueField-2 card to port 0 on the other BlueField-2 card, then both port 0 and port 1 on the receiving side send a reply to the ARP request (a.k.a, ARP flux).

Workaround: N/A

Keywords: BlueField-2, Ubuntu, ARP Flux

Discovered in Release: 5.6-1.0.3.3

3046456

Description: Switching between SwitchDev mode and legacy mode quickly on BlueField-2 can prevent the driver from loading successfully and breaks its health recovery.

Workaround: Pause 60 seconds between state-altering commands to guarantee the driver health recovery is completed successfully.

Keywords: ASAP2, BlueField-2, Health Recovery

Discovered in Release: 5.6-1.0.3.3

2934149

Description: Adding vDPA ports over ConnectX-5 devices in ovs-dpdk is not supported and will cause a crash.

Workaround: N/A

Keywords: OVS-DPDK, ConnectX-5

Discovered in Release: 5.6-1.0.3.3

2934833

Description: Running I/O traffic and toggling both physical ports status (UP/DOWN) in a stressful manner on the receiving-end machine may cause traffic loss.

Workaround: N/A

Keywords: RDMA, Port Toggle

Discovered in Release: 5.6-1.0.3.3

2901514

Description: Relaxed Ordering is not working properly on Virtual Functions.

Workaround: N/A

Keywords: Relaxed Ordering, VF

Discovered in Release: 5.6-1.0.3.3

Internal Ref. Number

Issue

2688191

Description: The minimum Tx rate limit is not supported with link speed of 1Gb/s.

Workaround: N/A

Keywords: Rate Limit, 1Gb/s

Discovered in Release: 5.4-1.0.3.0

2870299

Description: Managing SFs is possible using the iproute2 with mlxdevm tool only.

Workaround: N/A

Keywords: Scalable Functions

Discovered in Release: 5.5-1.0.3.2

2869722

Description: OFED packages were built with DKMS disabled since building OFED with DKMS failed due to a problem in the DKMS package on UOS. --dkms flag should not be used.

Workaround: N/A

Keywords: Installation, DKMS

Discovered in Release: 5.5-1.0.3.2

2870367

Description: On UOS, IPoIB PKEY may require manual bring up after driver restart.

Workaround: N/A

Keywords: Installation, IPoIB, PKEY

Discovered in Release: 5.5-1.0.3.2

2836032

Description: When using Software steering mlx5dv_dr API to create rules containing encapsulation actions in MLNX_OFED v5.5-1.x.x.x, upgrade firmware to the latest version. Otherwise, the maximum number of encapsulation actions that can be created will be limited to only 16K, and degradation for the rule insertion rate is expected compared to MLNX_OFED v5.4-.x.x.x.x.

Workaround: N/A

Keywords: Software Steering

Discovered in Release: 5.5-1.0.3.2

2851639

Description: Enabling ARFS in legacy mode and then moving to switchdev mode is not supported and may cause unwanted behavior.

Workaround: N/A

Keywords: NetDev, ARFS

Discovered in Release: 5.5-1.0.3.2

2851639

Description: nvme and iser are not enabled on UOS ARM, because of missing UOS kernel support.

Workaround: N/A

Keywords: nvme, iser, UOS ARM

Discovered in Release: 5.5-1.0.3.2

2860855

Description: Building OFED on RHEL 8.4 with kmp disabled and then installing with yum fails due to some conflicting packages.

Workaround: Remove libfabric and librpmem packages before OFED installation,or add --allowerasing option to the installation command.

Keywords: Installation, RHEL 8.4, kmp, yum

Discovered in Release: 5.5-1.0.3.2

2865983

Description: OFED packages were built with kmp disabled. Building with kmp enabled fails due to missing packages.

Workaround: N/A

Keywords: Installation, kmp

Discovered in Release: 5.5-1.0.3.2

Internal Ref. Number

Issue

2658644

Description: Only match on lower 32 bit of ct_label is supported.

Workaround: N/A

Keywords: ASAP2, Connection Tracking

Discovered in Release: 5.4-1.0.3.0

2706345

Description: Number of RQ and TIR allocation in the driver depends on total number of MSI-X vectors allocated. Total number of TIRs supported by device is 16K range. Each representor needs number of CPUs worth TIRs, upto maximum of 128.

Workaround: To use large number of VFs, set PF_NUM_PF_MSIX to a smaller value of around 32.

Keywords: ASAP2,VF, PF_NUM_PF_MSIX

Discovered in Release: 5.4-1.0.3.0

2836997

Description: An automatic test that checks a flow meter rate fluctuation stays within a fixed threshold (e.g., 10%) may fail because meter precision is dependent on multiple factors (i.e., rate and burst values and shape of the traffic).

To pick the best configuration parameters for a flow meter, perform a couple of test measurements using different values of burst size against expected traffic workload and average the results over an extended period of time (tens of minutes).

Workaround: N/A

Keywords: ASAP2,Meter Threshold

Discovered in Release: 5.4-1.0.3.0

2863456

Description: SA limit by packet count (hard and soft) are supported only on traffic originated from the ECPF. Trying to configure them on VF traffic will remove the SA when hard limit is hit, however traffic could still pass as plain text due to the tunnel offload that is used in such configuration.

Workaround: N/A

Keywords: ASAP2, IPsec Full Offload

Discovered in Release: 5.4-0.5.1.1

2657392

Description: OFED installation caused CIFS to break in RHEL8.4. A dummy module was added so that CIFS will be disabled after OFED installation in RHEL8.4.

Workaround: N/A

Keywords: Installation, RHEL8.4, CIFS

Discovered in Release: 5.4-0.5.1.1

2800993

Description: OpenMPI does not support running across different operating systems and/or CPU architectures.

Workaround: N/A

Keywords: OpenMPI

2399503

Description: O pen vSwitch is not supported on the latest operating systems containing only Python3 support.

Workaround: N/A

Keywords: Python, O pen vSwitch

2657392

Description: OFED installation caused CIFS to break in RHEL8.4. A dummy module was added so that CIFS will be disabled after OFED installation in RHEL8.4.

Workaround: N/A

Keywords: Installation, RHEL8.4, CIFS

Discovered in Release: 5.4-0.5.1.1

2782406

Description: Running yum update will upgrade kylin-release to a higher version. The version of this package is used for kylin10sp2 detection so the script will detect kylin 10 instead of kylin10sp2 and use its repository by mistake.

Workaround: Because there are no special cases for kylin10sp2, the repository that was detected with adding --add-kernel-support to the installation command can be used.

Keywords: Upgrade, kylin

Discovered in Release: 5.4-3.0.3.0

2755632

Description: On dual port cards with SR-IOV, when one port link is configured to InfiniBand and the other port link is configured to Ethernet, the Ethernet port will not be able to support VST and QinQ.

Workaround: N/A

Keywords: SR-IOV, VST, QinQ

Discovered in Release: 5.4-3.0.3.0

2780436

Description: Non-default MTU (>1500) is not supported with IPsec crypto offload and may cause packet drops.

Workaround: N/A

Keywords: IPsec, Crypto Offload, MTU

Discovered in Release: 5.4-3.0.3.0

2726021

Description: Building packages on openEuler with kmp enabled requires kernel-rpm-macros package installed. kernel-rpm-macros-30-13.oe1 does not support -p option and kernel-rpm-macros-30-18.oe1 should be installed instead.

On kylin OS, the version of kernel-rpm-macros package does not support -p option needed to support kmp, so it will stay disabled.

Workaround: N/A

Keywords: Installation, openEuler

Discovered in Release: 5.4-3.0.3.0

Internal Ref. Number

Issue

2750653

Description: Running fragmented traffic in RHEL 8.3 (4.18.0-240.el8.x86_64) may cause call trace in build_skb.

Workaround: Update to RHEL 8.3 z-stream 4.18.0-240.22.1.el8_3.x86_64.

Keywords: RHEL 8.3, Kernel Panic, Call Trace, fr

Discovered in Release: 5.4-1.0.3.0

2629375

Description: Matching on CT label is only supported when matching on lower 32 bits. Full match on all 128 bits of CT label is not supported.

Workaround: N/A

Keywords: ASAP2, Connection Tracking, Label

Discovered in Release: 5.4-1.0.3.0

2707997

Description: Installation in the package manager mode under SLES 15.x may require user-intervention if the original libibverbs is installed.

Workaround: zypper install --force-resolution mlnx-ofed-all

Keywords: Installation, libibverbs

Discovered in Release: 5.4-1.0.3.0

2708531

Description: Installation in the package manager mode under SLES 15.x may require user-intervention if the original libopenvswitch is installed.

Workaround: zypper install --force-resolution mlnx-ofed-all

Keywords: Installation

Discovered in Release: 5.4-1.0.3.0

2703043

Description: Congested TCP lock for kTLS TX device offload traffic compromises the performance.

Workaround: Disable TCP selective acknowledgement: echo 0 > /proc/sys/net/ipv4/tcp_sack

Keywords: kTLS TX

Discovered in Release: 5.4-1.0.3.0

2676405

Description: If the package interface-rename is active (on XenServer, for example), the interface renaming by the OFED will not be done to eliminate conflicts.

Workaround: N/A

Keywords: Interface Renaming

Discovered in Release: 5.4-1.0.3.0

2687943

Description: Offload of rules which redirect from VF on one PF to VF on second PF is not supported on socket-direct devices.

Workaround: N/A

Keywords: ASAP2, Socket-Direct

Discovered in Release: 5.4-1.0.3.0

2678672

Description: When disabling switchdev mode, the qdisc in tunnel device cannot be destroyed and mlx5e_stats_flower() is still called by OVS resulting in NULL pointer panic and memory leak.

Workaround: N/A

Keywords: SwitchDev, mlx5, Tunnel Traffic

Discovered in Release: 5.4-1.0.3.0

2566548

Description: On PPC systems when EEH is enabled, running fw sync reset (either by mlxfwreset with flag --sync 1 or by devlink dev reload action fw_activate), the EEHmay catch the PCI reset and take ownership on the flow. When run few times in sequence, the EEH may also decide to disable the device.

Workaround: Administrator may disable EEH before running firmware sync reset on the device.

Keywords: PPC, EEH

Discovered in Release: 5.4-1.0.3.0

2617950

Description: TX port timestamp feature is supported for kernel versions 3.15 and greater. On older kernel versions, the feature will not be supported and ptp_tx_* counters will not increment.

Workaround: N/A

Keywords: Ethtool

Discovered in Release: 5.4-1.0.3.0

2390731

Description: Ethtool does not display Port Speed advertised/capability above 100Gb/s over and below kernels 5.0, even when supported.

Workaround: N/A

Keywords: Ethtool, Port Speed

Discovered in Release: 5.4-1.0.3.0

Internal Ref. Number

Issue

2687198

Description: Activating VF/SF LAG when at least one VF/SF is still bound may lead to an internal error in the firmware.

Workaround: Make sure all VFs/SFs are unbound prior to VF/SF LAG activation/deactivation.

Keywords: VF, SF, Firmware, Binding

Discovered in Release: 5.4-1.0.3.0

Internal Ref. Number

Issue

2585575

Description: After disabling sync reset by setting enable_remote_dev_reset to false, running firmware sync reset a few times may lead to general protection fault and system may get stuck.

Workaround: N/A

Keywords: Firmware Upgrade

Discovered in Release: 5.3-1.0.0.1

2582565

Description: Conducting a firmware reset or unbinding the PF while in switchdev mode may cause a kernel crash.

Workaround: N/A

Keywords: SwitchDev, ASAP2, Unbind, Firmware Reset

Discovered in Release: 5.3-1.0.0.1

2587802

Description: PTP synchronization may be lost while using tx_port_ts private flag.

Workaround: Toggle private flag:

ethtool --set-priv-flags tx_port_ts off

ethtool --set-priv-flags tx_port_ts on

restart ptp4l application

Keywords: PTP Synchronization

Discovered in Release: 5.3-1.0.0.1

2574943

Description: When running kernel 5.8 and bellow or RHEL 8.2 and below, sampled packets do not support tunnel information.

Workaround: N/A

Keywords: ASAP2, sFLOW

Discovered in Release: 5.3-1.0.0.1

2568417

Description: Upon upgrade to version 5.3, the package manager tool will install the new packages and then remove the old packages, a depmod WARNING on "mlx5_fpga_tools" will appear. This warning can be safely ignored. mlx5_fpga_tools is a module that existed in version 5.2 and was removed in 5.3.

Workaround: N/A

Keywords: Upgrade; mlx5_fpga_tools

Discovered in Release: 5.3-1.0.0.1

2506425

Description: When installing kmod packages on EulerOS 2.0SP9 or OpenEuler 20.03, the following error appears: "modprobe: FATAL: could not get modversions of ". This error can be safely ignored. It is caused by incorrectly adding directories to a list of modules processed by /usr/sbin/weak-modules.

Workaround: N/A

Keywords: Installation; modules; kmod

Discovered in Release: 5.3-1.0.0.1

2492509

Description: When installing the driver on OpenEuler or on EulerOS 2.0SP9, rebuilding the drivers (--add-kernel-support) with the --kmp option (to create kmod packages) generates packages that are uninstallable because they have a dependency on "/sbin/depmod" that the system does not provide. This dependency is created by a buggy kmod package building tool included with the distribution.

Workaround: N/A

Keywords: add-kernel-support

Discovered in Release: 5.3-1.0.0.1

2479327

Description: On SLES 12 SP5, if the kernel was upgraded to 4.12.14-122.46, it is not possible to rebuild kernel modules (--add-kernel-support) without upgrading gcc as well to at least 4.8.5-31.23.2.

Workaround: N/A

Keywords: Upgrade; SLES 12; add-kernel-support

Discovered in Release: 5.3-1.0.0.1

2584441

Description: On SLES 12 SP5, if the kernel was upgraded to 4.12.14-122.46, it is not possible to rebuild kernel modules (--add-kernel-support) without upgrading gcc as well to at least 4.8.5-31.23.2.

Workaround: N/A

Keywords: Upgrade; SLES 12; add-kernel-support

Discovered in Release: 5.3-1.0.0.1

2460865

Description: When setting MTU to low values, such as 68 bytes, packets may fail on oversize.

Workaround: N/A

Keywords: MTU

Discovered in Release: 5.3-1.0.0.1

2383318

Description: On kernels based on RedHat 7.2, the "tx_port_ts" feature, as set by ethtool —set-priv-flags, is disabled.

Workaround: N/A

Keywords: RedHat; tx_port_ts

Discovered in Release: 5.3-1.0.0.1

2575647

Description: An OvS-DPDK crash might occur while doing live-migration for VMs that use virtio-interfaces that are accelerated using OvS-DPDK vDPA ports.

Workaround: N/A

Keywords: OvS-DPDK vDPA, Live-migration

Discovered in Release: 5.3-1.0.0.1

Internal Ref. Number

Issue

2430071

Description: After reloading devlink in IPoIB setup, the IB link may stay in initialization state and require to run OpenSM to get the IB link to active state.

Workaround: N/A

Keywords: IPoIB devlink reload

Discovered in Release: 5.2-2.2.0.0

2302786

Description: On EulerOS 2.0 SP9 systems, the kernel ABI (kABI) between the base vhulk2006 kernel and the errata vhulk2008 kernel has been changed. It is now not possible to install MLNX_OFED compiled with KMP on vhulk2006 kernel on a vhulk2008 system.

Workaround: Install MLNX_OFED with --add-kernel-support.

Keywords: EulerOS; kABI; installation; --add-kernel-support

Discovered in Release: 5.2-1.0.4.0

2398281

Description: A crash in the TLS Rx socket cleanup flow may occur due to a kernel issue where a wrong extra call to tls_dev_del is made.

Workaround: N/A

Keywords: TLS RX device offload

Discovered in Release: 5.2-1.0.4.0

2407415

Description: OpenEuler 20.03 Aarch64 with errata kernels 4.19.90-2011.6.0.0049.oe1.aarch64 and 4.19.90-2012.5.0.0054.oe1.aarch64 are incompatible with MLNX_OFED kmod-mlnx-ofa_kernel.

Workaround: Install MLNX_OFED with --add-kernel-support.

Keywords: OpenEuler; Aarch64; installation; --add-kernel-support

Discovered in Release: 5.2-1.0.4.0

2348077

Description: RDMA device name for VFs may change after resetting all VFs at once.

Workaround: Either reset interfaces one by one with a delay in between, or use a network interface naming scheme with predictable interface names, such as NAME_PCI or NAME_GUID. Copy /lib/udev/rules.d/60-rdma-persistent-naming.rules to /etc/udev/rules.d/ and edit the last line accordingly.

Note that this will change interface names.

Keywords: RDMA; VF

Discovered in Release: 5.2-1.0.4.0

2381713

Description: esp4_offload and esp6_offload modules are expected to be loaded according to the list determined by the default kernel. However, these modules cannot be loaded when working over Debian 10 with non-default custom kernel as they are not included in it.

Workaround: Either install MLNX_OFED using --add-kernel-support, or rebuild the non-default custom kernel to include these modules.

Keywords: esp4_offload; esp6_offload; kernel, Debian

Discovered in Release: 5.2-1.0.4.0

2382898

Description: On kernel 4.14, there is no traffic for UDP or TCP with payload size larger than 1398 on GENEVE IPv6 over VLAN tag interface.

Workaround: N/A

Keywords: GENEVE; stag; VLAN; UDP

Discovered in Release: 5.2-1.0.4.0

2326155

Description: When toggling the link state while running RoCE traffic, the below warning may appear in the dmesg:

__ib_cache_gid_add: unable to add gid <gid> error=-28

Workaround: N/A

Keywords: RoCE; __ib_cache_gid_add

Discovered in Release: 5.2-1.0.4.0

2329654

Description: Running XDP over an IP tunnel may fail when working with kernels as old as version 4.14.

Workaround: N/A

Keywords: XDP, Kernel

Discovered in Release: 5.2-1.0.4.0

2249156

Description: MLNX_OFED installation will remove qperf package in case it was done after qperf installation.

Workaround: Make sure to install qperf package after installing MLNX_OFED, or re-install qperf after installing MLNX_OFED.

Keywords: Installation; qperf

Discovered in Release: 5.2-1.0.4.0

2355956

Description: OFED installation requires kernel config CONFIG_DEBUG_INFO to be set.

Workaround: N/A

Keywords: Installation; CONFIG_DEBUG_INFO

Discovered in Release: 5.2-1.0.4.0

2362781

Description: Openibd may fail to unload the Inbox driver mlx5_ib on Ubuntu 18.04 PPC Boston server due to a bug in the Inbox drivers.

Workaround: N/A

Keywords: Openibd; Inbox; Ubuntu; mlx5_ib

Discovered in Release: 5.2-1.0.4.0

2367659

Description: Upgrading the MLNX_OFED version that is configured as a YUM repository may yield warning messages from depmod about unknown symbols, such as:

depmod: WARNING: /lib/modules/4.18.0-240.el8.×8664/extra/iser/ib_iser.ko needs unknown symbol ib_fmr_pool_unmap

depmod: WARNING: /lib/modules/4.18.0-240.el8.×8664/extra/srp/ib_srp.ko needs unknown symbol ib_create_qp_user

These warnings appear since the RPM packages upgrade occurs sequentially, and there is an upgrade dependency between some of the modules, which would create a state of upgrade inconsistency.

These warnings are temporary and can be ignored as eventually all modules will be upgraded, and the warnings will no longer appear.

Workaround: N/A

Keywords: YUM; RPM; symbol; depmod; ISER; SRP

Discovered in Release: 5.2-1.0.4.0

2385269

Description: The number of connections offloaded is limited to 100K when working with Kernel v5.9.

Workaround: N/A

Keywords: ASAP2; Connection Tracking; Kernel

Discovered in Release: 5.2-1.0.4.0

2393169

Description: Mirroring is not supported with Connection Tracking when the source port is a VxLAN device.

Workaround: N/A

Keywords: ASAP2; Connection Tracking; Mirroring

Discovered in Release: 5.2-1.0.4.0

2395082

Description: A call trace may take place when moving from SwitchDev mode back to Legacy mode in Kernel v5.9 due to a kernel issue in tcf_block_unbind.

Workaround: N/A

Keywords: ASAP2;SwitchDev; call trace; kernel; tcf_block_unbind

Discovered in Release: 5.2-1.0.4.0

Internal Ref. Number

Issue

2354899

Description: ODP is not supported on RHEL7.x systems when running over an ETH link layer with RoCE disabled.

Workaround: N/A

Keywords: ODP, RHEL, RoCE

Discovered in Release: 5.1-2.5.8.0

2338150

Description: Scatter to CQE feature should be disabled for the GPUDirect tests to work.

Workaround: Set the MLX5_SCATTER_TO_CQE environment variable to 0 before the ib_send_bw command. For example:

MLX5_SCATTER_TO_CQE=0 ib_send_bw -d <...>

Keywords: CQE, GPUDirect

Discovered in Release: 5.1-2.5.8.0

2295732

Description: Upgrading from legacy (mlnx-libs) to the current rdma-core based build using YUM (package manager) fails.

Workaround: To perform this upgrade, either use the installer script or uninstall the old packages and install the new packages.

Keywords: Legacy, mlnx-libs, rdma-core, installation

Discovered in Release: 5.1-2.5.8.0

2295735

Description: Upgrading from legacy (mlnx-libs) to the current rdma-core based build using the apt-get (package manager) fails.

Workaround: To perform this upgrade, either use the installer script or uninstall the old packages and install the new packages.

Keywords: Legacy, mlnx-libs, rdma-core, apt, apt-get, installation

Discovered in Release: 5.1-2.5.8.0

2248996

Description: Downgrading the firmware version for ConnectX-6 cards using "mlnx_ofed_install --fw-update-only --force-fw-update" fails.

Workaround: Manually downgrade the firmware version - please see Firmware Update Instructions.

Keywords: Firmware, ConnectX-6

Discovered in Release: 5.1-0.6.6.0

2175930

Description: When using OFED 5.1 on PPC architectures with kernels v5.5 or v5.6 and an old ethtool utility, a harmless warning call trace may appear in the dmesg due to mismatch between user space and kernel. The warning call trace mentions ethtool_notify.

Workaround: Update the ethtool utility to version 5.6 on such systems in order to avoid the call trace.

Keywords: PPC, ethtool_notify, kernel

Discovered in Release: 5.1-0.6.6.0

2198764

Description: If MLNX_OFED is installed on a Debian or Ubuntu system that is run in chroot environment, the openibd service will not be enabled. If the chroot files are being used as a base of a full system, the openibd service is left disabled.

Workaround: Currently, openibd is a sysv-init script that you can enable manually by running: update-rc.d openibd defaults

Keywords: chroot, Debian , Ubuntu, openibd

Discovered in Release: 5.1-0.6.6.0

2237134

Description: Running connection tracking (CT) with FW steering may cause CREATE_FLOW_TABLE command to fail with syndrome.

Workaround: Configure OVS to use a single handler-thread:

#ovs-vsctl set Open_vSwitch . other_config:n-handler-threads=1

Keywords: Connection tracking, ASAP, OVS, FW steering

Discovered in Release: 5.1-0.6.6.0

2239894

Description: Running OpenVSwitch offload with high traffic throughput can cause low insertion rate due to high CPU usage.

Workaround: Reduce the number of combined channels of the uplink using "ethtool -L".

Keywords: Insertion rate, ASAP2

Discovered in Release: 5.1-0.6.6.0

2240671

Description: Header rewrite action is not supported over RHEL/CentOS 7.4.

Workaround: N/A

Keywords: ASAP, header rewrite, RHEL, RedHat, CentOS, OS

Discovered in Release: 5.1-0.6.6.0

2242546

Description: Tunnel offload (encap/decap) may cause kernel panic if nf_tables module is not probed.

Workaround: Make sure to probe the nf_tables module before inserting any rule.

Keywords: Kernel v5.7, ASAP, kernel panic

Discovered in Release: 5.1-0.6.6.0

2143007

Description: IPsec packets are dropped during heavy traffic due to a bug in net/xfrm Linux Kernel.

Workaround: Make sure the Kernel is modified to apply the following patch: "xfrm: Fix double ESP trailer insertion in IPsec crypto offload".

Keywords: IPsec, xfrm

Discovered in Release: 5.1-0.6.6.0

2225952

Description: VF mirroring with TC policy skip_sw is not supported on RHEL/CentOS 7.4, 7.5 and 7.6 OSs.

Workaround: N/A

Keywords: ASAP2, Mirroring, RHEL, RedHat, OS

Discovered in Release: 5.1-0.6.6.0

2216521

Description: After upgrading MLNX_OFED from v5.0 or earlier, ibdev2netdev utility changes the installation prefix to /usr/sbin. Therefore, it cannot be found while found in the same SHELL environment.

Workaround: After installing MLNX_OFED, log out and log in again to refresh the SHELL environment.

Keywords: ibdev2netdev

Discovered in Release: 5.1-0.6.6.0

2202520

Description: Rules with VLAN push/pop, encap/decap and header rewrite actions together are not supported.

Workaround: N/A

Keywords: ASAP2, SwitchDev, VLAN push/pop, encap/decap, header rewrite

Discovered in Release: 5.1-0.6.6.0

2210752

Description: Switching from Legacy mode to SwitchDev mode and vice-versa while TC rules exist on the NIC will result in failure.

Workaround: Before attempting to switch mode, make sure to delete all TC rules on the NIC or stop OpenvSwitch.

Keywords: ASAP2, Devlink, Legacy SR-IOV

Discovered in Release: 5.1-0.6.6.0

Internal Ref. Number

Issue

2125036/2125031

Description: Upgrading the MLNX_OFED from an UPSTREAM_LIBS based version to an MLNX_LIBS based version fails unless the driver is uninstalled and then re-installed.

Workaround: Make sure to uninstall and re-install MLNX_OFED to complete the upgrade.

Keywords: Installation, UPSTREAM_LIBS, MLNX_LIBS

Discovered in Release: 5.0-2.1.8.0

2105447

Description: hns_roce warning messages will appear in the dmesg after reboot on Euler2 SP3 OSs.

Workaround: N/A

Keywords: hns_roce, dmesg, Euler

Discovered in Release: 5.0-2.1.8.0

2110321

Description: Multiple driver restarts may cause IPoIB soft lockup.

Workaround: N/A

Keywords: Driver restart, IPoIB

Discovered in Release: 5.0-2.1.8.0

2112251

Description: On kernels 4.10-4.14, when Geneve tunnel's remote endpoint is defined using IPv6, packets larger than MTU are not fragmented, resulting in no traffic sent.

Workaround: Define geneve tunnel's remote endpoint using IPv4.

Keywords: Kernel, Geneve, IPv4, IPv6, MTU, fragmentation

Discovered in Release: 5.0-2.1.8.0

2102902

Description: A kernel panic may occur over RH8.0-4.18.0-80.el8.x86_64 OS when opening kTLS offload connection due to a bug in kernel TLS stack.

Workaround: N/A

Keywords: TLS offload, mlx5e

Discovered in Release: 5.0-2.1.8.0

2111534

Description: A Kernel panic may occur over Ubuntu19.04-5.0.0-38-generic OS when opening kTLS offload connection due to a bug in the Kernel TLS stack.

Workaround: N/A

Keywords: TLS offload, mlx5e

Discovered in Release: 5.0-2.1.8.0

2035950

Description: An internal error might take place in the firmware when performing any of the following in VF LAG mode, when at least one VF of either PF is still bound/attached to a VM.

  1. Removing PF from the bond (using ifdown, ip link or any other function)

  2. Attempting to disable SR-IOV

Workaround: N/A

Keywords: VF LAG, binding, firmware, FW, PF, SR-IOV

Discovered in Release: 5.0-1.0.0.0

2044544

Description: When working with OSs with Kernel v4.10, bonding module does not allow setting MTUs larger than 1500 on a bonding interface.

Workaround: Upgrade your Kernel version to v4.11 or above.

Keywords: Bonding, MTU, Kernel

Discovered in Release: 5.0-1.0.0.0

1882932

Description: Libibverbs dependencies are removed during OFED installation, requiring manual installation of libraries that OFED does not reinstall.

Workaround: Manually install missing packages.

Keywords: libibverbs, installation

Discovered in Release: 5.0-1.0.0.0

2058535

Description: ibdev2netdev command returns duplicate devices with different ports in SwitchDev mode.

Workaround: Use /opt/mellanox/iproute2/sbin/rdma link show command instead.

Keywords: ibdev2netdev

Discovered in Release: 5.0-1.0.0.0

2072568

Description: In RHEL/CentOS 7.2 OSs, adding drop rules when act_gact is not loaded may cause a kernel crash.

Workaround: Preload all needed modules to avoid such a scenario (cls_flower, act_mirred, act_gact, act_tunnel_key and act_vlan).

Keywords: RHEL/CentOS 7.2, Kernel 4.9, call trace, ASAP

Discovered in Release: 5.0-1.0.0.0

2093698

Description: VF LAG configuration is not supported when the NUM_OF_VFS configured in mlxconfig is higher than 64.

Workaround: N/A

Keywords: VF LAG, SwitchDev mode, ASAP

Discovered in Release: 5.0-1.0.0.0

2093746

Description: Devlink health dumps are not supported on kernels lower than v5.3.

Workaround: N/A

Keywords: Devlink, health report, dump

Discovered in Release: 5.0-1.0.0.0

2000590

Description: Sending packets larger than MTU is not supported when working with OVS-DPDK.

Workaround: N/A

Keywords: MTU, OVS-DPDK

Discovered in Release: 5.0-1.0.0.0

2062900

Description: Moving VF from SwitchDev mode to Legacy mode while the representor is being used by OVS-DPDK results in a segmentation fault.

Workaround: To move VF to Legacy mode with no error, make sure to delete the ports from the OVS.

Keywords: SwitchDev, Legacy, representor, OVS-DPDK

Discovered in Release: 5.0-1.0.0.0

2075942

Description: Huge pages configuration is lost each time the server is configured.

Workaround: Re-configure the huge pages after each reboot, or configure them as a kernel parameter.

Keywords: Huge pages, reboot, OVS-DPDK

Discovered in Release: 5.0-1.0.0.0

2067012

Description: MLNX_OFED cannot be installed on Debian 9.11 OS in SwitchDev mode.

Workaround: Install OFED with the flag --add-kernel-support.

Keywords: ASAP, SwitchDev, Debian, Kernel

Discovered in Release: 5.0-1.0.0.0

2036572

Description: When using a thread domain and the lockless rdma-core ibv_post_send path, there is an additional CPU penalty due to required barriers around the device MMIO buffer that were omitted in MLNX_OFED.

Workaround: N/A

Keywords: rdma-core, write-combining, MMIO buffer

Discovered in Release: 5.0-1.0.0.0

Internal Ref. Number

Issue

-

Description: The argparse module is installed by default in Python versions =>2.7 and >=3.2. In case an older Python version is used, the argparse module is not installed by default.

Workaround: Install the argparse module manually.

Keywords: Python, MFT, argparse, installation

Discovered in Release: 4.7-3.2.9.0

1997230

Description: Running mlxfwreset or unloading mlx5_core module while contrak flows are offloaded may cause a call trace in the kernel.

Workaround: Stop OVS service before calling mlxfwreset or unloading mlx5_core module.

Keywords: Contrak, ASAP, OVS, mlxfwrest, unload

Discovered in Release: 4.7-3.2.9.0

1955352

Description: Moving 2 ports to SwitchDev mode in parallel is not supported.

Workaround: N/A

Keywords: ASAP, SwitchDev

Discovered in Release: 4.7-3.2.9.0

1979958

Description: VxLAN IPv6 offload is not supported over CentOS/RHEL v7.2 OSs.

Workaround: N/A

Keywords: Tunnel, VXLAN, ASAP, IPv6

Discovered in Release: 4.7-3.2.9.0

1991710

Description: PRIO_TAG_REQUIRED_EN configuration is not supported and may cause call trace.

Workaround: N/A

Keywords: ASAP, PRIO_TAG, mstconfig

Discovered in Release: 4.7-3.2.9.0

1967866

Description: Enabling ECMP offload requires the VFs to be unbound and VMs to be shut down.

Workaround: N/A

Keywords: ECMP, Multipath, ASAP2

Discovered in Release: 4.7-3.2.9.0

1921981

Description: On Ubuntu, Debian and RedHat 8 and above OSS, parsing the mfa2 file using the mstarchive might result in a segmentation fault.

Workaround: Use mlxarchive to parse the mfa2 file instead.

Keywords: MFT, mfa2, mstarchive, mlxarchive, Ubuntu, Debian, RedHat, operating system

Discovered in Release: 4.7-1.0.0.1

1840288

Description: MLNX_OFED does not support XDP features on RedHat 7 OS, despite the declared support by RedHat.

Workaround: N/A

Keywords: XDP, RedHat

Discovered in Release: 4.7-1.0.0.1

1821235

Description: When using mlx5dv_dr API for flow creation, for flows which execute the "encapsulation" action or "push vlan" action, metadata C registers will be reset to zero.

Workaround: Use the both actions at the end of the flow process.

Keywords: Flow steering

Discovered in Release: 4.7-1.0.0.1

1892663/1800633

Description: mlnx_tune script does not support python3 interpreter.

Workaround: Run mlnx_tune with python2 interpreter only.

Keywords: mlnx_tune, python3, python2

Discovered in Release: 4.7-1.0.0.1

Internal Ref. Number

Issue

1504785

Description: A lost interrupt issue in pass-through virtual machines may prevent the driver from loading, followed by printing managed pages errors to the dmesg.

Workaround: Restart the driver.

Keywords: VM, virtual machine

Discovered in Release: 4.6-1.0.1.1

1764415

Description: Unbinding PFs on LAG devices results in a "Failed to modify QP to RESET" error message.

Workaround: N/A

Keywords: RoCE LAG, unbind, PF, RDMA

Discovered in Release: 4.6-1.0.1.1

1806565

Description: RoCE default GIDs v1 and v2 are derived from the MAC address of the corresponding netdevice's PCI function, and they resemble the IPv6 address. However, in systems where the IPv6 link local address generated does not depend on the MAC address, RoCEv2 default GID should not be used.

Workaround: Use RoCEv2 default GID.

Keywords: RoCE

Discovered in Release: 4.6-1.0.1.1

-

Description: Aging is not functional on bond device in RHEL 7.6.

Workaround: N/A

Keywords: VF LAG, ASAP2

Discovered in Release: 4.6-1.0.1.1

1747774

Description: In VF LAG mode, outgoing traffic in load balanced mode is according to the origin ring, thus, half of the rings will be coupled with port 1 and half with port 2. All the traffic on the same ring will be sent from the same port.

Workaround: N/A

Keywords: VF LAG, ASAP2

Discovered in Release: 4.6-1.0.1.1

1753629

Description: A bonding bug found in Kernels 4.12 and 4.13 may cause a slave to become permanently stuck in BOND_LINK_FAIL state. As a result, the following message may appear in dmesg:

bond: link status down for interface eth1, disabling it in 100 ms

Workaround: N/A

Keywords: Bonding, slave

Discovered in Release: 4.6-1.0.1.1

1712068

Description: Uninstalling MLNX_OFED automatically results in the uninstallation of several libraries that are included in the MLNX_OFED package, such as InfiniBand-related libraries.

Workaround: If these libraries are required, reinstall them using the local package manager (yum/dnf).

Keywords: MLNX_OFED libraries

Discovered in Release: 4.6-1.0.1.1

-

Description: Due to changes in libraries, MFT v4.11.0 and below are not forward compatible with MLNX_OFED v4.6-1.0.0.0 and above.

Therefore, with MLNX_OFED v4.6-1.0.0.0 and above, it is recommended to use MFT v4.12.0 and above.

Workaround: N/A

Keywords: MFT compatible

Discovered in Release: 4.6-1.0.1.1

1730840

Description: On ConnectX-4 HCAs, GID index for RoCE v2 is inconsistent when toggling between enabled and disabled interface modes.

Workaround: N/A

Keywords: RoCE v2, GID

Discovered in Release: 4.6-1.0.1.1

1717428

Description: On kernels 4.10-4.14, MTUs larger than 1500 cannot be set for a GRE interface with any driver (IPv4 or IPv6).

Workaround: Upgrade your kernel to any version higher than v4.14.

Keywords: Fedora 27, gretap, ip_gre, ip_tunnel, ip6_gre, ip6_tunnel

Discovered in Release: 4.6-1.0.1.1

1748343

Description: Driver reload takes several minutes when a large number of VFs exists.

Workaround: N/A

Keywords: VF, SR-IOV

Discovered in Release: 4.6-1.0.1.1

1733974

Description: Running heavy traffic (such as 'ping flood') while bringing up and down other mlx5 interfaces may result in “INFO: rcu_preempt dectected stalls on CPUS/tasks:” call traces.

Workaround: N/A

Keywords: mlx5

Discovered in Release: 4.6-1.0.1.1

-

Description: On ConnectX-6 HCAs and above, an attempt to configure advertisement (any bitmap) will result in advertising the whole capabilities.

Workaround: N/A

Keywords: 200Gb/s, advertisement, Ethtool

Discovered in Release: 4.6-1.0.1.1

Internal Ref. Number

Issue

1699289

Description: HW LRO feature is disabled OOB, which results in increased CPU utilization on the Receive side. On ConnectX-5 adapter cards and above, this causes a bandwidth drop for a few streams.

Workaround: Make sure to enable HW LRO in the driver:

ethtool -k <intf> lro

ethtool --set-priv-flag <intf> hw_lro on

Keywords: HW LRO, ConnectX-5 and above

Discovered in Release: 4.5-1.0.1.0

1403313

Description: Attempting to allocate an excessive number of VFs per PF in operating systems with kernel versions below v4.15 might fail due to a known issue in the Kernel.

Workaround: Make sure to update the Kernel version to v4.15 or above.

Keywords: VF, PF, IOMMU, Kernel, OS

Discovered in Release: 4.5-1.0.1.0

-

Description: NEO-Host is not supported on the following OSs:

  • SLES12 SP3

  • SLES12 SP4

  • SLES15

  • Fedora 28

  • RHEL7.1

  • RHEL7.4 ALT (Pegas1.0)

  • REL 7.5

  • RHEL7.6

  • XenServer 4.9

Workaround: N/A

Keywords: NEO-Host, operating systems

Discovered in Release: 4.5-1.0.1.0

1521877

Description: On SLES 12 SP1 OSs, a kernel tracepoint issue may cause undefined behavior when inserting a kernel module with a wrong parameter.

Workaround: N/A

Keywords: mlx5 driver, SLES 12 SP1

Discovered in Release: 4.5-1.0.1.0

© Copyright 2023, NVIDIA. Last updated on Nov 27, 2023.