NVIDIA MLNX_OFED Documentation Rev 4.9-5.1.0.0 LTS
Linux Kernel Upstream Release Notes v5.17

Known Issues

The following is a list of general limitations and known issues of the various components of this Mellanox OFED for Linux release.

For the list of old known issues, please refer to Mellanox OFED Archived Known Issues file at: http://www.mellanox.com/pdf/prod_software/MLNX_OFED_Archived_Known_Issues.pdf

Internal Ref. Number

Issue

2894838

Description: Running 'ip link show' command over RHEL8.5 using ConnectX-3 with VFs will print "Truncated VFs" to the screen.

Workaround: Use the following OFED IP link command: /opt/mellanox/iproute2/sbin/ip link show

Keywords: IP Link, Virtual Functions, ConnectX-3

Discovered in Release: 4.9-4.1.7.0

2793596

Description: On Sles15Sp3, MFT restart does not work.

Workaround: Install MFT manually from https://www.mellanox.com/products/adapter-software/firmware-tools.

Keywords: MFT

Discovered in Release: 4.9-4.0.8.0

2794326

Description: When upgrading MLNX_OFED from 4.9-4 to 5.4-2 GA using Yum installation, the installation fails due to ibutils.

Workaround: Before the upgrade, remove ibutils manually (and the metapackage with it) using the following command: yum remove ibutils

Keywords: Installation, ibutils

Discovered in Release: 4.9-4.0.8.0

2753944

Description: On rare occasion, registering a device (ib_register_device()) and loading modules in parallel in this case (ib_cm) , a racing condition may occur which would stop ib_cm from loading properly.

Workaround: Add modprobe.d rules to force the ib_cm driver to load before the mlx4_ib and mlx5_ib drivers:
install mlx4_ib { /sbin/modprobe ib_cm; /sbin/modprobe -ignore-install mlx4_ib $CMDLINE_OPTS; }
install mlx5_ib { /sbin/modprobe ib_cm; /sbin/modprobe —ignore-install mlx5_ib $CMDLINE_OPTS; }

Keywords: ib_core, Racing Condition

Discovered in Release: 4.9-4.0.8.0

2636998

Description: When using Debian or Ubuntu operating systems, installing MLNX_OFED with mlnxofedinstall and then proceeding to upgrade with a package manager (apt), the mlnx-rdma-core-dkms package remains installed and fails to rebuild.

Workaround: Before upgrade, remove mlnx-rdma-rxe-dkms: dpkg --purge mlnx-rdma-rxe-dkms

Keywords: Upgrade, Debian, Ubuntu, mlnx-rdma-core-dkms

Discovered in Release: 4.9-3.1.5.0

2338121

Description: UCX will not work while running with upstream-libs if librdmacm is not installed.

Workaround: Install rdmacm or disable VMC (-x HCOLL_MCAST=^vmc).

Keywords: RDMA

Discovered in Release: 4.9-2.2.4.0

2440042

Description: Using ODP on specific hardware may cause intermittent failures (issue only reproduced on IBM POWER8 S822LC).

Workaround: If the program fail is seen, disable ODP. Or, to use ODP with ConnectX-4 and above, it is recommended to use MLNX_OFED version 5.2 and above.

Keywords: ar_mgr; dump_pr; upgrade; installation

Discovered in Release: 4.9-2.2.4.0

2432304

Description: ar_mgr and dump_pr plugin versions are not updated when upgrading the MLNX_OFED version.

Workaround: Prior to upgrading your MLNX_OFED version, make sure to uninstall ar_mgr and dump_pr subnet manager plugins. For example, on Ubuntu systems, run:

dpkg --remove mlnx-ofed-all ar-mgr dump-pr

Keywords: ar_mgr; dump_pr; upgrade; installation

Discovered in Release: 4.9-2.2.4.0

2339456

Description: OFED installation requires the --add-kernel-support flag on some of the Errata kernels of RedHat 7.8.

Workaround: N/A

Keywords: Installation, Errata, RedHat, OS

Discovered in Release: 4.9-2.2.4.0

2328653

Description: Dependency between qemu and libibverbs may cause qemu failures after OFED installation on Ubuntu v20.04 or SLES 15.2 KVM systems.

Workaround: N/A

Keywords: qemu, libibverbs, installation, OS, Ubuntu, SLES, SUSE

Discovered in Release: 4.9-2.2.4.0

2345669

Description: AliOS installation with add-kernel-support may require the installation of additional packages.

Workaround: Install the required packages.

Keywords: Installation

Discovered in Release: 4.9-2.2.4.0

2312063

Description: MKEY_BY_NAME is not supported.

Workaround: N/A

Keywords: MKEY_BY_NAME

Discovered in Release: 4.9-2.2.4.0

2046307

Description: Excessive toggling between modes (Connected and Datagram) and interface up and down may cause a crash.

Workaround: N/A

Keywords: System crash, mode change

Discovered in Release: 4.9-0.1.7.0

1550266

Description: XDP is not supported over ConnectX-3 and ConnectX-3 Pro adapter cards.

Workaround: N/A

Keywords: XDP, ConnectX-3

Discovered in Release: 4.9-0.1.7.0

2117822

Description: On ConnectX-3 and ConnectX-3 Pro adapter cards, no traffic runs between VLANs of any type over VLAN of type ctag (protocol 802.1Q).

Workaround: N/A

Keywords: ConnectX-3, VLAN

Discovered in Release: 4.9-0.1.7.0

2142218

Description: On ConnectX-3 and ConnectX-3 Pro adapter cards, driver might hang when found under the following conditions, collectively:

  • OS kernel is older than 4.10

  • Interface is down

  • CONFIG_NET_RX_BUSY_POLL parameter is set

  • netdev_ops.ndo_busy_poll is defined

Workaround: N/A

Keywords: ConnectX-3

Discovered in Release: 4.9-0.1.7.0

2156645

Description: MLNX_LIBS provider packages, such as libmlx5, cannot be installed simultaneously with ibverbs-providers distribution package when working with Ubuntu and Debian OSs.

Workaround: Before installing MLNX_OFED of type MLNX_LIBS, make sure that ibverbs-providers package is not installed.

Keywords: MLNX_LIBS, libmlx5, ibverbs-providers, Debian, Ubuntu

Discovered in Release: 4.9-0.1.7.0

2105447

Description: hns_roce warning messages will appear in the dmesg after reboot on Euler2 SP3 OSs.

Workaround: N/A

Keywords: hns_roce, dmesg, Euler

Discovered in Release: 4.9-0.1.7.0

2110321

Description: Multiple driver restarts may cause IPoIB soft lockup.

Workaround: N/A

Keywords: Driver restart, IPoIB

Discovered in Release: 4.9-0.1.7.0

2112251

Description: On kernels 4.10-4.14, when Geneve tunnel's remote endpoint is defined using IPv6, packets larger than MTU are not fragmented, resulting in no traffic sent.

Workaround: Define geneve tunnel's remote endpoint using IPv4.

Keywords: Kernel, Geneve, IPv4, IPv6, MTU, fragmentation

Discovered in Release: 4.9-0.1.7.0

2119210

Description: Multiple driver restarts may cause a stress and result in mlx5 commands check error message in the log.

Workaround: N/A

Keywords: Driver restart, syndrome, error message

Discovered in Release: 4.9-0.1.7.0

2111349

Description: Ethtool --show-fec/--get-fec are not supported over ConnectX-6 and ConnectX-6 Dx adapter cards.

Workaround: N/A

Keywords: Ethtool, ConnectX-6 Dx

Discovered in Release: 4.9-0.1.7.0

2119984

Description: IPsec crypto offloads does not work when ESN is enabled.

Workaround: N/A

Keywords: IPsec, ESN

Discovered in Release: 4.9-0.1.7.0

2102902

Description: A kernel panic may occur over RH8.0-4.18.0-80.el8.x86_64 OS when opening kTLS offload connection due to a bug in kernel TLS stack.

Workaround: N/A

Keywords: TLS offload, mlx5e

Discovered in Release: 4.9-0.1.7.0

2111534

Description: A Kernel panic may occur over Ubuntu19.04-5.0.0-38-generic OS when opening kTLS offload connection due to a bug in the Kernel TLS stack.

Workaround: N/A

Keywords: TLS offload, mlx5e

Discovered in Release: 4.9-0.1.7.0

2117845

Description: Relaxed ordering memory regions are not supported when working with CAPI. Registering memory region with relaxed ordering while CAPI enabled will result in a registration failure.

Workaround: N/A

Keywords: Relaxed ordering, memory region, MR, CAPI

Discovered in Release: 4.9-0.1.7.0

2083942

Description: The content of file /sys/class/net/<NETIF>/statistics/multicast may be out of date and may display values lower than the real values.

Workaround: Run ethtool -S <NETIF> to show the actual multicast counters and to update the content of file /sys/class/net/<NETIF>/statistics/multicast.

Keywords: Multicast counters

Discovered in Release: 4.9-0.1.7.0

2035950

Description: An internal error might take place in the firmware when performing any of the following in VF LAG mode, when at least one VF of either PF is still bound/attached to a VM.

  1. Removing PF from the bond (using ifdown, ip link or any other function)

  2. Attempting to disable SR-IOV

Workaround: N/A

Keywords: VF LAG, binding, firmware, FW, PF, SR-IOV

Discovered in Release: 4.9-0.1.7.0

2094176

Description: When running in a large scale in VF-LAG mode, bandwidth may be unstable.

Workaround: N/A

Keywords: VF LAG

Discovered in Release: 4.9-0.1.7.0

2044544

Description: When working with OSs with Kernel v4.10, bonding module does not allow setting MTUs larger than 1500 on a bonding interface.

Workaround: Upgrade your Kernel version to v4.11 or above.

Keywords: Bonding, MTU, Kernel

Discovered in Release: 4.9-0.1.7.0

1882932

Description: Libibverbs dependencies are removed during OFED installation, requiring manual installation of libraries that OFED does not reinstall.

Workaround: Manually install missing packages.

Keywords: libibverbs, installation

Discovered in Release: 4.9-0.1.7.0

2093746

Description: Devlink health dumps are not supported on kernels lower than v5.3.

Workaround: N/A

Keywords: Devlink, health report, dump

Discovered in Release: 4.9-0.1.7.0

2020260

Description: When changing the Trust mode to DSCP, there is an interval between the change taking effect in the hardware and updating the inline mode of the SQ in the driver. If any traffic is transmitted during this interval, the driver will not inline enough headers, resulting in a CQE error in the NIC.

Workaround: Set the interface down, change the trust mode, then bring the interface back up.

Copy
Copied!
            

ip link set eth0 down mlnx_qos -i eth0 --trust dscp ip link set eth0 up

Keywords: DSCP, inline, SQ, CQE

Discovered in Release: 4.9-0.1.7.0

2083427

Description: For kernels with connection tracking support, neigh update events are not supported, requiring users to have static ARPs to work with OVS and VxLAN.

Workaround: N/A

Keywords: VxLAN, VF LAG, neigh, ARP

Discovered in Release: 4.9-0.1.7.0

2043739

Description: Userspace RoCE UD QPs are not supported over distributions such as SLES11 SP4 and RedHat 6.10 for which the netlink 3 libraries (libnl-3 and libnl-route3) are not available.

Workaround: N/A

Keywords: RoCE UD, QP, SLES, RedHat, RHEL, netlink

Discovered in Release: 4.9-0.1.7.0

2067746

Description: When attaching a second slave to a bond, some bond interface GIDs might disappear.

Workaround: Re-create and re-configure the bond device.

Keywords: Bond, GID

Discovered in Release: 4.9-0.1.7.0

-

Description: The argparse module is installed by default in Python versions =>2.7 and >=3.2. In case an older Python version is used, the argparse module is not installed by default.

Workaround: Install the argparse module manually.

Keywords: Python, MFT, argparse, installation

Discovered in Release: 4.7-3.2.9.0

1979834

Description: When running MLNX_OFED on Kernel 4.10 with ConnectX-3/ConnectX-3 Pro NICs, deleting VxLAN may result in a crash.

Workaround: Upgrade the Kernel version to v4.14 to avoid the crash.

Keywords: Kernel, OS, ConnectX-3, VxLAN

Discovered in Release: 4.7-3.2.9.0

1973238

Description: ib_core unload may fail on Ubuntu 18.04.2 OS with the following error message:

"Module ib_core is in use"

Workaround: Stop ibacm.socket using the following commands:

systemctl stop ibacm.socket

systemctl disable ibacm.socket

Keywords: ib_core, Ubuntu, ibacm

Discovered in Release: 4.7-3.2.9.0

1970429

Description: With HW offloading in SR-IOV SwitchDev mode, the fragmented ICMP echo request/reply packets (with length larger than MTU) do not function properly. The correct behavior is for the fragments to miss the offloading flow and go to the slow path. However, the current behavior is as follows.

  • Ingress (to the VM): All echo request fragments miss the corresponding offloading flow, but all echo reply fragments hit the corresponding offloading flow

  • Egress (from the VM): The first fragment still hits the corresponding offloading flow, and the subsequent fragments miss the corresponding offloading flow

Workaround: N/A

Keywords: HW offloading, SR-IOV, SwitchDev, ICMP, VM, virtualization

Discovered in Release: 4.7-3.2.9.0

1969580

Description: RHEL 6.10 OS is not supported in SR-IOV mode.

Workaround: N/A

Keywords: RHEL, RedHat, OS, operating system, SR-IOV, virtualization

Discovered in Release: 4.7-3.2.9.0

1919335

Description: On SLES 11 SP4, RedHat 6.9 and 6.10 OSs, on hosts where OpenSM is running, the low-level driver's internal error reset flow will cause a kernel crash when OpenSM is killed (after the reset occurs). This is due to a bug in these kernels where opening the umad device (by OpenSM) does not take a reference count on the underlying device.

Workaround: Run OpenSM on a host with a more recent Kernel.

Keywords: SLES, RedHat, CR-Dump, OpenSM

Discovered in Release: 4.7-3.2.9.0

1893464

Description: ibacm is not tested with MLNX_OFED or its components.

Workaround: N/A

Keywords: ibacm, component

Discovered in Release: 4.7-1.0.0.1

1921981

Description: On Ubuntu, Debian and RedHat 8 and above OSS, parsing the mfa2 file using the mstarchive might result in a segmentation fault.

Workaround: Use mlxarchive to parse the mfa2 file instead.

Keywords: MFT, mfa2, mstarchive, mlxarchive, Ubuntu, Debian, RedHat, operating system

Discovered in Release: 4.7-1.0.0.1

1921799

Description: MLNX_OFED installation over SLES15 SP1 ARM OSs fails unless --add-kernel-support flag is added to the installation command.

Workaround: N/A

Keywords: SLES, installation

Discovered in Release: 4.7-1.0.0.1

1840288

Description: MLNX_OFED does not support XDP features on RedHat 7 OS, despite the declared support by RedHat.

Workaround: N/A

Keywords: XDP, RedHat

Discovered in Release: 4.7-1.0.0.1

1919335

Description: On SLES 11 SP4, RedHat 6.9 and 6.10 OSs, bringing the OpenSM down after CR-Dump results in a panic.

Workaround: N/A

Keywords: SLES, RedHat, CR-Dump, OpenSM

Discovered in Release: 4.7-1.0.0.1

1821235

Description: When using mlx5dv_dr API for flow creation, for flows which execute the "encapsulation" action or "push vlan" action, metadata C registers will be reset to zero.

Workaround: Use the both actions at the end of the flow process.

Keywords: Flow steering

Discovered in Release: 4.7-1.0.0.1

1911130

Description: When Offloaded Traffic Sniffer feature is on, the usage of "all default" flow steering rule could cause a deadlock.

Workaround: N/A

Keywords: Offloaded Traffic Sniffer, steering, deadlock

Discovered in Release: 4.7-1.0.0.1

1897199

Description: When using the RDMA statistics feature and attempting to unbind a QP from a counter, not including the counter-id as an argument in the CLI will result in a segmentation fault.

Workaround: N/A

Keywords: RDMA, QP, segfault, unbinding

Discovered in Release: 4.7-1.0.0.1

1869219

Description: On Fedora 27 OSs, reboot/shutdown operations may fail after uninstalling the MLNX_OFED package.

Workaround: N/A

Keywords: Fedora 27, uninstall, reboot, shutdown

Discovered in Release: 4.7-1.0.0.1

1892663

Description: mlnx_tune script does not support python3 interpreter.

Workaround: Run mlnx_tune with python2 interpreter only.

Keywords: mlnx_tune, python3, python2

Discovered in Release: 4.7-1.0.0.1

1341833

Description: On CoreOS, assigning a static IP address to PKeys using ifcfg configuration file option fails after restarting the driver.

Workaround: Manually run “ifdown” and then “ifup”.

Keywords: CoresOS, PKey, restart_driver

Discovered in Release: 4.6-1.0.1.1

1504785

Description: A lost interrupt issue in pass-through virtual machines may prevent the driver from loading, followed by printing managed pages errors to the dmesg.

Workaround: Restart the driver.

Keywords: VM, virtual machine

Discovered in Release: 4.6-1.0.1.1

1630228

Description: Tunnel stateless offloads are wrongly forbidden for E-Switch manager function.

Workaround: Set the stateless offloads cap to be permanently '1'.

Keywords: Stateless offloads cap

Discovered in Release: 4.6-1.0.1.1

1764415

Description: Unbinding PFs on LAG devices results in a "Failed to modify QP to RESET" error message.

Workaround: N/A

Keywords: RoCE LAG, unbind, PF, RDMA

Discovered in Release: 4.6-1.0.1.1

1769208

Description: Contrary to the standard DSCP mode setting procedure in SR-IOV mode, now, in order for this configuration to take effect, the DSCP trust mode has to be set before the VF is created, and not the other way around.

Workaround: Make sure to set the DSCP trust mode before creating the VF.

Keywords: DSCP, trust mode, VF

Discovered in Release: 4.6-1.0.1.1

1779150

Description: Upgrading the MLNX_OFED version over SLES 15 SP0 and SP1 OSs on PPCLE platforms might fail due to an isert-kmp-default issue.

Workaround: Remove the isert-kmp-default package manually

Keywords: Installation, SLES, PPCLE

Discovered in Release: 4.6-1.0.1.1

1806565

Description: RoCE default GIDs v1 and v2 are derived from the MAC address of the corresponding netdevice's PCI function, and they resemble the IPv6 address. However, in systems where the IPv6 link local address generated does not depend on the MAC address, RoCEv2 default GID should not be used.

Workaround: Use RoCEv2 default GID.

Keywords: RoCE

Discovered in Release: 4.6-1.0.1.1

1834997

Description: When working with VF Lag while the bond device is in active-active mode, traffic on both physical ports may not reach line rate.

Workaround: N/A

Keywords: VF LAG, bonding, bandwidth degradation, fairness

Discovered in Release: 4.6-1.0.1.1

1839907

Description: In mlx4 devices, enabling RX-FCS offload does not disable LRO, and vice-versa.

Workaround: Disable the RX-FCS or LRO separately.

Keywords: Frame Check Sequence (FCS), Large Receive Offload (LRO)

Discovered in Release: 4.6-1.0.1.1

1735161

Description: Innova cards do no support InfiniBand mode.

Workaround: N/A

Keywords: Innova, IB, InfiniBand

Discovered in Release: 4.6-1.0.1.1

1787667

Description: NVMe-oF driver of MLNX OFED v4.6-x.x.x.x does not function on SLES12 SP4 and SLES15 SP1 OSs, as they have a built-in NVME driver in the Linux image. Therefore, Mellanox NVME and NVME-oF drivers cannot be loaded.

For tracking purposes of this bug, see Bugzilla issue #1150850 and Bugzilla issue #1150846.

Workaround: Change the kernel configuration of NVMe-oF driver to be "=m" and recompile the kernel.

Keywords: NVME-oF, NVME, SLES

Discovered in Release: 4.6-1.0.1.1

1759593

Description: OFED installation on XenServer OSs requires using the -u flag.

Workaround: N/A

Keywords: Installation, XenServer, OS, operating system

Discovered in Release: 4.6-1.0.1.1

1753629

Description: A bonding bug found in Kernels 4.12 and 4.13 may cause a slave to become permanently stuck in BOND_LINK_FAIL state. As a result, the following message may appear in dmesg:

bond: link status down for interface eth1, disabling it in 100 ms

Workaround: N/A

Keywords: Bonding, slave

Discovered in Release: 4.6-1.0.1.1

1734102

Description: Ubuntu v16.04.05 and v16.04.05 OSs can only be used with Kernels of version 4.4.0-143 or below.

Workaround: N/A

Keywords: Ubuntu, Kernel, OS

Discovered in Release: 4.6-1.0.1.1

1712068

Description: Uninstalling MLNX_OFED automatically results in the uninstallation of several libraries that are included in the MLNX_OFED package, such as InfiniBand-related libraries.

Workaround: If these libraries are required, reinstall them using the local package manager (yum/dnf).

Keywords: MLNX_OFED libraries

Discovered in Release: 4.6-1.0.1.1

-

Description: Due to changes in libraries, MFT v4.11.0 and below are not forward compatible with MLNX_OFED v4.6-1.0.0.0 and above.

Therefore, with MLNX_OFED v4.6-1.0.0.0 and above, it is recommended to use MFT v4.12.0 and above.

Workaround: N/A

Keywords: MFT compatible

Discovered in Release: 4.6-1.0.1.1

1730840

Description: On ConnectX-4 HCAs, GID index for RoCE v2 is inconsistent when toggling between enabled and disabled interface modes.

Workaround: N/A

Keywords: RoCE v2, GID

Discovered in Release: 4.6-1.0.1.1

1731005

Description: MLNX_OFED v4.6 YUM and Zypper installations fail on RHEL8.0, SLES15.0 and PPCLE OSs.

Workaround: N/A

Keywords: YUM, Zypper, installation, RHEL, RedHat, SLES, PPCLE

Discovered in Release: 4.6-1.0.1.1

1717428

Description: On kernels 4.10-4.14, MTUs larger than 1500 cannot be set for a GRE interface with any driver (IPv4 or IPv6).

Workaround: Upgrade your kernel to any version higher than v4.14.

Keywords: Fedora 27, gretap, ip_gre, ip_tunnel, ip6_gre, ip6_tunnel

Discovered in Release: 4.6-1.0.1.1

1748343

Description: Driver reload takes several minutes when a large number of VFs exists.

Workaround: N/A

Keywords: VF, SR-IOV

Discovered in Release: 4.6-1.0.1.1

1748537

Description: Cannot set max Tx rate for VFs from the ARM.

Workaround: N/A

Keywords: Host control, max Tx rate

Discovered in Release: 4.6-1.0.1.1

1732940

Description: Software counters not working for representor net devices.

Workaround: N/A

Keywords: mlx5, counters, representors

Discovered in Release: 4.6-1.0.1.1

1733974

Description: Running heavy traffic (such as 'ping flood') while bringing up and down other mlx5 interfaces may result in “INFO: rcu_preempt dectected stalls on CPUS/tasks:” call traces.

Workaround: N/A

Keywords: mlx5

Discovered in Release: 4.6-1.0.1.1

1731939

Description: Get/Set Forward Error Correction FEC configuration is not supported on ConnectX-6 HCAs with 200Gbps speed rate.

Workaround: N/A

Keywords: Forward Error Correction, FEC, 200Gbps

Discovered in Release: 4.6-1.0.1.1

1715789

Description: Mellanox Firmware Tools (MFT) package is missing from Ubuntu v18.04.2 OS.

Workaround: Manually install MFT.

Keywords: MFT, Ubuntu, operating system

Discovered in Release: 4.6-1.0.1.1

1652864

Description: On ConnectX-3 and ConnectX-3 Pro HCAs, CR-Dump poll is not supported using sysfs commands.

Workaround: If supported in your Kernel, use the devlink tool as an alternative to sysfs to achieve CR-Dump support.

Keywords: mlx4, devlink, CR-Dump

Discovered in Release: 4.6-1.0.1.1

1699031

Description: When attempting to destroy IPoIB bonding interface on PPCLE setups, a leak of resources might occur.

Workaround: N/A

Keywords: IPoIB, bonding, PPCLE

Discovered in Release: 4.6-1.0.1.1

-

Description: On ConnectX-6 HCAs and above, an attempt to configure advertisement (any bitmap) will result in advertising the whole capabilities.

Workaround: N/A

Keywords: 200Gmbps, advertisement, Ethtool

Discovered in Release: 4.6-1.0.1.1

1699289

Description: HW LRO feature is disabled OOB, which results in increased CPU utilization on the Receive side. On ConnectX-5 adapter cards and above, this causes a bandwidth drop for a few streams.

Workaround: Make sure to enable HW LRO in the driver:

ethtool -k <intf> lro

ethtool --set-priv-flag <intf> hw_lro on

Keywords: HW LRO, ConnectX-5 and above

Discovered in Release: 4.5-1.0.1.0

1583487

Description: MPI package is not part of MLNX_OFED package in Fedora 28 OS.

Workaround: Manually install MPI package.

Keywords: MPI package, Fedora

Discovered in Release: 4.5-1.0.1.0

1403313

Description: Attempting to allocate an excessive number of VFs per PF in operating systems with kernel versions below v4.15 might fail due to a known issue in the Kernel.

Workaround: Make sure to update the Kernel version to v4.15 or above.

Keywords: VF, PF, IOMMU, Kernel, OS

Discovered in Release: 4.5-1.0.1.0

-

Description: NEO-Host is not supported on the following OSs:

  • SLES12 SP3

  • SLES12 SP4

  • SLES15

  • Fedora 28

  • RHEL7.1

  • RHEL7.4 ALT (Pegas1.0)

  • REL 7.5

  • RHEL7.6

  • XenServer 4.9

Workaround: N/A

Keywords: NEO-Host, operating systems

Discovered in Release: 4.5-1.0.1.0

1521877

Description: On SLES 12 SP1 OSs, a kernel tracepoint issue may cause undefined behavior when inserting a kernel module with a wrong parameter.

Workaround: N/A

Keywords: mlx5 driver, SLES 12 SP1

Discovered in Release: 4.5-1.0.1.0

1547200

Description: When running IPoIB connected traffic with multicasts in parallel, SKB crashes.

Workaround: N/A

Keywords: IPoIB, SKB

Discovered in Release: 4.5-1.0.1.0

1504073

Description: When using ConnectX-5 with LRO over PPC systems, the HCA might experience back pressure due to delayed PCI Write operations. In this case, bandwidth might drop from line-rate to ~35Gb/s. Packet loss or pause frames might also be observed.

Workaround: Look for an indication of PCI back pressure (“outbound_pci_stalled_wr” counter in ethtools advancing). Disabling LRO helps reduce the back pressure and its effects.

Keywords: Flow Control, LRO

Discovered in Release: 4.4-1.0.0.0

1424233

Description: On RHEL v7.3, 7.4 and 7.5 OSs, setting IPv4-IP-forwarding will turn off LRO on existing interfaces. Turning LRO back on manually using ethtool and adding a VLAN interface may cause a warning call trace.

Workaround: Make sure IPv4-IP-forwarding and LRO are not turned on at the same time.

Keywords: IPv4 forwarding, LRO

Discovered in Release: 4.4-1.0.0.0

1418447

Description: When working in IPoIB ULP (non-enhanced) mode, IPv6 may disappear in case ring size is changed dynamically (while the driver is running).

Workaround: There are three workarounds for this issue:

  • Perform static configuration of ring size instead of dynamic configuration

  • In case you have run dynamic configuration, run ifdown ifup afterwards

  • On supported kernels, enable keep_addr_on_down IPv6 sysfs parameter before configuring the ring size dynamically

Keywords: IPoIB, ULP mode, ring size

Discovered in Release: 4.4-1.0.0.0

1442507

Description: Retpoline support in GCC causes an increase in CPU utilization, which results in IP forwarding’s 15% performance drop.

Workaround: N/A

Keywords: Retpoline, GCC, CPU, IP forwarding, Spectre attack

Discovered in Release: 4.4-1.0.0.0

1417414

Description: When working with old kernel versions that do not include the unregister_netdevice_notifier function fix (introduced in “net: In unregister_netdevice_notifier unregister the netdevices” commit), reloading ib_ipoib module using modprobe will fail with the following error message: “ Cannot allocate memory ”.

Workaround: Reload the driver instead of modprobe by running:

/etc/init.d/openibd restart

Keywords: IPoIB

Discovered in Release: 4.4-1.0.0.0

1400381

Description: On SLES 11 SP3 PPC64 OSs, a memory allocation issue may prevent the interface from loading after reboot, resulting in a call trace in the message log.

Workaround: Restart the driver.

Keywords: SLES11 SP3

Discovered in Release: 4.4-1.0.0.0

1425129

Description: MLNX_OFED cannot be installed on SLES 15 OSs using Zypper repository.

Workaround: Install MLNX_OFED using the standard installation script instead of Zypper repository.

Keywords: Installation, SLES, Zypper

Discovered in Release: 4.4-1.0.0.0

1241056

Description: When working with ConnectX-4/ConnectX-5 HCAs on PPC systems with Hardware LRO and Adaptive Rx support, bandwidth drops from full wire speed (FWS) to ~60Gb/s.

Workaround: Make sure to disable Adaptive Rx when enabling Hardware LRO: ethtool -C <interface> adaptive-rx off

ethtool -C <interface> rx-usecs 8 rx-frames 128

Keywords: Hardware LRO, Adaptive Rx, PPC

Discovered in Release: 4.3-1.0.1.0

1090612

Description: NVMEoF protocol does not support LBA format with non-zero metadata size. Therefore, NVMe namespace configured to LBA format with metadata size bigger than 0 will cause Enhanced Error Handling (EEH) in PowerPC systems.

Workaround: Configure the NVMe namespace to use LBA format with zero sized metadata.

Keywords: NVMEoF, PowerPC, EEH

Discovered in Release: 4.3-1.0.1.0

1243581

Description: In switchdev mode, the IB device exposed does not support MADs. As a result, tools such as ibstat that work with MADs will not function properly.

Workaround: N/A

Keywords: switchdev, IB representors, mlx5, MADs

Discovered in Release: 4.3-1.0.1.0

1309621

Description: In switchdev mode default configuration, stateless offloads/steering based on inner headers is not supported.

Workaround: To enable stateless offloads/steering based on inner headers, disable encap by running:

devlink dev eswitch show pci/0000:83:00.1 encap disable

Or, in case devlink is not supported by the kernel, run:

echo none > /sys/kernel/debug/mlx5/<BDF>/compat/encap

Note: This is a hardware-related limitation.

Keywords: switchdev, stateless offload, steering

Discovered in Release: 4.3-1.0.1.0

1268718

Description: ConnectX-5 supports up to 62 IB representors. When attempting to move to switchdev mode where more than 62 VFs are initialized, the call will fail with the following error message:

devlink answers: Invalid argument

Workaround: N/A

Keywords: ConnectX-5, IB representors

Discovered in Release: 4.3-1.0.1.0

1275082

Description: When setting a non-default IPv6 link local address or an address that is not based on the device MAC, connection establishments over RoCEv2 might fail.

Workaround: N/A

Keywords: IPV6, RoCE, link local address

Discovered in Release: 4.3-1.0.1.0

1307336

Description: In RoCE LAG mode, when running ibdev2netdev -v , the port state of the second port of the mlx4_0 IB device will read “NA” since this IB device does not have a second port.

Workaround: N/A

Keywords: mlx4, RoCE LAG, ibdev2netdev, bonding

Discovered in Release: 4.3-1.0.1.0

1316654

Description: PKEY interface receives PTP delay requests without a time-stamp.

Workaround: Run ptp4l over the parent interface.

Keywords: PKEY, PTP

Discovered in Release: 4.3-1.0.1.0

1296355

Description: Number of MSI-X that can be allocated for VFs and PFs in total is limited to 2300 on Power9 platforms.

Workaround: N/A

Keywords: MSI-X, VF, PF, PPC, SR-IOV

Discovered in Release: 4.3-1.0.1.0

1294934

Description: Firmware reset might cause Enhanced Error Handling (EEH) on Power7 platforms.

Workaround: N/A

Keywords: EEH, PPC

Discovered in Release: 4.3-1.0.1.0

1259293

Description: On Fedora 20 operating systems, driver load fails with an error message such as: “ [185.262460] kmem_cache_sanity_check (fs_ftes_0000:00:06.0): Cache name already exists.

This is caused by SLUB allocators grouping multiple slab kmem_cache_create into one slab cache alias to save memory and increase cache hotness. This results in the slab name to be considered stale.

Workaround: Upgrade the kernel version to kernel-3.19.8-100.fc20.x86_64.

Note that after rebooting to the new kernel, you will need to rebuild
MLNX_OFED against the new kernel version.

Keywords: Fedora, driver load

Discovered in Release: 4.3-1.0.1.0

1264359

Description: When running perftest (ib_send_bw, ib_write_bw, etc.) in rdma-cm mode, the resp_cqe_error counter under /sys/class/infiniband/mlx5_0/ports/1/hw_counters/resp_cqe_error might increase. This behavior is expected and it is a result of receive WQEs that were not consumed.

Workaround: N/A

Keywords: perftest, RDMA CM, mlx5

Discovered in Release: 4.3-1.0.1.0

1294575

Description: Traffic may hang while working in IPoIB SR-IOV environment.

Workaround: N/A

Keywords: IPoIB, SR-IOV

Discovered in Release: 4.3-1.0.1.0

1227577

Description: Due to Enhanced IPoIB’s lack of priority-based flow control, PTP accuracy may adversely be affected by heavy TCP traffic.

Workaround: N/A

Keywords: Enhanced IPoIB, PTP

Discovered in Release: 4.3-1.0.1.0

1264956

Description: Configuring SR-IOV after disabling RoCE LAG using sysfs (/sys/bus/pci/drivers/mlx5_core/<bdf>/roce_lag_enable) might result in RoCE LAG being enabled again in case SR-IOV configuration fails.

Workaround: Make sure to disable RoCE LAG once again.

Keywords: RoCE LAG, SR-IOV

Discovered in Release: 4.3-1.0.1.0

1263043

Description: On RHEL7.4, due to an OS issue introduced in kmod package version 20-15.el7_4.6, parsing the depmod configuration files will fail, resulting in either of the following issues:

  • Driver restart failure prompting an error message, such as: “ ERROR: Module mlx5_core belong to kernel which is not a part of MLNX_OFED, skipping...

  • nvmet_rdma kernel module dysfunction, despite installing MLNX_OFED using the "--with-nvmf " option. An error message, such as: “ nvmet_rdma: unknown parameter 'offload_mem_start' ignored ” will be seen in dmesg output

Workaround: Go to RedHat webpage to upgrade the kmod package version.

Keywords: driver restart, kmod, kmp, nvmf, nvmet_rdma

Discovered in Release: 4.2-1.2.0.0

1229160

Description: Changing IPoIB Tx/Rx ring size dynamically using ethtool is not permitted.

Workaround: Use the send_queue_size/recv_queue_size module parameters to change the Tx/Rx ring size.

Keywords: IPoIB, queue size

Discovered in Release: 4.2-1.2.0.0

1214477

Description: On vRedHat 7.2 operating systems, when Network Manager is enabled, IPoIB interfaces may not get an IPv6 address due to an issue in the Network Manager.

Workaround: Disable Network Manager or upgrade its version.

Keywords: Network Manager, IPoIB, IPv6

Discovered in Release: 4.2-1.2.0.0

-

Description: Packet Size (Actual Packet MTU) limitation for IPsec offload on Innova IPsec adapter cards: The current offload implementation does not support IP fragmentation. The original packet size should be such that it does not exceed the interface's MTU size after the ESP transformation (encryption of the original IP packet which increases its length) and the headers (outer IP header) are added:

  • Inner IP packet size <= I/F MTU - ESP additions (20) - outer_IP (20) - fragmentation issue reserved length (56)

  • Inner IP packet size <= I/F MTU - 96

This mostly affects forwarded traffic into smaller MTU, as well as UDP traffic. TCP does PMTU discovery by default and clamps the MSS accordingly.

Workaround: N/A

Keywords: Innova IPsec, MTU

Discovered in Release: 4.2-1.0.0.0

-

Description: No LLC/SNAP support on Innova IPsec adapter cards.

Workaround: N/A

Keywords: Innova IPsec, LLC/SNAP

Discovered in Release: 4.2-1.0.0.0

-

Description: No support for FEC on Innova IPsec adapter cards. When using switches, there may be a need to change its configuration.

Workaround: N/A

Keywords: Innova IPsec, FEC

Discovered in Release: 4.2-1.0.0.0

955929

Description: Heavy traffic may cause SYN flooding when using Innova IPsec adapter cards.

Workaround: N/A

Keywords: Innova IPsec, SYN flooding

Discovered in Release: 4.2-1.0.0.0

-

Description: Priority Based Flow Control is not supported on Innova IPsec adapter cards.

Workaround: N/A

Keywords: Innova IPsec, Priority Based Flow Control

Discovered in Release: 4.2-1.0.0.0

-

Description: Pause configuration is not supported when using Innova IPsec adapter cards. Default pause is global pause (enabled).

Workaround: N/A

Keywords: Innova IPsec, Global pause

Discovered in Release: 4.2-1.0.0.0

1045097

Description: Connecting and disconnecting a cable several times may cause a link up failure when using Innova IPsec adapter cards.

Workaround: N/A

Keywords: Innova IPsec, Cable, link up

Discovered in Release: 4.2-1.0.0.0

-

Description: On Innova IPsec adapter cards, supported MTU is between 512 and 2012 bytes. Setting MTU values outside this range might fail or might cause traffic loss.

Workaround: Set MTU between 512 and 2012 bytes.

Keywords: Innova IPsec, MTU

Discovered in Release: 4.2-1.0.0.0

1177196

Description: If OpenSM version is 4.8.1 and below, the IB interfaces link remains Down while the "SRIOV_IB_ROUTING_MODE_P1=1" and "SRIOV_IB_ROUTING_MODE_P2=1" flags are enabled in the HCA.

Workaround: N/A

Keywords: OpenSM, SR-IOV, IB link

Discovered in Release: 4.2-1.0.0.0

1118530

Description: On kernel versions 4.10-4.13, when resetting sriov_numvfs to 0 on PowerPC systems, the following dmesg warning will appear:

mlx5_core <BDF>: can't update enabled VF BAR0

Workaround: Reboot the system to reset sriov_numvfs value.

Keywords: SR-IOV, numvfs

Discovered in Release: 4.2-1.0.0.0

1125184

Description: In old kernel versions, such as Ubuntu 14.04 and RedHat 7.1, VXLAN interface does not reply to ARP requests for a MAC address that exists in its own ARP table. This issue was fixed in the following newer kernel versions: Ubuntu 16.04 and RedHat 7.3.

Workaround: N/A

Keywords: ARP, VXLAN

Discovered in Release: 4.2-1.0.0.0

1171764

Description: Connecting multiple ports on the same server to the same subnet (IP/IB) will cause all interfaces connected to that subnet to respond to ARP requests. As a result, wrong ARP replies might be received when trying to resolve IP addresses.

Workaround: Run the following to make sure only the interface with the requested IP address responds to the ARP request:

sysctl -w net.ipv4.conf.all.arp_ignore=1

Keywords: IPoIB, librdmacm, ARP

Discovered in Release: 4.2-1.0.0.0

1134323

Description: When using kernel versions older than version 4.7 with IOMMU enabled, performance degradations and logical issues (such as soft lockup) might occur upon high load of traffic. This is caused due to the fact that IOMMU IOVA allocations are centralized, requiring many synchronization operations and high locking overhead amongst CPUs.

Workaround: Use kernel v4.7 or above, or a backported kernel that includes the following patches:

  • 2aac630429d9 iommu/vt-d: change intel-iommu to use IOVA frame numbers

  • 9257b4a206fc iommu/iova: introduce per-cpu caching to iova allocation

  • 22e2f9fa63b0 iommu/vt-d: Use per-cpu IOVA caching

Keywords: IOMMU, soft lockup

Discovered in Release: 4.2-1.0.0.0

1135738

Description: On 64k page size setups, DMA memory might run out when trying to increase the ring size/number of channels.

Workaround: Reduce the ring size/number of channels.

Keywords: DMA, 64K page

Discovered in Release: 4.2-1.0.0.0

1159650

Description: When configuring VF VST, VLAN-tagged outgoing packets will be dropped in case of ConnectX-4 HCAs. In case of ConnectX-5 HCAs, VLAN-tagged outgoing packets will have another VLAN tag inserted.

Workaround: N/A

Keywords: VST

Discovered in Release: 4.2-1.0.0.0

1157770

Description: On Passthrough/VM machines with relatively old QEMU and libvirtd,

CMD timeout might occur upon driver load.

After timeout, no other commands will be completed and all driver operations will be stuck.

Workaround: Upgrade the QEMU and libvirtd on the KVM server.

Tested with (Ubuntu 16.10) are the following versions:

  • libvirt 2.1.0

  • QEMU 2.6.1

Keywords: QEMU

Discovered in Release: 4.2-1.0.0.0

1147703

Description: Using dm-multipath for High Availability on top of NVMEoF block devices must be done with “directio” path checker.

Workaround: N/A

Keywords: NVMEoF

Discovered in Release: 4.2-1.0.0.0

1152408

Description: RedHat v7.3 PPCLE and v7.4 PPCLE operating systems do not support KVM qemu out of the box. The following error message will appear when attempting to run virt-install to create new VMs:

Cant find qemu-kvm packge to install

Workaround: Acquire the following rpms from the beta version of 7.4ALT to 7.3/7.4 PPCLE (in the same order):

  • qemu-img-.el7a.ppc64le.rpm

  • qemu-kvm-common-.el7a.ppc64le.rpm

  • qemu-kvm-.el7a.ppc64le.rpm

Keywords: Virtualization, PPC, Power8, KVM, RedHat, PPC64LE

Discovered in Release: 4.2-1.0.0.0

1012719

Description: A soft lockup in the CQ polling flow might occur when running very high stress on the GSI QP (RDMA-CM applications). This is a transient situation from which the driver will later recover.

Workaround: N/A

Keywords: RDMA-CM, GSI QP, CQ

Discovered in Release: 4.2-1.0.0.0

1062940

Description: When running Network Manger on devices on which Enhanced IPoIB is enabled, CONNECTED_MODE can only be set to NO/AUTO. Setting it to YES will prevent the interface from being configured.

Workaround: N/A

Keywords: Enhanced IPoIB, network manager, connected_mode

Discovered in Release: 4.2-1.0.0.0

1078630

Description: When working in RoCE LAG over kernel v3.10, a kernel crash might occur when unloading the driver as the Network Manager is running.

Workaround: Stop the Network Manager before unloading the driver and start it back once the driver unload is complete.

Keywords: RoCE LAG, network manager

Discovered in Release: 4.2-1.0.0.0

1149557

Description: When setting VGT+, the maximal number of allowed VLAN IDs presented in the sysfs is 813 (up to the first 813).

Workaround: N/A

Keywords: VGT+

Discovered in Release: 4.2-1.0.0.0

1122619

Description: On Arm setups, DMA memory resource is limited due to a default CMA limitation.

Workaround: Increase the CMA limitation or cancel its use, using the kernel's CMD line parameters:

  • Add the parameter cma=256M to increase the CMA limit to 256MB

  • Add the parameter cma=0 to disable the use of CMA

Keywords: IPoIB, CMA

Discovered in Release: 4.2-1.0.0.0

1146837

Description: On SLES11 SP1 operating system, IPoIB interface renaming process may fail due to a broken udev rule, leaving interfaces with names like ib0_rename.

Workaround:

  1. Open the udev conf file "/etc/udev/rules.d/70-persistent-net.rules", and remove such lines as SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", =="" , NAME="eth0".

  2. Reload the driver stack.

Keywords: IPoIB

Discovered in Release: 4.2-1.0.0.0

-

Description: NVMEoF support is available for the following:

  • SLES 12.3 and above

  • RHEL 7.2 and above (Host side only)

  • RHEL 7.4 and above (Host and Target side)

  • OS with distribution/custom kernel >= 4.8.x

Workaround: N/A

Keywords: NVMEoF Host/Target

995665/1165919

Description: In kernels below v4.13, connection between NVMEoF host and target cannot be established in a hyper-threaded system with more than 1 socket.

Workaround: On the host side, connect to NVMEoF subsystem using --nr-io-queues <num_queues> flag.

Note that num_queues must be lower or equal to num_sockets multiplied with num_cores_per_socket.

Keywords: NVMEoF

1039346

Description: Enabling multiple namespaces per subsystem while using NVMEoF target offload is not supported on ConnectX-5 adapter cards.

Workaround: To enable more than one namespace, create a subsystem for each one.

Keywords: NVMEoF Target Offload, namespace

1072347

Description: Ethtool -i <ibx> displays incorrect driver name for devices with enhanced IPoIB support.

Workaround: N/A

Keywords: Enhanced IPoIB, Ethtool

1071457

Description: PKEY-related limitations in enhanced IPoIB:

  • Since the parent interface ib<x> and the child interface ib<x>.yyyy share the same receive resources, the parent interface’s MTU cannot be less than the child interface’s MTU

  • Interface counters and Ethtool control are not supported on child interfaces

  • Parent interface should be in UP state to enable child interface to receive traffic

Workaround: N/A

Keywords: PKEY, Enhanced IPoIB, MTU, Ethtool, Interface Counters

1059451

Description: When Enhanced IPoIB is enabled, the following module parameters will not be functional:

  • send_queue_size

  • recv_queue_size

  • max_nonsrq_conn_qp

Workaround: N/A

Keywords: Enhance IPoIB

1030301

Description: Creating virtual functions on a device that is in LAG mode will destroy the LAG configuration. The boding device over the Ethernet NICs will continue to work as expected.

Workaround: N/A

Keywords: LAG, SR-IOV

1047616

Description: When node GUID of a device is set to zero (0000:0000:0000:0000), RDMA_CM user space application may crash.

Workaround: Set node GUID to a nonzero value.

Keywords: RDMA_CM

1061298

Description: Since enhanced IPoIB does not support connected mode on RedHat operating systems, when using network manger and enhanced IPoIB capable devices, CONNECTED_MODE must be set to NO/AUTO.

Setting CONNECTED_MODE to yes will cause the interface to not be configured.

Workaround: N/A

Keywords: Enhanced IPoIB

1068215

Description: When enhanced IPoIB mode is enabled, ring size limit is 8k. When it is disabled, ring size limit is decreased to 4k.

Workaround: N/A

Keywords: Enhanced IPoIB

1051701

Description: New versions of iproute which support new kernel features may misbehave on old kernels that do not support these new features.

Workaround: N/A

Keywords: iproute

1007830

Description: When working on Xenserver hypervisor with SR-IOV enabled on it, make sure the following instructions are applied:

  1. Right after enabling SR-IOV, unbind all driver instances of the virtual functions from their PCI slots.

  2. It is not allowed to unbind PF driver instance while having active VFs.

Workaround: N/A

Keywords: SR-IOV

1008583

Description: A soft lockup in the CQ polling flow might occur when running very high stress on the GSI QP (RDMA-CM applications). This is a transient situation and the driver recovers from it after a while.

Workaround: N/A

Keywords: RDMA-CM

1007356

Description: Creating a PKEY interface using “ ip link ” is not supported.

Workaround: Use sysfs to create a PKEY interface.

Keywords: IPoIB, PKEY

1000197

Description: Displaying multicast groups using sysfs may not show all the entries on Fedora 23 OS.

Workaround: N/A

Keywords: IPoIB

1010148

Description: Upgrading from MLNX_OFED v3.x to v4.x using yum and apt-get repositories fails.

Workaround: Remove MLNX_OFED v3.x using the ofed_uninstall.sh script, and only then install MLNX_OFED v4.x as usual.

Keywords: Installation

1005786

Description: When using ConnectX-5 adapter cards, the following error might be printed to dmesg, indicating temporary lack of DMA pages:

“mlx5_core ... give_pages:289:(pid x): Y pages alloc time exceeded the max permitted duration

mlx5_core ... page_notify_fail:263:(pid x): Page allocation failure notification on func_id(z) sent to fw

mlx5_core ... pages_work_handler:471:(pid x): give fail -12”

Example: This might happen when trying to open more than 64 VFs per port.

Workaround: N/A

Keywords: mlx5_core, DMA

1008066/1009004

Description: Performing some operations on the user end during reboot might cause call trace/panic, due to bugs found in the Linux kernel.

For example: Running get_vf_stats (via iptool) during reboot.

Workaround: N/A

Keywords: mlx5_core, reboot

1009488

Description: Mounting MLNX_OFED to a path that contains special characters, such as parenthesis or spaces is not supported. For example, when mounting MLNX_OFED to “/media/CDROM(vcd)/”, installation will fail and the following error message will be displayed:

# cd /media/CDROM\(vcd\)/

# ./mlnxofedinstall

sh: 1: Syntax error: "(" unexpected

Workaround: N/A

Keywords: Installation

982144

Description: When offload traffic sniffer is on, the bandwidth could decrease up to 50%.

Workaround: N/A

Keywords: Offload Traffic Sniffer

981045

Description: On kernels below v4.2, when removing a bonding module with devices different from ARPHRD_ETHER, a call trace may be received.

Workaround: Remove the bond in the following order:

Remove the slaves, delete the bond, and only then remove the bonding module.

Keywords: Bonding

980066/981314

Description: Soft RoCE does not support Extended Reliable Connection (XRC).

Workaround: N/A

Keywords: Soft RoCE, XRC

982534

Description: In ConnectX-3, when using a server with page size of 64K, the UAR BAR will become too small. This may cause one of the following issues:

  1. mlx4_core driver does not load.

  2. The mlx4_core driver does load, but calls to ibv_open_device may return ENOMEM errors.

Workaround:

  1. Add the following parameter in the firmware's ini file under [HCA] section: log2_uar_bar_megabytes = 7

  2. Re-burn the firmware with the new ini file.

Keywords: PPC

981362

Description: On several OSs, setting a number of TC is not supported via the tc tool.

Workaround: Set the number of TC via the /sys/class/net/<interface>/qos/tc_num sysfs file.

Keywords: Ethernet, TC

980257

Description: An issue in InfiniBand bond interfaces may cause memory corruption in Ubuntu v14.04 and v14.10 OSs.

The memory corruption happens when attempting to reload the driver while the bond is up with InfiniBand salves.

Workaround: Delete the bond before restarting the driver.

Keywords: Bonding, IPoIB

980034/981311

Description: Soft RoCE counters located under /sys/class/infiniband/<rxe-inf>/ports/1/counters/ directory are not supported.

Workaround: N/A

Keywords: Soft RoCE

979907

Description: Only the following two experimental verbs are supported for Soft RoCE:

  • ibv_exp_query_device

  • ibv_exp_poll_cq.

Workaround: N/A

Keywords: Soft RoCE

979457

Description: When setting IOMMU=ON, a severe performance degradation may occur due to a bug in IOMMU.

Workaround: Make sure the following patches are found in your kernel:

  • iommu/vt-d: Fix PASID table allocation

  • iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions

Note: These patches are already available in Ubuntu 16.04.02 and 17.04 OSs.

Keywords: Performance, IOMMU

977852

Description: rdma_cm running over IB ports does not support UD QPs on ConnectX-3 adapter cards.

Workaround: N/A

Keywords: SR-IOV, RDMA CM

955113/977990

Description: In RoCE LAG over ConnectX-4 adapter cards, the script ibdev2netdev may show a wrong port state for the bonded device. This means that although the IB device/port mlx5_bond_0/1 is up (as seen in ibstat), ibdev2netdev may report that it is down.

Workaround: N/A

Keywords: RoCE, LAG, bonding

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.