Known Issues

Linux Kernel Upstream Release Notes v6.5

For the list of old Know Issues, please see the relevant Release Notes version.

Internal Ref.

Issue

3150126

Description: On ConnectX-4 and ConnectX-4 Lx, when using Hardware QoS Offload revision 2 and running RDMA from the VF, only the RX inbound counters will be increased in the RDMA activity counters, and the bytes and frames counters will be the same.

Note: There is no functional impact on the actual traffic, only wrong counter value.

Workaround: N/A

Keywords: Hardware QoS Offload, RDMA activity counters

Detected in version: 3.0.50000

2938362

Description: Due to bug in the OS, nested VM does not work on Windows Server 2022.

In such case, the driver does not load and user is presented with code 12 error. (Not enough resources).

Workaround: N/A

Keywords: Nested VM, Windows Server 2022

Detected in version: N/A

2970608

Description: In some cases when running "mlx5cmd -linkspeed" and the link raises in more than 5 seconds, the command returns an error although the link is up.
This error can safely ignored.

Workaround: N/A

Keywords: "mlx5cmd -linkspeed"

Detected in version: 2.90.50010

3040551

Description: The Set/Get/Enable-NetAdapterEncapsulatedPacketTaskOffload powersehll commands are not supported by default when working in NIC mode on NVIDIA Bluefield-2 DPU.
These commands will fail in this mode because the encapsulation registry keys (*EncapsulatedPacketTaskOffload,*EncapsulatedPacketTaskOffloadNvgre,*EncapsulatedPacketTaskOffloadVxlan) are missing.
However, as the encapsulation is still enabled by default, the user can configure encapsulation without these commands.

Workaround: Manually add these keys, however, the keys must be removed or disabled when switching to Smart NIC mode. For instruction on how to add the keys please see Configuring the Driver Registry Keys.

Keywords: NVIDIA BlueField-2, NIC Mode,NVGRE,VXLAN

Detected in version: 2.90.50010

2891364

Description: When working with ConnectX-4 or ConnectX-4 Lx dual-port adapter cards, the value of the "EnableVmQoSOffloadRev2" registry key must be the same on both ports, otherwise one port will failed to load.

Workaround: Set the same value for the "EnableVmQoSOffloadRev2" registry key on both ports.

Keywords: EnableVmQoSOffloadRev2, VMQoS

Detected in version: 2.80.50000

2868062

Description: Notification on service side disconnection is not supported.

Workaround: N/A

Keywords: DOCA

Detected in version: 2.80.50000

2854943

Description: When using Hardware QoS Offload Rev 2 when in VMQ mode and the VM traffic is mapped to TC != 0, the rate limit will be enforced only on NVIDIA ConnectX-4 Lx. For all other devices, the rate limit will be enforced only for TC = 0.
Note: When in SR-IOV mode, it works for all devices as expected.

Workaround: N/A

Keywords: Hardware QoS Offload, VMQ

Detected in version: 2.80.50000

2302247

Description: mlx5cmd exposes the system GUID information of a NVIDIA BlueField Virtual Function irrespective of its trusted state.

Workaround: N/A

Keywords: mlx5cmd, VF, NVIDIA BlueField, GUID

Detected in version: 2.80.50000

2491846

Description: As oversubscription of QP parameters (entries and depth) is allowed, it could cause run-time failure when running out of resources.

Workaround: N/A

Keywords: QP creation

Detected in version: 2.70.50000

2380684

Description: Although the IPOIB failover team gets the correct DHCP address when first created, if the team is disabled and then enabled, Windows requests and rejects the DHCP address as BAD_ADDRESS.

Workaround: When the issue is seen, restart the secondary member(s) of the team.

Keywords: IPOIB teaming, DHCP

Detected in version: 2.70.50000

2603423

Description: When in ETH mode, setting the MTU (JumboPacket) lower than 1514, results in Received Packets Error counters not being increased when receiving packets with larger frame size but less or equal to 1518 bytes (Like ping with data size of 1476).

Workaround: N/A

Keywords: MTU, traffic, counters

Detected in version: 2.70.50000

2403963

Description: The DHCP redirect feature is not supported over FreeBSD VMs. When activated, DHCP packets will be dropped and VM will lose connectivity due to missing IP.

Workaround: N/A

Keywords: DHCP Redirect

Detected in version: 2.60.50000

2397036

Description: On BlueField-2 setup, the maximum number of VFs enabled is less than the actual value supported by the firmware.

When in SmartNIC mode, the number of VFs will decrease the SmartNIC enablements. When in separate mode, the number of supported VFs will be half of the firmware value as the VFs are split between the host and the Arm.

Workaround: N/A

Keywords: BlueField, VFs

Detected in version: 2.60.50000

2374101

Description: After upgrade, *PtpHardwareTimestamp remains enabled. When *PtpHardwareTimestamp is enabled, UDP performance feature (URO) wil be automatically disabled.

This is an OS limitation, if you do not use the HW time stamp feature, it is recommended to disable this feature by setting *PtpHardwareTimestamp to 0.

Workaround: Disable HW timestamping. by setting *PtpHardwareTimestamp to 0.

Keywords: *PtpHardwareTimestamp, UDP performance feature ,URO

Detected in version: 2.60.50000

2306807

Description: When the Decouple VmSwitch protocol is enabled, VM's friendly given name is not displayed when running the "Get-NetAdapterSriovVf" and "mlnx5hpccmd -DriverVersion" commands.

Workaround: N/A

Keywords: HPC, SR-IOV

Detected in version: 2.60.50000

2205722

Description: WinOF-2 driver does not support IB MTU lower than 614.

Workaround: N/A

Keywords: IB MTU

Detected in version: 2.60.50000

2180714

Description: In case the user configs TCP to priority 0 with no VlanID, the packets will be sent without a VLAN header since the miniport cannot distinguish between priority 0 with VlanId 0 and no Vlan tag.

Workaround: N/A

Keywords: TCP QOS

Detected in version: 2.50.50000

2216232

Description: As ConnectX-5 adapter cards do not create counters for RX PACKET MARKED PCIe BUFFERS, its value will be 0.

Workaround: N/A

Keywords: ECN Marking

Detected in version: 2.50.50000

2243909

Description: The driver to sends a wrong CNP priority counter while running RDMA.

Workaround: Change the CNP priority using mlxconfig.

Keywords: RDMA, CNP

Detected in version: 2.50.50000

2118837

Description: Performance degradation might be experienced during UDP traffic when using a container networking and the UDP message size is larger than the MTU size .

Workaround: N/A

Keywords: Nested Virtualization, container networking

Detected in version: 2.50.50000

2137585

Description: While working in IPoIB mode and *JumboPacket is set in the range of [256, 614], the driver issues a warning event log message (Event ID: 25). This is a false alarm and could be ignored.

Workaround: N/A

Keywords: JumboPacket

Detected in version: 2.50.50000

2148077

Description: Explicitly disabling the *NetworkDirect key when using the HyperV mode, disables NDSPI as well as the NDK.

Workaround: Enable NetworkDirect (ND).

Keywords: ND, HyperV

Detected in version: 2.50.50000

2117964

Description: A delay in connection establishment might be experienced when the ND application is started immediately after restarting the adapter card. This scenario occurs because the ND application requires the ARP table to find the destination MAC and generate the ARP request.

Workaround: Use static ARP. Ping the system before starting the ND application.

Keywords: ND, RDMA

Detected in version: 2.40.51000

2117636

Description: On a native setup, when setting JumboPacket to be less than 1514, the Large Receive Offload (LRO) feature might be disabled, and all its counters will not be valid.

Workaround: N/A

Keywords: LRO, RSC

Detected in version: 2.40.51000

2083686

Description: As PCIe Write Relaxed Ordering is enabled by default, some older Intel processors might observe up to 5% packet loss in high packet rate and small packets. (https://lore.kernel.org/patchwork/patch/820922/)

Workaround: Disable the Relaxed Ordering Write option by setting the RelaxedOrderingWrite registry key to 0 and restart the adapter.

Keywords: PCIe Write Relaxed Ordering

Detected in version: 2.40.50000

1763379

Description: On Windows Server 19H1, running "netstat -axn" when RDMA is enabled and a vNIC is present, results in RDMA being disabled on the port with the VMswitch.

Workaround: N/A

Keywords: VMSwitch, RDMA, Windows Server 2019

Detected in version: 2.40.50000

1908862

Description: When running RoCE traffic with a different RoceFrameSize configuration, and the fabric (jumbo packet size) is large enough, the MTU will be taken from the initiator even when it supports larger size than the server.

Workaround: N/A

Keywords: RoCE, MTU

Detected in version: 2.40.50000

1846356

Description: The driver ignores the value set by the "*NumVfs" key. The maximal number of VFs is the maximal number of VFs supported by the hardware.

Workaround: N/A

Keywords: SR-IOV NUMVFs

Detected in version: 2.30.50000

1598716

Description: Issues with the OS' "SR-IOV PF/VF Backchannel Communication" mechanism in Windows Server 2019 Hyper-V, effect VF-Counters functionality as well.

Workaround: N/A

Keywords: Mellanox WinOF-2 VF Port Traffic, VF-Counters

Detected in version: 2.30.50000

1702662

Description: On WIndows Server 2019, the physical media type of the IPoIB NIC will be 802.3 and not InfiniBand.

Workaround: Use the mlx5cmd tool ("mlx5cmd -stat") which is part of the driver package to display the lin_layer type.

Keywords: Windows Server 2019, IPoIB NdisPhysicalMedium

Detected in version: 2.20

1718201

Description: Heavy traffic causes Sniffer' limit file to be the same as the buffer size (100M by default).

Workaround: N/A

Keywords: Sniffer, heavy traffic

Detected in version: 2.20

1576283

Description: When working with SR-IOV in Windows Server 2019, the vNIC that is working in SR-IOV mode status will be displayed as "Degraded (SR-IOV not operational)" although the SR-IOV VF is fully operational. The message can be safely ignored.

Workaround: N/A

Keywords: SR-IOV IB, Windows Server 2019

Detected in version: 2.10

1580985

Description: iSCSI boot over IPoIB is currently not supported.

Workaround: N/A

Keywords: iSCSI Boot, IPoIB

Detected in version: 2.10

1536971

Description: The RscIPv4 and RscIPv6 keys’ values are set to 0 for the host in Windows Server 2019. As the values for those keys are already written by the Inbox Driver in Windows Server 2019, they will not be changed when upgrading.

Workaround: N/A

Keywords: RscIPv4, RscIPv6, Windows Server 2019

Detected in version: 2.10

1419597

Description: On servers with large number of VMs, (typically more than 40), after restarting the NIC on the host, VMs’ IPv6 global address is not retrieved back from the DHCP.

Workaround: Restart the NIC inside the VM.

Keywords: VMQ, SR-IOV

Detected in version: 2.10

1419597

Description: On servers with a large number of VMs (typically > 40) - after a NIC restart on the host, VMs’ IPv6 global address cannot be retrieved from DHCP.

Workaround: Restart Microsoft NIC inside the VM.

Keywords: VM, IPv6 address, DHCP

Detected in version: 2.0

1336097

Description: Due to an OID timeout, the miniport reset is executed.

Workaround: Increase the timeout value in such way that 2 * CheckForHangTOInSeconds > Max OID time.

For further information, refer to section General Registry Keys in the User Manual.

Keywords: Resiliency

Detected in version: 1.90

1310086

Description: Multicast packets are passed via to the VM the Hyper-V (even in SR-IOV VMs). As such, the Hyper-V can decide to drop the packets based on its specific policy.

Note: This issue is only related to FreeBSD OSes.

Workaround: N/A

Keywords: Hyper-V OS

Detected in version: 1.90

1154447

Description: Adding diagnostic counters to performance monitor might cause counters to get cleared every several seconds.

Workaround: Change the time period between samples to more than 1 second.

Keywords: Diagnostic Counters

Detected in version: 1.90

1074589

Description: When PXE boot is using Flexboot, the IPoIB interface is not receiving the reserved address from the DHCP using GUID reservation.

Workaround: To obtain the reserved address, use a 6-byte MAC address instead of the 8-byte client ID.

Keywords: PXE boot, IPoIB, Flexboot, DHCP

Detected in version: 1.80

917747

Description: VF driver initialization fails in case of bad MSIX mapping when running Windows Server 2012 R2 Hypervisor with Windows Server 2016 VM with more than a single core CPU. As a result, performance desegregation might occur.

Workaround: Run either with one CPU core, or run with different Operating Systems.

Keywords: SR-IOV

Detected in version: 1.80

1170780

Description: The driver must be restarted in order to switch from RSS to NonRSS mode. Therefore, if a PowerShell command is used on a specific VM to an enabled/disabled VMMQ without restarting the driver, the RSS counters will keep increasing in Perfmon.

Workaround: Restart the driver to switch to NonRSS mode.

Keywords: RSS, NonRSS, VMMQ

Detected in version: 1.80

1149961

Description: In RoCE, the maximum MTU of WinOF-2 (4k) is greater than the maximum MTU of WinOF (2k). As a result, when working with MTU greater than 2k, WinOF and WinOF-2 cannot operate together.

Workaround: N/A

Keywords: RoCE, MTU

Detected in version: 1.80

1145421

Description: In IPoIB SR-IOV setup, in the Hyper-V Manager, the address appears as "SR-IOV enabled" instead of "SR-IOV active". This does not influence any activity or functionality.

Workaround: N/A

Keywords: IPoIB SR-IOV setup, Hyper-V

Detected in version: 1.80

1145421

Description: In the "Network Connections" panel of Virtual Function (VF) in IPoIB SR-IOV setup, the Microsoft adapter may appear in addition to the NVIDIA® adapter. This does not influence any activity or functionality.

Workaround: N/A

Keywords: Network Connections, VF, IPoIB SR-IOV

Detected in version: 1.80

The below table summarizes the SR-IOV working limitations, and the driver’s expected behavior in unsupported configurations.

WinOF-2 Version

NVIDIA® ConnectX®-4 Firmware Ver.

Adapter Mode

InfiniBand

Ethernet

SR-IOV On

SR-IOV Off

SR-IOV On/Off

Earlier versions

Up to 12.16.1020

Driver will fail to load and show "Yellow Bang" in the device manager.

No limitations

1.50 and 1.60

Between 1x.16.1020 and 1x.19.2002 (IPoIB supported)

“Yellow Bang” unsupported mode - disable SR-IOV via mlxconfig

OK

No limitations

1.70 and onwards

1x.19.2002 and onwards (IPoIB supported)

OK

OK

No limitations

For further information on how to enable/disable SR-IOV, please refer to section Single Root I/O Virtualization (SR-IOV).

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.