What can I help you with?
NVIDIA MLNX_EN Documentation v24.10-3.2.5.0 LTS

Bug Fixes in This Version

Below are the bugs fixed in this version. For a list of fixes previous version, see Bug Fixes History.

Internal Reference Number

Description

4328624

Description: Fixed an issue where ibstat would crash when encountering a non-RoCE/IB device, preventing it from displaying information for other valid RoCE/IB devices.

Keywords: ibstat

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4424584

Description: Fixed an issue that prevented representors from being reloaded after a LAG creation failure.

Keywords: LAG

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4096711

Description: Fixed performance degradation on older kernel versions using RX cache, particularly on slower ARM CPUs with larger RX buffers. The issue was caused by the driver attempting to allocate new RX pages too quickly, leading to head-of-line blocking in the RX cache.

The fix improves RX cache usage by triggering page allocation for a bulk of at least 2 WQEs, allowing the application more time to process packets and return buffers to the RX cache, thereby reducing blocking and enhancing performance.

Keywords: Performance, kernel, Rx cache, page allocation

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4424531

Description: Increased the poll batch size as the number of QPs scales up to prevent bandwidth degradation in cases of high number of QPs, where polling only 16 CQEs per iteration may not be sufficient to process all completions in time.

Keywords: ib_write_bw performance

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4412407

Description: Fixed an issue where buffer initialization became a performance bottleneck during the allocation of large buffers, typically when using a high number of QPs with large message sizes. The root cause was the inefficient use of rand(). This has been resolved by replacing it with a faster pseudo-random algorithm.

Keywords: Buffer initialization, performance

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4466253

Description: Fixed an issue where a kernel crash could occur if a device event arrives during the event subscription process.

Keywords: DevX, event_fd

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4407137

Description: Fixed a crash caused by handling multiple CMA net events occurring in quick succession on the same CMA ID.

Keywords: CMA

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4405725

Description: Fixed a potential deadlock that could occur during the handling of peer memory registration failures.

Keywords: Deadlock, peer memory registration

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4340108

Description: Fixed a sysfs issue that occurred when accessing hardware counters from within a namespace.

Keywords: sysfs

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4235683

Description: Resolved corruption of SA MAD Congestion Control FIFO queue when all elements are canceled and a dequeue operation is attempted.

Keywords: SA legacy congestion control mechanism

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

4030072

Description: sFlow is not supported in MLNX_OFED v24.07-x.x.x.x.

Keywords: sFlow

Discovered in Release: 24.07-0.6.1.0

Fixed in Release: 24.10-3.2.5.0

4409283

Description: Increased the size of the slow FDB table to prevent hitting the following error when switching to SwitchDev mode.

mlx5_core 0000:03:00.0: mlx5_cmd_out_err:835:(pid 24362): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x4065f0), err(-22)

mlx5_core 0000:03:00.0: E-Switch: Failed to create peer miss flow group err(-22)

Keywords: Slow FDB table

Discovered in Release: 24.10-2.1.8.0

Fixed in Release: 24.10-3.2.5.0

© Copyright 2025, NVIDIA. Last updated on Jul 2, 2025.