NVIDIA MLNX_OFED Documentation v23.10-5.1.4.0 LTS

Bug Fixes in This Version

Below are the bugs fixed in this version. For a list of fixes previous version, see Bug Fixes History.

Internal Reference NumberDescription
4357048Description: Fixed a security vulnerability detailed in CVE-2025-23263. For more information, refer to the security bulletin: https://nvidia.custhelp.com/app/answers/detail/a_id/5654.
Keywords: VGT+
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4410029Description: Fixed an issue where installing mlnx-ofa_kernel drivers on SLES 15 SP5 with kernel version 5.14.21-150500.55.68-default (and newer) failed due to weak-modules falling back to the original inbox modules. The failure was caused by a mismatch: the original build kernel (5.14.21-150500.53-default) did not include the mana_ib driver, so no dummy module was provided, while the newer kernel did include it. This mismatch led to weak-modules sanity check errors due to the presence of the inbox mana_ib driver.
Keywords: mlnx-ofa_kernel, SLES 15 SP5
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4471811Description: Resolved NVMe driver compilation issue on Linux kernel version 6.6.87.
Keywords: NVMe driver
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4253229Description: Fixed a race condition between the firmware syndrome report and driver initialization during boot.
Keywords: Race condition
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4442965

Description: Fixed performance degradation on older kernel versions using RX cache, particularly on slower ARM CPUs with larger RX buffers. The issue was caused by the driver attempting to allocate new RX pages too quickly, leading to head-of-line blocking in the RX cache.

The fix improves RX cache usage by triggering page allocation for a bulk of at least 2 WQEs, allowing the application more time to process packets and return buffers to the RX cache, thereby reducing blocking and enhancing performance.

Keywords: Performance, kernel, Rx cache, page allocation
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4243800Description: Resolved improper page deallocation handling issue present in some kernels.
Keywords: Page deallocation
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4466255Description: Fixed an issue where a kernel crash could occur if a device event arrives during the event subscription process.
Keywords: DevX, event_fd
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4441119Description: Fixed a crash caused by handling multiple CMA net events occurring in quick succession on the same CMA ID.
Keywords: CMA
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4405723Description: Fixed a potential deadlock that could occur during the handling of peer memory registration failures.
Keywords: Deadlock, peer memory registration
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4340109Description: Fixed a sysfs issue that occurred when accessing hardware counters from within a namespace.
Keywords: sysfs
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4248125Description: Fixed the UMR QP recovery flow to ensure proper functionality and prevent tasks from getting stuck in the kernel. Additionally, resolved a race condition in the ODP MR area that could lead to a CQE error in the UMR QP.
Keywords: UMR QP recovery flow
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4235682Description: Resolved corruption of SA MAD Congestion Control FIFO queue when all elements are canceled and a dequeue operation is attempted.
Keywords: SA legacy congestion control mechanism
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0
4409282

Description: Increased the size of the slow FDB table to prevent hitting the following error when switching to SwitchDev mode.

mlx5_core 0000:03:00.0: mlx5_cmd_out_err:835:(pid 24362): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x4065f0), err(-22)

mlx5_core 0000:03:00.0: E-Switch: Failed to create peer miss flow group err(-22)

Keywords: Slow FDB table
Discovered in Release: 23.10-4.0.9.1
Fixed in Release: 23.10-5.1.4.0

© Copyright 2025, NVIDIA. Last updated on Jul 17, 2025.