DOCA Documentation v3.1.0 Core Update

Bug Fixes in This Version

Ref #

Issue

4426511

Description: Orchestrated reset mode (MLXConfig) will be released as a Beta feature. There's a known race condition between server reboot and the reset flow running in parallel, which can cause the reset to go out of sync.

Keyword: Orchestrated reset mode

Detected in version: 3.0.0

2657392

Description: OFED installation caused CIFS to break in RHEL 8.4 and above. A dummy module was added so that CIFS will be disabled after OFED installation in RHEL 8.4 and above.

Keyword: Installation; CIFS

Detected in version: 3.0.0

4374396

Description: Ingress mirroring rules configured on OVS-DOCA are not offloaded to hardware when using remote GRE tunnels.

Keyword: Mirroring

Detected in version: 3.0.0

4464648

Description: When the server crashes, the client’s Comch Producer may call the send error callback twice for the same task, potentially leading to buffer reference count errors.

Keyword: DOCA Comch; duplicate callback; buffer refcount error

Detected in version: 3.0.0

4454054

Description: DMS pod requires "Linux is up" from bfup.service twice to proceed, forcing repeated bfup runs.

Keyword: Installation

Detected in version: 3.0.0

4255270

Description: Packets are encapsulated with standard VXLAN headers even when VXLAN-GBP is configured, resulting in the Group Policy ID (GBP) not being applied.

Keyword: VXLAN; packet encapsulation

Detected in version: 2.10.0

4263035

Description: In L3 EVPN scenarios with 16k overlay and 4k underlay routes, OVS may get stuck or abnormally terminate.

Keyword: HBN

Detected in version: 2.10.0

3851200

Description: Once PPS is enabled, there is no way to disable it via FireFly commands.

Keyword: PTP

Detected in version: 2.7.0

Ref #

Issue

4404290

Description: Fixed a crash triggered by handling multiple CMA net events in rapid succession on the same CMA ID.

Keyword: CMA

Detected in version: 3.0.0

4500815

Description: Fixed an issue that caused packet loss when enabling or disabling promiscuous mode on a network interface.

Keyword: Promiscuous mode

Detected in version: 3.0.0

4514994

Description: Fixed performance degradation on older kernel versions using RX cache, particularly on slower ARM CPUs with larger RX buffers. The issue was caused by the driver attempting to allocate new RX pages too quickly, leading to head-of-line blocking in the RX cache. The fix improves RX cache usage by triggering page allocation for a bulk of at least 2 WQEs, allowing the application more time to process packets and return buffers to the RX cache, thereby reducing blocking and enhancing performance.

Keyword: Performance, kernel, Rx cache, page allocation

Detected in version: 3.0.0

4504899

Description: Fixed behavior to align with /sys/class/net/<interface-name>/device/sriov_numvfs: silently ignore attempts to set the same number of VFs, and prevent changing the number of VFs until existing VFs are removed.

Keyword: VFs

Detected in version: 3.0.0

3680538

Description: When using strongSwan or OVS-IPsec as explained in the NVIDIA BlueField DPU BSP, the IPSec Rx datapath is not offloaded to hardware and occurs in software running on the Arm cores. As a result, bandwidth performance is substantially low.

Keyword: IPsec

Detected in version: 3.0.0

4448262

Description: Fixed an issue where a kernel crash could occur if a device event arrives during the event subscription process.

Keyword: devx; event_fd

Detected in version: 3.0.0

4449477

Description: On BlueField-3 devices running linux-bluefield kernel versions 5.15.0-1050 or 5.15.0-1060, a kernel crash may occur due to a NULL pointer dereference in the cls_api network scheduler.

Keyword: Kernel crash

Detected in version: 3.0.0

Ref #

Details

4432078

Description: The PLDM file name uses a different YY/MM tag than the corresponding BFB file name.

Keywords: PLDM; BFB; filename

Detected in version: 3.1.0

4438514

Description: When probing device instance 1, the gpio-mlxbf3 driver may log the following harmless message: mlxbf3_gpio MLNXBF33:01: error -ENXIO: IRQ index 0 not found.

Keywords: Logging

Detected in version: 3.1.0

4546487

Description: In Device Manager, users can still select the “Separated” internal CPU mode even though this mode has been deprecated and is no longer supported.

Keywords: Device Manager; Separated mode; SmartNIC; Embedded mode

Detected in version: 3.1.0

4548563

Description: On BlueField-3 with Ubuntu 24.04, the OP-TEE driver cannot be loaded. Running modprobe optee fails with Operation not supported.

Keywords: OP-TEE; modprobe

Detected in version: 3.1.0

4548705

Description: Oracle Linux 9.4 BFB for BlueField-3 does not include the doca-sosreport package, preventing generation of SOS diagnostic reports.

Keywords: BFB; doca-sosreport; missing package

Detected in version: 3.1.0

4582160

Description: Running dmidecode -t 16 on BlueField-3 may show an incorrect "Number Of Devices" in the physical memory array section. Systems with two memory devices may report only one.

Keywords: SMBIOS; dmidecode; memory devices

Detected in version: 3.1.0

4612418

Description: Configuring hugepages through /etc/default/grub (e.g., default_hugepagesz=1G hugepagesz=1G hugepages=6) does not take effect after reboot.

Keywords: Hugepages; GRUB; boot parameters

Detected in version: 3.1.0

4643161

Description: On some operating systems, the partition UUID may change after the first boot following installation.

Keywords: Maintenance mode; boot failure

Detected in version: 3.1.0

4652620

Description: Real-time CPU frequency monitoring may increase CPU usage by 10–30% on older Linux kernels.

Keywords: CPPC utilization

Detected in version: 3.1.0

4519591

Description: In NIC mode, after Arm reboot, the BMC firmware and CEC versions are intermittently missing in the host query output.

Keywords: Host; firmware

Detected in version: 3.1.0

4550064

Description: When running sos report, several plugins such as kernel, process, and processor may time out during report generation. The timeouts are harmless; reports still generate successfully.

Keywords: Kernel boot failure

Detected in version: 3.1.0

4626552

Description: A firmware update failure may be incorrectly logged as an informational [INFO] message rather than an [ERROR].

Keywords: Firmware update; bfb-install; logging

Detected in version: 3.1.0

4643378

Description: In long-duration AC cycling tests, ipmitool commands may fail due to intermittent I²C device access issues on BlueField-3.

Keywords: I2C; IPMI

Detected in version: 3.1.0

4656521

Description: On BlueField-3 running Debian 13, restarting the openibd driver service on the Arm side may fail if user applications (e.g., reactor_0) are still holding InfiniBand device handles. The service reports Cannot unload the InfiniBand driver stack.

Keywords: Debian; driver restart

Detected in version: 3.1.0

Ref #

Issue

4064373

Description: The DPU BMC LLDP represents the eth0 interface. If a VLAN interface is initiated on top of eth0, the LLDP schema does not function as expected and the transmitted data does not accurately describe the eth0 attribute or the newly created VLAN.

Reported in version: 25.07

4365835

Description: There is a mistake in the message displayed when the BISO secure boot setting is changed through the DPU BMC where the event log shows ScureCurrentBoot instead of SecureBootCurrentBoot.

Reported in version: 25.07

4549368

Description: Power sensors available via ipmitool sensor command are not available when DPU is in kernel lockdown mode.

Reported in version: 25.07

4551450

Description: After BMC factory reset, the ipmitool network interface may fail to connect to the DPU BMC via the RMCP interface.

Reported in version: 25.07

Internal Ref.

Issue

4501157 / 4257750

Description: Fixed a critical issue with a live firmware patch.

Keywords: Live firmware patch

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4377816

Description: Fixed an issue where firmware did not de-assert the PERST of the DSP on pcore1. The fix updates the check to correctly interpret the default GPIO mapping value as 0xFFF (NO_GPIO_FUNCTION) instead of 0xFF (INVALID_READ).

Keywords: mlxconfig

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4286902

Description: Fixed a race condition in DPA process termination during the exception flow, where a failed process could be missed and not reported to the user.

Keywords: DPA

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4420567

Description: Removed an unnecessary and partially incorrect firmware check that blocked valid action list permutations allowed by the PRM. Validation of these permutations remains the responsibility of the software.

Keywords: Header actions

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4498670

Description: Fixed a race condition where destroying two emulation objects with the same VHCA ID could result in one destroy command failing with syndrome 0xF3F880.

Keywords: VirtIO

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4475307

Description: Fixed an issue where PCC DCQCN used incorrect parameter values when link speed was 400Gbps or higher.

Keywords: PCC DCQCN, congestion control.

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4443601

Description: Fixed a firmware issue where PXE failed to boot when both LAG ports were up.

Keywords: PXE, LAG

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4480427

Description: Fixed incorrect calculation of start address and mode for the CQE buffer in DPA CQ, which could cause CQEs to be written to the wrong address when the buffer is not 4K-aligned and spans a second page boundary.

Keywords: CQ, CQE Buffer, DPA

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4388371

Description: Fixed an issue where an uninitialized pport in the SLRG command, when using the SMP interface, caused an assertion failure.

Keywords: SLRG, SMP interface, pport

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4395036

Description: Fixed a race condition between firmware and hardware flows during QP closure.

Keywords: Race condition

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4428580

Description: Fixed a rare issue where triggering mstdump via core_dump in Windows drivers could cause a PCI link down condition.

Keywords: mstdump, windows

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4428580

Description: Fixed an issue with vQoS parameter configuration to improve latency handling for large messages.

Keywords: vQoS, latency

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4436922

Description: DC InfiniBand is not functional in this firmware version.

Keywords: DC, DDP traffic

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4366117

Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail.

Keywords: PXE boot filters

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4475307

Description: Fixed an issue where PCC DCQCN used incorrect parameter values when link speed was 400Gbps or higher.

Keywords: PCC DCQCN, congestion control.

Detected in version: 32.45.1020

Fixed in Release: 32.46.1006

4486431

Description: Fixed an issue where issuing multiple parallel queries of DPA_THREAD objects with the same object ID could fail.

Keywords: DPA

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

4470053

Description: Fixed an issue with vQoS parameter configuration to improve latency handling for large messages.

Keywords: vQoS, latency

Discovered in Version: 32.45.1020

Fixed in Release: 32.46.1006

Internal Ref.

Issue

4366117

Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail.

Keywords: PXE boot filters

Discovered in Version: 24.45.1020

Fixed in Release: 24.46.1006

© Copyright 2025, NVIDIA. Last updated on Nov 20, 2025