Known Issues
Reference | Description |
4394475 | Description: The existing congestion control configuration applies globally, rather than on a per-priority basis. |
Workaround: Ensure that the configuration values for all priorities are aligned in either mlxconfig ROCE_CC_PRIO_MASK_P$port or sysfs ecn/roce_rp/enable/$port. | |
Keyword: Congestion control, ROCE_CC_PRIO | |
Reported in version: 3.0.0 | |
4426511 | Description: Orchestrated reset mode (MLXConfig) will be released as a Beta feature. There's a known race condition between server reboot and the reset flow running in parallel, which can cause the reset to go out of sync. |
Workaround: Power cycle the system to recover the issue. | |
Keyword: Orchestrated reset mode | |
Reported in version: 3.0.0 | |
4297489 | Description: Due to incompatibility between DPA and host libraries, a DPA device application must be recompiled after updating DOCA to a newer version. |
Workaround: N/A | |
Keyword: DPA; host library; update | |
Reported in version: 2.10.0 | |
4270602 | Description: UEFI/ATF firmware does not upgrade as part of the Linux Standard Tool process when Secure Boot is disabled. |
Workaround: Remove PK key and initiate UEFI/ATF firmware upgrade again. To remove the PK key, use the UEFI menu to navigate to Device Manager → Secure Boot Configuration → Custom Secure Boot Options → PK Options → Delete Signature. | |
Keyword: UEFI/ATF; PK; Secure Boot; EFI Capsule Authentication | |
Reported in version: 2.10.0 | |
4200690 | Description: The fTPM trusted application is signed for testing proposes only (i.e., not securely) with a development key. |
Workaround: N/A | |
Keyword: fTPM over OP-TEE | |
Reported in version: 2.10.0 | |
3987526 | Description: OVS-DOCA offload of meter with sFlow is not supported and may cause OVS application to crash. |
Workaround: N/A | |
Keyword: OVS-DOCA; meter; sFlow | |
Reported in version: 2.9.0 | |
N/A | Description: Applications using DPA might not work with older firmware versions . |
Workaround: Full upgrade of all DOCA 2.9.0 components including the firmware (i.e., doca-host and BF-Bundle) . | |
Keyword: DPA; backward compatibility | |
Reported in version: 2.9.0 | |
N/A | Description: Applications using FlexIO SDK API may have missing symbols during runtime. |
Workaround: Re-compile FlexIO-based applications with the DOCA 2.9.0 release. | |
Keyword: FlexIO; backward compatibility | |
Reported in version: 2.9.0 | |
4095728 | Description: Corrupt create repo causes doca-kernel repo to not contain the repo data. |
Workaround: If repo data is missing after installing the doca-kernel repo, run createrepo --help . If no output is generated, then the createrepo is corrupted and must be removed and reinstalled.
| |
Keyword: Kernel; repo | |
Reported in version: 2.9.0 | |
4049034 | Description: On openEuler 22.03 SP3 and openEuler 20.03 SP1, it is not possible to do yum update after BFB installation. |
Workaround: To perform yum update with either openEuler 22.03 SP3 and openEuler 20.03 SP1, follow these procedures depending on the use case:
| |
Keyword: openEuler | |
Reported in version: 2.9.0 | |
4046180 |
Description: PCIe data IDs that require Node , PCIe index , and Depth parameters in doca_telemetry_diag , the only valid values are 0, 0, 0.
|
Workaround: N/A | |
Keyword: DOCA Telemetry | |
Reported in version: 2.9.0 | |
4129715 | Description: Compiling Rocky 9.2 may fail when using GCC with the "native" arch flag. |
Keyword: Upgrade to toolset 13 (gcc 13). | |
Keyword: Linux; GCC | |
Reported in version: 2.9.0 | |
4035553 |
Description: oper_sample_period does not always reflect the correct sample period. In some cases, it will reflect the admin_sample_period instead.
|
Workaround: N/A | |
Keyword: Core | |
Reported in version: 2.8.0 | |
4023257 | Description: If RDMA samples are compiled with memory sanitizer enabled, "read memory leak" errors are printed when running the samples with the RDMA CM flag and when running the client before the server. |
Workaround: Make sure to start the RDMA Server before RDMA Client. | |
Keyword: DOCA RDMA; samples | |
Reported in version: 2.8.0 | |
4021752 4021748 | Description: In all RDMA samples, if an error occurs in any of the following functions:
An error is printed but the sample resumes and might:
|
Workaround for 1: Either:
Workaround for 2: The mentioned address sanitizer violation shall be ignored in case of an error in a relevant function. | |
Keyword: DOCA RDMA; samples | |
Reported in version: 2.8.0 | |
4022563 | Description: OVS-DOCA connection tracking with E2E enabled is not supported. |
Workaround: N/A | |
Keyword: OVS-DPDK; connection tracking; E2E | |
Reported in version: 2.8.0 | |
3837255 | Description: When running Arm shutdown from the host OS it is expected to get the message -E- Failed to send Register MRSI .
This message should be ignored.
|
Workaround: Wait 2 more minutes before rebooting the host. Before proceeding with host OS reboot, it is recommended to query the operational state of the BlueField Arm cores from the BlueField BMC to verify that shutdown state has been reached. Run the following command:
Expected output is | |
Keyword: Host OS; reboot; error | |
Reported in version: 2.7.0 | |
3844705 | Description: In OpenEuler 20.03, the Linux Kernel version 4.19.90 is affected by an issue that impacts the discard/trim functionality for the BlueField eMMC device which may cause degraded performance of the BlueField eMMC over time. |
Workaround: Upgrade to Linux Kernel version 5.10 or later. | |
Keyword: eMMC discard; trim functionality | |
Reported in version: 2.7.0 | |
3877725 | Description: During BFB installation in NIC mode on BlueField-3, too much information is added into RShim log which fills it, causing the Linux installation progress log to not appear in the RShim log.
|
Workaround: Monitor the BlueField-3 Arm's UART console to check whether BFB installation has completed or not for NIC mode.
| |
Keyword: NIC mode; BFB install | |
Reported in version: 2.7.0 | |
3855702 |
Description: Trying to jump from a steering level in the hardware to a lower level using software steering is not supported on rdma-core lower than 48.x.
|
Workaround: N/A | |
Keyword: RDMA; SWS | |
Reported in version: 2.7.0 | |
3855485 |
Description:
When enabling the
|
Workaround: N/A | |
Keyword: NVconfig; RShim; dmsg | |
Reported in version: 2.7.0 | |
3831230 | Description: In OpenEuler 20.03, the Linux Kernel version 4.19.90 is affected by an issue that impacts the discard/trim functionality for BlueField eMMC device which may cause degraded performance of BlueField eMMC over time. |
Workaround: Upgrade to Linux Kernel version 5.10 or later. | |
Keyword: eMMC discard; trim functionality | |
Reported in version: 2.7.0 | |
3743879 |
Description: mlxfwreset could timeout on servers where the RShim driver is running and INTx is not supported. The following error message is printed: BF reset flow encountered a failure due to a reset state error of negotiation timeout .
|
Workaround:
Set If host Linux kernel lockdown is enabled, then manually unbind the RShim driver before
| |
Keyword: Timeout; mlxfwreset; INTx | |
Reported in version: 2.7.0 | |
3678069 |
Description: If using BlueField with NVMe and mmcbld and configured to boot from mmcblk, users must create bf.cfg file with device=/dev/mmcblk0 , then install the *.bfb as normal.
|
Workaround: N/A | |
Keyword: NVMe | |
Reported in version: 2.5.0 | |
3680538 | Description: When using strongSwan or OVS-IPsec as explained in the NVIDIA BlueField DPU BSP, the IPSec Rx data path is not offloaded to hardware and occurs in software running on the Arm cores. As a result, bandwidth performance is substantially low. |
Workaround: N/A | |
Keyword: IPsec | |
Reported in version: 2.5.0 | |
N/A | Description: Execution unit partitions are still not implemented and would be added in a future release. |
Workaround: N/A | |
Keyword: EU tool | |
Reported in version: 2.5.0 | |
3666160 |
Description: Installing BFB using bfb-install when mlxconfig PF_TOTAL_SF >1700, triggers server reboot immediately.
|
Workaround: Change PF_TOTAL_SF to 0, perform a graceful shutdown, power cycle, then installing BFB.
| |
Keyword: SF; PF_TOTAL_SF ; BFB installation
| |
Reported in version: 2.2.1 | |
3594836 | Description: When enabling Flex IO SDK tracer at high rates, a slow-down in processing may occur and/or some traces may be lost. |
Workaround: Keep tracing limited to ~1M traces per second to avoid a significant processing slow-down. Use tracer for debug purposes and consider disabling it by default. | |
Keyword: Tracer FlexIO | |
Reported in version: 2.2.1 | |
3592080 | Description: When using UEK8 on the host in DPU mode, creating a VF on the host consumes about 100MB memory on BlueField |
Workaround: N/A | |
Keyword: UEK; VF | |
Reported in version: 2.2.1 | |
3546202 | Description: After rebooting a BlueField-3 DPU running Rocky Linux 8.6 BFB, the kernel log shows the following error:
This message indicates that the Ethernet driver will function normally in all aspects, except that PHY polling is enabled. |
Workaround: N/A | |
Keyword: Linux; PHY; kernel | |
Reported in version: 2.2.0 | |
3566042 | Description: Virtio hotplug is not supported in GPU-HOST mode on the NVIDIA Converged Accelerator. |
Workaround: N/A | |
Keyword: Virtio; Converged Accelerator | |
Reported in version: 2.2.0 | |
3546474 | Description: PXE boot over ConnectX interface might not work due to an invalid MAC address in the UEFI boot entry. |
Workaround: On BlueField, create /etc/bf.cfg file with the relevant PXE boot entries, then run the command bfcfg . | |
Keyword: PXE; boot; MAC | |
Reported in version: 2.2.0 | |
3561723 | Description: Running mlxfwreset sync 1 on NVIDIA Converged Accelerators may be reported as supported although it is not. Executing the reset will fail. |
Workaround: N/A | |
Keywords: mlxfwreset | |
Reported in version: 2.2.0 | |
3306489 | Description: When performing longevity tests (e.g., mlxfwreset, DPU reboot, burning of new BFBs), a host running an Intel CPU may observer errors related to "CPU 0: Machine Check Exception". |
Workaround: Add intel_idle.max_cstate=1 entry to the kernel command line. | |
Keywords: Longevity; mlxfwreset; DPU reboot | |
Reported in version: 2.2.0 | |
3534219 | Description: On BlueField-3 devices, from DOCA 2.2.0 to 32.37.1306 (or lower), the host crashes when executing partial Arm reset (e.g., Arm reboot; BFB push; mlxfwreset). |
Workaround: Before downgrading the firmware:
| |
Keyword: BlueField-3; downgrade | |
Reported in version: 2.2.0 | |
3462630 | When trying to perform a PXE installation when UEFI Secure Boot is enabled, the following error messages may be observed:
|
Workaround: Download a Grub EFI binary from the Ubuntu website. For further information on Ubuntu UEFI Secure Boot PXE Boot, please visit Ubuntu's official website. | |
Keyword: PXE; UEFI Secure Boot | |
Reported in version: 2.0.2 | |
3448841 | Description: While running CentOS 8.2, switchdev Ethernet BlueField runs in "shared" RDMA net namespace mode instead of "exclusive". |
Workaround: Use
| |
Keyword: RDMA; isolation; Net NS | |
Reported in version: 2.0.2 | |
2706803 | Description: When an NVMe controller, SoC management controller, and DMA controller are configured, the maximum number of VFs is limited to 124. |
Workaround: N/A | |
Keyword: VF; limitation | |
Reported in version: 2.0.2 | |
3273435 | Description: Changing the mode of operation between NIC and DPU modes results in different capabilities for the host driver which might cause unexpected behavior. |
Workaround: Reload the host driver or reboot the host. | |
Keyword: Modes of operation; driver | |
Reported in version: 2.0.2 | |
3264749 | Description: In Rocky and CentOS 8.2 inbox-kernel BFBs, RegEx requires the following extra huge page configuration for it to function properly:
If these commands have executed successfully you should see |
Workaround: N/A | |
Keyword: RegEx; hugepages | |
Reported in version: 1.5.1 | |
3240153 | Description: DOCA kernel support only works on a non-default kernel. |
Workaround: N/A | |
Keyword: Kernel | |
Reported in version: 1.5.0 | |
3217627 | Description: The doca_devinfo_rep_list_create API returns success on the host instead of Operation not supported . |
Workaround: N/A | |
Keyword: DOCA core; InfiniBand | |
Reported in version: 1.5.0 |
Reference | Description |
4404719 | Description: Splitting a DPU into 4 ports conflicts with the shared_rq feature. |
Workaround:
| |
Keyword: PCI information | |
Reported in version: 3.0.0 | |
4273881 | Description: PCI information is missing on RedHat p host. |
Workaround:
Matching an interface name with its PCI address requires running: | |
Keyword: PCI information | |
Reported in version: 3.0.0 | |
4155701 | Description: When offloading xfrm states to hardware, the offloading device is linked to the skb's secpath. If an skb is freed or deferred, an unregister netdevice operation may hang because the netdevice is still being reference-counted. |
Workaround: Remove the netdevice from the xfrm states when the netdevice is unregistered. | |
Keyword: IPSec Crypto Offload | |
Reported in version: 2.10.0 |
Internal Ref. | Issue |
4436922 | Description: DC InfiniBand is not functional in this firmware version. |
Workaround: N/A | |
Keywords: DC, DDP traffic | |
Detected in version: 32.45.1020 | |
4422120 | Description: Any BFB upgrade from the October GA (2.9.2) to the new BFB will trigger a 0x00b4 assert. Nonetheless, the update will complete successfully, and the customer can safely ignore the assert. |
Workaround: N/A | |
Keywords: BFB upgrade | |
Detected in version: 32.45.1020 | |
4366117 | Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail. |
Workaround: If this capability is not disabled by default on your adapter cards (
| |
Keywords: PXE boot filters | |
Detected in version: 32.45.1020 | |
4216761 | Description: For all host-related counters, the buffers used by the Arm are the same as those used by the host. Buffer usage is tracked collectively, combining both ARM and host consumption. |
Workaround: N/A | |
Keywords: Counters | |
Detected in version: 32.45.1020 | |
4125431 | Description: The MKEY created by software (VirtIo.Net DPA App is created with a length of 1 byte and used to access L2 memory. Since the minimum translation size is 64 bytes, using a 1-byte MKEY results in a translation error and triggers an exception. |
Workaround: N/A | |
Keywords: MKEY | |
Detected in version: 32.45.1020 | |
4303583 | Description: The query_header_modify_pattern command may produce inaccurate results when specific fields are used. |
Workaround: N/A | |
Keywords: query_header_modify_pattern command | |
Detected in version: 32.45.1020 | |
4296168 | Description: Running mlxfwreset fails when the DPU is configured as the root complex for NVMe drives. This issue impacts the configuration use case where the DPU acts as the root complex for NVMe drives, rather than the BF-3 in the host functioning as a PCIe Switch for the NVMe. |
Workaround: To ensure the firmware reset works correctly, explicitly run the fwreset command from the host using the "--method 1" flag (hot reset). | |
Keywords: mlxfwreset | |
Detected in version: 32.45.1020 | |
4193036 | Description: The initial allocation of DPA_THREAD on group affinity allocates memory for all EUs, including stack, core dump, and other resources. |
Workaround: N/A | |
Keywords: DPA | |
Detected in version: 32.44.1036 | |
4007228 | Description: NC-SI pass-through requires the user to allocate a MAC address to the platform BMC. |
Workaround: N/A | |
Keywords: NC-SI pass-through | |
Discovered in Version: 32.41.1000 | |
3787618 | Description: NVIA register is not allowed for external host if any field of EXTERNAL_HOST_PRIV or EXTERNAL_HOST_PRIV_FAST TLVs is not set as the default. |
Workaround: N/A | |
Keywords: Host privilege | |
Discovered in Version: 32.41.1000 | |
3636631 | Description: When configuring BlueField-3 Arm cores as PCIe root-complex, all non-mlx5 devices must always set the BlueField-3’s IOMMU to disabled or passthrough mode. Turning IOMMU “ON” requires special handling of interrupts in the driver or the use of polling. For further assistance, contact NVIDIA support. |
Workaround: N/A | |
Keywords: IOMMU | |
Discovered in Version: 32.39.2048 | |
3614529 | Description: The supported DDR5 link speed in SKU B3220, is 5200 MT/s. |
Workaround: N/A | |
Keywords: DDR5 link speed | |
Discovered in Version: 32.39.2048 | |
3728450 | Description: SW_RESET with a pending image is currently not supported. |
Workaround: N/A | |
Keywords: SW_RESET | |
Discovered in Version: 32.39.2048 | |
3614288 | Description: Occasionally, the device may hang when there a hot plug is performed from a unknown direction. |
Workaround: N/A | |
Keywords: Hot-plug operation | |
Discovered in Version: 32.39.2048 | |
- | Description: The I2C clock fall time is lower than the 12ns minimum defined in the I2C-bus specification. For further information, refer to the I²C-bus Specification, Version 7.0, October 2021, https://www.i2c-bus.org/. |
Workaround: N/A | |
Keywords: I2C clock | |
Discovered in Version: 32.39.2048 | |
3439438 | Description: When connecting to a High Speed Traffic Generator in 400G speed, the linkup time may takes up to 3 minutes. |
Workaround: N/A | |
Keywords: 400G linkup time | |
Discovered in Version: 32.38.1002 | |
3534128 | Description: External flash access such as flash read using the MFT tools will fail if there is a pending image on the flash. |
Workaround: N/A | |
Keywords: Flash access | |
Discovered in Version: 32.38.1002 | |
3534219 | Description: On BlueField-3 devices, from DOCA 2.2.0 to 32.37.1306 (or lower), the host crashes when executing partial Arm reset (e.g., Arm reboot; BFB push; mlxfwreset). |
Workaround: Before downgrading the firmware, perform:
| |
Keywords: BlueField-3; downgrade | |
Discovered in Version: 32.38.1002 | |
3547022 | Description: When unloading the network drivers on an external host, sync1 reset may be still reported as 'supported' although it is not. Thus, initiating the reset flow may result in reset failure after a few minutes. |
Workaround: N/A | |
Keywords: Sync1 reset | |
Discovered in Version: 32.38.1002 | |
3439438 | Description: When connecting to a Spirent switch in 400G speed, the linkup time may takes up to 3 minutes. |
Workaround: N/A | |
Keywords: Spirent, 400G, linkup time | |
Discovered in Version: 32.38.1002 | |
3178339 | Description: PCIe PML1 is disabled. |
Workaround: N/A | |
Keywords: PCIe PML1 | |
Discovered in Version: 32.38.1002 | |
3525865 | Description: Unexpected system behavior might be observed if the driver is loaded while reset is in progress. |
Workaround: N/A | |
Keywords: Sync 1 reset, firmware reset | |
Discovered in Version: 32.38.1002 | |
3275394 | Description: When performing PCIe link secondary-bus-reset, disable/enable or mlxfwreset on AMD based Genoa systems, the device takes longer then expected to link up, due to a PCIe receiver termination misconfiguration. |
Workaround: N/A | |
Keywords: PCIe | |
Discovered in Version: 32.37.1306 | |
2878841 | Description: The firmware rollback fails for the signature retransmit flow if the QPN field is configured in the mkey (as it only allows the given QP to use this Mkey) as the firmware rollback flow relies on an internal QP that uses the mkey. |
Workaround: N/A | |
Keywords: Signature retransmit flow | |
Discovered in Version: 32.37.1306 | |
3412847 | Description: Socket-Direct is currently not supported. |
Workaround: N/A | |
Keywords: Socket-Direct | |
Discovered in Version: 32.37.1306 |
Internal Ref. | Issue |
4366117 | Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail. |
Workaround: If this capability is not disabled by default on your adapter cards (
| |
Keywords: PXE boot filters | |
Detected in version: 24.45.1016 | |
3754913 | Description: PHYless Reset is currently not supported. |
Workaround: N/A | |
Keywords: PHYless Reset | |
Discovered in Version: 24.40.1000 | |
3525865 | Description: Unexpected system behavior might be observed if the driver is loaded while reset is in progress. |
Workaround: N/A | |
Keywords: Sync 1 reset, firmware reset | |
Discovered in Version: 24.39.2048 | |
- | Description: When This might also cause an error while using timestamps for delay measurements (e,g., delay measurements reported by a PTP daemon) and even negative delay measurements in some cases. |
Workaround: N/A | |
Keywords: PTP path delay | |
Discovered in Version: 24.38.1002 | |
2878841 | Description: The firmware rollback fails for the signature retransmit flow if the QPN field is configured in the mkey (as it only allows the given QP to use this Mkey) as the firmware rollback flow relies on an internal QP that uses the mkey. |
Workaround: N/A | |
Keywords: Signature retransmit flow | |
Discovered in Version: 24.37.1300 | |
3329109 | Description: MFS1S50-H003E cable supports only HDR rate when used as a split cable. |
Workaround: N/A | |
Keywords: HDR, split cable, MFS1S50-H003E | |
Discovered in Version: 24.37.1300 | |
3267506 | Description: CRC is included in the traffic byte counters as a port byte counter. |
Workaround: N/A | |
Keywords: Counters, CRC | |
Discovered in Version: 24.35.2000 | |
3141072 | Description: The "max_shaper_rate" configuration query via QEEC mlxreg returns a value translated to hardware granularity. |
Workaround: N/A | |
Keywords: RX Rate-Limiter, Multi-host | |
Discovered in Version: 24.34.1002 | |
2870970 | Description: GTP encapsulation (flex parser profile 3) is limited to the NIC domain. Encapsulating in the FDB domain will render a 0-size length in GTP header. |
Workaround: N/A | |
Keywords: GTP encapsulation | |
Discovered in Version: 24.34.1002 | |
2870213 | Description: Servers do not recover after configuring |
Workaround: N/A | |
Keywords: VirtIO-net; power cycle | |
Discovered in Version: 24.33.1048 | |
2855592 | Description: When working with 3rd party device (e.g., Paragon) in 25GbE speed, the 25GbE speed must be configured in force mode. |
Workaround: N/A | |
Keywords: Force mode, 3rd party devices, 25GbE | |
Discovered in Version: 24.33.1048 | |
2850003 | Description: Occasionally, when rising a logical link, the link recovery counter is increase by 1. |
Workaround: N/A | |
Keywords: Link recovery counter | |
Discovered in Version: 24.33.1048 | |
2616755 | Description: Forward action for IPoIB is not supported on RX RDMA Flow Table. |
Workaround: N/A | |
Keywords: Steering, IPoIB | |
Discovered in Version: 24.33.1048 |