NVIDIA BlueField DPU BSP v3.8.0
1.0

Known Issues

Ref #

Issue

2852086

Description: On rare occasions, the UEFI variables in UVPS EEPROM are wiped out which hangs the boot process at the UEFI menu.

Workaround: N/A

Keywords: UEFI; hang

Discovered in version: 3.7.0

2801780

Description: When running virtio-net-controller with host kernel older than 3.10.0-1160.el7, the host virtio driver may get error (Unexpected TXQ (13) queue failure: -28) from dmesg in traffic stress test.

Workaround: N/A

Keywords: Virtio-net; error

Discovered in version: 3.8.0

2859206

Description: First generation BlueField SoC based DPUs are not supported in this release.

Workaround: N/A

Keywords: BlueField; SoC

Discovered in version: 3.8.0

2855485

Description: After BFB installation, Linux crash may occur with efi_call_rts messages in the call trace which can be seen from the UART console.

Workaround: Power cycle the setup and re-install the BFB.

Keywords: Linux crash; efi_call_rts

Discovered in version: 3.8.0

2876447

Description: Virtio full emulation is not supported by NVIDIA® BlueField®-2 multi-host cards.

Workaround: N/A

Keywords: Virtio full emulation; multi-host

Discovered in version: 3.8.0

2824859

Description: Hotplug/unplug of virtio-net devices during host shutdown/bootup may result in failure to do plug/unplug.

Workaround: Power cycle the host.

Keywords: Virtio-net, hotplug

Discovered in version: 3.8.0

2870213

Description: Servers do not recover after configuring PCI_SWITCH_EMULATION_NUM_PORT to 32 followed by power cycle.

Workaround: N/A

Keywords: VirtIO-net; power cycle

Discovered in version: 3.8.0

2585607

Description: Pushing the BFB image fails occasionally with a "bad magic number" error message showing up in the console.

Workaround: Retry pushing the BFB.

Keywords: BFB push; installation

Discovered in version: 3.8.0

Description: Only QP queues are supported for GGA accelerators from this version onward.

Workaround: N/A

Keywords: Firmware; SQ; QP

Discovered in version: 3.8.0

2846108

Description: Setting VHCA_TRUST_LEVEL does not work when there are active SFs or VFs.

Workaround: N/A

Keywords: Firmware; SF; VF

Discovered in version: 3.8.0

2793005

Description: When Arm reboots or crashes after sending a virtio-net unplug request, the hotplugged devices may still be present after Arm recovers. The host, however, will not see those devices.

Workaround: Power cycle the host to remove zombie devices.

Keywords:  Virtio-net; hotplug

Discovered in version: 3.7.1

2787308

Description: At rare occasions d uring Arm reset o n BMC-integrated DPUs , the DPU will send "PCIe Completion" marked as poisoned. Some servers treat that as fatal and may hang.

Workaround: N/A

Keywords: Arm reset; BMC integrated

Discovered in version: 3.7.1

2790928

Description: Virtio-net-controller recovery may not work for a hot-plugged device because the system assigns a BDF (string identifier) of 0 for the hot-plugged device, which is an invalid value.

Workaround: N/A

Keywords: Virtio-net; hotplug; recovery

Discovered in version: 3.7.1

2780819

Description: Eye-opening is not supported on 25GbE integrated-BMC BlueField-2 DPU card.

Workaround: N/A

Keywords: Firmware, eye-opening

Discovered in version: 3.7.1

2750499

Description: Some devlink commands are only supported by mlnx devlink (/opt/mellanox/iproute2/sbin/devlink). The default devlink from the OS may produce failure (e.g., devlink port show -j).

Workaround: N/A

Keywords: Devlink

Discovered in version: 3.7.1

2730157

Description: Kernel upgrade is not currently supported on BlueField as there are out of tree kernel modules (e.g., ConnectX drivers that will stop working after kernel upgrade).

Workaround: Kernel can be upgraded if there is a matching DOCA repository that includes all the drivers compiled with the new kernel or as a part of the new BFB package.

Keywords: Kernel; upgrade

Discovered in version: 3.7.0

2706710

Description: Call traces are seen on the host when recreating VFs before the controller side finishes the deletion procedure.

Workaround: N/A

Keywords: Virtio-net controller

Discovered in version: 3.7.0

2685478

Description: 3rd party (netkvm.sys) Virtio-net drivers for Windows do not support SR-IOV.

Workaround: N/A

Keywords: Virtio-net; SR-IOV; WinOF-2

Discovered in version: 3.7.0

2685191

Description: Once Virtio-net is enabled, the mlx5 Windows VF becomes unavailable.

Workaround: N/A

Keywords: Virtio-net; virtual function; WinOF-2

Discovered in version: 3.7.0

2702395

Description: When a device is hot-plugged from the virtio-net controller, the host OS may hang when warm reboot is performed on the host and Arm at the same time.

Workaround: Reboot the host OS first and only then reboot DPU.

Keywords: Virtio-net controller; hot-plug; reboot

Discovered in version: 3.7.0

2684501

Description: Once the contiguous memory pool, a limited resource, is exhausted, fallback allocation to other methods occurs. This process triggers cma_alloc failures in the dmesg log.

Workaround: N/A

Keywords: Log; cma_alloc; memory

Discovered in version: 3.7.0

2585607

Description: Pushing the BFB image fails occasionally with a "bad magic number" error message showing up in the console.

Workaround: Retry pushing the BFB.

Keywords: BFB push; installation

Discovered in version: 3.6.0.11699

2590016

Description: ibdev2netdev tool is not supported for PCIe PF operating in switchdev mode or on SFs.

Workaround: N/A

Keywords: ibdev2netdev

Discovered in version: 3.6.0.11699

2590016

Description: A "double free" error is seen when using the "curl" utility. This error is from libcrypto.so library which is part of the OpenSSL package. This happens only when OpenSSL is configured to use a dynamic engine (e.g. Bluefield PKA engine).

Workaround: Set OPENSSL_CONF=/etc/ssl/openssl.cnf.orig before using the curl utility.

For example:

Copy
Copied!
            

# OPENSSL_CONF=/etc/ssl/openssl.cnf.orig curl -O https://tpo.pe/pathogen.vim

Warning

OPENSSL_CONF is aimed at using a custom config file for applications. In this case, it is used to point to a config file where dynamic engine (PKA engine) is not enabled.

Keywords: OpenSSL; curl

Discovered in version: 3.6.0.11699

2407897

Description: The host may crash when the number of PCIe devices overflows the PCIe device address. According to the PCIe spec, the device address space is 8 bits in total—device (5 bits) and function (3 bits)—which means that the total number of devices cannot be more than 256.
The second PF maximum number of VFs is limited by the total number of additional PCIe devices that precedes it. By default, the preceding PCIe devices are 2 PFs + RShim DMA + 127 VFs of the first PF. This means that the maximum valid number of VFs for the second port will be 126.

Workaround: Use the maximum allowed VFs on the 2nd PCIe PF of BlueField instead of the maximum of 127 VFs.

Keywords: Emulated devices; VirtIO-net; VirtIO-blk; VFs; RShim

Discovered in version: 3.6.0.11699

2580945

Description: External host reboot may also reboot the Arm cores if the DPU was configured using mlxconfig.

Workaround: N/A

Keywords: Non-volatile configuration; Arm; reboot

Discovered in version: 3.6.0.11699

2445289

Description: If secure boot is enabled, MFT cannot be installed on the BlueField DPU independently from BlueField drivers (MLNX_OFED).

Workaround: N/A

Keywords: MFT; secure boot

Discovered in version: 3.5.1.11601

2377021

Description: Executing "sudo poweroff" on the Arm side causes the system to hang.

Workaround: Reboot your BlueField device or power cycle the server.

Keywords: Hang; reboot

Discovered in version: 3.5.0.11563

2350132

Description: Boot process hangs at BIOS (version 1.2.11) stage when power cycling a server (model Dell PowerEdge R7525) after configuring "PCI_SWITCH_EMULATION_NUM_PORT" > 27​​​​​​​.

Workaround: N/A

Keywords: Server; hang; power cycle

Discovered in version: 3.5.0.11563

2581408

Description: On a BlueField device operating in Embedded CPU mode, PXE driver will fail to boot if the Arm side is not fully loaded and the OVS bridge is not configured.

Workaround: Run warm reboot on the host side and boot again via the device when Arm is up and the OVS bridge is configured.

Keywords: Embedded CPU; PXE; UEFI; Arm

Discovered in version: 2.5.0.11176

1859322

Description: On some setups, DPU does not power on following server cold boot when UART cable is attached to the same server.

Workaround: As long as the RShim driver is loaded on the server and the RShim interface is visible, the RShim driver will detect this and auto-reset the card into normal state.

Keywords: DPU; Arm; Cold Boot

Discovered in version: 2.4.0.11082

1899921

Description: Driver restart fails when SNAP service is running.

Workaround: Stop the SNAP services nvme_sf and nvme_snap@nvme0, then restart the driver. After the driver loads restart the services.

Keywords: SNAP

Discovered in version: 2.2.0.11000

1911618

Description: Defining namespaces with certain Micron disks (Micron_9300_MTFDHAL3T8TDP) using consecutive attach-ns commands can cause errors.

Workaround: Add delay between attach-ns commands.

Keywords: Micron; disk; namespace; attach-ns

Discovered in version: 2.2.0.11000

© Copyright 2023, NVIDIA. Last updated on Sep 9, 2023.