NVIDIA BlueField DPU BSP v3.9.2
1.0

Known Issues

Ref #

Issue

3077361

Description: Enrolling new NVIDIA certificates is a mandatory prerequisite for any future software update beyond version v3.9.2.

Workaround: See section Enrolling New NVIDIA Certificates for instructions.

Keywords: NVIDIA certificates; signing keys; enrolling new keys

Discovered in version: 3.9.2

3151884

Description: If secure boot is enabled, the following error message is observed while installing Ubuntu on the DPU: ERROR: need to use capsule in secure boot mode. This message is harmless and may be safely ignored.

Workaround: N/A

Keywords: Error message; installation

Discovered in version: 3.9.2

3012182

Description: The command ethtool -I --show-fec is not supported by the DPU with kernel 5.4.

Workaround: N/A

Keywords: Kernel; show-fec

Discovered in version: 3.9.0

3048250

Description: When configuring the DPU to operate in NIC Mode, the following parameters must be set to default (i.e., 0): HIDE_PORT2_PF, NVME_EMULATION_ENABLE, and VIRTIO_NET_EMULATION_ENABLE.

Workaround: N/A

Keywords: Operation mode

Discovered in version: 3.9.0

2855986

Description: After disabling SR-IOV VF on a virtio device, removing virtio-net/PCIe driver from guest OS may render the virtio controller unusable .

Workaround: Restart the virtio-net controller to recover it. To avoid this issue, m onitor the log from controller and make sure VF resources are destroyed before unloading virtio-net/PCIe drivers.

Keywords: Virtio-net; VF

Discovered in version: 3.9.0

2863456

Description: SA limit by packet count (hard and soft) are supported only on traffic originated from the ECPF. Trying to configure them on VF traffic removes the SA when hard limit is hit. However, traffic could still pass as plain text due to the tunnel offload used in such configuration.

Workaround: N/A

Keywords: ASAP2; IPsec Full Offload

Discovered in version: 3.9.0

2982184

Description: When multiple BlueField resets are issued within 10 seconds of each other, EEPROM error messages are displayed on the console and, as a result, the BlueField may not boot from the eMMC and may halt at the UEFI menu.

Workaround: Power-cycle the BlueField to fix the EEPROM issue. Manual recovery of the boot options and/or SW installation may be needed.

Keywords: Reset; EEPROM

Discovered in version: 3.9.0

2853408

Description: Some pre-OS environments may fail when sensing a hot plug operation during their boot stage.

Workaround: Run "mlxconfig -d <mst dev> set PF_LOG_BAR_SIZE=0".

Keywords: BIOS; hot-plug; Virtio-net

Discovered in version: 3.9.0

2934833

Description: Running I/O traffic and toggling both physical ports status in a stressful manner on the receiving-end machine may cause traffic loss.

Workaround: N/A

Keywords: MLNX_OFED; RDMA; port toggle

Discovered in version: 3.8.5

2911425

Description: ProLiant DL385 Gen10 Plus server with BIOS version 1.3 hangs when large number of SFs (PF_TOTAL_SF=252) are configured.

Workaround: Update the BIOS version to 2.4 which should correctly detect the PCIe device with the bigger BAR size.

Keywords: Scalable functions; BIOS

Discovered in version: 3.8.5

2801780

Description: When running virtio-net-controller with host kernel older than 3.10.0-1160.el7, host virtio driver may get error (Unexpected TXQ (13) queue failure: -28) from dmesg in traffic stress test.

Workaround: N/A

Keywords: Virtio-net; error

Discovered in version: 3.8.0

2824859

Description: Hotplug/unplug of virtio-net devices during host shutdown/bootup may result in failure to do plug/unplug.

Workaround: Power cycle the host.

Keywords: Virtio-net, hotplug

Discovered in version: 3.8.0

2870213

Description: Servers do not recover after configuring PCI_SWITCH_EMULATION_NUM_PORT to 32 followed by power cycle.

Workaround: N/A

Keywords: VirtIO-net; power cycle

Discovered in version: 3.8.0

Description: Only QP queues are supported for GGA accelerators from this version onward.

Workaround: N/A

Keywords: Firmware; SQ; QP

Discovered in version: 3.8.0

2846108

Description: Setting VHCA_TRUST_LEVEL does not work when there are active SFs or VFs.

Workaround: N/A

Keywords: Firmware; SF; VF

Discovered in version: 3.8.0

2793005

Description: When Arm reboots or crashes after sending a virtio-net unplug request, the hotplugged devices may still be present after Arm recovers. The host, however, will not see those devices.

Workaround: Power cycle the host to remove zombie devices.

Keywords:  Virtio-net; hotplug

Discovered in version: 3.7.1

2750499

Description: Some devlink commands are only supported by mlnx devlink (/opt/mellanox/iproute2/sbin/devlink). The default devlink from the OS may produce failure (e.g., devlink port show -j).

Workaround: N/A

Keywords: Devlink

Discovered in version: 3.7.1

2730157

Description: Kernel upgrade is not currently supported on BlueField as there are out of tree kernel modules (e.g., ConnectX drivers that will stop working after kernel upgrade).

Workaround: Kernel can be upgraded if there is a matching DOCA repository that includes all the drivers compiled with the new kernel or as a part of the new BFB package.

Keywords: Kernel; upgrade

Discovered in version: 3.7.0

2706710

Description: Call traces are seen on the host when recreating VFs before the controller side finishes the deletion procedure.

Workaround: N/A

Keywords: Virtio-net controller

Discovered in version: 3.7.0

2685478

Description: 3rd party (netkvm.sys) Virtio-net drivers for Windows do not support SR-IOV.

Workaround: N/A

Keywords: Virtio-net; SR-IOV; WinOF-2

Discovered in version: 3.7.0

2685191

Description: Once Virtio-net is enabled, the mlx5 Windows VF becomes unavailable.

Workaround: N/A

Keywords: Virtio-net; virtual function; WinOF-2

Discovered in version: 3.7.0

2702395

Description: When a device is hot-plugged from the virtio-net controller, the host OS may hang when warm reboot is performed on the host and Arm at the same time.

Workaround: Reboot the host OS first and only then reboot DPU.

Keywords: Virtio-net controller; hot-plug; reboot

Discovered in version: 3.7.0

2684501

Description: Once the contiguous memory pool, a limited resource, is exhausted, fallback allocation to other methods occurs. This process triggers cma_alloc failures in the dmesg log.

Workaround: N/A

Keywords: Log; cma_alloc; memory

Discovered in version: 3.7.0

2590016

Description: ibdev2netdev tool is not supported for PCIe PF operating in switchdev mode or on SFs.

Workaround: N/A

Keywords: ibdev2netdev

Discovered in version: 3.6.0.11699

2590016

Description: A "double free" error is seen when using the "curl" utility. This error is from libcrypto.so library which is part of the OpenSSL package. This happens only when OpenSSL is configured to use a dynamic engine (e.g. Bluefield PKA engine).

Workaround: Set OPENSSL_CONF=/etc/ssl/openssl.cnf.orig before using the curl utility.

For example:

Copy
Copied!
            

# OPENSSL_CONF=/etc/ssl/openssl.cnf.orig curl -O https://tpo.pe/pathogen.vim

Warning

OPENSSL_CONF is aimed at using a custom config file for applications. In this case, it is used to point to a config file where dynamic engine (PKA engine) is not enabled.

Keywords: OpenSSL; curl

Discovered in version: 3.6.0.11699

2407897

Description: The host may crash when the number of PCIe devices overflows the PCIe device address. According to the PCIe spec, the device address space is 8 bits in total—device (5 bits) and function (3 bits)—which means that the total number of devices cannot be more than 256.
The second PF maximum number of VFs is limited by the total number of additional PCIe devices that precedes it. By default, the preceding PCIe devices are 2 PFs + RShim DMA + 127 VFs of the first PF. This means that the maximum valid number of VFs for the second port will be 126.

Workaround: Use the maximum allowed VFs on the 2nd PCIe PF of BlueField instead of the maximum of 127 VFs.

Keywords: Emulated devices; VirtIO-net; VirtIO-blk; VFs; RShim

Discovered in version: 3.6.0.11699

2445289

Description: If secure boot is enabled, MFT cannot be installed on the BlueField DPU independently from BlueField drivers (MLNX_OFED).

Workaround: N/A

Keywords: MFT; secure boot

Discovered in version: 3.5.1.11601

2377021

Description: Executing "sudo poweroff" on the Arm side causes the system to hang.

Workaround: Reboot your BlueField device or power cycle the server.

Keywords: Hang; reboot

Discovered in version: 3.5.0.11563

2350132

Description: Boot process hangs at BIOS (version 1.2.11) stage when power cycling a server (model Dell PowerEdge R7525) after configuring "PCI_SWITCH_EMULATION_NUM_PORT" > 27​​​​​​​.

Workaround: N/A

Keywords: Server; hang; power cycle

Discovered in version: 3.5.0.11563

2581408

Description: On a BlueField device operating in Embedded CPU mode, PXE driver will fail to boot if the Arm side is not fully loaded and the OVS bridge is not configured.

Workaround: Run warm reboot on the host side and boot again via the device when Arm is up and the OVS bridge is configured.

Keywords: Embedded CPU; PXE; UEFI; Arm

Discovered in version: 2.5.0.11176

1859322

Description: On some setups, DPU does not power on following server cold boot when UART cable is attached to the same server.

Workaround: As long as the RShim driver is loaded on the server and the RShim interface is visible, the RShim driver will detect this and auto-reset the card into normal state.

Keywords: DPU; Arm; Cold Boot

Discovered in version: 2.4.0.11082

1899921

Description: Driver restart fails when SNAP service is running.

Workaround: Stop the SNAP services nvme_sf and nvme_snap@nvme0, then restart the driver. After the driver loads restart the services.

Keywords: SNAP

Discovered in version: 2.2.0.11000

1911618

Description: Defining namespaces with certain Micron disks (Micron_9300_MTFDHAL3T8TDP) using consecutive attach-ns commands can cause errors.

Workaround: Add delay between attach-ns commands.

Keywords: Micron; disk; namespace; attach-ns

Discovered in version: 2.2.0.11000

© Copyright 2023, NVIDIA. Last updated on Sep 9, 2023.