NVIDIA BlueField DPU BSP v4.0.3

Bug Fixes History

Ref #

Issue Description

3151884

Description: If secure boot is enabled, the following error message is observed while installing Ubuntu on the DPU: ERROR: need to use capsule in secure boot mode . This message is harmless and may be safely ignored.

Keywords: Error message; installation

Discovered in version: 3.9.2

2793005

Description: When Arm reboots or crashes after sending a virtio-net unplug request, the hotplugged devices may still be present after Arm recovers. The host, however, will not see those devices.

Keywords:  Virtio-net; hotplug

Discovered in version: 3.7.1

3107227

Description: BlueField with secured BFB fails to boot up if the PART_SCHEME field is set in bf.cfg during installation.

Keywords: Installation; bf.cfg

Fixed in version: 3.9.2

3109270

Description: If the RShim service is running on an external host over the PCIe interface then, in very rare cases, a soft reset of the BlueField can cause a poisoned completion to be returned to the host. The host may treat this as a fatal error and crash.

Keywords: RShim; ATF

Fixed in version: 3.9.2

2790928

Description: Virtio-net-controller recovery may not work for a hot-plugged device because the system assigns a BDF (string identifier) of 0 for the hot-plugged device, which is an invalid value.

Keywords: Virtio-net; hotplug; recovery

Fixed in version: 3.9.0

2780819

Description: Eye-opening is not supported on 25GbE integrated-BMC BlueField-2 DPU.

Keywords: Firmware, eye-opening

Fixed in version: 3.9.0

2876447

Description: Virtio full emulation is not supported by NVIDIA® BlueField®-2 multi-host cards.

Keywords: Virtio full emulation; multi-host

Fixed in version: 3.9.0

2855485

Description: After BFB installation, Linux crash may occur with efi_call_rts messages in the call trace which can be seen from the UART console.

Keywords: Linux crash; efi_call_rts

Fixed in version: 3.9.0

2901514

Description: Relaxed ordering is not working properly on virtual functions.

Keywords: MLNX_OFED; relaxed ordering; VF

Fixed in version: 3.9.0

2852086

Description: On rare occasions, the UEFI variables in UVPS EEPROM are wiped out which hangs the boot process at the UEFI menu.

Keywords: UEFI; hang

Fixed in version: 3.9.0

2934828

Description: PCIe device address to RDMA device name mapping on x86 host may change after the driver restarts in Arm.

Keywords: RDMA; Arm; driver

Fixed in version: 3.9.0

-

Description: RShim driver does not work when the host is in secure boot mode.

Keywords: RShim; Secure Boot

Fixed in version: 3.9.0

2787308

Description: At rare occasions d uring Arm reset o n BMC-integrated DPUs , the DPU will send "PCIe Completion" marked as poisoned. Some servers treat that as fatal and may hang.

Keywords: Arm reset; BMC integrated

Fixed in version: 3.9.0

2585607

Description: Pushing the BFB image fails occasionally with a "bad magic number" error message showing up in the console.

Keywords: BFB push; installation

Fixed in version: 3.9.0

2802943

Description: SLD detection may not function properly.

Keywords: Firmware

Fixed in version: 3.9.0

2580945

Description: External host reboot may also reboot the Arm cores if the DPU was configured using mlxconfig.

Keywords: Non-volatile configuration; Arm; reboot

Fixed in version: 3.9.0

2899740

Description: BlueField-2 may sometimes go to PXE boot instead of Linux after installation.

Keywords: Installation; PXE

Fixed in version: 3.8.5

2870143

Description: Some DPUs may get stuck at GRUB menu when booting due to the GRUB configuration getting corrupted when board is powered down before the configuration is synced to memory.

Keywords: GRUB; memory

Fixed in version: 3.8.5

2873700

Description: The available RShim logging buffer may not have enough space to hold the whole register dump which may cause buffer wraparound.

Keywords: RShim; logging

Fixed in version: 3.8.5

2801891

Description: IPMI EMU service reports cable link as down when it is actually up.

Keywords: IPMI EMU

Fixed in version: 3.8.0

2779861

Description: Virtio-net controller does not work with devices other than mlx5_0/1.

Keywords: Virtio-net controller

Fixed in version: 3.8.0

2801378

Description: No parameter validation is done for feature bits when performing hotplug.

Keywords: Virtio-net; hotplug

Fixed in version: 3.8.0

2802917

Description: When secure boot is enabled, PXE boot may not work.

Keywords: Secure boot; PXE

Fixed in version: 3.8.0

2827413

Description: Updating a BFB could fail due to congestion.

Keywords: Installation; congestion

Fixed in version: 3.8.0

2829876

Description: For virtio-net device, modifying the number of queues does not update the number of MSIX.

Keywords: Virtio-net; queues

Fixed in version: 3.8.0

2597790

Description: A "double free" error is seen when using the "curl" utility. This happens only when OpenSSL is configured to use a dynamic engine (e.g. Bluefield PKA engine).

Keywords: OpenSSL; curl

Fixed in version: 3.8.0

2853295

Description: UEFI secure boot enables the kernel lockdown feature which blocks access by mstmcra.

Keywords: Secure boot

Fixed in version: 3.8.0

2854472

Description: Virtio-net controller may fail to start after power cycle.

Keywords: Virtio-net controller

Fixed in version: 3.8.0

2854995

Description: Memory consumed for a representor exceeds what is necessary making scaling to 504 SF's not possible.

Keywords: Memory

Fixed in version: 3.8.0

2856652

Description: Modifying VF bits yields an error.

Keywords: Virtio-net controller

Fixed in version: 3.8.0

2859066

Description: Arm hangs when user is thrown to livefish by FW (e.g. secure boot).

Keywords: Arm; livefish

Fixed in version: 3.8.0

2866082

Description: The current installation flow requires multiple resets after booting the self-install BFB due to the watchdog being armed after capsule update.

Keywords: Reset; installation

Fixed in version: 3.8.0

2866537

Description: Power-off of BlueField shows up as a panic which is then stored in the RShim log and carried into the BERT table in the next boot which is misleading to the user.

Keywords: RShim; log; panic

Fixed in version: 3.8.0

2868944

Description: Various errors related to the UPVS store running out of space are observed.

Keywords: UPVS; errors

Fixed in version: 3.8.0

2754798

Description: oob_net0 cannot receive traffic after a network restart.

Keywords: oob_net0

Fixed in version: 3.8.0

2691175

Description: Up to 31 hot-plugged virtio-net devices are supported even if PCI_SWITCH_EMULATION_NUM_PORT=32. Host may hang if it hot plugs 32 devices.

Keywords: Virtio-net; hotplug

Fixed in version: 3.8.0

2597973

Description: Working with CentOS 7.6, if SF network interfaces are statically configured, the following parameters should be set.

NM_CONTROLLED="no"
DEVTIMEOUT=30

For example:

Copy
Copied!
            

# cat /etc/sysconfig/network-scripts/ifcfg-p0m0 NAME=p0m0 DEVICE=p0m0 NM_CONTROLLED="no" PEERDNS="yes" ONBOOT="yes" BOOTPROTO="static" IPADDR=12.212.10.29 BROADCAST=12.212.255.255 NETMASK=255.255.0.0 NETWORK=12.212.0.0 TYPE=Ethernet DEVTIMEOUT=30

Keywords: CentOS; subfunctions; static configuration

Fixed in version: 3.7.0

2581534

Description: When shared RQ mode is enabled and offloads are disabled, running multiple UDP connections from multiple interfaces can lead to packet drops.

Keywords: Offload; shared RQ

Fixed in version: 3.7.0

2581621

Description: When OVS-DPDK and LAG are configured, the kernel driver drops the LACP packet when working in shared RQ mode.

Keywords: OVS-DPDK; LAG; LACP; shared RQ

Fixed in version: 3.7.0

2601094

Description: The gpio-mlxbf2 and mlxbf-gige drivers are not supported on 4.14 kernel.

Keywords: Drivers; kernel

Fixed in version: 3.7.0

2584427

Description: Virtio-net-controller does not function properly after changing uplink representor MTU.

Keywords: Virtio-net controller; MTU

Fixed in version: 3.7.0

2438392

Description: VXLAN with IPsec crypto offload does not work.

Keywords: VXLAN; IPsec crypto

Fixed in version: 3.7.0

2406401

Description: Address Translation Services is not supported in BlueField-2 step A1 devices. Enabling ATS can cause server hang.

Keywords: ATS

Fixed in version: 3.7.0

2402531

Description: PHYless reset on BlueField-2 devices may cause the device to disappear.

Keywords: PHY; firmware reset

Fixed in version: 3.7.0

2400381

Description: When working with strongSwan 5.9.0bf, running ip xfrm state show returns partial information as to the offload parameters, not showing "mode full".

Keywords: strongSwan; ip xfrm; IPsec

Fixed in version: 3.7.0

2392604

Description: Server crashes after configuring PCI_SWITCH_EMULATION_NUM_PORT to a value higher than the number of PCIe lanes the server supports.

Keywords: Server; hang

Fixed in version: 3.7.0

2293791

Description: Loading/reloading NVMe after enabling VirtIO fails with a PCI bar memory mapping error.

Keywords: VirtIO; NVMe

Fixed in version: 3.7.0

2245983

Description: When working with OVS in the kernel and using Connection Tracking, up to 500,000 flows may be offloaded.

Keywords: DPU; Connection Tracking

Fixed in version: 3.7.0

1945513

Description: If the Linux OS running on the host connected to the BlueField DPU has a kernel version lower then 4.14, MLNX_OFED package should be installed on the host.

Keywords: Host OS

Fixed in version: 3.7.0

1900203

Description: During heavy traffic, ARP reply from the other tunnel endpoint may be dropped. If no ARP entry exists when flows are offloaded, they remain stuck on the slow path.

Workaround: Set a static ARP entry at the BlueField Arm to VXLAN tunnel endpoints.

Keywords: ARP; Static; VXLAN; Tunnel; Endpoint

Fixed in version: 3.7.0

2082985

Description: During boot, the system enters systemctl emergency mode due a corrupt root file system.

Keywords: Boot

Fixed in version: 3.6.0.11699

2278833

Description: Creating a bond via NetworkManager and restarting the driver (openibd restart) results in no pf0hpf and bond creation failure.

Keywords: Bond; LAG; network manager; driver reload

Fixed in version: 3.6.0.11699

2286596

Description: Only up to 62 host virtual functions are currently supported.

Keywords: DPU; SR-IOV

Fixed in version: 3.6.0.11699

2397932

Description: Before changing SR-IOV mode or reloading the mlx5 drivers on IPsec-enabled systems, make sure all IPsec configurations are cleared by issuing the command ip x s f && ip x p f.

Keywords: IPsec; SR-IOV; driver

Fixed in version: 3.6.0.11699

2405039

Description: In Ubuntu, during or after a reboot of the Arm, manually, or as part of a firmware reset, the network devices may not transition to switchdev mode. No device representors would be created (pf0hpf, pf1hpf, etc). Driver loading on the host will timeout after 120 seconds.

Keywords: Ubuntu; reboot; representors; switchdev

Fixed in version: 3.6.0.11699

2403019

Description: EEPROM storage for UEFI variables may run out of space and cause various issues such as an inability to push new BFB (due to timeout) or exception when trying to enter UEFI boot menu.

Keywords: BFB install; timeout; EEPROM UEFI Variable; UVPS

Fixed in version: 3.6.0.11699

2458040

Description: When using OpenSSL on BlueField platforms where Crypto support is disabled, the following errors may be encountered:
PKA_ENGINE: PKA instance is invalid
PKA_ENGINE: failed to retrieve valid instanceThis happens due to OpenSSL configuration being linked to use PKA hardware, but that hardware is not available since crypto support is disabled on these platforms.

Keywords: PKA; Crypto

Fixed in version: 3.6.0.11699

2456947

Description: All NVMe emulation counters (Ctrl, SQ, Namespace) return "0" when queried.

Keywords: Emulated devices; NVMe

Fixed in version: 3.6.0.11699

2411542

Description: Multi-APP QoS is not supported when LAG is configured.

Keywords: Multi-APP QoS; LAG

Fixed in version: 3.6.0.11699

2394130

Description: When creating a large number of VirtIO VFs, hung task call traces may be seen in the dmesg.

Keywords: VirtIO; call traces; hang

Fixed in version: 3.5.1.11601

2398050

Description: Only up to 60 virtio-net emulated virtual functions are supported if LAG is enabled.

Keywords: Virtio-net; LAG

Fixed in version: 3.5.1.11601

2256134

Description: On rare occasions, rebooting the BlueField DPU may result in traffic failure from the x86 host.

Keywords: Host; Arm

Fixed in version: 3.5.1.11601

2400121

Description: When emulated PCIe switch is enabled, and more than 8 PFs are enabled, the BIOS boot process might halt.

Keywords: Emulated PCIe switch

Fixed in version: 3.5.0.11563

2082985

Description: During boot, the system enters systemctl emergency mode due a corrupt root file system.

Keywords: Boot

Fixed in version: 3.5.0.11563

2249187

Description: With the OCP card connecting to multiple hosts, one of the hosts could have the RShim PF exposed and probed by the RShim driver.

Keywords: RShim; multi-host

Fixed in version: 3.5.0.11563

2363650

Description: When moving to separate mode on the DPU, the OVS bridge remains and no ping is transmitted between the Arm cores and the remote server.

Keywords: SmartNIC; operation modes

Fixed in version: 3.5.0.11563

2394226

Description: Pushing the BFB image v3.5 with a WinOF-2 version older than 2.60 can cause a crash on the host side.

Keywords: Windows; RShim

Fixed in version: 3.5.0.11563

© Copyright 2023, NVIDIA. Last updated on Jun 23, 2023.