NVIDIA BlueField DPU BSP v4.7.0
NVIDIA BlueField DPU BSP v4.7.0

Bug Fixes History

Ref #

Issue Description

3660460

Description: Ubuntu kernel 5.15.0-88-generic backports a bug from the upstream kernel which results in virtio-net full emulation not functioning.

Keywords: Kernel

Fixed in version: 4.6.0

3695367

Description: For BlueField-2, although an option to configure "large ICM size" appears in the UEFI menu it is not functional as large ICM size is not supported on it.

Keywords: UEFI

Fixed in version: 4.6.0

3571285

Description: Intermittent UEFI/grub exception after many power-cycles:

Copy
Copied!
            

Call Stack: Synchronous Exception at 0xF4B72E0C   ERR[UEFI]: PC=0xF4B72E0C ERR[UEFI]: PC=0xF4B72E70 ERR[UEFI]: PC=0xF4B73570 ERR[UEFI]: PC=0xF4B74904 ERR[UEFI]: PC=0xF4F04444 ERR[UEFI]: PC=0xF4F044F8 ERR[UEFI]: PC=0xF4F05160 ERR[UEFI]: PC=0xF4F02030 ERR[UEFI]: PC=0xFDFC3A38 (0xFDFB0000+0x13A38) [ 1] DxeCore.dll ERR[UEFI]: PC=0xF56E3594 (0xF56D4000+0xF594) [ 2] BdsDxe.dll ERR[UEFI]: PC=0xF56F1FFC (0xF56D4000+0x1DFFC) [ 2] BdsDxe.dll ERR[UEFI]: PC=0xF56F40D4 (0xF56D4000+0x200D4) [ 2] BdsDxe.dll ERR[UEFI]: PC=0xFDFC6E50 (0xFDFB0000+0x16E50) [ 3] DxeCore.dll ERR[UEFI]: PC=0x880092E0 ERR[UEFI]: PC=0x8800947C ERR[UEFI]: X0=0x0 X1=0xF4B78FC3 X2=0xE X3=0x0 ERR[UEFI]: X4=0x0 X5=0xFFFFFFFFFFFFFFF8 X6=0x0 X7=0xFFFFFFF5 ERR[UEFI]: X8=0xF4B79480 X9=0x2 X10=0xFFFFFFFFFFFFFFFF X11=0xFFFFDC00

Keyword: Security

Fixed in version: 4.5.0

3599839

Description: On a reboot following BFB install, the error message "Boot Image update completed, Status: Volume Corrupt" is observed. The error is non-functional and may be safely ignored.

Keyword: Software provisioning; EFI capsule update; eMMC boot partitions

Fixed in version: 4.5.0

3556795

Description: The first uplink representor interface may not be renamed to p0 from ethX .

Keyword: Representors

Fixed in version: 4.5.0

3629875

Description: Fixed base address of static ICM .

Keyword: ICM

Fixed in version: 4.5.0

3365363

Description: On BlueField-3, when booting virtio-net emulation device using a GRUB2 bootloader, the bootloader may attempt to close and re-open the virtio-net device. This can result in unexpected behavior and possible system failure to boot.

Keywords: BlueField-3; virtio-net; UEFI

Fixed in version: 4.5.0

3373849

Description: Different OVS-based packages can include their own systemd services which prevents /sbin/mlnx_bf_configure from identifying the right one.

Keywords: OVS; systemd

Fixed in version: 4.5.0

3605332

Description: A dmseg is printed due to the OVS bridge interface being configured DOWN by default.

Keyword: OVS

Fixed in version: 4.2.1

3479040

Description: For non-LSO data, a max chain of 4 descriptors is posted onto the send queue resulting in a partial packet going out on the wire.

Keyword: Send; LSO

Fixed in version: 4.2.1

3549785

Description: NVMe and mlx5_core drivers fail during BFB installation. As a result, Anolis OS cannot be installed on the SSD and the mlxfwreset command does not work during Anolis BFB installation.

Keyword: Linux; NVMe; BFB installation

Fixed in version: 4.2.1

3393316

Description: When LSO is enabled, if the header and data appear in the same fragment, the following warning is given from tcpdump:

Copy
Copied!
            

truncated-ip - 9 bytes missing

Keyword: Virtio-net; large send offload

Fixed in version: 4.2.1

3554128

Description: "dmidecode" output does not match "ipmitool fru print" output.

Keywords: IPMI; print

Fixed in version: 4.2.1

3508018

Description: Failure to ssh to Arm via 1GbE OOB interface is experienced after performing warm reboot on the DPU.

Keywords: SSH; reboot

Fixed in version: 4.2.0

3451539

Description: BSP build number (fourth digit in version number) does not appear in UEFI menu.

Keywords: UEFI; software

Fixed in version: 4.2.0

3259805

Description: Following many power cycles on the BlueField DPU, the virtio-net controller may fail to start with the error failed to register epoll in the log.

Keywords: Virtio-net; power cycle; epoll

Fixed in version: 4.2.0

3266180

Description: Enabled reset on MMC to enhance recovery on error.

Keywords: MMC; reset

Fixed in version: 4.2.0

3448217

Description: The PKA engine is not working on CentOS 7.6 due to multiple OpenSSL versions (1.0.2k 1.1.1k) being installed and the library loader not selecting the correct version of the openssl library.

Keywords: PKA; OpenSSL

Fixed in version: 4.2.0

3448228

Description: On virtio-net devices with LSO (large send offload) enabled, bogus packets may be captured on the SF representor when running heavy iperf traffic.

Keywords: Virtio-net; iperf

Fixed in version: 4.2.0

3452583

Description: OpenSSL is not working with PKA engine on CentOS 7.6 with 4.23 5.4 5.10 kernels due to multiple versions of OpenSSL(1.0.2k and 1.1.1k) are installed.

Keywords: OpenSSL; PKA

Fixed in version: 4.2.0

3455873

Description: 699140280000 OPN is not supported.

Keywords: SKU; support

Fixed in version: 4.2.0

3519341

Description: Populate the vGIC maintenance interrupt number in MADT to avoid harmless.

Keywords: Error

Fixed in version: 4.2.0

3522652

Description: The timer frequency is measured using the c0 fmon feature causing new kernels to complain if CNTFRQ_EL0 has a different value on different cores.

Keywords: Timer frequency

Fixed in version: 4.2.0

3531965

Description: Memory info displayed via dmidecode is not correct for memory sizes 32G and above.

Keywords: Memory; dmidecode

Fixed in version: 4.2.0

3362181

Description: A customized BFB with an older kernel does not support bond speed above 200Gb/s.

Keywords: Bond; LAG; speed

Fixed in version: 4.2.0

3177569

Description: DCBX configuration may not take effect.

Keywords: DCBX; QoS; lldpad

Fixed in version: 4.2.0

2824859

Description: Hotplug/unplug of virtio-net devices during host shutdown/bootup may result in failure to do plug/unplug.

Keywords: Virtio-net, hotplug

Fixed in version: 4.2.0

3252083

Description: Assert errors may be observed in the RShim log after reset/reboot. These errors are harmless and may be ignored.

Keywords: RShim; log; error

Fixed in version: 4.0.3

3240060

Description: Hotplug of a modern virtio-net device is not supported when VIRTIO_EMULATION_HOTPLUG_TRANS is TRUE from mlxconfig.

Keywords: Virtio-net; hotplug; legacy

Fixed in version: 4.0.3

3240182

Description: Virtio-net full emulation is not supported in CentOS 8.2 with inbox-kernel 4.18.0-193.el8.aarch64.

Keywords: Virtio-net; CentOS

Fixed in version: 4.0.3

3151884

Description: If secure boot is enabled, the following error message is observed while installing Ubuntu on the DPU: ERROR: need to use capsule in secure boot mode . This message is harmless and may be safely ignored.

Keywords: Error message; installation

Fixed in version: 3.9.3

2793005

Description: When Arm reboots or crashes after sending a virtio-net unplug request, the hotplugged devices may still be present after Arm recovers. The host, however, will not see those devices.

Keywords:  Virtio-net; hotplug

Fixed in version: 3.9.3

3107227

Description: BlueField with secured BFB fails to boot up if the PART_SCHEME field is set in bf.cfg during installation.

Keywords: Installation; bf.cfg

Fixed in version: 3.9.2

3109270

Description: If the RShim service is running on an external host over the PCIe interface then, in very rare cases, a soft reset of the BlueField can cause a poisoned completion to be returned to the host. The host may treat this as a fatal error and crash.

Keywords: RShim; ATF

Fixed in version: 3.9.2

2790928

Description: Virtio-net-controller recovery may not work for a hot-plugged device because the system assigns a BDF (string identifier) of 0 for the hot-plugged device, which is an invalid value.

Keywords: Virtio-net; hotplug; recovery

Fixed in version: 3.9.0

2780819

Description: Eye-opening is not supported on 25GbE integrated-BMC BlueField-2 DPU.

Keywords: Firmware, eye-opening

Fixed in version: 3.9.0

2876447

Description: Virtio full emulation is not supported by NVIDIA® BlueField®-2 multi-host cards.

Keywords: Virtio full emulation; multi-host

Fixed in version: 3.9.0

2855485

Description: After BFB installation, Linux crash may occur with efi_call_rts messages in the call trace which can be seen from the UART console.

Keywords: Linux crash; efi_call_rts

Fixed in version: 3.9.0

2901514

Description: Relaxed ordering is not working properly on virtual functions.

Keywords: MLNX_OFED; relaxed ordering; VF

Fixed in version: 3.9.0

2852086

Description: On rare occasions, the UEFI variables in UVPS EEPROM are wiped out which hangs the boot process at the UEFI menu.

Keywords: UEFI; hang

Fixed in version: 3.9.0

2934828

Description: PCIe device address to RDMA device name mapping on x86 host may change after the driver restarts in Arm.

Keywords: RDMA; Arm; driver

Fixed in version: 3.9.0

-

Description: RShim driver does not work when the host is in secure boot mode.

Keywords: RShim; Secure Boot

Fixed in version: 3.9.0

2787308

Description: At rare occasions d uring Arm reset o n BMC-integrated DPUs , the DPU will send "PCIe Completion" marked as poisoned. Some servers treat that as fatal and may hang.

Keywords: Arm reset; BMC integrated

Fixed in version: 3.9.0

2585607

Description: Pushing the BFB image fails occasionally with a "bad magic number" error message showing up in the console.

Keywords: BFB push; installation

Fixed in version: 3.9.0

2802943

Description: SLD detection may not function properly.

Keywords: Firmware

Fixed in version: 3.9.0

2580945

Description: External host reboot may also reboot the Arm cores if the DPU was configured using mlxconfig.

Keywords: Non-volatile configuration; Arm; reboot

Fixed in version: 3.9.0

2899740

Description: BlueField-2 may sometimes go to PXE boot instead of Linux after installation.

Keywords: Installation; PXE

Fixed in version: 3.8.5

2870143

Description: Some DPUs may get stuck at GRUB menu when booting due to the GRUB configuration getting corrupted when board is powered down before the configuration is synced to memory.

Keywords: GRUB; memory

Fixed in version: 3.8.5

2873700

Description: The available RShim logging buffer may not have enough space to hold the whole register dump which may cause buffer wraparound.

Keywords: RShim; logging

Fixed in version: 3.8.5

2801891

Description: IPMI EMU service reports cable link as down when it is actually up.

Keywords: IPMI EMU

Fixed in version: 3.8.0

2779861

Description: Virtio-net controller does not work with devices other than mlx5_0/1.

Keywords: Virtio-net controller

Fixed in version: 3.8.0

2801378

Description: No parameter validation is done for feature bits when performing hotplug.

Keywords: Virtio-net; hotplug

Fixed in version: 3.8.0

2802917

Description: When secure boot is enabled, PXE boot may not work.

Keywords: Secure boot; PXE

Fixed in version: 3.8.0

2827413

Description: Updating a BFB could fail due to congestion.

Keywords: Installation; congestion

Fixed in version: 3.8.0

2829876

Description: For virtio-net device, modifying the number of queues does not update the number of MSIX.

Keywords: Virtio-net; queues

Fixed in version: 3.8.0

2597790

Description: A "double free" error is seen when using the "curl" utility. This happens only when OpenSSL is configured to use a dynamic engine (e.g. Bluefield PKA engine).

Keywords: OpenSSL; curl

Fixed in version: 3.8.0

2853295

Description: UEFI secure boot enables the kernel lockdown feature which blocks access by mstmcra.

Keywords: Secure boot

Fixed in version: 3.8.0

2854472

Description: Virtio-net controller may fail to start after power cycle.

Keywords: Virtio-net controller

Fixed in version: 3.8.0

2854995

Description: Memory consumed for a representor exceeds what is necessary making scaling to 504 SF's not possible.

Keywords: Memory

Fixed in version: 3.8.0

2856652

Description: Modifying VF bits yields an error.

Keywords: Virtio-net controller

Fixed in version: 3.8.0

2859066

Description: Arm hangs when user is thrown to livefish by FW (e.g. secure boot).

Keywords: Arm; livefish

Fixed in version: 3.8.0

2866082

Description: The current installation flow requires multiple resets after booting the self-install BFB due to the watchdog being armed after capsule update.

Keywords: Reset; installation

Fixed in version: 3.8.0

2866537

Description: Power-off of BlueField shows up as a panic which is then stored in the RShim log and carried into the BERT table in the next boot which is misleading to the user.

Keywords: RShim; log; panic

Fixed in version: 3.8.0

2868944

Description: Various errors related to the UPVS store running out of space are observed.

Keywords: UPVS; errors

Fixed in version: 3.8.0

2754798

Description: oob_net0 cannot receive traffic after a network restart.

Keywords: oob_net0

Fixed in version: 3.8.0

2691175

Description: Up to 31 hot-plugged virtio-net devices are supported even if PCI_SWITCH_EMULATION_NUM_PORT=32. Host may hang if it hot plugs 32 devices.

Keywords: Virtio-net; hotplug

Fixed in version: 3.8.0

2597973

Description: Working with CentOS 7.6, if SF network interfaces are statically configured, the following parameters should be set.

NM_CONTROLLED="no"

DEVTIMEOUT=30

For example:

Copy
Copied!
            

# cat /etc/sysconfig/network-scripts/ifcfg-p0m0 NAME=p0m0 DEVICE=p0m0 NM_CONTROLLED="no" PEERDNS="yes" ONBOOT="yes" BOOTPROTO="static" IPADDR=12.212.10.29 BROADCAST=12.212.255.255 NETMASK=255.255.0.0 NETWORK=12.212.0.0 TYPE=Ethernet DEVTIMEOUT=30

Keywords: CentOS; subfunctions; static configuration

Fixed in version: 3.7.0

2581534

Description: When shared RQ mode is enabled and offloads are disabled, running multiple UDP connections from multiple interfaces can lead to packet drops.

Keywords: Offload; shared RQ

Fixed in version: 3.7.0

2581621

Description: When OVS-DPDK and LAG are configured, the kernel driver drops the LACP packet when working in shared RQ mode.

Keywords: OVS-DPDK; LAG; LACP; shared RQ

Fixed in version: 3.7.0

2601094

Description: The gpio-mlxbf2 and mlxbf-gige drivers are not supported on 4.14 kernel.

Keywords: Drivers; kernel

Fixed in version: 3.7.0

2584427

Description: Virtio-net-controller does not function properly after changing uplink representor MTU.

Keywords: Virtio-net controller; MTU

Fixed in version: 3.7.0

2438392

Description: VXLAN with IPsec crypto offload does not work.

Keywords: VXLAN; IPsec crypto

Fixed in version: 3.7.0

2406401

Description: Address Translation Services is not supported in BlueField-2 step A1 devices. Enabling ATS can cause server hang.

Keywords: ATS

Fixed in version: 3.7.0

2402531

Description: PHYless reset on BlueField-2 devices may cause the device to disappear.

Keywords: PHY; firmware reset

Fixed in version: 3.7.0

2400381

Description: When working with strongSwan 5.9.0bf, running ip xfrm state show returns partial information as to the offload parameters, not showing "mode full".

Keywords: strongSwan; ip xfrm; IPsec

Fixed in version: 3.7.0

2392604

Description: Server crashes after configuring PCI_SWITCH_EMULATION_NUM_PORT to a value higher than the number of PCIe lanes the server supports.

Keywords: Server; hang

Fixed in version: 3.7.0

2293791

Description: Loading/reloading NVMe after enabling VirtIO fails with a PCI bar memory mapping error.

Keywords: VirtIO; NVMe

Fixed in version: 3.7.0

2245983

Description: When working with OVS in the kernel and using Connection Tracking, up to 500,000 flows may be offloaded.

Keywords: DPU; Connection Tracking

Fixed in version: 3.7.0

1945513

Description: If the Linux OS running on the host connected to the BlueField DPU has a kernel version lower then 4.14, MLNX_OFED package should be installed on the host.

Keywords: Host OS

Fixed in version: 3.7.0

1900203

Description: During heavy traffic, ARP reply from the other tunnel endpoint may be dropped. If no ARP entry exists when flows are offloaded, they remain stuck on the slow path.

Workaround: Set a static ARP entry at the BlueField Arm to VXLAN tunnel endpoints.

Keywords: ARP; Static; VXLAN; Tunnel; Endpoint

Fixed in version: 3.7.0

2082985

Description: During boot, the system enters systemctl emergency mode due a corrupt root file system.

Keywords: Boot

Fixed in version: 3.6.0.11699

2278833

Description: Creating a bond via NetworkManager and restarting the driver (openibd restart) results in no pf0hpf and bond creation failure.

Keywords: Bond; LAG; network manager; driver reload

Fixed in version: 3.6.0.11699

2286596

Description: Only up to 62 host virtual functions are currently supported.

Keywords: DPU; SR-IOV

Fixed in version: 3.6.0.11699

2397932

Description: Before changing SR-IOV mode or reloading the mlx5 drivers on IPsec-enabled systems, make sure all IPsec configurations are cleared by issuing the command ip x s f && ip x p f.

Keywords: IPsec; SR-IOV; driver

Fixed in version: 3.6.0.11699

2405039

Description: In Ubuntu, during or after a reboot of the Arm, manually, or as part of a firmware reset, the network devices may not transition to switchdev mode. No device representors would be created (pf0hpf, pf1hpf, etc). Driver loading on the host will timeout after 120 seconds.

Keywords: Ubuntu; reboot; representors; switchdev

Fixed in version: 3.6.0.11699

2403019

Description: EEPROM storage for UEFI variables may run out of space and cause various issues such as an inability to push new BFB (due to timeout) or exception when trying to enter UEFI boot menu.

Keywords: BFB install; timeout; EEPROM UEFI Variable; UVPS

Fixed in version: 3.6.0.11699

2458040

Description: When using OpenSSL on BlueField platforms where Crypto support is disabled, the following errors may be encountered:

PKA_ENGINE: PKA instance is invalid

PKA_ENGINE: failed to retrieve valid instanceThis happens due to OpenSSL configuration being linked to use PKA hardware, but that hardware is not available since crypto support is disabled on these platforms.

Keywords: PKA; Crypto

Fixed in version: 3.6.0.11699

2456947

Description: All NVMe emulation counters (Ctrl, SQ, Namespace) return "0" when queried.

Keywords: Emulated devices; NVMe

Fixed in version: 3.6.0.11699

2411542

Description: Multi-APP QoS is not supported when LAG is configured.

Keywords: Multi-APP QoS; LAG

Fixed in version: 3.6.0.11699

2394130

Description: When creating a large number of VirtIO VFs, hung task call traces may be seen in the dmesg.

Keywords: VirtIO; call traces; hang

Fixed in version: 3.5.1.11601

2398050

Description: Only up to 60 virtio-net emulated virtual functions are supported if LAG is enabled.

Keywords: Virtio-net; LAG

Fixed in version: 3.5.1.11601

2256134

Description: On rare occasions, rebooting the BlueField DPU may result in traffic failure from the x86 host.

Keywords: Host; Arm

Fixed in version: 3.5.1.11601

2400121

Description: When emulated PCIe switch is enabled, and more than 8 PFs are enabled, the BIOS boot process might halt.

Keywords: Emulated PCIe switch

Fixed in version: 3.5.0.11563

2082985

Description: During boot, the system enters systemctl emergency mode due a corrupt root file system.

Keywords: Boot

Fixed in version: 3.5.0.11563

2249187

Description: With the OCP card connecting to multiple hosts, one of the hosts could have the RShim PF exposed and probed by the RShim driver.

Keywords: RShim; multi-host

Fixed in version: 3.5.0.11563

2363650

Description: When moving to separate mode on the DPU, the OVS bridge remains and no ping is transmitted between the Arm cores and the remote server.

Keywords: SmartNIC; operation modes

Fixed in version: 3.5.0.11563

2394226

Description: Pushing the BFB image v3.5 with a WinOF-2 version older than 2.60 can cause a crash on the host side.

Keywords: Windows; RShim

Fixed in version: 3.5.0.11563

© Copyright 2024, NVIDIA. Last updated on May 9, 2024.