Bug Fixes History

NVIDIA BlueField-3 DPU NIC Firmware Release Notes v32.41.1000

Internal Ref.

Issue

3712016

Description: Fixed an issue that prevented Congestion Control from behaving properly when GRH is used in traffic of an IB cluster.

Keywords: IB Congestion Control, CNP, SL

Discovered in Version: 32.39.2048

Fixed in Release: 32.40.1000

3708035

Description: Fixed an issue with Selective-Repeat configuration which occasionally caused retransmission to wait for timeout instead of out-of-sequence NACK.

Keywords: RoCE, SR

Discovered in Version: 32.38.1002

Fixed in Release: 32.40.1000

3695219

Description: Enabled the lowest minimum rate for SW DCQCN to enable congestion control to hold a larger amount of QPs without pauses or drops.

Keywords: Congestion control, PCC, DCQCN

Discovered in Version: 32.38.3056

Fixed in Release: 32.40.1000

3481864

Description: Fixed an issue that resulted in console getting stuck and kernel call trace when trying to destroy native VFs or unload the MLNX_OFED driver when setting the mlxconfig configuration of 192 native VFs + 416 VBLK VFs + 416 VNET VFs.

Keywords: Call trace, host, NIC mode, DPU mode

Discovered in Version: 32.38.3056

Fixed in Release: 32.40.1000

3659549

Description: Fixed an issue that resulted in packets loss in 3rd party NVMF target when using migreq==0 over ethernet. Such error is now ignored, and the systems stays with the current (MIGRATED) PA state.

Keywords: NVMe-oF Connectivity

Discovered in Version: 32.38.3056

Fixed in Release: 32.40.1000

3469692

Description: Added support for 16M IPsec sessions.

Keywords: IPsec

Discovered in Version: 32.38.3056

Fixed in Release: 32.40.1000

3671356

Description: Added new parameters for PLDM temperature thresholds to the B3140H DPU cards:

  • Warning - 97 C

  • Critical - 102 C

  • Hysteresis - 5 C

Keywords: PLDM, temperature

Discovered in Version: 32.38.3056

Fixed in Release: 32.40.1000

3686150

Description: Fixed an issue in the RTT template that resulted in letters at the end of the filename being dropped from its description as they were not aligned when querying for the description using the PPCC command.

Keywords: PPCC, DOCA PCC

Discovered in Version: 32.38.3056

Fixed in Release: 32.40.1000

Internal Ref.

Issue

3614288

Description: Fixed an issue on special systems with separate power supply that caused the host to hang and RDMA to fail in virtio-net-controller when performing the following steps:

  1. hotplug 31 vnet device

  2. host power off

  3. unplug 31 vnet device

  4. hotplug 31 vnet device

  5. host power on

Keywords: hotplug, RDMA, virtio-net-controller

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3661385

Description: BlueField Arm cores that serve as PCIe Root-Port of PCIe End-Point devices (eg NVMe SSDs) connected to BlueField’s PCIe interfaces may receive MSI-X (used by a device to indicate an event) prior to PCIe CQE writes, resulting in a driver interrupt handler trying to retrieve data in an inconsistent state.

Keywords: MSI-X, NVMe

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3629353

Description: Fixed the cr_space in port configuration to prevent wrong timestamp of cqes.

Keywords: Hardware timestamp

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3627384

Description: Fixed an issue that prevented the PCC flow context database from being cleared when starting a new DOCA PCC application used to avoid the "left state by legacy" active application from impacting the new application's behavior.

Keywords: PCC flow

Discovered in Version: 32.38.3056

Fixed in Release: 32.39.2048

3630586

Description: Updated the HW ETS (QETCR RL) default to be per host-port instead of per physical-port to prevent bandwidth degradation.

Keywords: Performance

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3636595

Description: Fixed an issue that caused the TX to hang and create a "TX timeout" error in dmesg after unplugging the device forcefully during server warm reboot.

Keywords: hotplug, virtio, NVMe, warm reboot, TX timeout

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3653763

Description: Fixed the issue that caused the server not to boot up (after power cycle) when there are 31 hotplug devices on a customized server with BlueField-3 DPU with an independent power supply.

Keywords: Power cycle, hotplug device, server

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3547022

Description: Fixed an issue that resulted in reset failure when unloading network drivers on an external host and the sync1 reset is still reported as 'supported' although it is not.

Keywords: sync1 reset

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3546787

Description: Extended the number of elastic buffer lock attempts, to prevent rare cases of Tx issues during Gen1.

Keywords: PCIe

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3591726

Description: Fixed an issue when in LAG mode that resulted in RoCE traffic having less throughput when Congestion Control (CC) mode is enabled than when CC mode is disabled.

Keywords: Congestion Control, LAG, bond, Bandwidth, RoCE

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3482251

Description: Added support for hairpin drop counter in QUERY_VNIC_ENV command.

Keywords: Hairpin

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3571251

Description: Fixed an issue that resulted in migration data corruption when running parallel save_vhca_state/load_vhca_state commands on the same PF.

Keywords: VF live migration

Discovered in Version: 32.38.1002

Fixed in Release: 32.39.2048

3602176

Description: Updated OOB counter behavior.

Keywords: OOB

Discovered in Version: 32.37.1306

Fixed in Release: 32.39.2048

3140048

Description: The DPC mechanism is currently not supported.

Keywords: DPC, PCIe

Discovered in Version: 32.37.1306

Fixed in Release: 32.39.2048

Internal Ref.

Issue

3629562

Description: Fixed a code mismatch in the process of handling the cause to the link being down when the remote faults were received.

Keywords: Link down

Discovered in Version: 32.38.1002

Fixed in Release: 32.38.3056

3602526

Description: Fixed an issue that led to packet drops on lossless fabric due to an Rx buffer overflow.

Keywords: PFC

Discovered in Version: 32.38.1002

Fixed in Release: 32.38.3056

3614448

Description: Fixed an issue that resulted in RoCE traffic showing significantly less throughput when the CC mode was enabled rather than disabled when using the LAG mode.

Keywords: Bandwidth, LAG, CC

Discovered in Version: 32.38.1002

Fixed in Release: 32.38.3056

3535284

Description: Fixed an issue related to sending loopback traffic when the Rate Limiter was enabled as it limited the user from having more than the wire speed.

Keywords: Rate Limiter

Discovered in Version: 32.38.1002

Fixed in Release: 32.38.3056

3556822

Description: Modified the CC events arriving flow to the PCC application to be received after the PCC application finishes information synchronization with the firmware when loading a new application.

Keywords: DOCA PCC, Programmable Congestion Control, high availability

Discovered in Version: 32.38.1002

Fixed in Release: 32.38.3056

3605649

Description: Fixed an issue related to SXP port VL rate limiter that resulted in bandwidth degradation. Additionally, cleared the token in SXD VL rate limiter, so when setting new rate during traffic the token will not get negative and stuck all outgoing bandwidth.

Keywords: Rate Limiter, VL, bandwidth

Discovered in Version: 32.38.1002

Fixed in Release: 32.38.3056

3583456

Description: Fixed an issue that caused the PCC DPA application to suffer from continuous communication failure due to retry asynchronous error. This issue resulted in PCC DPA application failure to start or mlxreg set/get PCC register failure.

Keywords: DOCA PCC

Discovered in Version: 32.38.1002

Fixed in Release: 32.38.3056

3580406

Description: Fixed an issue related to VFs performance throughput across multiple VF FLRs while using carveout pages.

Keywords: Performance

Discovered in Version: 32.38.1002

Fixed in Release: 32.38.3056

Internal Ref.

Issue

3506017

Description: Updated the firmware INI to enable MCTP over SMBUS and PCIe.

Keywords: MCTP

Discovered in Version: 32.37.1306

Fixed in Release: 32.38.1002

3331179

Description: Improved token calculation.

Keywords: Token calculation

Discovered in Version: 32.37.1306

Fixed in Release: 32.38.1002

3495889

Description: Fixed a QoS host port rate limit shaper inaccuracy that occurred when the shaper was configured via the QSHR access register.

Keywords: Port rate limit shaper

Discovered in Version: 32.37.1306

Fixed in Release: 32.38.1002

3432080

Description: Fixed a reburst issue.

Keywords: Rate limit

Discovered in Version: 32.37.1306

Fixed in Release: 32.38.1002

3432080

Description: Improved the grated2hw token calculation.

Keywords: Rate limit (vQoS)

Discovered in Version: 32.37.1306

Fixed in Release: 32.38.1002

3457472

Description: Disabling the Relaxed Ordered (RO) capability (relaxed_ordering_read_pci_enabled=0) using the vhca_resource_manager is currently not functional.

Keywords: Relaxed Ordered

Discovered in Version: 32.37.1306

Fixed in Release: 32.38.1002

© Copyright 2024, NVIDIA. Last updated on May 6, 2024.