NVIDIA BlueField-3 DPU NIC Firmware Release Notes v32.43.2402 LTS

Changes and New Feature History

Note

This section includes history of changes and new feature of 3 major releases back. For older releases history, please refer to the relevant firmware versions.

Feature/Change

Description

32.43.1014

Programmable Congestion Control (PCC)

Migrated PCC NP solution from ACE hardware platform to DPA hardware platform. The new capability is applicable to the following 2 modes:

  • PCC_INT_EN=True and PCC_INT_NP_RTT_DATA_MODE=INGRESS_BYTE

  • PCC_INT_EN=True and PCC_INT_NP_RTT_DATA_MODE=NO_DATA

The first mode is used to support ZTRCC RX bytes in RTT response.

HPCC2 Custom Header

Added support for HPCC2 custom header insertion in RTT request packets for DOCA PCC. The capability will be supported when setting ROCE_CC_STEERING_EXT = ENABLED.

Hight Availability for virtio-net-controller

Added support for a second emulation VirtIO blk and net device on the same vHCA to enable switching to the second emulation device and reduce downtime.

RDMA Telemetry

Added the option to indicate an error CQE event on every selected function per eSwitch manager. This indication is defined as a new WQE including the relevant information about the error (such as: syndrome, function_id, timestamp, QPs num etc.).

The feature is configured using a new general object: RDMA-Telemetry object, and depends on the following new caps: HCA_CAP.rdma_telemetry_notification_types and HCA_CAP.rdma_telemetry.

UID Permissions

Extended kernel lockdown permission set. The following sub-operations can now be called by tools (permission TOOLS_RESORCES) using new HCA capability bitmask field: tool_partial_cap.

The 5 sub-operations are:

  • QUERY_HCA_CAP with other function

  • QUERY_VUID with direct data

  • QUERY_ROCE_ADDRESS with other vport

  • SET_HCA_CAP with other function

  • POSTPONE_CONNECTED_QP_TIMEOUT with other vport

The new added caps are:

  • tool_partial_cap.postpone_conn_qp_timeout_other_vport,

  • tool_partial_cap.set_hca_cap_other_func

  • tool_partial_cap.query_roce_addr_other_vport

  • tool_partial_cap.query_vuid_direct_data

  • tool_partial_cap.query_hca_cap_other_func

Cross E-Switch Scheduling

Added support for QoS scheduling across multiple E-Switches grouped in a LAG. VPort members of a Physical Function can be added to a rate group from another Physical Function and rate limits of the group will apply to those VPort members as well.

Jump from NIC_TX to FDB_TX

Added 'table_type_valid' and 'table_type' fields to the steering action (STC) "Jump To Flow" table parameters to enable the user to jump from NIC_TX to FDB_TX and bypass the ACL table.

Jump to TIR or queue from FDB on Tx

Enabled hop reduction by bypassing NIC domain in various use cases. Such action r educes the number of hops (improves PPS) to deal with mass number of flows and devices.

To enable this new capability, a new STC action type "JUMP_TO_FDB_RX" was added to allow jumping into the RX side of a table.

Virtual Quality of Service

Added a new scheduling element type ("TC_ARB") capability in the VQoS domain (Virtual Quality of Service), to support TC arbitration between functions (VPORTs).

Hotplug/Unplug on VirtIO Devices when the Host is Powered OFF

Enabled hotplug/hotunplug during device's power off or power cycle to prevent the device from getting stuck.

2-steps-hotplug

Added support for 2-steps-hotplug capability. The device is plugged with "free" status by default, and it will not appear on the bus until being modified to "hotplug" status.

Bug Fixes

See Bug Fixes in this Firmware Version section.

Feature/Change

Description

32.42.1000

Memory Slow Release

Added a new command interface "Memory slow release" to enable/disable holding memory pages for a defined period of time. Once the timer expires, the firmware will return the pages to the driver.

Server's Resource Size

Increased the server's resource size for 10k data QP (connections from NVME initiator) attached to the XRQ upon 32MB, 64MB, 128MB, 256MB staging buffer.

Hotplug Power Off for Virtio FS

Added support for Hotplug Power Off for Virtio FS (hotplug_power_off).

Kernel Lockdown

Added support for MVTS register via a miscellaneous driver using the access_register PRM command.

Dynamic Queue Modification

Added support for Virtio devices' dynamic queue modification. A Virtio PF manages the available number of queues (doorbells) that can be allocated to its Virtio VFs.

Managed Hot-Plug

Added support for remove/plugged-in Memory Device units while the system is active. To insert/remove the device while the system is active, use the Attention Button Control or User OS Commands, press Attention Butten if exists or write SW Command if not exists.

Note: This capability is not enabled by default, to enable managed hot plug, configure the following setting using mlxconfig and then power-cycle:

  • setting name: OFF_BOARD_SERIALIZER

    • *cmd: mlxconfig -d <device> set OFF_BOARD_SERIALIZER=1

    • *Description: when set, the BlueField-3 enables the serializer that is connected to the SMC bridge board and enables the bitstream.

ResourceDump QP_INFO

Added QP_INFO segment to resource dump access_register command.

Maximum Number of EQs

Added a new hca_cap call max_num_eqs_24b to report the number of EQs for VFs, PFs of ECPFs, and SFs.

Note: It is only writable for SFs.

MSIX

Firmware allocates the MSIX/VQ resources according to the function number, thus, every VF function will get the same number of MSIX/VQ.

For example: In case of a total of 8K MSIX locked ICMC resource, each VF will get 8K MSIX/ (384 vblk VF + 128 vnet VF) = 16 MSIX by symmetric distribution.

As of firmware v32.42.100x, X_EMULATION_NUM_VF_MSIX are added to set the Emulation VF device MSIX number in NVCONFIG, such as VIRTIO_VBLK_EMULATION_NUM_VF_MSIX (=8 MSIx for this user case) and VIRTIO_NET_EMULATION_NUM_VF_MSIX (=32 MSIx for this user case).

MSIX Allocation

The user can now know the exact number of allocated MSIX by the firmware using the new added call actual_msix_number.

Dynamic MSIX Allocation

Each VF can allocate all VFs' MSIX of the PF as a free pool of the PF. The new modification, increased the maximum VNET/VBLK VF MSIX number from 64 to 256. To see the new value, query the cmd_hca_cap.max_dynamic_vf_msix_table_size.

Now each VF will get the number of MSI by the asymmetric distribution according to the new VF MSIX configuration (X_EMULATION_NUM_VF_MSIX).

If there are not enough MSIX to be allocated, the actual number of MSIXs will be deduced from the total free number and not from the NVCONFIG value. The actual_msix_number value is shown as LSPCI value. To get the actual_msix_number in the PCI device, query the "Current" column of the mlxconfig, which is the same as the ‘lspci’ shown.

MMO: Cache-Invalidate WQE

Enabled Cache-Invalidate WQE (OPCODE=”MMO”) with OPC_MOD=”DPU_CACHE_INVALIDATE" by default for DPU GVMI. Additionally, added related capabilities to show if this capability is supported and what is the maximum supported data size to be invalidated (2MB by default.).

Steering SF Traffic to a Specific PF MSI-X

MSI-X on SF can be received now through the PF's MSI-X vector.

Bug Fixes

See Bug Fixes in this Firmware Version section.

Feature/Change

Description

32.41.1000

SuperNIC Mode

SuperNIC mode is now the default mode for the following SKUs:

  • 900-9D3B4-00CC-EA0

  • 900-9D3B4-00SC-EA0

  • 900-9D3B4-00CV-EA0

  • 900-9D3B4-00SV-EA0

  • 900-9D3B4-00EN-EA0

  • 900-9D3B4-00PN-EA0

  • 900-9D3D4-00EN-HA0

  • 900-9D3D4-00NN-HA0

virtio-net Emulation Device

Added support for VIRTIO_NET_F_HASH_REPORT(57) bit for the virtio-net emulation device.

Added support for VIRTIO_NET_F_SPEED_DUPLEX(63) bit for the virtio-net emulation device.

virtio Full Emulation

Added support for virtio full emulation scale up to 2k devices.

ODP Event

Added support for the following prefetch fields on ODP event: pre_demand_fault_pages, post_demand_fault_pages.

TRNG FIPS Compliance

Implemented Deterministic Random Bit Generator (DRBG) algorithm on top of firmware TRNG (the source for raw data input) in accordance with NIST SP800-90A.

PSP

Added support for PSP in Hardware Steering.

NVConfig

Added a new NVConfig option to copy AR bit from the BTH header to the DHCP header.

Generic Emulation

Generic Emulation enables the programmers to define their own custom PCI devices to be exposed to the host using the new hot-plug/unplug function flow. The API enables the programmer to control the device BARs layout, software defined BAR registers and hardware offloading mechanisms (MSI-X, DBs).

Steering

Added the option provide field's offset and length in Steering add_action option.

Steering Match

Added support for steering match on packet l4_type through FTG/FTE.

RSHIM PF

RSHIM PF functionalities are now dynamically locked/unlocked during runtime by Platform BMC via the NC-SI commands.

BAR Pages

Added support for 64KB pages.

Note: Configuring BAR_PAGE_ALIGNMENT to ALIGN_64KB(2) while one of the following is configured will cause the device to ignore the BAR_PAGE_ALIGNMENT configuration:

  • PF_NUM_PF_MSIX>256 on any of the Physical Functions

  • VIRTIO_EMULATION_HOTPLUG_TRANS/VIRTIO_NET_EMULATION_PF_PCI_LAYOUT/ VIRTIO_NET_EMULATION_VF_PCI_LAYOUT/ VIRTIO_BLK_EMULATION_PF_PCI_LAYOUT/ VIRTIO_BLK_EMULATION_PF_PCI_LAYOUT=VIRTIO_TRANSITIONAL(1)

ATF/UEFI Version Query

Added the ability to query ATF/UEFI version via the MISOC register.

Programmable Congestion Control

Added support for PCC NP for RTT according to the IFA2.0 standards.

Flex Parser Merge Mechanism

Extended Flex Parser merge mechanism to support hardware capabilities.

Flex Parser

Enabled the option to disable the native parser when the parse graph node is configured with the same conditions.

Flex Parser

Added support for father/son headers parsing.

LRO

Added support for tunnel_offload in LRO.

Bug Fixes

See Bug Fixes in this Firmware Version section.

Feature/Change

Description

32.40.1000

Socket Direct Single netdev Mapped to Two PCIe Devices

Enabled Single Netdev mapping to two PCIe devices (Socket Direct).

Now multiple devices (PFs) of the same port can be combined under a single netdev instance. Traffic is passed through different devices belonging to different NUMA sockets, thus saving cross-NUMA traffic and allowing apps running on the same netdev from different NUMAs to still feel a sense of proximity to the device and achieve improved performance.

The netdev is destroyed once any of the PFs is removed. A proper configuration would utilize the correct close NUMA when working on a certain app/CPU.

Currently, this capability is limited to PFs only, and up to two devices (sockets). To enable the feature, one must configure the same Socket Direct group (non zero) for both PFs through mlxconfig SD_GROUP.

ACL

Added support for egress ACL to the uplink by adding a new bit to the Set Flow Table Entry: allow_fdb_uplink_hairpin.

Port Rate Limiting

Added a new access register (PBWS) to set the port maximum bandwidth to a value between 95% to 100%.

mlxconfig

Added a new NVConfig parameter to force Congestion Control algorithm to be SW-DCQCN.

Bug Fixes

See Bug Fixes in this Firmware Version section.

© Copyright 2025, NVIDIA. Last updated on Mar 12, 2025.