PCI Express (PCIe) Uses PCIe Gen 4.0 (16GT/s) through an x16 edge connector. Gen 1.1, 2.0, and 3.0 compatible.

InfiniBand Architecture Specification v1.4 compliant BlueField-3 DPU delivers low latency, high bandwidth, and computing efficiency for high-performance computing (HPC), artificial intelligence (AI), and hyperscale cloud data center applications.

BlueField-3 DPU is InfiniBand Architecture Specification v1.3 compliant. InfiniBand Network Protocols and Rates: Protocol Standard Rate (Gb/s) Comments 4x Port

(4 Lanes) 2x Ports

(2 Lanes) HDR/HDR100 IBTA Vol2 1.4 212.5 106.25 PAM4 256b/257b encoding and RS-FEC EDR IBTA Vol2 1.3.1 103.125 51.5625 NRZ 64b/66b encoding FDR IBTA Vol2 1.2 56.25 N/A NRZ 64b/66b encoding

Up to 200 Gigabit Ethernet BlueField-2 DPU complies with the following IEEE 802.3 standards:

200GbE / 100GbE / 50GbE / 40GbE / 25GbE / 10GbE Protocol MAC Rate IEEE802.3ck 200/100 Gigabit Ethernet

(Include ETC enhancement) IEEE802.3cd

IEEE802.3bs

IEEE802.3cm

IEEE802.3cn

IEEE802.3cu 200/100 Gigabit Ethernet

(Include ETC enhancement) IEEE 802.3bj

IEEE 802.3bm 100 Gigabit Ethernet IEEE 802.3by

Ethernet Consortium25 50/25 Gigabit Ethernet IEEE 802.3ba 40 Gigabit Ethernet IEEE 802.3ae 10 Gigabit Ethernet IEEE 802.3cb 2.5/5 Gigabit Ethernet

(For 2.5: support only 2.5 x1000BASE-X) IEEE 802.3ap Based on auto-negotiation and KR startup IEEE 802.3ad

IEEE 802.1AX Link Aggregation IEEE 802.1Q

IEEE 802.1P VLAN tags and priority IEEE 802.1Qau (QCN)

Congestion Notification

IEEE 802.1Qaz (ETS)

EEE 802.1Qbb (PFC)

IEEE 802.1Qbg

IEEE 1588v2

IEEE 802.1AE

Jumbo frame support (9.6KB)

On-board Memory Quad SPI NOR FLASH - includes 256Mbit for Firmware image.

UVPS EEPROM - includes 1Mbit.

FRU EEPROM - Stores the parameters and personality of the card. The EEPROM capacity is 128Kbit. FRU I2C address is (0x50) and is accessible through the PCIe SMBus.

eMMC - x8 NAND flash (memory size might vary on different DPUs) for Arm boot, OS, and disk space.

DDR4 SDRAM - 16GB/32GB @3200MT/s single-channel DDR4 SDRAM memory. Solder down on board. 64bit + 8bit ECC.

BlueField-2 DPU The NVIDIA BlueField-2 DPU integrates eight 64-bit Armv8 A72 cores interconnected by a coherent mesh network, one DRAM controller, and an RDMA intelligent network adapter supporting up to 200Gb/s, an embedded PCIe switch with endpoint and root complex functionality, and up to 16 lanes of PCIe Gen 3.0/4.0.

Overlay Networks To better scale their networks, data center operators often create overlay networks that carry traffic from individual virtual machines over logical tunnels in encapsulated formats such as NVGRE and VXLAN. While this solves network scalability issues, it hides the TCP packet from the hardware offloading engines, placing higher loads on the host CPU. NVIDIA DPU effectively addresses this by providing advanced NVGRE and VXLAN hardware offloading engines that encapsulate and de-capsulate the overlay protocol.

RDMA and RDMA over Converged InfiniBand/VPI (RoCE) NVIDIA DPU, utilizing IBTA RDMA (Remote Data Memory Access) and RoCE (RDMA over Converged InfiniBand/VPI) technology, delivers low-latency and high-performance over InfiniBand/VPI networks. Leveraging data center bridging (DCB) capabilities as well as advanced congestion control hardware mechanisms, RoCE provides efficient low-latency RDMA services over Layer 2 and Layer 3 networks.

NVIDIA PeerDirect™ PeerDirect communication provides high-efficiency RDMA access by eliminating unnecessary internal data copies between components on the PCIe bus (for example, from GPU to CPU), significantly reducing application run time. NVIDIA DPU's advanced acceleration technology enables higher cluster efficiency and scalability to tens of thousands of nodes.

Quality of Service (QoS) Support for port-based Quality of Service enabling various application requirements for latency and SLA.

Storage Acceleration A consolidated compute and storage network achieves significant cost-performance advantages over multi-fabric networks. Standard block and file access protocols can leverage RDMA for high-performance storage access: NVMe over Fabric offloads for the target machine

BlueField-2 DPU may operate as a co-processor offloading specific storage tasks from the host, isolating part of the storage media from the host, or enabling abstraction of software-defined storage logic using the NVIDIA BlueField-2 Arm cores. On the storage initiator side, NVIDIA BlueField-2 DPU can prove an efficient solution for hyper-converged systems to enable the host CPU to focus on computing while all the storage interface is handled through the Arm cores.

NVMe-oF Non-volatile Memory Express (NVMe) over Fabrics is a protocol for communicating block storage IO requests over RDMA to transfer data between a host computer and a target solid-state storage device or system over a network. NVIDIA BlueField-2 DPU may operate as a co-processor offloading specific storage tasks from the host using its powerful NVMe over Fabrics Offload accelerator.

SR-IOV NVIDIA DPU SR-IOV technology provides dedicated adapter resources and guaranteed isolation and protection for virtual machines (VM) within the server.

High-Performance Accelerations Tag Matching and Rendezvous Offloads

Adaptive Routing on Reliable Transport

Burst Buffer Offloads for Background Checkpointing

GPU Direct GPUDirect RDMA is a technology that provides a direct P2P (Peer-to-Peer) data path between the GPU Memory directly to/from the NVIDIA HCA devices. This provides a significant decrease in GPU-GPU communication latency and completely offloads the CPU, removing it from all GPU-GPU communications across the network. NVIDIA DPU uses high-speed DMA transfers to copy data between P2P devices resulting in more efficient system applications

Isolation BlueField-2 DPU functions as a “computer-in-front-of-a-computer,” unlocking unlimited opportunities for custom security applications on its Arm processors, fully isolated from the host’s CPU. In the event of a compromised host, BlueField-2 may detect/block malicious activities in real-time and at wire speed to prevent the attack from spreading further.

Cryptography Accelerations From IPsec and TLS data-in-motion inline encryption to AES-XTS block-level data-at-rest encryption and public key acceleration, BlueField-2 DPU hardware-based accelerations offload the crypto operations and free up the CPU, reducing latency and enabling scalable crypto solutions. BlueField-2 “host-unaware” solutions may transmit and receive data, while BlueField-2 acts as a bump-in-the-wire for crypto.

Securing Workloads BlueField-2 DPU accelerates connection tracking with its ASAP2 technology to enable stateful filtering on a per-connection basis. Moreover, BlueField-2 includes a Titan IC regular expression (RXP) acceleration engine supported by IDS/IPS tools to detect host introspection and Application Recognition (AR) in real-time.

Security Accelerators A consolidated compute and network solution based on DPU achieves significant advantages over a centralized security server solution. Standard encryption protocols and security applications can leverage NVIDIA BlueField-2 compute capabilities and network offloads for security application solutions such as Layer4 Stateful Firewall.

Virtualized Cloud By leveraging BlueField-2 DPU virtualization offloads, data center administrators can benefit from better server utilization, allowing more virtual machines and more tenants on the same hardware while reducing the TCO and power consumption

Out-of-Band Management The NVIDIA BlueField-2 DPU incorporates a 1GbE RT45 out-of-band port that allows the network operator to establish trust boundaries in accessing the management function to apply it to network resources. It can also be used to ensure management connectivity (including the ability to determine the status of any network component) independent of the status of other in-band network components.

BMC Some DPUs incorporate local NIC BMC (Baseboard Management Controller) hardware on the board. The BMC SoC (system on a chip) can utilize either shared or dedicated NICs for remote access. The BMC node enables remote power cycling, board environment monitoring, BlueField-2 chip temperature monitoring, board power, and consumption monitoring, and individual interface resets. The BMC also supports the ability to push a boot stream to BlueField-2.

Having a trusted onboard BMC that is fully isolated for the host server ensures the highest security for the DPU boards.