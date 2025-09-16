Created on Dec 30, 2025

This Reference Deployment Guide (RDG) provides comprehensive instructions for deploying the NVIDIA DOCA Platform Framework (DPF) on high-performance, bare-metal infrastructure in Zero-Trust mode. The guide focuses on setting up an accelerated Host-Based Networking (HBN) and DOCA Argus services on NVIDIA® BlueField®-3 DPUs to deliver secure, isolated, and hardware-accelerated environments.

The guide is intended for experienced system administrators, systems engineers, and solution architects who build highly secure bare-metal environments with Host-Based Networking enabled using NVIDIA BlueField DPUs for acceleration, isolation, and infrastructure offload.

This document is an extension of the RDG for DPF Zero Trust (DPF-ZT) - NVIDIA Docs (referred to as the Baseline RDG ). It details the additional steps and modifications required to deploy the HBN and Argus Services in the Baseline RDG environment.

Note This reference implementation, as the name implies, is a specific, opinionated deployment example designed to address the use case described above.

Although other approaches may exist for implementing similar solutions, this document provides a detailed guide for this specific method.

Term Definition Term Definition BFB BlueField Bootstream NFS Network File System BGP Border Gateway Protocol OOB Out-of-Band DOCA Data Center Infrastructure-on-a-Chip Architecture PF Physical Function DPF DOCA Platform Framework RDG Reference Deployment Guide DPU Data Processing Unit RDMA Remote Direct Memory Access HBN Host Based Networking RoCE RDMA over Converged Ethernet IPAM IP Address Management SFC Service Function Chaining K8S Kubernetes SR-IOV Single Root Input/Output Virtualization KVM Kernel-based Virtual Machine VLAN Virtual LAN (Local Area Network) MAAS Metal as a Service VNI Virtual Network Interface MTU Maximum Transmission Unit VRF Virtual Router/Forwarder NGC NVIDIA GPU Cloud ZT Zero Trust

The NVIDIA BlueField-3 Data Processing Unit (DPU) is a 400 Gb/s infrastructure compute platform designed for line-rate processing of software-defined networking, storage, and cybersecurity workloads. It combines powerful compute resources, high-speed networking, and advanced programmability to deliver hardware-accelerated, software-defined solutions for modern data centers.

NVIDIA DOCA unleashes the full potential of the BlueField platform by enabling rapid development of applications and services that offload, accelerate, and isolate data center workloads.

One such service is Host-Based Networking (HBN) - a DOCA-enabled solution that allows network architects to design networks based on Layer 3 (L3) protocols. HBN enables routing on the server side by using BlueField as a BGP router. It encapsulates key networking functions in a containerized service pod, deployed directly on the BlueField’s Arm cores.

Another service is the DOCA Argus Service provides Workload Threat Detection is a novel approach for container threat detection in AI workloads and microservices, utilizing a Bluefield DPU to perform live machine introspection at the hardware level. This approach analyzes specific snippets of volatile memory to provide real-time visibility into container activity and behavior at the network, host, and application levels.

The state of container node images is continuously monitored in real-time, checking for deviations from their secure, compliant versions and configurations to detect and stop runtime attacks. These insights also include the ability to identify attacks targeting network facing applications/services.

The Argus service provides events and data on any object on the OS (host/VM) without any configuration needed and without any active part from the user or the host.

Examples what Argus service provides:

Any new processes with its PID, name, attributes, and status.

Reverse shells with process and network connection details such as source & destination IP and number of transferred bytes.

SHA256 hash of running executable and loaded libraries.

However, deploying and managing DPUs and their associated DOCA services, especially at scale, presents operational challenges. Without a robust provisioning and orchestration system, tasks such as lifecycle management, service deployment, and network configuration for service function chaining (SFC) can quickly become complex and error prone. This is where the DOCA Platform Framework (DPF) comes into play.

DPF automates the full DPU lifecycle, streamlines the deployment of DOCA services, and simplifies advanced network configurations. With DPF, services such as HBN can be deployed seamlessly, allowing for efficient offloading and intelligent routing of traffic through the DPU data plane.

By leveraging DPF, users can scale and automate DPU management across Bare Metal, Virtual, and Kubernetes customer environments - optimizing performance while simplifying operations.

DPF supports multiple deployment models. This guide focuses on the Zero Trust bare-metal deployment model. In this scenario:

The DPU is managed through its Baseboard Management Controller (BMC)

through its All management traffic occurs over the DPU's out-of-band (OOB) network

network The host is considered as an untrusted entity towards the data center network. The DPU acts as a barrier between the host and the network.

The host sees the DPU as a standard NIC, with no access to the internal DPU management plane (Zero Trust Mode)

This Reference Deployment Guide (RDG) provides a step-by-step example for installing DPF in Zero-Trust mode and HBN. It also includes practical demonstrations of performance optimization, validated using standard RDMA and TCP workloads.

As part of the reference implementation, open-source components outside the scope of DPF (e.g., MAAS, pfSense, Kubespray) are used to simulate a realistic customer deployment environment. The guide includes the full end-to-end deployment process, including:

Infrastructure provisioning

DPF deployment

DPU provisioning (redfish)

Service configuration and deployment

Service chaining.

This document extends the capabilities of the DPF-managed Kubernetes cluster described in the RDG for DPF Zero Trust (DPF-ZT) - NVIDIA Docs (referred to as the Baseline RDG) by deploying the NVIDIA DOCA HBN and Argus Services within the existing DPF deployment to achieve a comprehensive, accelerated infrastructure.

NVIDIA BlueField® Data Processing Unit (DPU) The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.



NVIDIA DOCA Software Framework NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.



NVIDIA ConnectX SmartNICs 10/25/40/50/100/200 and 400G Ethernet Network Adapters The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations. NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.



NVIDIA LinkX Cables The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.



NVIDIA Spectrum Ethernet Switches Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds. Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects. NVIDIA combines the benefits of NVIDIA Spectrum™ switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.



NVIDIA Cumulus Linux NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.



Kubernetes Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.



Kubespray Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides: A highly available cluster Composable attributes Support for most popular Linux distributions



The logical design includes the following components:

1 x Hypervisor node (KVM-based) with ConnectX-7: 1 x Firewall VM 1 x Jump Node VM 1 x MaaS VM 3 x K8s Master VMs running all K8s management components

4 x Worker nodes (PCI Gen5), each with a 1 x BlueField-3 NIC

Single High-Speed (HS) switch

1 Gb Host Management network

As part of this RDG, we will:

Create two fully isolated logical networks per bare-metal workload server using a single physical function ( PF0 ). Connect each network through the HBN service to a dedicated VLAN/VNI , mapped to separate VRFs ( RED or BLUE ).

Route all workload traffic through the HBN service , with routing and isolation enforced inside the DPU.

, with routing and isolation enforced inside the DPU. Assign PF0 as the sole network interface for each bare-metal workload server, with no networking configuration on the host.

as the sole network interface for each bare-metal workload server, with no networking configuration on the host. Demonstrate accelerated RDMA and TCP traffic between workload servers running on different bare-metal hosts within the same network (for example, RED ↔ RED ).

between workload servers running on (for example, ↔ ). Validate strict network isolation by confirming that traffic between workloads in different networks ( RED vs BLUE ) is not permitted.

The pfSense firewall in this solution serves a dual purpose:

Firewall—provides an isolated environment for the DPF system, ensuring secure operations

Router—enables Internet access for the management network

Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address in the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of the Kubernetes (K8s) cluster and DPF components.

The following diagram illustrates the firewall design used in this solution:

Warning Make sure to use the exact same versions for the software stack as described above.

These are the definitions and parameters used for deploying the demonstrated fabric:

Switches Ports Usage Hostname Rack ID Ports mgmt-switch 1 swp1-5 hs-switch 1 swp1-9

Hosts Rack Server Type Server Name Switch Port IP and NICs Default Gateway Rack1 Hypervisor Node hypervisor mgmt-switch: swp1 hs-switch: swp1 lab-br (interface eno1): Trusted LAN IP mgmt-br (interface eno2): - hs-br (interface enp1s0): - Trusted LAN GW Rack1 Firewall (Virtual) fw - WAN (lab-br): Trusted LAN IP LAN (mgmt-br): 10.0.110.254/24 OPT1(hs-br): 10.0.123.254/22 Trusted LAN GW Rack1 Jump Node (Virtual) jump - enp1s0: 10.0.110.253/24 10.0.110.254 Rack1 MaaS (Virtual) maas - enp1s0: 10.0.110.252/24 10.0.110.254 Rack1 Master Node (Virtual) master1 - enp1s0: 10.0.110.1/24 10.0.110.254 Rack1 Worker Node worker1 mgmt-switch: swp2(DPU BMC/OOB) hs-switch: swp2 - swp3 dpubmc: 10.0.110.201/24 dpuoob: 10.0.110.211/24 ens1f0np0/ens1f1np1: 10.0.120.0/22 10.0.110.254 Rack1 Worker Node worker2 mgmt-switch: swp3(DPU BMC/OOB) hs-switch: swp4 - swp5 dpubmc: 10.0.110.202/24 dpuoob: 10.0.110.212/24 ens1f0np0/ens1f1np1: 10.0.120.0/22 10.0.110.254 Rack1 Worker Node worker3 mgmt-switch: swp4(DPU BMC/OOB) hs-switch: swp6 - swp7 dpubmc: 10.0.110.203/24 dpuoob: 10.0.110.213/24 ens1f0np0/ens1f1np1: 10.0.120.0/22 10.0.110.254 Rack1 Worker Node worker4 mgmt-switch: swp5(DPU BMC/OOB) hs-switch: swp8 - swp9 dpubmc: 10.0.110.204/24 dpuoob: 10.0.110.214/24 ens1f0np0/ens1f1np1: 10.0.120.0/22 10.0.110.254

As a best practice, make sure to use the latest released Cumulus Linux NOS version.

For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.

The SN3700 switch ( hs-switch ), is configured as follows:

SN3700 Switch Console Collapse Source Copy Copied! nv set evpn state enable nv set interface eth0 ip address dhcp nv set interface eth0 ip vrf mgmt nv set interface eth0 type eth nv set interface lo ip address 11.0.0.101/32 nv set interface lo type loopback nv set interface swp1-9 link state up nv set interface swp1-9 type swp nv set interface swp1 ip address 10.0.123.253/22 nv set router bgp autonomous-system 65001 nv set router bgp state enabled nv set router bgp graceful-restart mode full nv set router bgp router-id 11.0.0.101 nv set vrf default router bgp address-family ipv4-unicast state enabled nv set vrf default router bgp address-family ipv4-unicast redistribute connected state enabled nv set vrf default router bgp address-family ipv4-unicast redistribute static state enabled nv set vrf default router bgp address-family ipv6-unicast state enabled nv set vrf default router bgp address-family ipv6-unicast redistribute connected state enabled nv set vrf default router bgp address-family l2vpn-evpn state enabled nv set vrf default router bgp state enabled nv set vrf default router bgp neighbor swp2 peer-group hbn nv set vrf default router bgp neighbor swp2 type unnumbered nv set vrf default router bgp neighbor swp3 peer-group hbn nv set vrf default router bgp neighbor swp3 type unnumbered nv set vrf default router bgp neighbor swp4 peer-group hbn nv set vrf default router bgp neighbor swp4 type unnumbered nv set vrf default router bgp neighbor swp5 peer-group hbn nv set vrf default router bgp neighbor swp5 type unnumbered nv set vrf default router bgp neighbor swp6 peer-group hbn nv set vrf default router bgp neighbor swp6 type unnumbered nv set vrf default router bgp neighbor swp7 peer-group hbn nv set vrf default router bgp neighbor swp7 type unnumbered nv set vrf default router bgp neighbor swp8 peer-group hbn nv set vrf default router bgp neighbor swp8 type unnumbered nv set vrf default router bgp neighbor swp9 peer-group hbn nv set vrf default router bgp neighbor swp9 type unnumbered nv set vrf default router bgp path-selection multipath aspath-ignore enabled nv set vrf default router bgp peer-group hbn address-family l2vpn-evpn state enabled nv set vrf default router bgp peer-group hbn remote-as external nv set vrf default router static 0.0.0.0/0 address-family ipv4-unicast nv set vrf default router static 0.0.0.0/0 via 10.0.123.254 type ipv4-address nv config apply -y nv config save

The SN2201 switch ( mgmt-switch ) is configured as follows:

SN2201 Switch Console Collapse Source Copy Copied! nv set interface swp1-3 link state up nv set interface swp1-3 type swp nv set interface swp1-3 bridge domain br_default nv set bridge domain br_default untagged 1 nv config apply -y nv config save

Warning Make sure that the BIOS settings on the worker node servers have SR-IOV enabled and that the servers are tuned for maximum performance. All worker nodes must have the same PCIe placement for the BlueField-3 NIC and must display the same interface name. Make sure that you have DPU BMC and OOB MAC addresses.

No change from the Reference Deployment Guide (Baseline RDG) (Section "Deployment and Configuration", Subsection " Host Configuration ").

No change from the Baseline RDG (Section "Deployment and Configuration", Subsection "Hypervisor Installation and Configuration").

No change from the Baseline RDG (Section "Deployment and Configuration", Subsection "Prepare Infrastructure Servers") regarding Firewall VM, Jump VM, MaaS VM.

(Optional) Firewall VM – Bare Metal Server Outside Conection

To provide outside connection from Bare Metal Host via High Speed network, open Firefox web browser and go to the pfSense web UI ( http://10.0.110.254 ).

System: Routing → Gateways → Add → “Interface”: OPT1 , “Address Family”: IPv4 , “Name”: switch , “Gateway”: 10.0.123.253 → Click "Save"→ Under "Default Gateway" - "Default gateway IPv4" choose WAN_DHCP → Click "Save" Note Note that the IP addresses from the Trusted LAN network under "Gateway" and "Monitor IP" are blurred.



No change from the Baseline RDG (Section "Deployment and Configuration", Subsection "Provision Master VMs Using MaaS").

The procedures for initial Kubernetes cluster deployment using Kubespray for the master nodes, and subsequent verification, remain unchanged from the Baseline RDG (Section "K8s Cluster Deployment and Configuration", Subsections: "Kubespray Deployment and Configuration", "Deploying Cluster Using Kubespray Ansible Playbook","K8s Deployment Verification".

The DPF installation process (Operator, System components) largely follows the Baseline RDG.

Start by installing the remaining software perquisites. Jump Node Console Collapse Source Copy Copied! ## Connect to master1 to copy helm client utility that was installed during kubespray deployment $ depuser@jump:~$ ssh master1 depuser@master1:~$ cp /usr/local/bin/helm /tmp/ ## In another tab depuser@jump:~$ scp master1:/tmp/helm /tmp/ depuser@jump:~$ sudo chown root:root /tmp/helm depuser@jump:~$ sudo mv /tmp/helm /usr/local/bin/ ## Verify that envsubst utility is installed depuser@jump:~$ which envsubst /usr/bin/envsubst Proceed to clone the doca-platform Git repository: Jump Node Console Collapse Source Copy Copied! $ git clone https://github.com/NVIDIA/doca-platform.git Change directory to doca-platform and checkout to tag v25.10.0: Jump Node Console Collapse Source Copy Copied! $ cd doca-platform/ $ git checkout v25.10.0 Change directory to readme.md from where all the commands will be run: Jump Node Console Collapse Source Copy Copied! $ cd doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn Modify the variables in manifests/00-env-vars/envvars.env to fit your environment, then source the file: Warning Replace the values for the variables in the following file with the values that fit your setup. Specifically, pay attention to DPUCLUSTER_INTERFACE , BMC_ROOT_PASSWORD , and DPU's serial number . To get a DPU's serial number you can use following command. Sample: $ curl -k -u root:'BMC root pasword' https://10.0.110.201/redfish/v1/Systems/Bluefield | jq -r '.SerialNumber | ascii_downcase' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 4970 100 4970 0 0 4211 0 0:00:01 0:00:01 --:--:-- 4211 mt2402xz0f7x

manifests/00-env-vars/envvars.env Collapse Source Copy Copied! export TARGETCLUSTER_API_SERVER_HOST=10.0.110.10 export DPUCLUSTER_VIP=10.0.110.200 export DPUCLUSTER_INTERFACE=ens160 export NFS_SERVER_IP=10.0.110.253 export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca export HBN_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_hbn export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca export TAG=v25.10.0 export BFB_URL= "https://content.mellanox.com/BlueField/BFBs/Ubuntu24.04/bf-bundle-3.2.1-34_25.11_ubuntu-24.04_64k_prod.bfb" export IP_RANGE_START=10.0.110.201 export IP_RANGE_END=10.0.110.204 export BMC_ROOT_PASSWORD=< set your BMC_ROOT_PASSWORD> export DPU1_SERIAL=mt2402xz0f7x export DPU2_SERIAL=mt2402xz0f80 export DPU2_SERIAL=mt2402xz0f9n export DPU2_SERIAL=mt2402xz0f8g export ARGUS_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_argus:1.0.0-doca3.1.0 Export environment variables for the installation: Jump Node Console Collapse Source Copy Copied! $ source manifests/00-env-vars/envvars.env

No change from the Baseline RDG (Section "DPF Installation", Subsection "DPF System Installation").

This section focuses on provisioning NVIDIA® BlueField®-3 DPUs using DPF and installing the HBN and Argus DPU Services on those DPUs. The DOCA HBN Service ensures that all workload traffic is routed through HBN before leaving the DPU, providing secure and policy-enforced network processing. The DOCA Argus Service performs live machine introspection directly on the BlueField DPU, enabling real-time detection of attacks, anomalies, and malicious behavior in AI workloads and microservices—without impacting host performance.

Before deploying the objects under doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn directory, a few adjustments are required.

Export environment variables for the installation: Jump Node Console Collapse Source Copy Copied! $ source manifests/00-env-vars/envvars.env Use the following YAML to define a BFB resource that downloads the Bluefield Bitstream to a shared volume: manifests/03.1-dpudeployment-installation-pf/bfb.yaml Collapse Source Copy Copied! --- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: BFB metadata: name: bf-bundle-$TAG namespace: dpf-operator-system spec: url: $BFB_URL Review the DPUFlavor using the following YAML: Note The settings below configure a DPU in Zero Trust mode, which means DPU management will be blocked from the bare-metal host. To deploy in DPU mode, comment out the line containing dpuMode : # dpuMode: zero-trust manifests/03.1-dpudeployment-installation-pf/hbn-dpuflavor.yaml Collapse Source Copy Copied! --- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUFlavor metadata: name: hbn-$TAG namespace: dpf-operator-system spec: dpuMode: zero-trust bfcfgParameters: - UPDATE_ATF_UEFI=yes - UPDATE_DPU_OS=yes - WITH_NIC_FW_UPDATE=yes configFiles: - operation: override path: /etc/mellanox/mlnx-bf.conf permissions: "0644" raw: | ALLOW_SHARED_RQ= "no" IPSEC_FULL_OFFLOAD= "no" ENABLE_ESWITCH_MULTIPORT= "yes" - operation: override path: /etc/mellanox/mlnx-ovs.conf permissions: "0644" raw: | CREATE_OVS_BRIDGES= "no" OVS_DOCA= "yes" - operation: override path: /etc/mellanox/mlnx-sf.conf permissions: "0644" raw: "" grub: kernelParameters: - console=hvc0 - console=ttyAMA0 - earlycon=pl011, 0x13010000 - fixrttc - net.ifnames= 0 - biosdevname= 0 - iommu.passthrough= 1 - cgroup_no_v1=net_prio,net_cls - hugepagesz=2048kB - hugepages= 3072 nvconfig: - device: '*' parameters: - PF_BAR2_ENABLE= 0 - PER_PF_NUM_SF= 1 - PF_TOTAL_SF= 20 - PF_SF_BAR_SIZE= 10 - NUM_PF_MSIX_VALID= 0 - PF_NUM_PF_MSIX_VALID= 1 - PF_NUM_PF_MSIX= 228 - INTERNAL_CPU_MODEL= 1 - INTERNAL_CPU_OFFLOAD_ENGINE= 0 - SRIOV_EN= 1 - NUM_OF_VFS= 46 - LAG_RESOURCE_ALLOCATION= 1 - LINK_TYPE_P1=ETH - LINK_TYPE_P2=ETH - EXP_ROM_UEFI_x86_ENABLE= 1 ovs: rawConfigScript: | _ovs-vsctl() { ovs-vsctl --no-wait --timeout 15 "$@" } _ovs-vsctl set Open_vSwitch . other_config:doca-init= true _ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones= 50000 _ovs-vsctl set Open_vSwitch . other_config:hw-offload= true _ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle= true _ovs-vsctl set Open_vSwitch . other_config:max-idle= 20000 _ovs-vsctl set Open_vSwitch . other_config:max-revalidator= 5000 _ovs-vsctl -- if -exists del-br ovsbr1 _ovs-vsctl -- if -exists del-br ovsbr2 _ovs-vsctl --may-exist add-br br-sfc _ovs-vsctl set bridge br-sfc datapath_type=netdev _ovs-vsctl set bridge br-sfc fail_mode=secure _ovs-vsctl --may-exist add-port br-sfc p0 _ovs-vsctl set Interface p0 type=dpdk _ovs-vsctl set Interface p0 mtu_request= 9216 _ovs-vsctl set Port p0 external_ids:dpf-type=physical _ovs-vsctl --may-exist add-port br-sfc p1 _ovs-vsctl set Interface p1 type=dpdk _ovs-vsctl set Interface p1 mtu_request= 9216 _ovs-vsctl set Port p1 external_ids:dpf-type=physical _ovs-vsctl --may-exist add-br br-hbn _ovs-vsctl set bridge br-hbn datapath_type=netdev _ovs-vsctl set bridge br-hbn fail_mode=secure Change the dpudeployment.yaml file to reference the DPUFlavor suited for performance. manifests/03.1-dpudeployment-installation-pf/dpudeployment.yaml Collapse Source Copy Copied! --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUDeployment metadata: name: hbn namespace: dpf-operator-system spec: dpus: bfb: bf-bundle-$TAG flavor: hbn-$TAG nodeEffect: noEffect: true dpuSets: - nameSuffix: "dpuset1" nodeSelector: matchLabels: feature.node.kubernetes.io/dpu-enabled: "true" services: doca-hbn: serviceTemplate: doca-hbn serviceConfiguration: doca-hbn argus: serviceConfiguration: argus serviceTemplate: argus serviceChains: switches: - ports: - serviceInterface: matchLabels: interface : p0 - service: name: doca-hbn interface : p0_if - ports: - serviceInterface: matchLabels: interface : p1 - service: name: doca-hbn interface : p1_if - ports: - serviceInterface: matchLabels: interface : pf0hpf - service: name: doca-hbn interface : pf0hpf_if Change the rest of the configuration files. As explained in the introduction, these files create service chains that connect two physical functions PF0 or PF0 to the outer fabric through HBN, providing EVPN VXLAN overlay, VNI based isolation, and ECMP redundancy across both DPU uplinks (p0 and p1). These are the configuration files. HBN DPUServiceConfig and DPUServiceTemplate to deploy HBN workloads to the DPUs. manifests/03.1-dpudeployment-installation-pf/hbn-dpuserviceconfig.yaml Collapse Source Copy Copied! --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: doca-hbn namespace: dpf-operator-system spec: deploymentServiceName: "doca-hbn" serviceConfiguration: serviceDaemonSet: annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name" : "iprequest" , "interface" : "ip_lo" , "cni-args" : { "poolNames" : [ "loopback" ], "poolType" : "cidrpool" }}, { "name" : "iprequest" , "interface" : "ip_pf0hpf_red" , "cni-args" : { "poolNames" : [ "pool1" ], "poolType" : "cidrpool" , "allocateDefaultGateway" : true }}, { "name" : "iprequest" , "interface" : "ip_pf0hpf_blue" , "cni-args" : { "poolNames" : [ "pool2" ], "poolType" : "cidrpool" , "allocateDefaultGateway" : true }} ] helmChart: values: configuration: perDPUValuesYAML: | - hostnamePattern: "*" values: bgp_peer_group: hbn # ---- DPU1, DPU2 => RED only ---- - hostnamePattern: "dpu-node-mt2402xz0f7x-mt2402xz0f7x*" values: role: RED vrf: RED vlan: 11 l2vni: 10010 l3vni: 100001 bgp_autonomous_system: 65101 - hostnamePattern: "dpu-node-mt2402xz0f80-mt2402xz0f80*" values: role: RED vrf: RED vlan: 11 l2vni: 10010 l3vni: 100001 bgp_autonomous_system: 65201 # ---- DPU3, DPU4 => BLUE only ---- - hostnamePattern: "dpu-node-mt2402xz0f9n-mt2402xz0f9n*" values: role: BLUE vrf: BLUE vlan: 21 l2vni: 10020 l3vni: 100002 bgp_autonomous_system: 65301 - hostnamePattern: "dpu-node-mt2402xz0f8g-mt2402xz0f8g*" values: role: BLUE vrf: BLUE vlan: 21 l2vni: 10020 l3vni: 100002 bgp_autonomous_system: 65401 startupYAMLJ2: | - header: model: bluefield nvue-api-version: nvue_v1 rev-id: 1.0 version: HBN 2.4 . 0 - set: bridge: domain: br_default: vlan: {{ config.vlan }}: vni: {{ config.l2vni }}: {} evpn: enable: on route-advertise: {} interface : lo: ip: address: {{ ipaddresses.ip_lo.ip }}/ 32 : {} type: loopback p0_if,p1_if,pf0hpf_if: type: swp link: mtu: 9000 pf0hpf_if: bridge: domain: br_default: access: {{ config.vlan }} vlan{{ config.vlan }}: type: svi vlan: {{ config.vlan }} ip: address: {% if config.role == "RED" %} {{ ipaddresses.ip_pf0hpf_red.cidr }}: {} {% else %} {{ ipaddresses.ip_pf0hpf_blue.cidr }}: {} {% endif %} vrf: {{ config.vrf }} nve: vxlan: arp-nd-suppress: on enable: on source: address: {{ ipaddresses.ip_lo.ip }} router: bgp: enable: on graceful-restart: mode: full vrf: default : router: bgp: address-family: ipv4-unicast: enable: on redistribute: connected: enable: on l2vpn-evpn: enable: on autonomous-system: {{ config.bgp_autonomous_system }} enable: on neighbor: p0_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered p1_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered path-selection: multipath: aspath-ignore: on peer-group: {{ config.bgp_peer_group }}: address-family: ipv4-unicast: enable: on l2vpn-evpn: enable: on remote-as: external router-id: {{ ipaddresses.ip_lo.ip }} {{ config.vrf }}: evpn: enable: on vni: {{ config.l3vni }}: {} loopback: ip: address: {{ ipaddresses.ip_lo.ip }}/ 32 : {} router: bgp: address-family: ipv4-unicast: enable: on redistribute: connected: enable: on route-export: to-evpn: enable: on autonomous-system: {{ config.bgp_autonomous_system }} enable: on router-id: {{ ipaddresses.ip_lo.ip }} interfaces: - name: p0_if network: mybrhbn - name: p1_if network: mybrhbn - name: pf0hpf_if network: mybrhbn manifests/03.1-dpudeployment-installation-pf/hbn-dpuservicetemplate.yaml Collapse Source Copy Copied! --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: doca-hbn namespace: dpf-operator-system spec: deploymentServiceName: "doca-hbn" helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: 1.0 . 5 chart: doca-hbn values: image: repository: $HBN_NGC_IMAGE_URL tag: 3.2 . 1 -doca3. 2.1 resources: memory: 6Gi nvidia.com/bf_sf: 4

Physical Interfaces for physical ports on the DPU. manifests/03.1-dpudeployment-installation-pf/physical-ifaces.yaml Collapse Source Copy Copied! --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: p0 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: uplink: "p0" spec: interfaceType: physical physical: interfaceName: p0 --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: p1 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: uplink: "p1" spec: interfaceType: physical physical: interfaceName: p1 --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: pf0hpf namespace: dpf-operator-system spec: template: spec: template: metadata: labels: interface : "pf0hpf" spec: interfaceType: pf pf: pfID: 0

DPU Service IPAM objects to set up IP Address Management on the DPUCluster. manifests/03.1-dpudeployment-installation-pf/hbn-ipam.yaml Collapse Source Copy Copied! --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: pool1 namespace: dpf-operator-system spec: ipv4Network: network: "10.0.121.0/24" gatewayIndex: 2 prefixSize: 29 # These preallocations are not necessary. We specify them so that the validation commands are straightforward. allocations: dpu-node-mt2402xz0f7x-mt2402xz0f7x: 10.0 . 121.0 / 29 dpu-node-mt2402xz0f80-mt2402xz0f80: 10.0 . 121.8 / 29 --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: pool2 namespace: dpf-operator-system spec: ipv4Network: network: "10.0.122.0/24" gatewayIndex: 2 prefixSize: 29 allocations: dpu-node-mt2402xz0f9n-mt2402xz0f9n: 10.0 . 122.0 / 29 dpu-node-mt2402xz0f8g-mt2402xz0f8g: 10.0 . 122.8 / 29 manifests/03.1-dpudeployment-installation-pf/hbn-loopback-ipam.yaml Collapse Source Copy Copied! --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: loopback namespace: dpf-operator-system spec: ipv4Network: network: "11.0.0.0/24" prefixSize: 32 Note It is necessary to set several environment variables before running this command. $ source manifests/00-env-vars/envvars.env Create the Argus-DPUServiceConfiguration.yaml file for the Argus service: manifests/03.1-dpudeployment-installation-pf/Argus-DPUServiceConfiguration.yaml Collapse Source Copy Copied! --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: argus namespace: dpf-operator-system spec: deploymentServiceName: argus serviceConfiguration: helmChart: values: config: isLocalPath: false containerImage: $ARGUS_NGC_IMAGE_URL Create the Argus-DPUServiceTemplate.yaml file for the Argus service: manifests/03.1-dpudeployment-installation-pf/Argus-DPUServiceTemplate.yaml Collapse Source Copy Copied! --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: argus namespace: dpf-operator-system spec: deploymentServiceName: argus helmChart: source: chart: doca-argus repoURL: $HELM_REGISTRY_REPO_URL version: 1.0 . 0 Apply all of the YAML files mentioned above using the following command: Jump Node Console Collapse Source Copy Copied! $ cat manifests/03.1-dpudeployment-installation-pf/*.yaml | envsubst | kubectl apply -f - Jump Node Console Collapse Source Copy Copied! $ kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices --all dpuservice.svc.dpu.nvidia.com/argus-d4f6z condition met dpuservice.svc.dpu.nvidia.com/cni-installer condition met dpuservice.svc.dpu.nvidia.com/doca-hbn-j9tx2 condition met dpuservice.svc.dpu.nvidia.com/flannel condition met dpuservice.svc.dpu.nvidia.com/multus condition met dpuservice.svc.dpu.nvidia.com/nvidia-k8s-ipam condition met dpuservice.svc.dpu.nvidia.com/ovs-cni condition met dpuservice.svc.dpu.nvidia.com/servicechainset-controller condition met dpuservice.svc.dpu.nvidia.com/servicechainset-rbac-and-crds condition met dpuservice.svc.dpu.nvidia.com/sfc-controller condition met dpuservice.svc.dpu.nvidia.com/sriov-device-plugin condition met $ kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all dpuserviceipam.svc.dpu.nvidia.com/loopback condition met dpuserviceipam.svc.dpu.nvidia.com/pool1 condition met dpuserviceipam.svc.dpu.nvidia.com/pool2 condition met $ kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-p0-if-fsmwc condition met dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-p1-if-7lrlp condition met dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-pf0hpf-if-ts78b condition met dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-pf1hpf-if-mtr6t condition met dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met dpuserviceinterface.svc.dpu.nvidia.com/p1 condition met dpuserviceinterface.svc.dpu.nvidia.com/pf0hpf condition met dpuserviceinterface.svc.dpu.nvidia.com/pf1hpf condition met $ kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all dpuservicechain.svc.dpu.nvidia.com/hbn-c9bsz condition met To follow the progress of DPU provisioning, run the following command to check its current phase: Jump Node Console Collapse Source Copy Copied! $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'" Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase' Dpu Node Name: dpu-node-mt2402xz0f7x Type: InternalIP Type: Hostname Last Transition Time: 2026-01-04T14:32:13Z Type: Ready Last Transition Time: 2026-01-04T14:06:41Z Type: BFBPrepared Last Transition Time: 2026-01-04T14:06:36Z Type: BFBReady Last Transition Time: 2026-01-04T14:11:19Z Type: BFBTransferred Last Transition Time: 2026-01-04T14:32:12Z Type: DPUClusterReady Last Transition Time: 2026-01-04T14:06:40Z Type: FWConfigured Last Transition Time: 2026-01-04T14:06:36Z Type: Initialized Last Transition Time: 2026-01-04T14:06:39Z Type: InterfaceInitialized Last Transition Time: 2026-01-04T14:06:37Z Type: NodeEffectReady Last Transition Time: 2026-01-04T14:32:12Z Type: NodeEffectRemoved Last Transition Time: 2026-01-04T14:18:07Z Reason: OemLastState Type: OSInstalled Last Transition Time: 2026-01-04T14:32:12Z Type: Rebooted Phase: Rebooting Dpu Node Name: dpu-node-mt2402xz0f80 Type: InternalIP Type: Hostname Last Transition Time: 2026-01-04T14:32:13Z Type: Ready Last Transition Time: 2026-01-04T14:06:40Z Type: BFBPrepared Last Transition Time: 2026-01-04T14:06:36Z Type: BFBReady Last Transition Time: 2026-01-04T14:11:17Z Type: BFBTransferred Last Transition Time: 2026-01-04T14:32:12Z Type: DPUClusterReady Last Transition Time: 2026-01-04T14:06:38Z Type: FWConfigured Last Transition Time: 2026-01-04T14:06:36Z Type: Initialized Last Transition Time: 2026-01-04T14:06:37Z Type: InterfaceInitialized Last Transition Time: 2026-01-04T14:06:36Z Type: NodeEffectReady Last Transition Time: 2026-01-04T14:32:13Z Type: NodeEffectRemoved Last Transition Time: 2026-01-04T14:18:18Z Reason: OemLastState Type: OSInstalled Last Transition Time: 2026-01-04T14:32:12Z Type: Rebooted Phase: Rebooting ... Wait for the Rebooted stage and then Power Cycle the bare-metal host manual. After the DPU is up, run following command for each DPU worker: Jump Node Console Collapse Source Copy Copied! $ kubectl annotate dpunodes -n dpf-operator-system --all provisioning.dpu.nvidia.com/dpunode-external-reboot-required- At this point, the DPU workers should be added to the cluster. As they being added to the cluster, the DPUs are provisioned. Jump Node Console Collapse Source Copy Copied! $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'" Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase' Dpu Node Name: dpu-node-mt2402xz0f7x Type: InternalIP Type: Hostname Last Transition Time: 2026-01-04T14:32:13Z Type: Ready Last Transition Time: 2026-01-04T14:06:41Z Type: BFBPrepared Last Transition Time: 2026-01-04T14:06:36Z Type: BFBReady Last Transition Time: 2026-01-04T14:11:19Z Type: BFBTransferred Last Transition Time: 2026-01-04T14:32:12Z Type: DPUClusterReady Last Transition Time: 2026-01-04T14:06:40Z Type: FWConfigured Last Transition Time: 2026-01-04T14:06:36Z Type: Initialized Last Transition Time: 2026-01-04T14:06:39Z Type: InterfaceInitialized Last Transition Time: 2026-01-04T14:06:37Z Type: NodeEffectReady Last Transition Time: 2026-01-04T14:32:12Z Type: NodeEffectRemoved Last Transition Time: 2026-01-04T14:18:07Z Reason: OemLastState Type: OSInstalled Last Transition Time: 2026-01-04T14:32:12Z Type: Rebooted Phase: Ready Dpu Node Name: dpu-node-mt2402xz0f80 Type: InternalIP Type: Hostname Last Transition Time: 2026-01-04T14:32:13Z Type: Ready Last Transition Time: 2026-01-04T14:06:40Z Type: BFBPrepared Last Transition Time: 2026-01-04T14:06:36Z Type: BFBReady Last Transition Time: 2026-01-04T14:11:17Z Type: BFBTransferred Last Transition Time: 2026-01-04T14:32:12Z Type: DPUClusterReady Last Transition Time: 2026-01-04T14:06:38Z Type: FWConfigured Last Transition Time: 2026-01-04T14:06:36Z Type: Initialized Last Transition Time: 2026-01-04T14:06:37Z Type: InterfaceInitialized Last Transition Time: 2026-01-04T14:06:36Z Type: NodeEffectReady Last Transition Time: 2026-01-04T14:32:13Z Type: NodeEffectRemoved Last Transition Time: 2026-01-04T14:18:18Z Reason: OemLastState Type: OSInstalled Last Transition Time: 2026-01-04T14:32:12Z Type: Rebooted Phase: Ready ... Finally, validate that all the different DPU-related objects are now in the Ready state: Jump Node Console Collapse Source Copy Copied! $ kubectl get secrets -n dpu-cplane-tenant1 dpu-cplane-tenant1-admin-kubeconfig -o json | jq -r '.data["admin.conf"]' | base64 --decode > /home/depuser/dpu-cluster.config $ echo "alias ki='KUBECONFIG=/home/depuser/dpu-cluster.config kubectl'" >> ~/.bashrc $ echo 'alias dpfctl="kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl "' >> ~/.bashrc $ dpfctl describe dpudeployments NAME NAMESPACE STATUS REASON SINCE MESSAGE DPFOperatorConfig/dpfoperatorconfig dpf-operator-system Ready: True Success 9m48s └─DPUDeployments └─DPUDeployment/hbn dpf-operator-system Ready: True Success 8m42s ├─DPUServiceChains │ └─DPUServiceChain/hbn-nwcrh dpf-operator-system Ready: True Success 9m35s ├─DPUServiceInterfaces │ └─3 DPUServiceInterfaces... dpf-operator-system Ready: True Success 9m40s See doca-hbn-p0-if-n5cnz, doca-hbn-p1-if-m8cz2, doca-hbn-pf0hpf-if-c4slv ├─DPUSets │ └─DPUSet/hbn-dpuset1 dpf-operator-system Ready: True Success 9m40s │ ├─BFB/bf-bundle-v25.10.0 dpf-operator-system Ready: True Ready 44m File: bf-bundle-3.2.1-34_25.11_ubuntu-24.04_64k_prod.bfb, DOCA: 3.2.1 │ ├─DPUNodes │ │ └─4 DPUNodes... dpf-operator-system Ready: True Ready 9m41s See dpu-node-mt2402xz0f7x, dpu-node-mt2402xz0f80, dpu-node-mt2402xz0f8g, dpu-node-mt2402xz0f9n │ └─DPUs │ └─4 DPUs... dpf-operator-system Ready: True DPUReady 9m40s See dpu-node-mt2402xz0f7x-mt2402xz0f7x, dpu-node-mt2402xz0f80-mt2402xz0f80, │ dpu-node-mt2402xz0f8g-mt2402xz0f8g, dpu-node-mt2402xz0f9n-mt2402xz0f9n └─Services ├─DPUServiceTemplates │ ├─DPUServiceTemplate/argus dpf-operator-system Ready: True Success 44m │ └─DPUServiceTemplate/doca-hbn dpf-operator-system Ready: True Success 35m └─DPUServices └─2 DPUServices... dpf-operator-system Ready: True Success 9m16s See argus-njfpf, doca-hbn-76gsm $ ki get node -A NAME STATUS ROLES AGE VERSION dpu-node-mt2402xz0f7x-mt2402xz0f7x Ready <none> 11m v1.34.3 dpu-node-mt2402xz0f80-mt2402xz0f80 Ready <none> 11m v1.34.3 dpu-node-mt2402xz0f8g-mt2402xz0f8g Ready <none> 11m v1.34.3 dpu-node-mt2402xz0f9n-mt2402xz0f9n Ready <none> 12m v1.34.3 $ kubectl get dpu -A NAMESPACE NAME READY PHASE AGE dpf-operator-system dpu-node-mt2402xz0f7x-mt2402xz0f7x True Ready 36m dpf-operator-system dpu-node-mt2402xz0f80-mt2402xz0f80 True Ready 36m dpf-operator-system dpu-node-mt2402xz0f8g-mt2402xz0f8g True Ready 36m dpf-operator-system dpu-node-mt2402xz0f9n-mt2402xz0f9n True Ready 36m $ kubectl wait --for=condition=ready --namespace dpf-operator-system dpu --all dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f7x-mt2402xz0f7x condition met dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f80-mt2402xz0f80 condition met dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f8g-mt2402xz0f8g condition met dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f9n-mt2402xz0f9n condition met $ ki get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dpf-operator-system dpu-cplane-tenant1-argus-njfpf-doca-argus-7qp7p 1/1 Running 0 11m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> dpf-operator-system dpu-cplane-tenant1-argus-njfpf-doca-argus-9hw7x 1/1 Running 0 11m 10.0.110.214 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system dpu-cplane-tenant1-argus-njfpf-doca-argus-pzk5b 1/1 Running 0 11m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system dpu-cplane-tenant1-argus-njfpf-doca-argus-rsjlv 1/1 Running 0 11m 10.0.110.213 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system dpu-cplane-tenant1-cni-installer-8gdw9 1/1 Running 0 12m 10.244.0.4 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system dpu-cplane-tenant1-cni-installer-kcjmg 1/1 Running 0 12m 10.244.1.3 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system dpu-cplane-tenant1-cni-installer-xllw8 1/1 Running 0 12m 10.244.3.2 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> dpf-operator-system dpu-cplane-tenant1-cni-installer-z7ncl 1/1 Running 0 12m 10.244.2.2 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system dpu-cplane-tenant1-doca-hbn-76gsm-ds-c6xfm 2/2 Running 0 11m 10.244.1.5 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system dpu-cplane-tenant1-doca-hbn-76gsm-ds-dgp2f 2/2 Running 0 11m 10.244.3.4 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> dpf-operator-system dpu-cplane-tenant1-doca-hbn-76gsm-ds-hckq9 2/2 Running 0 11m 10.244.0.6 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system dpu-cplane-tenant1-doca-hbn-76gsm-ds-qdhl2 2/2 Running 0 11m 10.244.2.4 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system dpu-cplane-tenant1-nvidia-k8s-ipam-controller-5c77854fcc-l97cq 1/1 Running 0 148m 10.244.1.2 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system dpu-cplane-tenant1-nvidia-k8s-ipam-node-ds-7pfz5 1/1 Running 0 12m 10.244.1.4 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system dpu-cplane-tenant1-nvidia-k8s-ipam-node-ds-kmfcz 1/1 Running 0 12m 10.244.0.3 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system dpu-cplane-tenant1-nvidia-k8s-ipam-node-ds-psdr7 1/1 Running 0 12m 10.244.3.3 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> dpf-operator-system dpu-cplane-tenant1-nvidia-k8s-ipam-node-ds-sj86l 1/1 Running 0 12m 10.244.2.3 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system dpu-cplane-tenant1-ovs-cni-arm64-2sjmh 1/1 Running 0 12m 10.0.110.214 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system dpu-cplane-tenant1-ovs-cni-arm64-c9px6 1/1 Running 0 12m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system dpu-cplane-tenant1-ovs-cni-arm64-w7sgb 1/1 Running 0 12m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> dpf-operator-system dpu-cplane-tenant1-ovs-cni-arm64-wmxsg 1/1 Running 0 12m 10.0.110.213 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system dpu-cplane-tenant1-sfc-controller-node-ds-b85jv 1/1 Running 1 (11m ago) 12m 10.0.110.214 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system dpu-cplane-tenant1-sfc-controller-node-ds-bjvtb 1/1 Running 2 (11m ago) 12m 10.0.110.213 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system dpu-cplane-tenant1-sfc-controller-node-ds-f4scv 1/1 Running 0 12m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system dpu-cplane-tenant1-sfc-controller-node-ds-rkrsb 1/1 Running 1 (11m ago) 12m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> dpf-operator-system kube-flannel-ds-4h4l9 1/1 Running 0 12m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system kube-flannel-ds-bndpt 1/1 Running 0 12m 10.0.110.214 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system kube-flannel-ds-kxlm7 1/1 Running 0 13m 10.0.110.213 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system kube-flannel-ds-mm2jl 1/1 Running 0 12m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> dpf-operator-system kube-multus-ds-7jzp6 1/1 Running 0 12m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system kube-multus-ds-tg67p 1/1 Running 0 12m 10.0.110.213 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system kube-multus-ds-xpglw 1/1 Running 0 12m 10.0.110.214 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system kube-multus-ds-zc6h8 1/1 Running 0 12m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> dpf-operator-system kube-sriov-device-plugin-82gfh 1/1 Running 0 12m 10.0.110.213 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpf-operator-system kube-sriov-device-plugin-g26lk 1/1 Running 0 12m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpf-operator-system kube-sriov-device-plugin-nnt8c 1/1 Running 0 12m 10.0.110.214 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpf-operator-system kube-sriov-device-plugin-vzlnw 1/1 Running 0 12m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> kube-system coredns-66bc5c9577-nzbtq 1/1 Running 0 148m 10.244.0.2 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> kube-system coredns-66bc5c9577-s2qnl 1/1 Running 0 148m 10.244.0.5 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> kube-system kube-proxy-54sqk 1/1 Running 0 12m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> kube-system kube-proxy-76t4q 1/1 Running 0 13m 10.0.110.213 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> kube-system kube-proxy-trz5g 1/1 Running 0 12m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> kube-system kube-proxy-znfw6 1/1 Running 0 12m 10.0.110.214 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> Congratulations! The DPF system with the HBN service has been successfully installed.

Here's a step-by-step procedure to check the Zero-Trust Mode on your NVIDIA BlueField DPU from the host server, including the installation of the Mellanox Firmware Tools (MFT).

Note Ubuntu 24.04 was installed on the servers.

Navigate to the NVIDIA Downloads Site: Open your web browser and go to the official NVIDIA Mellanox software downloads page.

Select the Latest Version for your OS: Transfer and Extract MFT Tools on the Worker 1 BareMetal Host. First Pod Console Collapse Source Copy Copied! root@worker1:~# tar -xvzf /tmp/mft-4.33.0-169-x86_64-deb.tgz Navigate into the Extracted Directory. First Pod Console Collapse Source Copy Copied! root@worker1:~# cd mft-4.33.0-169-x86_64-deb/ Run following commands. First Pod Console Collapse Source Copy Copied! root@worker1:~# apt-get install gcc make dkms root@worker1:~# ./install.sh Start MST (Mellanox Software Tools) Service and Identify DPU Device Name. First Pod Console Collapse Source Copy Copied! root@worker1:~# mst start Starting MST (Mellanox Software Tools) driver set Loading MST PCI module - Success Loading MST PCI configuration module - Success Create devices Unloading MST PCI module (unused) - Success root@worker1:~# mst status MST modules: ------------ MST PCI module is not loaded MST PCI configuration module loaded MST devices: ------------ /dev/mst/mt41692_pciconf0 - PCI configuration cycles access. domain:bus:dev.fn=0000:2b:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1 Chip revision is: 01 Perform Zero-Trust Checking. First Pod Console Collapse Source Copy Copied! root@worker1:~# mlxprivhost -d 2b:00.0 q Host configurations ------------------- level : RESTRICTED Port functions status: ----------------------- disable_rshim : TRUE disable_tracer : TRUE disable_port_owner : TRUE disable_counter_rd : TRUE #Expected Zero-Trust Output. This is the most definitive confirmation. level : RESTRICTED means the host is in Zero-Trust Mode, and the TRUE flags confirm individual security restrictions are active.

Check Firmware Access with mlxfwmanager : First Pod Console Collapse Source Copy Copied! root@worker1:~# mlxfwmanager -d 2b:00.0 --query Querying Mellanox devices firmware ... Device #1: ---------- Device Type: BlueField3 Part Number: -- Description: PSID: PCI Device Name: 2b:00.0 Base MAC: N/A Versions: Current Available FW -- Status: Failed to open device "Failed to open device" indicates the host is blocked from accessing the DPU for firmware operations, a key aspect of Zero-Trust.

Check Device Configuration with mlxconfig : First Pod Console Collapse Source Copy Copied! root@worker1:~# mlxconfig -d 2b:00.0 q Device #1: ---------- Device type: BlueField3 Name: 900-9D3B6-00CV-A_Ax Description: NVIDIA BlueField-3 B3220 P-Series FHHL DPU; 200GbE (default mode) / NDR200 IB; Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Enabled Device: 2b:00.0 Configurations: Next Boot ... ALLOW_RD_COUNTERS True(1) # No RO, but restricted by mlxprivhost ... PORT_OWNER True(1) # No RO, but restricted by mlxprivhost ... TRACER_ENABLE True(1) # No RO, but restricted by mlxprivhost Most configuration parameters will be prefixed with RO (Read-Only). Parameters related to direct host control, like PORT_OWNER , ALLOW_RD_COUNTERS , TRACER_ENABLE , even if shown as True(1) for the DPU's internal capability, will be unenforcible by the host due to the mlxprivhost restrictions. The widespread RO status shows that the host cannot modify these configurations, reinforcing the DPU's autonomous and secure state. The few parameters without RO are still overridden by the mlxprivhost security policy.

Check Low-Level Hardware Access with ethtool : First Pod Console Collapse Source Copy Copied! root@worker1:~# ethtool -d ens1f0np0 Cannot get register dump: Operation not supported This confirms the DPU is preventing deep, low-level hardware access from the host, aligning with Zero-Trust's isolation goals.

Conclusion

The command outputs of mlxprivhost , mlxfwmanager , mlxconfig (showing RO flags), and ethtool (showing "Operation not supported"), then your NVIDIA BlueField DPU is indeed operating in Zero-Trust Mode.

This means the host has significantly restricted privileges and cannot perform sensitive operations on the DPU, ensuring its security and isolation.

Verify the deployment and confirm that the DPU system achieves link-speed performance and low latency by running various tests:

Iperf TCP—for bandwidth measurements RDMA—for bandwidth and latency measurements Network isolation

Each test is described in detail. At the end of each test, the achieved performance is displayed.

Note Make sure that the servers are tuned for maximum performance (not covered in this document).

Now that the test deployment is running, perform bandwidth and latency performance tests between two bare-metal workload servers.

Note Ubuntu 24.04 was installed on the servers.

Before running the tests, check the Gateway address and BGP configuration on each HBN pod: Jump Node Console Collapse Source Copy Copied! $ ki -n dpf-operator-system get pod -o wide | grep doca-hbn dpu-cplane-tenant1-doca-hbn-tt72p-ds-2b4xb 2/2 Running 0 47s 10.244.0.26 dpu-node-mt2402xz0f9n-mt2402xz0f9n <none> <none> dpu-cplane-tenant1-doca-hbn-tt72p-ds-4c68s 2/2 Running 0 47s 10.244.2.22 dpu-node-mt2402xz0f8g-mt2402xz0f8g <none> <none> dpu-cplane-tenant1-doca-hbn-tt72p-ds-7chzl 2/2 Running 0 47s 10.244.1.24 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none> dpu-cplane-tenant1-doca-hbn-tt72p-ds-l7ct6 2/2 Running 0 47s 10.244.3.23 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none> $ ki exec -it -n dpf-operator-system dpu-cplane-tenant1-doca-hbn-tt72p-ds-7chzl -- bash Defaulted container "doca-hbn" out of: doca-hbn, hbn-sidecar, hbn-init (init) root@dpu-cplane-tenant1-doca-hbn-tt72p-ds-7chzl:/tmp# ip a s ... 9: vlan11@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue master RED state UP group default qlen 1000 link/ether 2e:5d:83:20:c7:0d brd ff:ff:ff:ff:ff:ff inet 10.0.121.2/29 scope global vlan11 valid_lft forever preferred_lft forever inet6 fe80::2c5d:83ff:fe20:c70d/64 scope link valid_lft forever preferred_lft forever ... # vtysh # show bgp summary IPv4 Unicast Summary (VRF default): BGP router identifier 11.0.0.0, local AS number 65101 vrf-id 0 BGP table version 12 RIB entries 12, using 2688 bytes of memory Peers 2, using 40 KiB of memory Peer groups 1, using 64 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc clx-swx-056(p0_if) 4 65001 802 798 0 0 0 00:38:36 6 7 N/A clx-swx-056(p1_if) 4 65001 1028 1024 0 0 0 00:49:56 6 7 N/A Total number of neighbors 2 L2VPN EVPN Summary (VRF default): BGP router identifier 11.0.0.0, local AS number 65101 vrf-id 0 BGP table version 0 RIB entries 15, using 3360 bytes of memory Peers 2, using 40 KiB of memory Peer groups 1, using 64 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc clx-swx-056(p0_if) 4 65001 802 798 0 0 0 00:38:36 15 20 N/A clx-swx-056(p1_if) 4 65001 1028 1024 0 0 0 00:49:56 15 20 N/A Total number of neighbors 2 # show ip bgp BGP table version is 7, local router ID is 11.0.0.3, vrf id 0 Default local pref 100, local AS 65301 Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath, + multipath nhg, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 p0_if 0 0 65001 ? *> 10.0.120.0/22 p0_if 0 0 65001 ? *> 11.0.0.0/32 p0_if 0 65001 65101 ? *> 11.0.0.1/32 p0_if 0 65001 65201 ? *> 11.0.0.2/32 p0_if 0 65001 65401 ? *> 11.0.0.3/32 0.0.0.0(dpu-cplane-tenant1-doca-hbn-tt72p-ds-2b4xb) 0 32768 ? *> 11.0.0.101/32 p0_if 0 0 65001 ? # exit $ exit Connect to a first Workload Server console, install iperf, perftest, check DPU Hight Speed Interfaces, set route to ethernet and identify the relevant RDMA device: First Pod Console Collapse Source Copy Copied! root@worker1:~# apt install iperf3 root@worker1:~# apt install perftest root@worker1:~# ip a s ... 6: ens1f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 58:a2:e1:73:69:e6 brd ff:ff:ff:ff:ff:ff altname enp43s0f0np0 ... root@worker1:~# ip route add 10.0.123.0/22 via 10.0.121.2 depuser@worker2:~$ ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=5.35 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=5.10 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=117 time=5.15 ms root@worker1:~# rdma link | grep ens1f0np0 link mlx5_0/1 state DOWN physical_state DISABLED netdev ens1f0np0 Configure the ens1f0np0 interface on Ubuntu 24.04 using iproute2 . Configuration Overview Interface IP Address Default Gateway ens1f0np0 10.0.121.1/29 10.0.121.2/29 First Pod Console Collapse Source Copy Copied! # Bring up physical interfaces root@worker1:~# ip link set dev ens1f0np0 up # Assign IP addresses root@worker1:~# ip addr add 10.0.121.1/29 dev ens1f0np0 # Set default route root@worker1:~# ip route add default via 10.0.121.2 dev ens1f0np0 Using another console window , reconnect to the jump node and connect to a second Workload Server . From within the servers, install iperf, perftest , check DPU Hight Speed Interfaces, set route to ethernet and identify the relevant RDMA device: Second Pod Console Collapse Source Copy Copied! root @worker2 :~# apt install iperf3 root @worker2 :~# apt install perftest root @worker2 :~# ip a s ... 6 : ens1f0np0: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000 link/ether 58 :a2:e1: 73 :6a: 58 brd ff:ff:ff:ff:ff:ff altname enp43s0f0np0 ... root @worker2 :~# ip route add 10.0 . 123.0 / 22 via 10.0 . 121.10 depuser @worker2 :~$ ping 8.8 . 8.8 PING 8.8 . 8.8 ( 8.8 . 8.8 ) 56 ( 84 ) bytes of data. 64 bytes from 8.8 . 8.8 : icmp_seq= 1 ttl= 117 time= 5.35 ms 64 bytes from 8.8 . 8.8 : icmp_seq= 2 ttl= 117 time= 5.10 ms 64 bytes from 8.8 . 8.8 : icmp_seq= 3 ttl= 117 time= 5.15 ms root @worker2 :~# rdma link | grep ens1f0np0 link mlx5_0/ 1 state DOWN physical_state DISABLED netdev ens1f0np0 Configure the ens1f0np0 interface on Ubuntu 24.04 using iproute2 . Configuration Overview Interface IP Address Default Gateway ens1f0np0 10.0.121.9/29 10.0.121.10/29 First Pod Console Collapse Source Copy Copied! # Bring up physical interfaces root@worker2:~# ip link set dev ens1f0np0 up # Assign IP addresses root@worker2:~# ip addr add 10.0.121.9/29 dev ens1f0np0 # Set default route root@worker2:~# ip route add default via 10.0.121.10 dev ens1f0np0 Repeat the step 1-5 on your Nodes 3 and 4.

Move back to the first server console.

Start the iperf3 server side: First BM Server Console Collapse Source Copy Copied! root @worker1 :~# iperf3 -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 128 KByte ( default ) ------------------------------------------------------------ Move to the second server console. Start the iperf client side: Second BM Server Console Collapse Source Copy Copied! root @worker2 :~# iperf3 -c 10.0 . 121.1 -P 16 ------------------------------------------------------------ Client connecting to 10.0 . 121.1 , TCP port 5001 TCP window size: 16.0 KByte ( default ) ------------------------------------------------------------ [ 9 ] local 10.0 . 121.9 port 48620 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 827 ) [ 10 ] local 10.0 . 121.9 port 48610 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 881 ) [ 1 ] local 10.0 . 121.9 port 48712 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 608 ) [ 14 ] local 10.0 . 121.9 port 48728 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 722 ) [ 11 ] local 10.0 . 121.9 port 48710 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 870 ) [ 4 ] local 10.0 . 121.9 port 48622 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 945 ) [ 7 ] local 10.0 . 121.9 port 48690 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 906 ) [ 15 ] local 10.0 . 121.9 port 48736 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 689 ) [ 2 ] local 10.0 . 121.9 port 48616 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 796 ) [ 3 ] local 10.0 . 121.9 port 48618 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 940 ) [ 12 ] local 10.0 . 121.9 port 48706 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 892 ) [ 16 ] local 10.0 . 121.9 port 48696 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 810 ) [ 8 ] local 10.0 . 121.9 port 48626 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 801 ) [ 6 ] local 10.0 . 121.9 port 48692 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 891 ) [ 5 ] local 10.0 . 121.9 port 48624 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 931 ) [ 13 ] local 10.0 . 121.9 port 48686 connected with 10.0 . 121.1 port 5001 (icwnd/mss/irtt= 14 / 1448 / 903 ) [ ID] Interval Transfer Bandwidth [ 3 ] 0.0000 - 10.0058 sec 14.1 GBytes 12.1 Gbits/sec [ 13 ] 0.0000 - 10.0057 sec 14.2 GBytes 12.2 Gbits/sec [ 7 ] 0.0000 - 10.0056 sec 13.4 GBytes 11.5 Gbits/sec [ 12 ] 0.0000 - 10.0057 sec 15.2 GBytes 13.1 Gbits/sec [ 4 ] 0.0000 - 10.0058 sec 14.1 GBytes 12.1 Gbits/sec [ 11 ] 0.0000 - 10.0058 sec 15.8 GBytes 13.6 Gbits/sec [ 8 ] 0.0000 - 10.0057 sec 13.9 GBytes 11.9 Gbits/sec [ 9 ] 0.0000 - 10.0058 sec 13.8 GBytes 11.9 Gbits/sec [ 15 ] 0.0000 - 10.0057 sec 14.3 GBytes 12.3 Gbits/sec [ 16 ] 0.0000 - 10.0058 sec 14.6 GBytes 12.5 Gbits/sec [ 1 ] 0.0000 - 10.0057 sec 14.6 GBytes 12.6 Gbits/sec [ 6 ] 0.0000 - 10.0058 sec 13.1 GBytes 11.3 Gbits/sec [ 14 ] 0.0000 - 10.0059 sec 13.6 GBytes 11.6 Gbits/sec [ 10 ] 0.0000 - 10.0055 sec 13.5 GBytes 11.6 Gbits/sec [ 2 ] 0.0000 - 10.0057 sec 14.0 GBytes 12.0 Gbits/sec [ 5 ] 0.0000 - 10.0058 sec 14.6 GBytes 12.6 Gbits/sec [SUM] 0.0000 - 10.0010 sec 227 GBytes 195 Gbits/sec

Return to the first server console.

Start the ib_read_lat server side: First BM Server Console Collapse Source Copy Copied! root @worker1 :~# ib_read_lat -F -n 20000 -d mlx5_0 ************************************ * Waiting for client to connect... * ************************************ Move to the second server console. Start the ib_read_lat client side:



Second BM Server Console Collapse Source Copy Copied! root @worker2 :~# ib_read_lat -F -n 20000 -d mlx5_0 10.0 . 121.1 --------------------------------------------------------------------------------------- RDMA_Read Latency Test Dual-port : OFF Device : mlx5_0 Number of qps : 1 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth : 1 Mtu : 1024 [B] Link type : Ethernet GID index : 3 Outstand reads : 16 rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x0048 PSN 0x77ae88 OUT 0x10 RKey 0x186ded VAddr 0x005fe0b3e3a000 GID: 00 : 00 : 00 : 00 : 00 : 00 : 00 : 00 : 00 : 00 : 255 : 255 : 10 : 00 : 121 : 09 remote address: LID 0000 QPN 0x0048 PSN 0x51948d OUT 0x10 RKey 0x186ded VAddr 0x00577584a67000 GID: 00 : 00 : 00 : 00 : 00 : 00 : 00 : 00 : 00 : 00 : 255 : 255 : 10 : 00 : 121 : 01 --------------------------------------------------------------------------------------- #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99 % percentile[usec] 99.9 % percentile[usec] 2 20000 3.98 65.30 4.08 7.89 7.17 31.51 36.33 ---------------------------------------------------------------------------------------

Return to the first server console.

Start the ib_write_bw server side: First BM Server Console Collapse Source Copy Copied! root @worker1 :~# ib_write_bw -s 1048576 -F -D 30 -q 64 -d mlx5_0 ************************************ * Waiting for client to connect... * ************************************ Move to the second server console. Start the ib_write_bw client side: Second BM Server Console Collapse Source Copy Copied! root @worker2 :~# ib_write_bw -s 1048576 -F -D 30 -q 64 -d mlx5_0 10.0 . 121.1 --report_gbit --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_0 Number of qps : 64 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth : 128 CQ Moderation : 1 Mtu : 1024 [B] Link type : Ethernet GID index : 3 Max inline data : 0 [B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- … --------------------------------------------------------------------------------------- #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] 1048576 439217 0.00 230.83 0.027517 ---------------------------------------------------------------------------------------

Finally, verify that the two servers running on different networks—using virtual functions on PF0 and PF0 can't communicate with each other.

Connect to the first workload server, with the PF0 network, and try to ping the PF0 on second node .

Run the ping command from PF0 to PF0 : First BM Server Console Collapse Source Copy Copied! root @worker1 :~# ping -c 3 10.0 . 121.9 PING 10.0 . 121.9 ( 10.0 . 121.9 ) 56 ( 84 ) bytes of data. 64 bytes from 10.0 . 121.9 : icmp_seq= 1 ttl= 62 time= 0.885 ms 64 bytes from 10.0 . 121.9 : icmp_seq= 2 ttl= 62 time= 0.273 ms 64 bytes from 10.0 . 121.9 : icmp_seq= 3 ttl= 62 time= 0.214 ms Try to ping the PF0 on nodes 3 and 4 . Run the ping commands from PF0 to PF0 : First BM Server Console Collapse Source Copy Copied! root @worker1 :~# ping -c 3 10.0 . 122.1 PING 10.0 . 122.1 ( 10.0 . 122.1 ) 56 ( 84 ) bytes of data. From 10.0 . 121.2 icmp_seq= 1 Destination Host Unreachable From 10.0 . 121.2 icmp_seq= 2 Destination Host Unreachable From 10.0 . 121.2 icmp_seq= 3 Destination Host Unreachable --- 10.0 . 122.1 ping statistics --- 3 packets transmitted, 0 received, + 3 errors, 100 % packet loss, time 2037ms root @worker1 :~# ping -c 3 10.0 . 122.9 PING 10.0 . 122.1 ( 10.0 . 122.1 ) 56 ( 84 ) bytes of data. From 10.0 . 121.2 icmp_seq= 1 Destination Host Unreachable From 10.0 . 121.2 icmp_seq= 2 Destination Host Unreachable From 10.0 . 121.2 icmp_seq= 3 Destination Host Unreachable --- 10.0 . 122.1 ping statistics --- 3 packets transmitted, 0 received, + 3 errors, 100 % packet loss, time 2031ms

This ping operation should fail due to the network isolation implemented in HBN using different VLANs, VNIs and VRFs.

Here's a step-by-step procedure to check the DOCA Argus service on your NVIDIA BlueField DPU.

Note Ubuntu 24.04 was installed on the servers.

Open the first worker server console. First BM Server Console Collapse Source Copy Copied! $ ssh worker1 Add iommu configuration in the /etc/default/grub file: First BM Server Console Collapse Source Copy Copied! root @worker1 :~# vim /etc/ default /grub ## Add iommu=pt intel_iommu=on in GRUB_CMDLINE_LINUX_DEFAULT parameter GRUB_CMDLINE_LINUX_DEFAULT= "iommu.passthrough=1 intel_iommu=on" Reboot the server. Second BM Server Console Collapse Source Copy Copied! root @worker1 :~# reboot For test we will run the sleep 100 command. Second BM Server Console Collapse Source Copy Copied! root @worker1 :~# sleep 100 & C onnect to the first DPU OOB over SSH and change the OOB ubuntu's user password(d efault password is ubuntu). Second BM Server Console Collapse Source Copy Copied! root @worker1 :~# ssh ubuntu @10 .0. 110.211 Run following command to see Argus log events about the sleep 100 process on the worker host. Second BM Server Console Collapse Source Copy Copied! ubuntu @dpu -node-mt2402xz0f7x-mt2402xz0f7x:~$ tr -d '\000-\037' < /var/log/doca_argus_activity_report/doca_argus_log_MT2402XZ0F7XMLNXS0D0F0.log | jq 'select(.activity_data.process_details.process_name == "sleep") | .activity_data' -C | less -R { "name" : "process_created" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "process_container_id" : "" } } { "name" : "thread_created" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" }, "thread_details" : { "thread_id" : "2067" , "thread_self_exec_id" : "10" , "thread_exit_state" : "0" } } { "name" : "new_file_mapped" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" }, "process_memory_details" : { "process_id" : "2067" , "virtual_memory_area_start_address" : "103842736050176" , "virtual_memory_area_end_address" : "103842736066560" , "memory_permissions" : "r-x" , "virtual_memory_area_file_structure" : "18393486039071318016" , "is_main_process_executable" : "1" , "file_path" : "/usr/bin/sleep" , "file_name" : "sleep" }, "process_attestation_details" : { "elf_file_inode_number" : "14287898" , "elf_file_name" : "sleep" , "elf_file_path" : "/usr/bin/sleep" , "elf_file_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "elf_file_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "elf_file_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "elf_file_size_bytes" : "35336" , "elf_file_process_executable_state" : "1" , "elf_file_type" : "ET_DYN + INTERP segment - Executable file" } } { "name" : "foreign_binary_executed" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" }, "process_memory_details" : { "process_id" : "2067" , "virtual_memory_area_start_address" : "103842736050176" , "virtual_memory_area_end_address" : "103842736066560" , "memory_permissions" : "r-x" , "virtual_memory_area_file_structure" : "18393486039071318016" , "is_main_process_executable" : "1" , "file_path" : "/usr/bin/sleep" , "file_name" : "sleep" }, "process_attestation_details" : { "elf_file_inode_number" : "14287898" , "elf_file_name" : "sleep" , "elf_file_path" : "/usr/bin/sleep" , "elf_file_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "elf_file_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "elf_file_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "elf_file_size_bytes" : "35336" , "elf_file_process_executable_state" : "1" , "elf_file_type" : "ET_DYN + INTERP segment - Executable file" } } { "name" : "new_file_mapped" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" }, "process_memory_details" : { "process_id" : "2067" , "virtual_memory_area_start_address" : "132709628227584" , "virtual_memory_area_end_address" : "132709628403712" , "memory_permissions" : "r-x" , "virtual_memory_area_file_structure" : "18393486039071323648" , "is_main_process_executable" : "0" , "file_path" : "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2" , "file_name" : "ld-linux-x86-64.so.2" }, "process_attestation_details" : { "elf_file_inode_number" : "14321201" , "elf_file_name" : "ld-linux-x86-64.so.2" , "elf_file_path" : "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2" , "elf_file_hash_sha256" : "4f961aefd1ecbc91b6de5980623aa389ca56e8bfb5f2a1d2a0b94b54b0fde894" , "elf_file_hash_sha1" : "d6878eaa6b21fc4eee9d5e441bbf2df102f850aa" , "elf_file_hash_md5" : "9d4fdd5d382e1212c9f793974ee0f44a" , "elf_file_size_bytes" : "236616" , "elf_file_process_executable_state" : "0" , "elf_file_type" : "ET_DYN - Shared object" } } { "name" : "foreign_library_loaded" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" }, "process_memory_details" : { "process_id" : "2067" , "virtual_memory_area_start_address" : "132709628227584" , "virtual_memory_area_end_address" : "132709628403712" , "memory_permissions" : "r-x" , "virtual_memory_area_file_structure" : "18393486039071323648" , "is_main_process_executable" : "0" , "file_path" : "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2" , "file_name" : "ld-linux-x86-64.so.2" }, "process_attestation_details" : { "elf_file_inode_number" : "14321201" , "elf_file_name" : "ld-linux-x86-64.so.2" , "elf_file_path" : "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2" , "elf_file_hash_sha256" : "4f961aefd1ecbc91b6de5980623aa389ca56e8bfb5f2a1d2a0b94b54b0fde894" , "elf_file_hash_sha1" : "d6878eaa6b21fc4eee9d5e441bbf2df102f850aa" , "elf_file_hash_md5" : "9d4fdd5d382e1212c9f793974ee0f44a" , "elf_file_size_bytes" : "236616" , "elf_file_process_executable_state" : "0" , "elf_file_type" : "ET_DYN - Shared object" } } { "name" : "new_file_mapped" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" }, "process_memory_details" : { "process_id" : "2067" , "virtual_memory_area_start_address" : "132709624217600" , "virtual_memory_area_end_address" : "132709625823232" , "memory_permissions" : "r-x" , "virtual_memory_area_file_structure" : "18393486039071319808" , "is_main_process_executable" : "0" , "file_path" : "/usr/lib/x86_64-linux-gnu/libc.so.6" , "file_name" : "libc.so.6" }, "process_attestation_details" : { "elf_file_inode_number" : "14321204" , "elf_file_name" : "libc.so.6" , "elf_file_path" : "/usr/lib/x86_64-linux-gnu/libc.so.6" , "elf_file_hash_sha256" : "de259f5276c4a991f78bf87225d6b40e56edbffe0dcbc0ffca36ec7fe30f3f77" , "elf_file_hash_sha1" : "5b02e178d9ded9b8c37a605e7a233687aa45f72f" , "elf_file_hash_md5" : "289071786eab0c1910da49b2b1bfd377" , "elf_file_size_bytes" : "2125328" , "elf_file_process_executable_state" : "0" , "elf_file_type" : "ET_DYN + INTERP segment - Executable file" } } { "name" : "foreign_library_loaded" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" }, "process_memory_details" : { "process_id" : "2067" , "virtual_memory_area_start_address" : "132709624217600" , "virtual_memory_area_end_address" : "132709625823232" , "memory_permissions" : "r-x" , "virtual_memory_area_file_structure" : "18393486039071319808" , "is_main_process_executable" : "0" , "file_path" : "/usr/lib/x86_64-linux-gnu/libc.so.6" , "file_name" : "libc.so.6" }, "process_attestation_details" : { "elf_file_inode_number" : "14321204" , "elf_file_name" : "libc.so.6" , "elf_file_path" : "/usr/lib/x86_64-linux-gnu/libc.so.6" , "elf_file_hash_sha256" : "de259f5276c4a991f78bf87225d6b40e56edbffe0dcbc0ffca36ec7fe30f3f77" , "elf_file_hash_sha1" : "5b02e178d9ded9b8c37a605e7a233687aa45f72f" , "elf_file_hash_md5" : "289071786eab0c1910da49b2b1bfd377" , "elf_file_size_bytes" : "2125328" , "elf_file_process_executable_state" : "0" , "elf_file_type" : "ET_DYN + INTERP segment - Executable file" } } { "name" : "process_terminated" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" } } { "name" : "thread_terminated" , "process_details" : { "process_id" : "2067" , "process_name" : "sleep" , "process_self_exec_id" : "10" , "process_parent_process_id" : "2055" , "process_cpu_clock_cycles" : "1139964" , "process_real_group_id" : "0" , "process_real_user_id" : "0" , "process_command_line_arguments" : "sleep 100" , "process_creation_time_nanoseconds" : "977145605" , "process_state" : "RUNNING" , "process_pid_namespace" : "4026531836" , "process_mount_points_namespace" : "4026531841" , "process_network_namespace" : "4026531840" , "process_hash_sha256" : "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa" , "process_hash_sha1" : "bab62b22ddb568b245ebc0132200a5e2ddd8577c" , "process_hash_md5" : "ecdb9cd1468ff7151564b334b73161f5" , "process_file_size_bytes" : "35336" , "process_folder_path" : "/usr/bin/" , "container_id" : "" , "process_container_id" : "" }, "thread_details" : { "thread_id" : "2067" , "thread_self_exec_id" : "10" , "thread_exit_state" : "0" } }

Done.

