NVIDIA Docs Hub Homepage  NVIDIA Networking  Networking Solutions  RDG for DPF Zero Trust (DPF-ZT) with HBN and Argus DPU Services

On This Page

RDG for DPF Zero Trust (DPF-ZT) with HBN and Argus DPU Services

Created on Sep 09, 2025

Scope

This Reference Deployment Guide (RDG) provides comprehensive instructions for deploying the NVIDIA DOCA Platform Framework (DPF) on high-performance, bare-metal infrastructure in Zero-Trust mode. The guide focuses on setting up an accelerated Host-Based Networking (HBN) and DOCA Argus services on NVIDIA® BlueField®-3 DPUs to deliver secure, isolated, and hardware-accelerated environments.

The guide is intended for experienced system administrators, systems engineers, and solution architects who build highly secure bare-metal environments with Host-Based Networking enabled using NVIDIA BlueField DPUs for acceleration, isolation, and infrastructure offload.

Note

  • This reference implementation, as the name implies, is a specific, opinionated deployment example designed to address the use case described above.

  • Although other approaches may exist for implementing similar solutions, this document provides a detailed guide for this specific method.

Abbreviations and Acronyms

Term

Definition

Term

Definition

BFB

BlueField Bootstream

NFS

Network File System

BGP

Border Gateway Protocol

OOB

Out-of-Band

DOCA

Data Center Infrastructure-on-a-Chip Architecture

PF

Physical Function

DPF

DOCA Platform Framework

RDG

Reference Deployment Guide

DPU

Data Processing Unit

RDMA

Remote Direct Memory Access

HBN

Host Based Networking

RoCE

RDMA over Converged Ethernet

IPAM

IP Address Management

SFC

Service Function Chaining

K8S

Kubernetes

SR-IOV

Single Root Input/Output Virtualization

KVM

Kernel-based Virtual Machine

VLAN

Virtual LAN (Local Area Network)

MAAS

Metal as a Service

VNI

Virtual Network Interface

MTU

Maximum Transmission Unit

VRF

Virtual Router/Forwarder

NGC

NVIDIA GPU Cloud

ZT

Zero Trust

Introduction

The NVIDIA BlueField-3 Data Processing Unit (DPU) is a 400 Gb/s infrastructure compute platform designed for line-rate processing of software-defined networking, storage, and cybersecurity workloads. It combines powerful compute resources, high-speed networking, and advanced programmability to deliver hardware-accelerated, software-defined solutions for modern data centers.

NVIDIA DOCA unleashes the full potential of the BlueField platform by enabling rapid development of applications and services that offload, accelerate, and isolate data center workloads.

One such service is Host-Based Networking (HBN) - a DOCA-enabled solution that allows network architects to design networks based on Layer 3 (L3) protocols. HBN enables routing on the server side by using BlueField as a BGP router. It encapsulates key networking functions in a containerized service pod, deployed directly on the BlueField’s Arm cores.

Another service is the DOCA Argus Service provides Workload Threat Detection is a novel approach for container threat detection in AI workloads and microservices, utilizing a Bluefield DPU to perform live machine introspection at the hardware level. This approach analyzes specific snippets of volatile memory to provide real-time visibility into container activity and behavior at the network, host, and application levels.

The state of container node images is continuously monitored in real-time, checking for deviations from their secure, compliant versions and configurations to detect and stop runtime attacks. These insights also include the ability to identify attacks targeting network facing applications/services.

The Argus service provides events and data on any object on the OS (host/VM) without any configuration needed and without any active part from the user or the host.

Examples what Argus service provides:

  • Any new processes with its PID, name, attributes, and status.
  • Reverse shells with process and network connection details such as source & destination IP and number of transferred bytes.
  • SHA256 hash of running executable and loaded libraries.

However, deploying and managing DPUs and their associated DOCA services, especially at scale, presents operational challenges. Without a robust provisioning and orchestration system, tasks such as lifecycle management, service deployment, and network configuration for service function chaining (SFC) can quickly become complex and error prone. This is where the DOCA Platform Framework (DPF) comes into play.

DPF automates the full DPU lifecycle, streamlines the deployment of DOCA services, and simplifies advanced network configurations. With DPF, services such as HBN can be deployed seamlessly, allowing for efficient offloading and intelligent routing of traffic through the DPU data plane.

By leveraging DPF, users can scale and automate DPU management across Bare Metal, Virtual, and Kubernetes customer environments - optimizing performance while simplifying operations.

DPF supports multiple deployment models. This guide focuses on the Zero Trust bare-metal deployment model. In this scenario:

  • The DPU is managed through its Baseboard Management Controller (BMC)
  • All management traffic occurs over the DPU's out-of-band (OOB) network
  • The host is considered as an untrusted entity towards the data center network. The DPU acts as a barrier between the host and the network.
  • The host sees the DPU as a standard NIC, with no access to the internal DPU management plane (Zero Trust Mode)

This Reference Deployment Guide (RDG) provides a step-by-step example for installing DPF in Zero-Trust mode and HBN. It also includes practical demonstrations of performance optimization, validated using standard RDMA and TCP workloads.

As part of the reference implementation, open-source components outside the scope of DPF (e.g., MAAS, pfSense, Kubespray) are used to simulate a realistic customer deployment environment. The guide includes the full end-to-end deployment process, including:

  • Infrastructure provisioning
  • DPF deployment
  • DPU provisioning (redfish)
  • Service configuration and deployment
  • Service chaining.

References

    Solution Architecture

    Key Components and Technologies

    • NVIDIA BlueField® Data Processing Unit (DPU)

      The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.

    • NVIDIA DOCA Software Framework

      NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.

    • NVIDIA ConnectX SmartNICs

      10/25/40/50/100/200 and 400G Ethernet Network Adapters

      The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.

      NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.

    • NVIDIA LinkX Cables

      The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.

    • NVIDIA Spectrum Ethernet Switches

      Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.

      Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.

      NVIDIA combines the benefits of NVIDIA Spectrum switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.

    • NVIDIA Cumulus Linux

      NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.

    • Kubernetes

      Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.

    • Kubespray

      Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:

      • A highly available cluster
      • Composable attributes
      • Support for most popular Linux distributions

    Solution Design

    Solution Logical Design

    The logical design includes the following components:

    • 1 x Hypervisor node (KVM-based) with ConnectX-7:

      • 1 x Firewall VM
      • 1 x Jump Node VM
      • 1 x MaaS VM
      • 3 x K8s Master VMs running all K8s management components
    • 2 x Worker nodes (PCI Gen5), each with a 1 x BlueField-3 NIC
    • Single High-Speed (HS) switch
    • 1 Gb Host Management network

    image-2025-8-3_17-32-39-version-1-modificationdate-1754231559690-api-v2.png

    HBN service Logical Design

    As part of this RDG, we will:

    • Create two isolated networks on each bare-metal workload server using physical function PF0 and PF1

      • Each network connects through the HBN service to a separate VLAN/VNI, on separate VRFs - RED and BLUE
    • Route traffic through the HBN service
    • Assign PFs to each bare-metal workload server as its network interfaces
    • Demonstrate accelerated RDMA and TCP traffic between two workload servers that run on different bare-metal servers within the same network (e.g., RED network)
    • Validate network isolation between bare-metal workload servers connected to different networks ( RED vs BLUE ).

    image-2025-5-20_15-1-47-version-1-modificationdate-1754231260777-api-v2.png

    Firewall Design

    The pfSense firewall in this solution serves a dual purpose:

    • Firewall—provides an isolated environment for the DPF system, ensuring secure operations
    • Router—enables Internet access for the management network

    Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address in the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of the Kubernetes (K8s) cluster and DPF components.
    The following diagram illustrates the firewall design used in this solution:

    image-2025-5-7_10-44-2-1-version-1-modificationdate-1754231265367-api-v2.png

    Software Stack Components

    image-2025-9-9_13-4-33-1-version-1-modificationdate-1757412273997-api-v2.png

    Warning

    Make sure to use the exact same versions for the software stack as described above.

    Bill of Materials

    image-2025-6-3_18-0-8-1-version-1-modificationdate-1754231255247-api-v2.png

    Deployment and Configuration

    Node and Switch Definitions

    These are the definitions and parameters used for deploying the demonstrated fabric:

    Switches Ports Usage

    Hostname

    Rack ID

    Ports

    mgmt-switch

    1

    swp1-3

    hs-switch

    1

    swp1-5

    Hosts

    Rack

    Server Type

    Server Name

    Switch Port

    IP and NICs

    Default Gateway

    Rack1

    Hypervisor Node

    hypervisor

    mgmt-switch: swp1

    hs-switch: swp1

    lab-br (interface eno1): Trusted LAN IP

    mgmt-br (interface eno2): -

    hs-br (interface enp1s0): -

    Trusted LAN GW

    Rack1

    Firewall (Virtual)

    fw

    -

    WAN (lab-br): Trusted LAN IP

    LAN (mgmt-br): 10.0.110.254/24

    OPT1(hs-br): 10.0.123.254/22

    Trusted LAN GW

    Rack1

    Jump Node (Virtual)

    jump

    -

    enp1s0: 10.0.110.253/24

    10.0.110.254

    Rack1

    MaaS (Virtual)

    maas

    -

    enp1s0: 10.0.110.252/24

    10.0.110.254

    Rack1

    Master Node

    (Virtual)

    master1

    -

    enp1s0: 10.0.110.1/24

    10.0.110.254

    Rack1

    Master Node

    (Virtual)

    master2

    -

    enp1s0: 10.0.110.2/24

    10.0.110.254

    Rack1

    Master Node

    (Virtual)

    master3

    -

    enp1s0: 10.0.110.3/24

    10.0.110.254

    Rack1

    Worker Node

    worker1

    mgmt-switch: swp2(DPU OOB)

    hs-switch: swp2-swp3

    dpubmc: 10.0.110.21/24

    ens1f0np0/ens1f1np1: 10.0.120.0/22

    10.0.110.254

    Rack1

    Worker Node

    worker2

    mgmt-switch: swp3(DPU OOB)

    hs-switch: swp4-swp5

    dpubmc: 10.0.110.22/24

    ens1f0np0/ens1f1np1: 10.0.120.0/22

    10.0.110.254

    Wiring

    Hypervisor Node

    image-2025-6-3_11-34-50-version-1-modificationdate-1754231256500-api-v2.png

    Bare Metal Worker Node

    image-2025-6-3_11-35-40-version-1-modificationdate-1754231256227-api-v2.png

    Fabric Configuration

    Updating Cumulus Linux

    As a best practice, make sure to use the latest released Cumulus Linux NOS version.

    For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.

    Configuring the Cumulus Linux Switch

    The SN3700 switch (hs-switch), is configured as follows:

    SN3700 Switch Console

    Copy
    Copied!
                
    
            
    nv set evpn enable on
nv set interface eth0 ip address dhcp
nv set interface eth0 ip vrf mgmt
nv set interface eth0 type eth
nv set interface lo ip address 11.0.0.101/32
nv set interface lo type loopback
nv set interface swp1-5 link state up
nv set interface swp1-5 type swp
nv set interface swp1 ip address 10.0.123.253/22
nv set nve vxlan enable on
nv set router bgp autonomous-system 65001
nv set router bgp enable on
nv set router bgp graceful-restart mode full
nv set router bgp router-id 11.0.0.101
nv set vrf default router bgp address-family ipv4-unicast enable on
nv set vrf default router bgp address-family ipv4-unicast redistribute connected enable on
nv set vrf default router bgp address-family ipv4-unicast redistribute static enable on
nv set vrf default router bgp address-family ipv6-unicast enable on
nv set vrf default router bgp address-family ipv6-unicast redistribute connected enable on
nv set vrf default router bgp address-family l2vpn-evpn enable on
nv set vrf default router bgp enable on
nv set vrf default router bgp neighbor swp2 peer-group hbn
nv set vrf default router bgp neighbor swp2 type unnumbered
nv set vrf default router bgp neighbor swp3 peer-group hbn
nv set vrf default router bgp neighbor swp3 type unnumbered
nv set vrf default router bgp neighbor swp4 peer-group hbn
nv set vrf default router bgp neighbor swp4 type unnumbered
nv set vrf default router bgp neighbor swp5 peer-group hbn
nv set vrf default router bgp neighbor swp5 type unnumbered
nv set vrf default router bgp path-selection multipath aspath-ignore on
nv set vrf default router bgp peer-group hbn address-family l2vpn-evpn enable on
nv set vrf default router bgp peer-group hbn remote-as external
nv set vrf default router static 0.0.0.0/0 address-family ipv4-unicast
nv set vrf default router static 0.0.0.0/0 via 10.0.123.254 type ipv4-address
nv config apply -y

    The SN2201 switch (mgmt-switch) is configured as follows:

    SN2201 Switch Console

    Copy
    Copied!
                
    
            
    nv set interface swp1-3 link state up
nv set interface swp1-3 type swp
nv set interface swp1-3 bridge domain br_default
nv set bridge domain br_default untagged 1
nv config apply
nv config save -y

    Installation and Configuration

    Warning

    Make sure that the BIOS settings on the worker node servers have SR-IOV enabled and that the servers are tuned for maximum performance.

    All worker nodes must have the same PCIe placement for the BlueField-3 NIC and must display the same interface name.

    Make sure that you have DPU BMC and OOB MAC addresses.

    Use this Reference Deployment Guide (RDG) for:

    • Host Configuration,
    • K8s Cluster Deployment and Configuration,
    • DPF Installation

    Firewall VM – Bare Metal Outside Conection (Optional)

    To provide outside connection from Bare Metal Host via High Speed network, open Firefox web browser and go to the pfSense web UI (http://10.0.110.254).

    • System:

      • Routing → Gateways → Add → “Interface”: OPT1, “Address Family”: IPv4, “Name”: switch, “Gateway”: 10.0.123.253 → Click "Save"→ Under "Default Gateway" - "Default gateway IPv4" choose WAN_DHCP → Click "Save"

        image-2025-9-10_16-27-37-version-1-modificationdate-1757511374803-api-v2.png

        Note

        Note that the IP addresses from the Trusted LAN network under "Gateway" and "Monitor IP" are blurred.

        image-2025-9-10_16-30-18-version-1-modificationdate-1757511375597-api-v2.png

    DPU Service Installation

    Change the DPUDeployment, DPUServiceConfig, DPUServiceTemplate and other necessary objects.

    Before deploying the objects under doca-platform/docs/public/user-guides/zero-trust/use-cases/hbndirectory, a few adjustments are required.

    Note

    It is necessary to set several environment variables before running this command.

    $ source manifests/00-env-vars/envvars.env

    1. Change directory to readme.md from where all the commands will be run:

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ cd doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn

    2. Modify the variables in manifests/00-env-vars/envvars.env to fit your environment, then source the file:

      Warning

      Replace the values for the variables in the following file with the values that fit your setup. Specifically, pay attention to DPUCLUSTER_INTERFACE and BMC_ROOT_PASSWORD.

      manifests/00-env-vars/envvars.env

      Copy
      Copied!
                  
      
            
      ## IP Address for the Kubernetes API server of the target cluster on which DPF is installed.
## This should never include a scheme or a port.
## e.g. 10.10.10.10
export TARGETCLUSTER_API_SERVER_HOST=10.0.110.10
 
## Virtual IP used by the load balancer for the DPU Cluster. Must be a reserved IP from the management subnet and not
## allocated by DHCP.
export DPUCLUSTER_VIP=10.0.110.200
 
## Interface on which the DPUCluster load balancer will listen. Should be the management interface of the control plane node.
export DPUCLUSTER_INTERFACE=eno1
 
## IP address to the NFS server used as storage for the BFB.
export NFS_SERVER_IP=10.0.110.253
 
## The DPF REGISTRY is the Helm repository URL where the DPF Operator Chart resides.
## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca
 
## The repository URL for the NVIDIA Helm chart registry.
## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca
 
## IP_RANGE_START and IP_RANGE_END
## These define the IP range for DPU discovery via Redfish/BMC interfaces
## Example: If your DPUs have BMC IPs in range 10.0.110.201-240
## export IP_RANGE_START=10.0.110.201
## export IP_RANGE_END=10.0.110.202
 
## Start of DPUDiscovery IpRange
export IP_RANGE_START=10.0.110.201
 
## End of DPUDiscovery IpRange
export IP_RANGE_END=10.0.110.202
 
# The password used for DPU BMC root login, must be the same for all DPUs
# For more information on how to set the BMC root password refer to BlueField DPU Administrator Quick Start Guide. 
export BMC_ROOT_PASSWORD=<set your BMC_ROOT_PASSWORD>
 
## Serial number of DPUs. If you have more than 2 DPUs, you will need to parameterize the system accordingly and expose
## additional variables.
## All serial numbers must be in lowercase.
 
## Serial number of DPU1
export DPU1_SERIAL=mt2402xz0f7x
 
## Serial number of DPU2
export DPU2_SERIAL=mt2402xz0f80
 
## The DPF TAG is the version of the DPF components which will be deployed in this guide.
export TAG=v25.7.0
 
## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet.
export BFB_URL="https://content.mellanox.com/BlueField/BFBs/Ubuntu22.04/bf-bundle-3.1.0-76_25.07_ubuntu-22.04_prod.bfb"
 
## The repository URL for the Argus container image.
## Usually this is the NVIDIA NGC registry.
export ARGUS_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_argus:1.0.0-doca3.1.0

    3. Export environment variables for the installation:

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ source manifests/00-env-vars/envvars.env

    4. Use the following YAML to define a BFB resource that downloads the Bluefield Bitstream to a shared volume:

      manifests/03.1-dpudeployment-installation-pf/bfb.yaml

      Copy
      Copied!
                  
      
            
      ---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: BFB
metadata:
  name: bf-bundle
  namespace: dpf-operator-system
spec:
  url: $BFB_URL

    5. Run the command to create the BFB:

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ cat manifests/03.1-dpudeployment-installation-pf/bfb.yaml | envsubst |kubectl apply -f -

    6. Change the DPUFlavor using the following YAML:

      Note

      The settings below configure a DPU in Zero Trust mode, which means DPU management will be blocked from the bare-metal host.

      To deploy in DPU mode, comment out the line containing dpuMode:

      # dpuMode: zero-trust

      manifests/04-dpudeployment-installation/hbn-dpuflavor.yaml

      Copy
      Copied!
                  
      
            
      ---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUFlavor
metadata:
  name: dpf-provisioning-hbn
  namespace: dpf-operator-system
spec:
  dpuMode: zero-trust
  bfcfgParameters:
  - UPDATE_ATF_UEFI=yes
  - UPDATE_DPU_OS=yes
  - WITH_NIC_FW_UPDATE=yes
  configFiles:
  - operation: override
    path: /etc/mellanox/mlnx-bf.conf
    permissions: "0644"
    raw: |
      ALLOW_SHARED_RQ="no"
      IPSEC_FULL_OFFLOAD="no"
      ENABLE_ESWITCH_MULTIPORT="yes"
  - operation: override
    path: /etc/mellanox/mlnx-ovs.conf
    permissions: "0644"
    raw: |
      CREATE_OVS_BRIDGES="no"
      OVS_DOCA="yes"
  - operation: override
    path: /etc/mellanox/mlnx-sf.conf
    permissions: "0644"
    raw: ""
  grub:
    kernelParameters:
    - console=hvc0
    - console=ttyAMA0
    - earlycon=pl011,0x13010000
    - fixrttc
    - net.ifnames=0
    - biosdevname=0
    - iommu.passthrough=1
    - cgroup_no_v1=net_prio,net_cls
    - hugepagesz=2048kB
    - hugepages=8072
  nvconfig:
  - device: '*'
    parameters:
    - PF_BAR2_ENABLE=0
    - PER_PF_NUM_SF=1
    - PF_TOTAL_SF=20
    - PF_SF_BAR_SIZE=10
    - NUM_PF_MSIX_VALID=0
    - PF_NUM_PF_MSIX_VALID=1
    - PF_NUM_PF_MSIX=228
    - INTERNAL_CPU_MODEL=1
    - INTERNAL_CPU_OFFLOAD_ENGINE=0
    - SRIOV_EN=1
    - NUM_OF_VFS=46
    - LAG_RESOURCE_ALLOCATION=1
  ovs:
    rawConfigScript: |
      _ovs-vsctl() {
        ovs-vsctl --no-wait --timeout 15 "$@"
      }
 
      _ovs-vsctl set Open_vSwitch . other_config:doca-init=true
      _ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=50000
      _ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
      _ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=true
      _ovs-vsctl set Open_vSwitch . other_config:max-idle=20000
      _ovs-vsctl set Open_vSwitch . other_config:max-revalidator=5000
      _ovs-vsctl --if-exists del-br ovsbr1
      _ovs-vsctl --if-exists del-br ovsbr2
      _ovs-vsctl --may-exist add-br br-sfc
      _ovs-vsctl set bridge br-sfc datapath_type=netdev
      _ovs-vsctl set bridge br-sfc fail_mode=secure
      _ovs-vsctl --may-exist add-port br-sfc p0
      _ovs-vsctl set Interface p0 type=dpdk
      _ovs-vsctl set Interface p0 mtu_request=9216
      _ovs-vsctl set Port p0 external_ids:dpf-type=physical
      _ovs-vsctl --may-exist add-port br-sfc p1
      _ovs-vsctl set Interface p1 type=dpdk
      _ovs-vsctl set Interface p1 mtu_request=9216
      _ovs-vsctl set Port p1 external_ids:dpf-type=physical

    7. Change the dpudeployment.yaml file to reference the DPUFlavor suited for performance:

      manifests/04-dpudeployment-installation/dpudeployment.yaml

      Copy
      Copied!
                  
      
            
      ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUDeployment
metadata:
  name: hbn-only
  namespace: dpf-operator-system
spec:
  dpus:
    bfb: bf-bundle
    flavor: dpf-provisioning-hbn
    nodeEffect:
      noEffect: true
    dpuSets:
    - nameSuffix: "dpuset1"
      nodeSelector:
        matchLabels:
          feature.node.kubernetes.io/dpu-enabled: "true"
  services:
    doca-hbn:
      serviceTemplate: doca-hbn
      serviceConfiguration: doca-hbn
    argus:
      serviceConfiguration: argus
      serviceTemplate: argus 
  serviceChains:
    switches:
      - ports:
        - serviceInterface:
            matchLabels:
              uplink: p0
        - service:
            name: doca-hbn
            interface: p0_if
      - ports:
        - serviceInterface:
            matchLabels:
              uplink: p1
        - service:
            name: doca-hbn
            interface: p1_if
      - ports:
        - serviceInterface:
            matchLabels:
              interface: pf0hpf
        - service:
            interface: pf0hpf_if
            name: doca-hbn
      - ports:
        - serviceInterface:
            matchLabels:
              interface: pf1hpf
        - service:
            interface: pf1hpf_if
            name: doca-hbn

    8. Change the rest of the configuration files.

      As explained in the introduction, these files create service chains that connect two physical functions PF0 and PF1 to the outer fabric through HBN, providing EVPN VXLAN overlay, VNI based isolation, and ECMP redundancy across both DPU uplinks (p0 and p1).

      These are the configuration files

      • HBN DPUServiceConfig and DPUServiceTemplate to deploy HBN workloads to the DPUs.

        manifests/04-dpudeployment-installation/hbn-dpuservicetemplate.yaml

        Copy
        Copied!
                    
        
            
        ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceTemplate
metadata:
  name: doca-hbn
  namespace: dpf-operator-system
spec:
  deploymentServiceName: "doca-hbn"
  helmChart:
    source:
      repoURL: $HELM_REGISTRY_REPO_URL
      version: 1.0.3
      chart: doca-hbn
    values:
      image:
        repository: $HBN_NGC_IMAGE_URL
        tag: 3.1.0-doca3.1.0
      resources:
        memory: 6Gi
        nvidia.com/bf_sf: 4

        manifests/04-dpudeployment-installation/hbn-dpuserviceconfig.yaml

        Copy
        Copied!
                    
        
            
        ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceConfiguration
metadata:
  name: doca-hbn
  namespace: dpf-operator-system
spec:
  deploymentServiceName: "doca-hbn"
  serviceConfiguration:
    serviceDaemonSet:
      annotations:
        k8s.v1.cni.cncf.io/networks: |-
          [
          {"name": "iprequest", "interface": "ip_lo", "cni-args": {"poolNames": ["loopback"], "poolType": "cidrpool"}},
          {"name": "iprequest", "interface": "ip_pf0hpf", "cni-args": {"poolNames": ["pool1"], "poolType": "cidrpool", "allocateDefaultGateway": true}},
          {"name": "iprequest", "interface": "ip_pf1hpf", "cni-args": {"poolNames": ["pool2"], "poolType": "cidrpool", "allocateDefaultGateway": true}}
          ]
    helmChart:
      values:
        configuration:
          perDPUValuesYAML: |
            - hostnamePattern: "*"
              values:
                bgp_peer_group: hbn
                vrf1: RED
                vrf2: BLUE
                l2vni1: 10010
                l2vni2: 10020
                l3vni1: 100001
                l3vni2: 100002
            - hostnamePattern: "dpu-node-${DPU1_SERIAL}*"
              values:
                vlan1: 11
                vlan2: 21
                bgp_autonomous_system: 65101
            - hostnamePattern: "dpu-node-${DPU2_SERIAL}*"
              values:
                vlan1: 12
                vlan2: 22
                bgp_autonomous_system: 65201
          startupYAMLJ2: |
            - header:
                model: bluefield
                nvue-api-version: nvue_v1
                rev-id: 1.0
                version: HBN 2.4.0
            - set:
                bridge:
                  domain:
                    br_default:
                      vlan:
                        {{ config.vlan1 }}:
                          vni:
                            {{ config.l2vni1 }}: {}
                        {{ config.vlan2 }}:
                          vni:
                            {{ config.l2vni2 }}: {}
                evpn:
                  enable: on
                  route-advertise: {}
                interface:
                  lo:
                    ip:
                      address:
                        {{ ipaddresses.ip_lo.ip }}/32: {}
                    type: loopback
                  p0_if,p1_if,pf0hpf_if,pf1hpf_if:
                    type: swp
                    link:
                      mtu: 9000
                  pf0hpf_if:
                    bridge:
                      domain:
                        br_default:
                          access: {{ config.vlan1 }}
                  pf1hpf_if:
                    bridge:
                      domain:
                        br_default:
                          access: {{ config.vlan2 }}
                  vlan{{ config.vlan1 }}:
                    ip:
                      address:
                        {{ ipaddresses.ip_pf0hpf.cidr }}: {}
                      vrf: {{ config.vrf1 }}
                    vlan: {{ config.vlan1 }}
                  vlan{{ config.vlan1 }},{{ config.vlan2 }}:
                    type: svi
                  vlan{{ config.vlan2 }}:
                    ip:
                      address:
                        {{ ipaddresses.ip_pf1hpf.cidr }}: {}
                      vrf: {{ config.vrf2 }}
                    vlan: {{ config.vlan2 }}
                nve:
                  vxlan:
                    arp-nd-suppress: on
                    enable: on
                    source:
                      address: {{ ipaddresses.ip_lo.ip }}
                router:
                  bgp:
                    enable: on
                    graceful-restart:
                      mode: full
                vrf:
                  default:
                    router:
                      bgp:
                        address-family:
                          ipv4-unicast:
                            enable: on
                            redistribute:
                              connected:
                                enable: on
                          l2vpn-evpn:
                            enable: on
                        autonomous-system: {{ config.bgp_autonomous_system }}
                        enable: on
                        neighbor:
                          p0_if:
                            peer-group: {{ config.bgp_peer_group }}
                            type: unnumbered
                          p1_if:
                            peer-group: {{ config.bgp_peer_group }}
                            type: unnumbered
                        path-selection:
                          multipath:
                            aspath-ignore: on
                        peer-group:
                          {{ config.bgp_peer_group }}:
                            address-family:
                              ipv4-unicast:
                                enable: on
                              l2vpn-evpn:
                                enable: on
                            remote-as: external
                        router-id: {{ ipaddresses.ip_lo.ip }}
                  {{ config.vrf1 }}:
                    evpn:
                      enable: on
                      vni:
                        {{ config.l3vni1 }}: {}
                    loopback:
                      ip:
                        address:
                          {{ ipaddresses.ip_lo.ip }}/32: {}
                    router:
                      bgp:
                        address-family:
                          ipv4-unicast:
                            enable: on
                            redistribute:
                              connected:
                                enable: on
                            route-export:
                              to-evpn:
                                enable: on
                        autonomous-system: {{ config.bgp_autonomous_system }}
                        enable: on
                        router-id: {{ ipaddresses.ip_lo.ip }}
                  {{ config.vrf2 }}:
                    evpn:
                      enable: on
                      vni:
                        {{ config.l3vni2 }}: {}
                    loopback:
                      ip:
                        address:
                          {{ ipaddresses.ip_lo.ip }}/32: {}
                    router:
                      bgp:
                        address-family:
                          ipv4-unicast:
                            enable: on
                            redistribute:
                              connected:
                                enable: on
                            route-export:
                              to-evpn:
                                enable: on
                        autonomous-system: {{ config.bgp_autonomous_system }}
                        enable: on
                        router-id: {{ ipaddresses.ip_lo.ip }}
 
  interfaces:
  - name: p0_if
    network: mybrhbn
  - name: p1_if
    network: mybrhbn
  - name: pf0hpf_if
    network: mybrhbn
  - name: pf1hpf_if
    network: mybrhbn

      • Physical Interfaces for physical ports on the DPU.

        manifests/04-dpudeployment-installation/physical-ifaces.yaml

        Copy
        Copied!
                    
        
            
        ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
  name: p0
  namespace: dpf-operator-system
spec:
  template:
    spec:
      template:
        metadata:
          labels:
            uplink: "p0"
        spec:
          interfaceType: physical
          physical:
            interfaceName: p0
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
  name: p1
  namespace: dpf-operator-system
spec:
  template:
    spec:
      template:
        metadata:
          labels:
            uplink: "p1"
        spec:
          interfaceType: physical
          physical:
            interfaceName: p1
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
  name: pf0hpf
  namespace: dpf-operator-system
spec:
  template:
    spec:
      template:
        metadata:
          labels:
            interface: "pf0hpf"
        spec:
          interfaceType: pf
          pf:
            pfID: 0
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
  name: pf1hpf
  namespace: dpf-operator-system
spec:
  template:
    spec:
      template:
        metadata:
          labels:
            interface: "pf1hpf"
        spec:
          interfaceType: pf
          pf:
            pfID: 1

      • DPU Service IPAM objects to set up IP Address Management on the DPUCluster.

        manifests/04-dpudeployment-installation/hbn-ipam.yaml

        Copy
        Copied!
                    
        
            
        ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceIPAM
metadata:
  name: pool1
  namespace: dpf-operator-system
spec:
  ipv4Network:
    network: "10.0.121.0/24"
    gatewayIndex: 2
    prefixSize: 29
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceIPAM
metadata:
  name: pool2
  namespace: dpf-operator-system
spec:
  ipv4Network:
    network: "10.0.122.0/24"
    gatewayIndex: 2
    prefixSize: 29

        manifests/04-dpudeployment-installation/hbn-loopback-ipam.yaml

        Copy
        Copied!
                    
        
            
        ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceIPAM
metadata:
  name: loopback
  namespace: dpf-operator-system
spec:
  ipv4Network:
    network: "11.0.0.0/24"
    prefixSize: 32

        Note

        It is necessary to set several environment variables before running this command.

        $ source export_vars.env

    9. Create the DPUServiceConfiguration.yaml file for the Argus service:

      manifests/03.1-dpudeployment-installation-pf/DPUServiceConfiguration.yaml

      Copy
      Copied!
                  
      
            
      ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceConfiguration
metadata:
  name: argus
  namespace: dpf-operator-system
spec:
  deploymentServiceName: argus
  serviceConfiguration:
    helmChart:
      values:
        config:
          isLocalPath: false
        containerImage: $ARGUS_NGC_IMAGE_URL

    10. Create the DPUServiceTemplate.yaml file for the Argus service:

      manifests/03.1-dpudeployment-installation-pf/DPUServiceTemplate.yaml

      Copy
      Copied!
                  
      
            
      ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceTemplate
metadata:
  name: argus
  namespace: dpf-operator-system
spec:
  deploymentServiceName: argus
  helmChart:
    source:
      chart: doca-argus
      repoURL: $HELM_REGISTRY_REPO_URL
      version: 1.0.0

    11. Apply all of the YAML files mentioned above using the following command:

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ cat manifests/03.1-dpudeployment-installation-pf/hbn-dpuflavor.yaml | envsubst | kubectl apply -f -
$ cat manifests/03.1-dpudeployment-installation-pf/dpudeployment.yaml | envsubst | kubectl apply -f -
$ cat manifests/03.1-dpudeployment-installation-pf/hbn-dpuserviceconfig.yaml | envsubst | kubectl apply -f -
$ cat manifests/03.1-dpudeployment-installation-pf/hbn-dpuservicetemplate.yaml | envsubst | kubectl apply -f -
$ cat manifests/03.1-dpudeployment-installation-pf/physical-ifaces.yaml | envsubst | kubectl apply -f -
$ cat manifests/03.1-dpudeployment-installation-pf/hbn-ipam.yaml | envsubst | kubectl apply -f -
$ cat manifests/03.1-dpudeployment-installation-pf/hbn-loopback-ipam.yaml | envsubst | kubectl apply -f -
$ cat manifests/03.1-dpudeployment-installation-pf/DPUServiceConfiguration.yaml | envsubst | kubectl apply -f -
$ cat manifests/03.1-dpudeployment-installation-pf/DPUServiceTemplate.yaml | envsubst | kubectl apply -f -

    12. Verify the DPUService installation by ensuring that:

      • DPUServices are created and reconciled
      • DPUServiceIPAMs are reconciled
      • DPUServiceInterfaces are reconciled, and

      • DPUServiceChains are reconciled.

        Note

        These verification commands may need to be run multiple times to ensure the conditions are met.

        Jump Node Console

        Copy
        Copied!
                    
        
            
        $ kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices --all
dpuservice.svc.dpu.nvidia.com/doca-hbn-wb5pg condition met
dpuservice.svc.dpu.nvidia.com/flannel condition met
dpuservice.svc.dpu.nvidia.com/multus condition met
dpuservice.svc.dpu.nvidia.com/nvidia-k8s-ipam condition met
dpuservice.svc.dpu.nvidia.com/ovs-cni condition met
dpuservice.svc.dpu.nvidia.com/servicechainset-controller condition met
dpuservice.svc.dpu.nvidia.com/servicechainset-rbac-and-crds condition met
dpuservice.svc.dpu.nvidia.com/sfc-controller condition met
dpuservice.svc.dpu.nvidia.com/sriov-device-plugin condition met
 
$ kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all
dpuserviceipam.svc.dpu.nvidia.com/loopback condition met
dpuserviceipam.svc.dpu.nvidia.com/pool1 condition met
dpuserviceipam.svc.dpu.nvidia.com/pool2 condition met
 
$ kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all
dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-p0-if-vjqn5 condition met
dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-p1-if-nl8rj condition met
dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-pf0hpf-if-kbfj4 condition met
dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-pf1hpf-if-79zsq condition met
dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met
dpuserviceinterface.svc.dpu.nvidia.com/p1 condition met
dpuserviceinterface.svc.dpu.nvidia.com/pf0hpf condition met
dpuserviceinterface.svc.dpu.nvidia.com/pf1hpf condition met
 
$ kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all
dpuservicechain.svc.dpu.nvidia.com/hbn-only-8xrrx condition met

    13. To follow the progress of DPU provisioning, run the following command to check its current phase:

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'"
Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'           
Dpu Node Name:                                       dpu-node-mt2402xz0f7x
    Last Transition Time:  2025-09-09T08:01:09Z
    Type:                  Initialized
    Last Transition Time:  2025-09-09T08:01:10Z
    Type:                  BFBReady
    Last Transition Time:  2025-09-09T08:01:10Z
    Type:                  NodeEffectReady
    Last Transition Time:  2025-09-09T08:01:14Z
    Type:                  InterfaceInitialized
    Last Transition Time:  2025-09-09T08:01:18Z
    Type:                  FWConfigured
    Last Transition Time:  2025-09-09T08:01:19Z
    Type:                  BFBPrepared
    Last Transition Time:  2025-09-09T08:14:59Z
    Type:                  OSInstalled
    Last Transition Time:  2025-09-09T08:18:05Z
    Type:                  Rebooted
  Phase:  Rebooting
  Dpu Node Name:                                       dpu-node-mt2402xz0f80
    Last Transition Time:  2025-09-09T08:01:09Z
    Type:                  Initialized
    Last Transition Time:  2025-09-09T08:01:10Z
    Type:                  BFBReady
    Last Transition Time:  2025-09-09T08:01:10Z
    Type:                  NodeEffectReady
    Last Transition Time:  2025-09-09T08:01:15Z
    Type:                  InterfaceInitialized
    Last Transition Time:  2025-09-09T08:01:18Z
    Type:                  FWConfigured
    Last Transition Time:  2025-09-09T08:01:19Z
    Type:                  BFBPrepared
    Last Transition Time:  2025-09-09T08:14:53Z
    Type:                  OSInstalled
    Last Transition Time:  2025-09-09T08:17:54Z
    Type:                  Rebooted
  Phase:  Rebooting                                           

    14. Wait for the Rebooted stage and then Power Cycle the bare-metal host manual.

      After the DPU is up, run following command for each DPU worker:

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ kubectl annotate dpunodes -n dpf-operator-system dpu-node-mt2402xz0f7x provisioning.dpu.nvidia.com/dpunode-external-reboot-required-
dpunode.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f7x annotated
 
$ kubectl annotate dpunodes -n dpf-operator-system dpu-node-mt2402xz0f80 provisioning.dpu.nvidia.com/dpunode-external-reboot-required-
dpunode.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f80 annotated

      At this point, the DPU workers should be added to the cluster. As they being added to the cluster, the DPUs are provisioned.

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'"
Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'                                                                     setup5-jump: Wed May 21 10:45:44 2025
  Dpu Node Name:                                       dpu-node-mt2402xz0f7x
    Type:       InternalIP
    Type:       Hostname
    Last Transition Time:  2025-09-09T08:01:09Z
    Type:                  Initialized
    Last Transition Time:  2025-09-09T08:01:10Z
    Type:                  BFBReady
    Last Transition Time:  2025-09-09T08:01:10Z
    Type:                  NodeEffectReady
    Last Transition Time:  2025-09-09T08:01:14Z
    Type:                  InterfaceInitialized
    Last Transition Time:  2025-09-09T08:01:18Z
    Type:                  FWConfigured
    Last Transition Time:  2025-09-09T08:01:19Z
    Type:                  BFBPrepared
    Last Transition Time:  2025-09-09T08:14:59Z
    Type:                  OSInstalled
    Last Transition Time:  2025-09-09T08:30:18Z
    Type:                  Rebooted
    Last Transition Time:  2025-09-09T08:30:18Z
    Type:                  DPUClusterReady
    Last Transition Time:  2025-09-09T08:30:19Z
    Type:                  Ready
  Phase:  Ready
  Dpu Node Name:                                       dpu-node-mt2402xz0f80
    Type:       InternalIP
    Type:       Hostname
    Last Transition Time:  2025-09-09T08:01:09Z
    Type:                  Initialized
    Last Transition Time:  2025-09-09T08:01:10Z
    Type:                  BFBReady
    Last Transition Time:  2025-09-09T08:01:10Z
    Type:                  NodeEffectReady
    Last Transition Time:  2025-09-09T08:01:15Z
    Type:                  InterfaceInitialized
    Last Transition Time:  2025-09-09T08:01:18Z
    Type:                  FWConfigured
    Last Transition Time:  2025-09-09T08:01:19Z
    Type:                  BFBPrepared
    Last Transition Time:  2025-09-09T08:14:53Z
    Type:                  OSInstalled
    Last Transition Time:  2025-09-09T08:30:26Z
    Type:                  Rebooted
    Last Transition Time:  2025-09-09T08:30:26Z
    Type:                  DPUClusterReady
    Last Transition Time:  2025-09-09T08:30:26Z
    Type:                  Ready
  Phase:  Ready

    15. Finally, validate that all the different DPU-related objects are now in the Ready state:

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ kubectl get secrets -n dpu-cplane-tenant1 dpu-cplane-tenant1-admin-kubeconfig -o json | jq -r '.data["admin.conf"]' | base64 --decode > /home/depuser/dpu-cluster.config
 
$ echo "alias ki='KUBECONFIG=/home/depuser/dpu-cluster.config kubectl'" >> ~/.bashrc
$ echo 'alias dpfctl="kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl "' >> ~/.bashrc
 
$ dpfctl describe dpudeployments
NAME                                    NAMESPACE            STATUS       REASON    SINCE  MESSAGE
DPFOperatorConfig/dpfoperatorconfig     dpf-operator-system  Ready: True  Success   53m
└─DPUDeployments
  └─DPUDeployment/hbn-only              dpf-operator-system  Ready: True  Success   51s
    ├─DPUServiceChains
    │ └─DPUServiceChain/hbn-only-r5sl9  dpf-operator-system  Ready: True  Success   79m
    ├─DPUServiceInterfaces
    │ └─4 DPUServiceInterfaces...       dpf-operator-system  Ready: True  Success   6m56s  See doca-hbn-p0-if-68chl, doca-hbn-p1-if-dsddt, doca-hbn-pf0hpf-if-sfghw, doca-hbn-pf1hpf-if-rw68k
    ├─DPUSets
    │ └─DPUSet/hbn-only-dpuset1         dpf-operator-system
    │   ├─BFB/bf-bundle                 dpf-operator-system  Ready: True  Ready     85m    File: bf-bundle-3.1.0-76_25.07_ubuntu-22.04_prod.bfb, DOCA: 3.1.0
    │   └─DPUs
    │     └─2 DPUs...                   dpf-operator-system  Ready: True  DPUReady  50m    See dpu-node-mt2402xz0f7x-mt2402xz0f7x, dpu-node-mt2402xz0f80-mt2402xz0f80
    └─Services
      ├─DPUServiceTemplates
      │ └─DPUServiceTemplate/doca-hbn   dpf-operator-system  Ready: True  Success   79m
      └─DPUServices
        └─DPUService/doca-hbn-stptv     dpf-operator-system  Ready: True  Success   55s
 
$ ki get node -A
NAME                                 STATUS   ROLES    AGE     VERSION
dpu-node-mt2402xz0f7x-mt2402xz0f7x   Ready    <none>   6m19s   v1.33.3
dpu-node-mt2402xz0f80-mt2402xz0f80   Ready    <none>   6m16s   v1.33.3
 
$ kubectl get dpu -A
NAMESPACE             NAME                                 READY   PHASE   AGE
dpf-operator-system   dpu-node-mt2402xz0f7x-mt2402xz0f7x   True    Ready   30m
dpf-operator-system   dpu-node-mt2402xz0f80-mt2402xz0f80   True    Ready   30m
 
$ kubectl wait --for=condition=ready --namespace dpf-operator-system dpu --all
dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f7x-mt2402xz0f7x condition met
dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f80-mt2402xz0f80 condition met
 
$ ki get pods -A -o wideNAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dpf-operator-system dpu-cplane-tenant1-doca-hbn-stptv-ds-5jmmb 2/2 Running 0 7m12s 10.244.6.10 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
dpf-operator-system dpu-cplane-tenant1-doca-hbn-stptv-ds-v8s22 2/2 Running 6 (3m23s ago) 7m12s 10.244.8.10 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>
dpf-operator-system dpu-cplane-tenant1-nvidia-k8s-ipam-controller-6cb8f65fc5-7px4n 1/1 Running 0 19h 10.244.6.3 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
dpf-operator-system dpu-cplane-tenant1-nvidia-k8s-ipam-node-ds-2qj4d 1/1 Running 0 55m 10.244.6.5 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
dpf-operator-system dpu-cplane-tenant1-nvidia-k8s-ipam-node-ds-jn8qf 1/1 Running 0 55m 10.244.8.3 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>
dpf-operator-system dpu-cplane-tenant1-ovs-cni-arm64-6lg9q 1/1 Running 1 (86s ago) 55m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>
dpf-operator-system dpu-cplane-tenant1-ovs-cni-arm64-nzgw4 1/1 Running 1 (17m ago) 55m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
dpf-operator-system dpu-cplane-tenant1-sfc-controller-node-ds-m8zgn 1/1 Running 0 55m 10.244.8.4 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>
pf-operator-system dpu-cplane-tenant1-sfc-controller-node-ds-slbb9 1/1 Running 0 55m 10.244.6.2 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
dpf-operator-system kube-flannel-ds-dvkhh 1/1 Running 0 55m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>
dpf-operator-system kube-flannel-ds-w4486 1/1 Running 0 55m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
dpf-operator-system kube-multus-ds-b299s 1/1 Running 0 55m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>
dpf-operator-system kube-multus-ds-gvt87 1/1 Running 0 55m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
dpf-operator-system kube-sriov-device-plugin-96vz5 1/1 Running 0 55m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
dpf-operator-system kube-sriov-device-plugin-jh6tm 1/1 Running 0 55m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>
kube-system coredns-796d84c46b-9k8pm 1/1 Running 0 19h 10.244.6.4 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
kube-system coredns-796d84c46b-tg4bh 1/1 Running 0 19h 10.244.8.2 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>
kube-system kube-proxy-87mbx 1/1 Running 0 55m 10.0.110.211 dpu-node-mt2402xz0f7x-mt2402xz0f7x <none> <none>
kube-system kube-proxy-cdm5l 1/1 Running 0 55m 10.0.110.212 dpu-node-mt2402xz0f80-mt2402xz0f80 <none> <none>

    Congratulations, the DPF system with HBN service has been successfully installed!

    Zero-Trust Mode Checking

    Here's a step-by-step procedure to check the Zero-Trust Mode on your NVIDIA BlueField DPU from the host server, including the installation of the Mellanox Firmware Tools (MFT).

    Note

    Ubuntu 24.04 was installed on the servers.

    1. Navigate to the NVIDIA Downloads Site: Open your web browser and go to the official NVIDIA Mellanox software downloads page.

    2. Select the Latest Version for your OS:

      image-2025-9-9_12-24-17-version-1-modificationdate-1757413576167-api-v2.png

    3. Transfer and Extract MFT Tools on the Worker 1 BareMetal Host.

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# tar -xvzf /tmp/mft-4.33.0-169-x86_64-deb.tgz

    4. Navigate into the Extracted Directory.

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# cd mft-4.33.0-169-x86_64-deb/

    5. Run following commands.

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# apt-get install gcc make dkms
root@worker1:~# ./install.sh

    6. Start MST (Mellanox Software Tools) Service and Identify DPU Device Name.

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# mst start
 
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success
 
root@worker1:~# mst status
 
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded
 
MST devices:
------------
/dev/mst/mt41692_pciconf0        - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:2b:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
                                   Chip revision is: 01

    7. Perform Zero-Trust Checking.

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# mlxprivhost -d 2b:00.0 q
Host configurations
-------------------
level                         : RESTRICTED
 
Port functions status:
-----------------------
disable_rshim                 : TRUE
disable_tracer                : TRUE
disable_port_owner            : TRUE
disable_counter_rd            : TRUE
 
#Expected Zero-Trust Output.

      This is the most definitive confirmation. level : RESTRICTED means the host is in Zero-Trust Mode, and the TRUE flags confirm individual security restrictions are active.

    8. Check Firmware Access with mlxfwmanager:

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# mlxfwmanager -d 2b:00.0 --query
Querying Mellanox devices firmware ...
 
Device #1:
----------
 
  Device Type:      BlueField3
  Part Number:      --
  Description:
  PSID:
  PCI Device Name:  2b:00.0
  Base MAC:         N/A
  Versions:         Current        Available
     FW             --
 
  Status:           Failed to open device

      "Failed to open device" indicates the host is blocked from accessing the DPU for firmware operations, a key aspect of Zero-Trust.

    9. Check Device Configuration with mlxconfig:

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# mlxconfig -d 2b:00.0 q
 
Device #1:
----------
 
Device type:        BlueField3
Name:               900-9D3B6-00CV-A_Ax
Description:        NVIDIA BlueField-3 B3220 P-Series FHHL DPU; 200GbE (default mode) / NDR200 IB; Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Enabled
Device:             2b:00.0
 
Configurations:                                          Next Boot
...
        ALLOW_RD_COUNTERS                           True(1)   # No RO, but restricted by mlxprivhost
...
        PORT_OWNER                                  True(1)   # No RO, but restricted by mlxprivhost
...        
        TRACER_ENABLE                               True(1)   # No RO, but restricted by mlxprivhost

      Most configuration parameters will be prefixed with RO (Read-Only). Parameters related to direct host control, like PORT_OWNER, ALLOW_RD_COUNTERS, TRACER_ENABLE, even if shown as True(1) for the DPU's internal capability, will be unenforcible by the host due to the mlxprivhost restrictions. The widespread RO status shows that the host cannot modify these configurations, reinforcing the DPU's autonomous and secure state. The few parameters without RO are still overridden by the mlxprivhost security policy.

    10. Check Low-Level Hardware Access with ethtool:

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# ethtool -d ens1f0np0
Cannot get register dump: Operation not supported

      This confirms the DPU is preventing deep, low-level hardware access from the host, aligning with Zero-Trust's isolation goals.

    Conclusion

    The command outputs of mlxprivhost, mlxfwmanager, mlxconfig (showing RO flags), and ethtool (showing "Operation not supported"), then your NVIDIA BlueField DPU is indeed operating in Zero-Trust Mode.

    This means the host has significantly restricted privileges and cannot perform sensitive operations on the DPU, ensuring its security and isolation.

    Infrastructure Bandwidth & Latency Validation

    Verify the deployment and confirm that the DPU system achieves link-speed performance and low latency by running various tests:

    1. Iperf TCP—for bandwidth measurements
    2. RDMA—for bandwidth and latency measurements
    3. Network isolation

    Each test is described in detail. At the end of each test, the achieved performance is displayed.

    Note

    Make sure that the servers are tuned for maximum performance (not covered in this document).

    Performance and Isolation Tests

    Now that the test deployment is running, perform bandwidth and latency performance tests between two bare-metal workload servers.

    Note

    Ubuntu 24.04 was installed on the servers.

    1. Before running the tests, check the Gateway address on each HBN pod:

      Jump Node Console

      Copy
      Copied!
                  
      
            
      $ ki get pods -A -o wide
 NAMESPACE             NAME                                                             READY   STATUS    RESTARTS      AGE   IP             NODE                                 NOMINATED NODE   READINESS GATES
dpf-operator-system   dpu-cplane-tenant1-doca-hbn-stptv-ds-5jmmb                       2/2     Running   0             26m   10.244.6.10    dpu-node-mt2402xz0f7x-mt2402xz0f7x   <none>           <none>
...
dpf-operator-system   dpu-cplane-tenant1-doca-hbn-stptv-ds-v8s22                       2/2     Running   0             26m   10.244.8.10    dpu-node-mt2402xz0f80-mt2402xz0f80   <none>           <none>
...
 
$ ki exec -it -n dpf-operator-system dpu-cplane-tenant1-doca-hbn-stptv-ds-5jmmb -- bash
Defaulted container "doca-hbn" out of: doca-hbn, hbn-sidecar, hbn-init (init)
 
root@dpu-cplane-tenant1-doca-hbn-stptv-ds-5jmmb:/tmp# ip a s
...
9: vlan21@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue master BLUE state UP group default qlen 1000
    link/ether 56:66:51:0f:ca:43 brd ff:ff:ff:ff:ff:ff
    inet 10.0.122.2/29 scope global vlan21
       valid_lft forever preferred_lft forever
    inet6 fe80::5466:51ff:fe0f:ca43/64 scope link
       valid_lft forever preferred_lft forever
...
12: vlan11@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue master RED state UP group default qlen 1000
    link/ether 56:66:51:0f:ca:43 brd ff:ff:ff:ff:ff:ff
    inet 10.0.121.2/29 scope global vlan11
       valid_lft forever preferred_lft forever
    inet6 fe80::5466:51ff:fe0f:ca43/64 scope link
       valid_lft forever preferred_lft forever
...
 
$ exit
 
$  ki exec -it -n dpf-operator-system dpu-cplane-tenant1-doca-hbn-stptv-ds-v8s22  -- bash
Defaulted container "doca-hbn" out of: doca-hbn, hbn-sidecar, hbn-init (init)
 
root@dpu-cplane-tenant1-doca-hbn-qldl6-ds-lvjrx:/tmp# ip a s
...
9: vlan22@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue master BLUE state UP group default qlen 1000
    link/ether 3e:7c:73:43:ac:43 brd ff:ff:ff:ff:ff:ff
    inet 10.0.122.10/29 scope global vlan22
       valid_lft forever preferred_lft forever
    inet6 fe80::3c7c:73ff:fe43:ac43/64 scope link
       valid_lft forever preferred_lft forever
...
 
12: vlan12@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue master RED state UP group default qlen 1000
    link/ether 3e:7c:73:43:ac:43 brd ff:ff:ff:ff:ff:ff
    inet 10.0.121.10/29 scope global vlan12
       valid_lft forever preferred_lft forever
    inet6 fe80::3c7c:73ff:fe43:ac43/64 scope link
       valid_lft forever preferred_lft forever
...
 
$ exit

    2. Connect to a first Workload Server console, install iperf, perftest, check DPU Hight Speed Interfaces, set route to ethernet and identify the relevant RDMA device:

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# apt install iperf3
root@worker1:~# apt install perftest
root@worker1:~# ip a s
...
6: ens1f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 58:a2:e1:73:69:e6 brd ff:ff:ff:ff:ff:ff
    altname enp43s0f0np0
7: ens1f1np1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 58:a2:e1:73:69:e7 brd ff:ff:ff:ff:ff:ff
    altname enp43s0f1np1
...
 
root@worker1:~# ip route add 10.0.123.0/22 via 10.0.121.2
 
depuser@worker2:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=5.35 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=5.10 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=117 time=5.15 ms
 
root@worker1:~#  rdma link | grep ens1f0np0
link mlx5_0/1 state DOWN physical_state DISABLED netdev ens1f0np0
 
root@worker1:~#  rdma link | grep ens1f1np1
link mlx5_1/1 state DOWN physical_state DISABLED netdev ens1f1np1

    3. Configure VRF with two interfaces on Ubuntu 24.04 using (ens1f0np0 in VRF red , and ens1f1np1 in VRF blue ) .

      Configuration Overview

      Interface

      IP Address

      Default Gateway

      VRF

      Routing Table

      ens1f0np0

      10.0.121.1/29

      10.0.121.2/29

      red

      1001

      ens1f1np1

      10.0.122.1/29

      10.0.122.2/29

      blue

      1002

      First Pod Console

      Copy
      Copied!
                  
      
            
      # Load VRF module
root@worker1:~# modprobe vrf
root@worker1:~# echo vrf | tee -a /etc/modules
 
# Create VRF devices
root@worker1:~# ip link add vrf-red type vrf table 1001
root@worker1:~# ip link add vrf-blue type vrf table 1002
 
# Bring up VRF devices
root@worker1:~# ip link set dev vrf-red up
root@worker1:~# ip link set dev vrf-blue up
 
# Assign interfaces to VRFs
root@worker1:~# ip link set dev ens1f0np0 master vrf-red
root@worker1:~# ip link set dev ens1f1np1 master vrf-blue
 
# Bring up physical interfaces
root@worker1:~# ip link set dev ens1f0np0 up
root@worker1:~# ip link set dev ens1f1np1 up
 
# Assign IP addresses
root@worker1:~# ip addr add 10.0.121.1/29 dev ens1f0np0
root@worker1:~# ip addr add 10.0.122.1/29 dev ens1f1np1
 
# Set default routes per VRF
root@worker1:~# ip route add table 1001 default via 10.0.121.2 dev ens1f0np0
root@worker1:~# ip route add table 1002 default via 10.0.122.2 dev ens1f1np1

    4. Using another console window , reconnect to the jump node and connect to a second Workload Server .

      From within the servers, install iperf, perftest , check DPU Hight Speed Interfaces, set route to ethernet and identify the relevant RDMA device:

      First Pod Console

      Copy
      Copied!
                  
      
            
      root@worker2:~# apt install iperf3
root@worker2:~# apt install perftest
root@worker2:~# ip a s
...
6: ens1f0np0: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
    link/ether 58:a2:e1:73:6a:58 brd ff:ff:ff:ff:ff:ff
    altname enp43s0f0np0
7: ens1f1np1: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
    link/ether 58:a2:e1:73:6a:59 brd ff:ff:ff:ff:ff:ff
    altname enp43s0f1np1
...
 
root@worker2:~# ip route add 10.0.123.0/22 via 10.0.121.10
 
depuser@worker2:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=5.35 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=5.10 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=117 time=5.15 ms
 
 
root@worker2:~# rdma link | grep ens1f0np0
link mlx5_0/1 state DOWN physical_state DISABLED netdev ens1f0np0
 
root@worker2:~# rdma link | grep ens1f1np1
link mlx5_1/1 state DOWN physical_state DISABLED netdev ens1f1np1

    5. Configure VRF with two interfaces on Ubuntu 24.04 using iproute2: Assign (ens1f0np0 to VRF red , and ens1f1np1 to VRF blue ).

      Configuration Overview

    Interface

    IP Address

    Default Gateway

    VRF

    Routing Table

    ens1f0np0

    10.0.121.9/29

    10.0.121.10/29

    red

    1001

    ens1f1np1

    10.0.122.9/29

    10.0.122.10/29

    blue

    1002

    First Pod Console

    Copy
    Copied!
                
    
            
    # Load VRF module
root@worker2:~# modprobe vrf
root@worker2:~# echo vrf | tee -a /etc/modules
 
# Create VRF devices
root@worker2:~# ip link add vrf-red type vrf table 1001
root@worker2:~# ip link add vrf-blue type vrf table 1002
 
# Bring up VRF devices
root@worker2:~# ip link set dev vrf-red up
root@worker2:~# ip link set dev vrf-blue up
 
# Assign interfaces to VRFs
root@worker2:~# ip link set dev ens1f0np0 master vrf-red
root@worker2:~# ip link set dev ens1f1np1 master vrf-blue
 
# Bring up physical interfaces
root@worker2:~# ip link set dev ens1f0np0 up
root@worker2:~# ip link set dev ens1f1np1 up
 
# Assign IP addresses
root@worker2:~# ip addr add 10.0.121.9/29 dev ens1f0np0
root@worker2:~# ip addr add 10.0.122.9/29 dev ens1f1np1
 
# Set default routes per VRF
root@worker2:~# ip route add table 1001 default via 10.0.121.10 dev ens1f0np0
root@worker2:~# ip route add table 1002 default via 10.0.122.10 dev ens1f1np1

    iPerf TCP Bandwidth Test

    Move back to the first server console.

    1. Start the iperf3 server side:

      First BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# ip vrf exec vrf-red iperf3 -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------

    2. Move to the second server console.

      Start the iperf client side:

      Second BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker2:~#  ip vrf exec vrf-red iperf3 -c 10.0.121.1 -P 16
------------------------------------------------------------
Client connecting to 10.0.121.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  9] local 10.0.121.9 port 48620 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/827)
[ 10] local 10.0.121.9 port 48610 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/881)
[  1] local 10.0.121.9 port 48712 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/608)
[ 14] local 10.0.121.9 port 48728 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/722)
[ 11] local 10.0.121.9 port 48710 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/870)
[  4] local 10.0.121.9 port 48622 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/945)
[  7] local 10.0.121.9 port 48690 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/906)
[ 15] local 10.0.121.9 port 48736 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/689)
[  2] local 10.0.121.9 port 48616 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/796)
[  3] local 10.0.121.9 port 48618 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/940)
[ 12] local 10.0.121.9 port 48706 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/892)
[ 16] local 10.0.121.9 port 48696 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/810)
[  8] local 10.0.121.9 port 48626 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/801)
[  6] local 10.0.121.9 port 48692 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/891)
[  5] local 10.0.121.9 port 48624 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/931)
[ 13] local 10.0.121.9 port 48686 connected with 10.0.121.1 port 5001 (icwnd/mss/irtt=14/1448/903)
[ ID] Interval       Transfer     Bandwidth
[  3] 0.0000-10.0058 sec  14.1 GBytes  12.1 Gbits/sec
[ 13] 0.0000-10.0057 sec  14.2 GBytes  12.2 Gbits/sec
[  7] 0.0000-10.0056 sec  13.4 GBytes  11.5 Gbits/sec
[ 12] 0.0000-10.0057 sec  15.2 GBytes  13.1 Gbits/sec
[  4] 0.0000-10.0058 sec  14.1 GBytes  12.1 Gbits/sec
[ 11] 0.0000-10.0058 sec  15.8 GBytes  13.6 Gbits/sec
[  8] 0.0000-10.0057 sec  13.9 GBytes  11.9 Gbits/sec
[  9] 0.0000-10.0058 sec  13.8 GBytes  11.9 Gbits/sec
[ 15] 0.0000-10.0057 sec  14.3 GBytes  12.3 Gbits/sec
[ 16] 0.0000-10.0058 sec  14.6 GBytes  12.5 Gbits/sec
[  1] 0.0000-10.0057 sec  14.6 GBytes  12.6 Gbits/sec
[  6] 0.0000-10.0058 sec  13.1 GBytes  11.3 Gbits/sec
[ 14] 0.0000-10.0059 sec  13.6 GBytes  11.6 Gbits/sec
[ 10] 0.0000-10.0055 sec  13.5 GBytes  11.6 Gbits/sec
[  2] 0.0000-10.0057 sec  14.0 GBytes  12.0 Gbits/sec
[  5] 0.0000-10.0058 sec  14.6 GBytes  12.6 Gbits/sec
[SUM] 0.0000-10.0010 sec   227 GBytes   195 Gbits/sec

    RoCE Latency Test

    Return to the first server console.

    1. Start the ib_read_lat server side:

      First BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# ip vrf exec vrf-red ib_read_lat -F -n 20000 -d mlx5_0
 
************************************
* Waiting for client to connect... *
************************************

    2. Move to the second server console.

      Start the ib_read_lat client side:

    Second BM Server Console

    Copy
    Copied!
                
    
            
    root@worker2:~# ip vrf exec vrf-red ib_read_lat -F -n 20000 -d mlx5_0 10.0.121.1
 
---------------------------------------------------------------------------------------
                    RDMA_Read Latency Test
 Dual-port       : OFF          Device         : mlx5_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0048 PSN 0x77ae88 OUT 0x10 RKey 0x186ded VAddr 0x005fe0b3e3a000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:00:121:09
 remote address: LID 0000 QPN 0x0048 PSN 0x51948d OUT 0x10 RKey 0x186ded VAddr 0x00577584a67000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:00:121:01
---------------------------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec]
 2       20000          3.98           65.30        4.08               7.89             7.17            31.51                   36.33
---------------------------------------------------------------------------------------

    RoCE Bandwidth Test

    Return to the first server console.

    1. Start the ib_write_bw server side:

      First BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# ip vrf exec vrf-red ib_write_bw -s 1048576 -F -D 30 -q 64 -d mlx5_0
 
************************************
* Waiting for client to connect... *
************************************

    2. Move to the second server console.

      Start the ib_write_bw client side:

      Second BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker2:~# ip vrf exec vrf-red ib_write_bw -s 1048576 -F  -D 30 -q 64 -d mlx5_0 10.0.121.1 --report_gbit
 ---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
Dual-port       : OFF          Device         : mlx5_0
Number of qps   : 64           Transport type : IB
Connection type : RC           Using SRQ      : OFF
PCIe relax order: ON
ibv_wr* API     : ON
TX depth        : 128
CQ Moderation   : 1
Mtu             : 1024[B]
Link type       : Ethernet
GID index       : 3
Max inline data : 0[B]
rdma_cm QPs     : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
…
---------------------------------------------------------------------------------------
#bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
1048576    448865           0.00               235.89             0.028120
---------------------------------------------------------------------------------------

    Network Isolation Test

    Finally, verify that the two servers running on different networks—using virtual functions on PF0 and PF1 can't communicate with each other.

    Connect to the first workload server, with the PF0 network, and try to ping the PF0 on second server , with the PF0 network interface:

    1. Run the ping commands from PF0 to PF0 and PF1 to PF1 on the second server:

      First BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# ip vrf exec vrf-red ping -c 3 10.0.121.9
PING 10.0.121.9 (10.0.121.9) 56(84) bytes of data.
64 bytes from 10.0.121.9: icmp_seq=1 ttl=62 time=0.885 ms
64 bytes from 10.0.121.9: icmp_seq=2 ttl=62 time=0.273 ms
64 bytes from 10.0.121.9: icmp_seq=3 ttl=62 time=0.214 ms
 
root@worker1:~# ip vrf exec vrf-blue ping -c 3 10.0.122.9
PING 10.0.122.9 (10.0.122.9) 56(84) bytes of data.
64 bytes from 10.0.122.9: icmp_seq=1 ttl=62 time=0.911 ms
64 bytes from 10.0.122.9: icmp_seq=2 ttl=62 time=0.278 ms
64 bytes from 10.0.122.9: icmp_seq=3 ttl=62 time=0.257 ms

    2. Run the ping commands from PF0 to PF1 and PF1 to PF0 :

      First BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# ip vrf exec vrf-red ping -c 3 10.0.122.1
PING 10.0.122.1 (10.0.122.1) 56(84) bytes of data.
From 10.0.121.2 icmp_seq=1 Destination Host Unreachable
From 10.0.121.2 icmp_seq=2 Destination Host Unreachable
From 10.0.121.2 icmp_seq=3 Destination Host Unreachable
 
--- 10.0.122.1 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2037ms
 
root@worker1:~# ip vrf exec vrf-red ping -c 3 10.0.122.9
PING 10.0.122.9 (10.0.122.9) 56(84) bytes of data.
^C
--- 10.0.122.9 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2044ms
 
root@worker1:~# ip vrf exec vrf-blue ping -c 3 10.0.121.1
PING 10.0.121.1 (10.0.121.1) 56(84) bytes of data.
From 10.0.122.2 icmp_seq=1 Destination Host Unreachable
From 10.0.122.2 icmp_seq=2 Destination Host Unreachable
From 10.0.122.2 icmp_seq=3 Destination Host Unreachable
 
--- 10.0.121.1 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2033ms
 
root@worker1:~# ip vrf exec vrf-blue ping -c 3 10.0.121.9
PING 10.0.121.9 (10.0.121.9) 56(84) bytes of data.
From 10.0.122.2 icmp_seq=1 Destination Host Unreachable
From 10.0.122.2 icmp_seq=2 Destination Host Unreachable
From 10.0.122.2 icmp_seq=3 Destination Host Unreachable
 
--- 10.0.121.9 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2027ms

    This ping operation should fail due to the network isolation implemented in HBN using different VLANs, VNIs and VRFs.

    Argus Service Verification

    Here's a step-by-step procedure to check the DOCA Argus service on your NVIDIA BlueField DPU.

    Note

    Ubuntu 24.04 was installed on the servers.

    1. Open the first worker server console.

      First BM Server Console

      Copy
      Copied!
                  
      
            
      $ ssh worker1

    2. Add iommu configuration in the /etc/default/grub file:

      First BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# vim /etc/default/grub
 
## Add iommu=pt intel_iommu=on in GRUB_CMDLINE_LINUX_DEFAULT parameter 
 
GRUB_CMDLINE_LINUX_DEFAULT="iommu.passthrough=1 intel_iommu=on"

    3. Reboot the server.

      Second BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# reboot

    4. For test we will run the sleep 100 command.

      Second BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# sleep 100&

    5. C onnect to the first DPU OOB over SSH and change the OOB ubuntu's user password(d efault password is ubuntu).

      Second BM Server Console

      Copy
      Copied!
                  
      
            
      root@worker1:~# ssh ubuntu@10.0.110.211

    6. Run following command to see Argus log events about the sleep 100 process on the worker host.

      Second BM Server Console

      Copy
      Copied!
                  
      
            
      ubuntu@dpu-node-mt2402xz0f7x-mt2402xz0f7x:~$ jq 'select(.activity_data.process_details.process_name == "sleep") | .activity_data' /var/log/doca_argus_activity_report/doca_argus_log_MT2402XZ0F7XMLNXS0D0F0.log -C | less -R
 
 
{
  "name": "process_created",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "process_container_id": ""
  }
}
 
{
  "name": "thread_created",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  },
  "thread_details": {
    "thread_id": "2067",
    "thread_self_exec_id": "10",
    "thread_exit_state": "0"
  }
}
 
{
  "name": "new_file_mapped",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  },
  "process_memory_details": {
    "process_id": "2067",
    "virtual_memory_area_start_address": "103842736050176",
    "virtual_memory_area_end_address": "103842736066560",
    "memory_permissions": "r-x",
    "virtual_memory_area_file_structure": "18393486039071318016",
    "is_main_process_executable": "1",
    "file_path": "/usr/bin/sleep",
    "file_name": "sleep"
  },
 
  "process_attestation_details": {
    "elf_file_inode_number": "14287898",
    "elf_file_name": "sleep",
    "elf_file_path": "/usr/bin/sleep",
    "elf_file_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "elf_file_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "elf_file_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "elf_file_size_bytes": "35336",
    "elf_file_process_executable_state": "1",
    "elf_file_type": "ET_DYN + INTERP segment - Executable file"
  }
}
{
  "name": "foreign_binary_executed",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  },
  "process_memory_details": {
    "process_id": "2067",
    "virtual_memory_area_start_address": "103842736050176",
    "virtual_memory_area_end_address": "103842736066560",
    "memory_permissions": "r-x",
    "virtual_memory_area_file_structure": "18393486039071318016",
    "is_main_process_executable": "1",
    "file_path": "/usr/bin/sleep",
    "file_name": "sleep"
  },
 
 "process_attestation_details": {
    "elf_file_inode_number": "14287898",
    "elf_file_name": "sleep",
    "elf_file_path": "/usr/bin/sleep",
    "elf_file_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "elf_file_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "elf_file_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "elf_file_size_bytes": "35336",
    "elf_file_process_executable_state": "1",
    "elf_file_type": "ET_DYN + INTERP segment - Executable file"
  }
}
{
  "name": "new_file_mapped",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  },
  "process_memory_details": {
    "process_id": "2067",
    "virtual_memory_area_start_address": "132709628227584",
    "virtual_memory_area_end_address": "132709628403712",
    "memory_permissions": "r-x",
    "virtual_memory_area_file_structure": "18393486039071323648",
    "is_main_process_executable": "0",
    "file_path": "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2",
    "file_name": "ld-linux-x86-64.so.2"
 
  },
  "process_attestation_details": {
    "elf_file_inode_number": "14321201",
    "elf_file_name": "ld-linux-x86-64.so.2",
    "elf_file_path": "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2",
    "elf_file_hash_sha256": "4f961aefd1ecbc91b6de5980623aa389ca56e8bfb5f2a1d2a0b94b54b0fde894",
    "elf_file_hash_sha1": "d6878eaa6b21fc4eee9d5e441bbf2df102f850aa",
    "elf_file_hash_md5": "9d4fdd5d382e1212c9f793974ee0f44a",
    "elf_file_size_bytes": "236616",
    "elf_file_process_executable_state": "0",
    "elf_file_type": "ET_DYN - Shared object"
  }
}
{
  "name": "foreign_library_loaded",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  },
  "process_memory_details": {
    "process_id": "2067",
    "virtual_memory_area_start_address": "132709628227584",
    "virtual_memory_area_end_address": "132709628403712",
    "memory_permissions": "r-x",
    "virtual_memory_area_file_structure": "18393486039071323648",
    "is_main_process_executable": "0",
    "file_path": "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2",
    "file_name": "ld-linux-x86-64.so.2"
  },
 
  "process_attestation_details": {
    "elf_file_inode_number": "14321201",
    "elf_file_name": "ld-linux-x86-64.so.2",
    "elf_file_path": "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2",
    "elf_file_hash_sha256": "4f961aefd1ecbc91b6de5980623aa389ca56e8bfb5f2a1d2a0b94b54b0fde894",
    "elf_file_hash_sha1": "d6878eaa6b21fc4eee9d5e441bbf2df102f850aa",
    "elf_file_hash_md5": "9d4fdd5d382e1212c9f793974ee0f44a",
    "elf_file_size_bytes": "236616",
    "elf_file_process_executable_state": "0",
    "elf_file_type": "ET_DYN - Shared object"
  }
}
{
  "name": "new_file_mapped",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  },
  "process_memory_details": {
    "process_id": "2067",
    "virtual_memory_area_start_address": "132709624217600",
    "virtual_memory_area_end_address": "132709625823232",
    "memory_permissions": "r-x",
    "virtual_memory_area_file_structure": "18393486039071319808",
    "is_main_process_executable": "0",
    "file_path": "/usr/lib/x86_64-linux-gnu/libc.so.6",
    "file_name": "libc.so.6"
  },
 
  "process_attestation_details": {
    "elf_file_inode_number": "14321204",
    "elf_file_name": "libc.so.6",
    "elf_file_path": "/usr/lib/x86_64-linux-gnu/libc.so.6",
    "elf_file_hash_sha256": "de259f5276c4a991f78bf87225d6b40e56edbffe0dcbc0ffca36ec7fe30f3f77",
    "elf_file_hash_sha1": "5b02e178d9ded9b8c37a605e7a233687aa45f72f",
    "elf_file_hash_md5": "289071786eab0c1910da49b2b1bfd377",
    "elf_file_size_bytes": "2125328",
    "elf_file_process_executable_state": "0",
    "elf_file_type": "ET_DYN + INTERP segment - Executable file"
  }
}
{
  "name": "foreign_library_loaded",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  },
  "process_memory_details": {
    "process_id": "2067",
    "virtual_memory_area_start_address": "132709624217600",
    "virtual_memory_area_end_address": "132709625823232",
    "memory_permissions": "r-x",
    "virtual_memory_area_file_structure": "18393486039071319808",
    "is_main_process_executable": "0",
    "file_path": "/usr/lib/x86_64-linux-gnu/libc.so.6",
    "file_name": "libc.so.6"
  },
 
  "process_attestation_details": {
    "elf_file_inode_number": "14321204",
    "elf_file_name": "libc.so.6",
    "elf_file_path": "/usr/lib/x86_64-linux-gnu/libc.so.6",
    "elf_file_hash_sha256": "de259f5276c4a991f78bf87225d6b40e56edbffe0dcbc0ffca36ec7fe30f3f77",
    "elf_file_hash_sha1": "5b02e178d9ded9b8c37a605e7a233687aa45f72f",
    "elf_file_hash_md5": "289071786eab0c1910da49b2b1bfd377",
    "elf_file_size_bytes": "2125328",
    "elf_file_process_executable_state": "0",
    "elf_file_type": "ET_DYN + INTERP segment - Executable file"
  }
}
{
  "name": "process_terminated",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  }
}
 
{
  "name": "thread_terminated",
  "process_details": {
    "process_id": "2067",
    "process_name": "sleep",
    "process_self_exec_id": "10",
    "process_parent_process_id": "2055",
    "process_cpu_clock_cycles": "1139964",
    "process_real_group_id": "0",
    "process_real_user_id": "0",
    "process_command_line_arguments": "sleep 100",
    "process_creation_time_nanoseconds": "977145605",
    "process_state": "RUNNING",
    "process_pid_namespace": "4026531836",
    "process_mount_points_namespace": "4026531841",
    "process_network_namespace": "4026531840",
    "process_hash_sha256": "4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
    "process_hash_sha1": "bab62b22ddb568b245ebc0132200a5e2ddd8577c",
    "process_hash_md5": "ecdb9cd1468ff7151564b334b73161f5",
    "process_file_size_bytes": "35336",
    "process_folder_path": "/usr/bin/",
    "container_id": "",
    "process_container_id": ""
  },
  "thread_details": {
    "thread_id": "2067",
    "thread_self_exec_id": "10",
    "thread_exit_state": "0"
  }
}

    Done.

    Authors

    BK-version-2-modificationdate-1697457536297-api-v2.jpg

    Boris Kovalev

    Boris Kovalev has worked for the past several years as a Solutions Architect, focusing on NVIDIA Networking/Mellanox technology, and is responsible for complex machine learning, Big Data and advanced VMware-based cloud research and design. Boris previously spent more than 20 years as a senior consultant and solutions architect at multiple companies, most recently at VMware. He has written multiple reference designs covering VMware, machine learning, Kubernetes, and container solutions which are available at the NVIDIA Documents website.

    NVIDIA, the NVIDIA logo, and BlueField are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. TM

    © 2025 NVIDIA Corporation. All rights reserved.

    Notice

    This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality. NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice. Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete. NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.
    Last updated on Sep 16, 2025.
    content here