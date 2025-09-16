On This Page
- Scope
- Abbreviations and Acronyms
- Introduction
- References
- Solution Architecture
- Deployment and Configuration
- Node and Switch Definitions
- Wiring
- Fabric Configuration
- Installation and Configuration
- DPU Service Installation
- Zero-Trust Mode Checking
- Infrastructure Bandwidth & Latency Validation
- Argus Service Verification
- Authors
RDG for DPF Zero Trust (DPF-ZT) with VPC OVN and Argus DPU services
Created on Sep 15, 2025
Scope
This Reference Deployment Guide (RDG) provides comprehensive instructions for deploying the NVIDIA DOCA Platform Framework (DPF) with the DOCA VPC(Virtual Private Cloud) OVN(Open Virtual Network) and DOCA Argus services on high-performance, bare-metal infrastructure in Zero-Trust mode. It focuses on the setup and use of DPU-based services on NVIDIA® BlueField®-3 DPUs to deliver secure, isolated, and hardware-accelerated environments.
The guide is intended for experienced system administrators, systems engineers, and solution architects who build highly secure bare-metal environments using NVIDIA BlueField DPUs for acceleration, isolation, and infrastructure offload.
This reference implementation, as the name implies, is a specific, opinionated deployment example designed to address the use case described above.
Although other approaches may exist for implementing similar solutions, this document provides a detailed guide for this specific method.
Abbreviations and Acronyms
Term
Definition
Term
Definition
BFB
BlueField Bootstream
NFS
Network File System
DOCA
Data Center Infrastructure-on-a-Chip Architecture
OOB
Out-of-Band
DPF
DOCA Platform Framework
OVN
Open Virtual Network
DPU
Data Processing Unit
PF
Physical Function
K8S
Kubernetes
RDG
Reference Deployment Guide
KVM
Kernel-based Virtual Machine
RDMA
Remote Direct Memory Access
MAAS
Metal as a Service
RoCE
RDMA over Converged Ethernet
MTU
Maximum Transmission Unit
VPC
Virtual Private Cloud
NGC
NVIDIA GPU Cloud
ZT
Zero Trust
Introduction
The NVIDIA BlueField-3 Data Processing Unit (DPU) is a 400 Gb/s infrastructure compute platform designed for line-rate processing of software-defined networking, storage, and cybersecurity workloads. It combines powerful compute resources, high-speed networking, and advanced programmability to deliver hardware-accelerated, software-defined solutions for modern data centers.
NVIDIA DOCA unleashes the full potential of the BlueField platform by enabling rapid development of applications and services that offload, accelerate, and isolate data center workloads.
One such service is the DOCA VPC OVN Service provides accelerated VPC networking functionality for the DPF. Built on top of OVN, this service enables network isolation, virtualization, and advanced SDN capabilities directly on NVIDIA DPUs.
Key Features:
- Multi-tenant Network Isolation: Create isolated VPCs for different tenants with guaranteed network separation.
- Virtual Network Management: Support the creation of virtual networks with DHCP and custom IP addressing.
- External Connectivity: Configurable external routing with NAT/masquerading capabilities.
- Hardware Acceleration: Leverages DPU hardware acceleration for high-performance networking.
- Flexible Topology: Support for complex network topologies with inter-network routing controls.
- Kubernetes Integration: Native Kubernetes resources for declarative VPC management.
Another service is the DOCA Argus Service provides Workload Threat Detection is a novel approach for container threat detection in AI workloads and microservices, utilizing a Bluefield DPU to perform live machine introspection at the hardware level. This approach analyzes specific snippets of volatile memory to provide real-time visibility into container activity and behavior at the network, host, and application levels.
The state of container node images is continuously monitored in real-time, checking for deviations from their secure, compliant versions and configurations to detect and stop runtime attacks. These insights also include the ability to identify attacks targeting network facing applications/services.
The Argus service provides events and data on any object on the OS (host/VM) without any configuration needed and without any active part from the user or the host.
Examples what Argus service provides:
- Any new processes with its PID, name, attributes, and status.
- Reverse shells with process and network connection details such as source & destination IP and number of transferred bytes.
- SHA256 hash of running executable and loaded libraries.
However, deploying and managing DPUs, especially at scale, presents operational challenges. Without a robust provisioning and orchestration system, tasks such as lifecycle management, service deployment, and network configuration for service function chaining (SFC) can quickly become complex and error prone. This is where the DOCA Platform Framework (DPF) comes into play.
DPF automates the full DPU lifecycle, and simplifies advanced network configurations. With DPF, services can be deployed seamlessly, allowing for efficient offloading and intelligent routing of traffic through the DPU data plane.
By leveraging DPF, users can scale and automate DPU management across Bare Metal, Virtual, and Kubernetes customer environments - optimizing performance while simplifying operations.
DPF supports multiple deployment models. This guide focuses on the Zero Trust bare-metal deployment model. In this scenario:
- The DPU is managed through its Baseboard Management Controller (BMC)
- All management traffic occurs over the DPU's out-of-band (OOB) network
- The host is considered as an untrusted entity towards the data center network. The DPU acts as a barrier between the host and the network.
- The host sees the DPU as a standard NIC, with no access to the internal DPU management plane (Zero Trust Mode)
This Reference Deployment Guide (RDG) provides a step-by-step example for installing DPF in Zero-Trust mode. It also includes practical demonstrations of performance optimization, validated using standard RDMA and TCP workloads.
As part of the reference implementation, open-source components outside the scope of DPF (e.g., MAAS, pfSense, Kubespray) are used to simulate a realistic customer deployment environment. The guide includes the full end-to-end deployment process, including:
- Infrastructure provisioning
- DPF deployment
- DPU provisioning (redfish)
- Service configuration and deployment
- Service chaining.
References
- NVIDIA BlueField DPU
- NVIDIA DOCA
- NVIDIA DPF Release Notes
- NVIDIA DPF GitHub Repository
- NVIDIA DPF System Overview
- NVIDIA Ethernet Switching
- NVIDIA Cumulus Linux
- What is K8s?
- Kubespray
Solution Architecture
Key Components and Technologies
NVIDIA BlueField® Data Processing Unit (DPU)
The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.
NVIDIA DOCA Software Framework
NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.
10/25/40/50/100/200 and 400G Ethernet Network Adapters
The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.
NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.
The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.
NVIDIA Spectrum Ethernet Switches
Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.
Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.
NVIDIA combines the benefits of NVIDIA Spectrum™ switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.
NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.
Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.
Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:
- A highly available cluster
- Composable attributes
- Support for most popular Linux distributions
Solution Design
Solution Logical Design
The logical design includes the following components:
1 x Hypervisor node (KVM-based) with ConnectX-7:
- 1 x Firewall VM
- 1 x Jump Node VM
- 1 x MaaS VM
- 3 x K8s Master VMs running all K8s management components
- 4 x Worker nodes (PCI Gen5), each with a 1 x BlueField-3 NIC
- Single High-Speed (HS) switch
- 1 Gb Host Management network
VPC service Logical Design
As part of this RDG, we will:
We will deploy VPC OVN over a simple bridged network, using a single highspeed uplink on each worker node
- Create two isolated VPCs on each pair bare-metal workload server (Worker1/2, Worker3/4) using a virtual function VF
- Each network connects through the VPC OVN service on separate VPCs - RED and BLUE
- Route traffic through the VPC OVN service
- Assign VF to each bare-metal workload server as its network interfaces
- Demonstrate accelerated RDMA and TCP traffic between two workload servers that run on different bare-metal servers within the same VPC network (e.g., RED network)
- Validate network isolation between bare-metal workload servers connected to different VPC networks ( RED vs BLUE ).
Firewall Design
The pfSense firewall in this solution serves a dual purpose:
- Firewall—provides an isolated environment for the DPF system, ensuring secure operations
- Router—enables Internet access for the management network
Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address in the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of the Kubernetes (K8s) cluster and DPF components.
The following diagram illustrates the firewall design used in this solution:
Software Stack Components
Make sure to use the exact same versions for the software stack as described above.
Bill of Materials
Deployment and Configuration
Node and Switch Definitions
These are the definitions and parameters used for deploying the demonstrated fabric:
Switches Ports Usage
Hostname
Rack ID
Ports
1
swp1-5
1
swp1-5
Hosts
Rack
Server Type
Server Name
Switch Port
IP and NICs
Default Gateway
Rack1
Hypervisor Node
mgmt-switch:
hs-switch:
lab-br (interface eno1): Trusted LAN IP
mgmt-br (interface eno2): -
hs-br (interface enp1s0): -
Trusted LAN GW
Rack1
Firewall (Virtual)
-
WAN (lab-br): Trusted LAN IP
LAN (mgmt-br): 10.0.110.254/24
OPT1(hs-br): 10.0.123.254/22
Trusted LAN GW
Rack1
Jump Node (Virtual)
-
enp1s0: 10.0.110.253/24
10.0.110.254
Rack1
MaaS (Virtual)
-
enp1s0: 10.0.110.252/24
10.0.110.254
Rack1
Master Node
(Virtual)
-
enp1s0: 10.0.110.1/24
10.0.110.254
Rack1
Master Node
(Virtual)
-
enp1s0: 10.0.110.2/24
10.0.110.254
Rack1
Master Node
(Virtual)
-
enp1s0: 10.0.110.3/24
10.0.110.254
Rack1
Worker Node
mgmt-switch:
hs-switch:
dpubmc: 10.0.110.21/24
ens1f0v2: DHCP
10.0.110.254
10.0.123.254
Rack1
Worker Node
mgmt-switch:
hs-switch:
dpubmc: 10.0.110.22/24
ens1f0v2: DHCP
10.0.110.254
10.0.123.254
Rack1
Worker Node
mgmt-switch:
hs-switch:
dpubmc: 10.0.110.23/24
ens1f0v2: DHCP
10.0.110.254
10.0.123.254
Rack1
Worker Node
mgmt-switch:
hs-switch:
dpubmc: 10.0.110.24/24
ens1f0v2: DHCP
10.0.110.254
10.0.123.254
Wiring
Hypervisor Node
Bare Metal Worker Node
Fabric Configuration
Updating Cumulus Linux
As a best practice, make sure to use the latest released Cumulus Linux NOS version.
For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.
Configuring the Cumulus Linux Switch
The SN3700 switch (
hs-switch), is configured as follows:
SN3700 Switch Console
nv set bridge domain br_hs untagged 1
nv set interface swp1-5 bridge domain br_hs
nv set interface swp1-5 link state up
nv set interface swp1-5 type swp
nv config apply -y
nv config save -y
The SN2201 switch (
mgmt-switch) is configured as follows:
SN2201 Switch Console
nv set interface swp1-5 link state up
nv set interface swp1-5 type swp
nv set interface swp1-5 bridge domain br_default
nv set bridge domain br_default untagged 1
nv config apply
nv config save -y
Installation and Configuration
Make sure that the BIOS settings on the worker node servers have SR-IOV enabled and that the servers are tuned for maximum performance.
All worker nodes must have the same PCIe placement for the BlueField-3 NIC and must display the same interface name.
Make sure that you have DPU BMC and OOB MAC addresses.
Use this Reference Deployment Guide (RDG) for:
- Host Configuration,
- K8s Cluster Deployment and Configuration,
- DPF Installation
DPU Service Installation
Before deploying the objects under
doca-platform/docs/public/user-guides/vpc_only/directory, a few adjustments are required.
Change directory to readme.md from where all the commands will be run:
Jump Node Console
$ cd doca-platform/docs/public/user-guides/zero-trust/use-cases/vpc/
Modify the variables in
manifests/00-env-vars/envvars.envto fit your environment, then source the file:Warning
Replace the values for the variables in the following file with the values that fit your setup. Specifically, pay attention to
DPUCLUSTER_INTERFACEand
BMC_ROOT_PASSWORD.
manifests/00-env-vars/envvars.env
## IP Address for the Kubernetes API server of the target cluster on which DPF is installed. ## This should never include a scheme or a port. ## e.g. 10.10.10.10 export TARGETCLUSTER_API_SERVER_HOST=10.0.110.10 ## Virtual IP used by the load balancer for the DPU Cluster. Must be a reserved IP from the management subnet and not ## allocated by DHCP. export DPUCLUSTER_VIP=10.0.110.200 ## Interface on which the DPUCluster load balancer will listen. Should be the management interface of the control plane node. export DPUCLUSTER_INTERFACE=eno1 ## IP address to the NFS server used as storage for the BFB. export NFS_SERVER_IP=10.0.110.253 ## The DPF REGISTRY is the Helm repository URL where the DPF Operator Chart resides. ## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository. export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca ## The repository URL for the NVIDIA Helm chart registry. ## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository. export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca ## IP_RANGE_START and IP_RANGE_END ## These define the IP range for DPU discovery via Redfish/BMC interfaces ## Example: If your DPUs have BMC IPs in range 10.0.110.201-240 ## export IP_RANGE_START=10.0.110.201 ## export IP_RANGE_END=10.0.110.204 ## Start of DPUDiscovery IpRange export IP_RANGE_START=10.0.110.201 ## End of DPUDiscovery IpRange export IP_RANGE_END=10.0.110.204 # The password used for DPU BMC root login, must be the same for all DPUs # For more information on how to set the BMC root password refer to BlueField DPU Administrator Quick Start Guide. export BMC_ROOT_PASSWORD=<set your BMC_ROOT_PASSWORD> ## Serial number of DPUs. If you have more than 2 DPUs, you will need to parameterize the system accordingly and expose ## additional variables. ## All serial numbers must be in lowercase. ## Serial number of DPU1 export DPU1_SERIAL=mt2402xz0f7x ## Serial number of DPU2 export DPU2_SERIAL=mt2402xz0f80 ## Serial number of DPU3 export DPU3_SERIAL=mt2402xz0f8g ## Serial number of DPU4 export DPU4_SERIAL=mt2402xz0f9n ## IP Address through which ovn-central service (exposed as NodePort) ## is accessible. This can be a VIP or one of the control-plane node IP ## in the host k8s cluster. ## This should never include a scheme or a port. ## e.g. 10.10.10.10 export TARGETCLUSTER_OVN_CENTRAL_IP=${TARGETCLUSTER_API_SERVER_HOST} ## IP address range for VTEPs used by VPC OVN Service on the high speed fabric. ## This is a CIDR in the form e.g. 20.20.0.0/16 export VTEP_CIDR=20.20.0.0/16 ## The Gateway address of the VTEP subnet ## This is an IP in the form e.g. 20.20.0.1 export VTEP_GATEWAY=20.20.0.1 ## IP address range for external network used by VPC OVN Service on the high speed fabric. ## This is a CIDR in the form e.g. 30.30.0.0/16 export EXTERNAL_CIDR=30.30.0.0/16 ## The Gateway address of the external subnet ## This is an IP in the form e.g. 30.30.0.1 export EXTERNAL_GATEWAY=30.30.0.1 ## The DPF TAG is the version of the DPF components which will be deployed in this guide. export TAG=v25.7.0 ## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet. export BFB_URL="https://content.mellanox.com/BlueField/BFBs/Ubuntu22.04/bf-bundle-3.1.0-76_25.07_ubuntu-22.04_prod.bfb" ## The repository URL for the Argus container image. ## Usually this is the NVIDIA NGC registry. export ARGUS_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_argus:1.0.0-doca3.1.0
Export environment variables for the installation:
Jump Node Console
$ source manifests/00-env-vars/envvars.env
Use the following YAML to define a
BFBresource that downloads the Bluefield Bitstream to a shared volume:
manifests/03-bfb-and-flavor/bfb.yaml
--- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: BFB metadata: name: bf-bundle namespace: dpf-operator-system spec: url: $BFB_URL
Run the command to create the
BFB:
Jump Node Console
$ cat manifests/03-bfb-and-flavor/bfb.yaml | envsubst |kubectl apply -f -
Change a DPUFlavor using the following YAML:Note
The settings below configure a DPU in Zero Trust mode, which means DPU management will be blocked from the bare-metal host.
To deploy in DPU mode, comment out the line containing
dpuMode:
# dpuMode: zero-trust
manifests/03-bfb-and-flavor/dpuflavor.yaml
--- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUFlavor metadata: name: vpc-flavor namespace: dpf-operator-system spec: dpuMode: zero-trust bfcfgParameters: - UPDATE_ATF_UEFI=yes - UPDATE_DPU_OS=yes - WITH_NIC_FW_UPDATE=yes configFiles: - operation: override path: /etc/mellanox/mlnx-bf.conf permissions:
"0644"raw: | ALLOW_SHARED_RQ=
"no"IPSEC_FULL_OFFLOAD=
"no"ENABLE_ESWITCH_MULTIPORT=
"yes"- operation: override path: /etc/mellanox/mlnx-ovs.conf permissions:
"0644"raw: | CREATE_OVS_BRIDGES=
"no"OVS_DOCA=
"yes"- operation: override path: /etc/mellanox/mlnx-sf.conf permissions:
"0644"raw:
""grub: kernelParameters: - console=hvc0 - console=ttyAMA0 - earlycon=pl011,
0x13010000- fixrttc - net.ifnames=
0- biosdevname=
0- iommu.passthrough=
1- cgroup_no_v1=net_prio,net_cls - hugepagesz=2048kB - hugepages=
3072nvconfig: - device:
'*'parameters: - PF_BAR2_ENABLE=
0- PER_PF_NUM_SF=
1- PF_TOTAL_SF=
20- PF_SF_BAR_SIZE=
10- NUM_PF_MSIX_VALID=
0- PF_NUM_PF_MSIX_VALID=
1- PF_NUM_PF_MSIX=
228- INTERNAL_CPU_MODEL=
1- INTERNAL_CPU_OFFLOAD_ENGINE=
0- SRIOV_EN=
1- NUM_OF_VFS=
46- LAG_RESOURCE_ALLOCATION=
1ovs: rawConfigScript: | _ovs-vsctl() { ovs-vsctl --no-wait --timeout
15
"$@"} _ovs-vsctl set Open_vSwitch . other_config:doca-init=
true_ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=
50000_ovs-vsctl set Open_vSwitch . other_config:hw-offload=
true_ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=
true_ovs-vsctl set Open_vSwitch . other_config:max-idle=
20000_ovs-vsctl set Open_vSwitch . other_config:max-revalidator=
5000_ovs-vsctl --
if-exists del-br ovsbr1 _ovs-vsctl --
if-exists del-br ovsbr2 _ovs-vsctl --may-exist add-br br-sfc _ovs-vsctl set bridge br-sfc datapath_type=netdev _ovs-vsctl set bridge br-sfc fail_mode=secure _ovs-vsctl --may-exist add-port br-sfc p0 _ovs-vsctl set Interface p0 type=dpdk _ovs-vsctl set Interface p0 mtu_request=
9216_ovs-vsctl set Port p0 external_ids:dpf-type=physical
Apply all of the YAML files mentioned above using the following command:
Jump Node Console
$ cat manifests/03-bfb-and-flavor/dpuflavor.yaml | envsubst | kubectl apply -f -
Change the
dpudeployment.yamlfile to reference the DPUFlavor suited for performance:
manifests/04-vpc-ovn-dpudeployment/dpudeployment.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUDeployment metadata: name: vpc-ovn namespace: dpf-operator-system spec: dpus: bfb: bf-bundle flavor: vpc-flavor nodeEffect: noEffect:
truedpuSets: - nameSuffix:
"dpuset1"nodeSelector: matchLabels: feature.node.kubernetes.io/dpu-enabled:
"true"services: ovn-central: serviceTemplate: ovn-central serviceConfiguration: ovn-central ovn-controller: serviceTemplate: ovn-controller serviceConfiguration: ovn-controller vpc-ovn-controller: serviceTemplate: vpc-ovn-controller serviceConfiguration: vpc-ovn-controller vpc-ovn-node: serviceTemplate: vpc-ovn-node serviceConfiguration: vpc-ovn-node argus: serviceConfiguration: argus serviceTemplate: argus serviceChains: switches: - ports: - serviceInterface: matchLabels: ovn.vpc.dpu.nvidia.com/
interface: p0 - serviceInterface: matchLabels: ovn.vpc.dpu.nvidia.com/
interface: ovn-vtep-patch - serviceInterface: matchLabels: ovn.vpc.dpu.nvidia.com/
interface: ovn-ext-patch
The VPC OVN service consists of the following components:
- ovn-central: Deployed in the target cluster (runs northd, sb_db, nb_db)
- ovn-controller: Deployed in the DPU cluster
- vpc-ovn-controller: VPC controller in the target cluster
vpc-ovn-node: VPC node agent in the DPU cluster
manifests/04-vpc-ovn-dpudeployment/dpuserviceconfig-ovn-central.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: ovn-central namespace: dpf-operator-system spec: deploymentServiceName: ovn-central upgradePolicy: applyNodeEffect:
falseserviceConfiguration: deployInCluster:
truehelmChart: values: exposedPorts: ports: ovnnb:
trueovnsb:
truemanagement: ovnCentral: enabled:
trueaffinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key:
"node-role.kubernetes.io/master"operator: Exists - matchExpressions: - key:
"node-role.kubernetes.io/control-plane"operator: Exists tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule
manifests/04-vpc-ovn-dpudeployment/dpuserviceconfig-ovn-controller.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: ovn-controller namespace: dpf-operator-system spec: deploymentServiceName: ovn-controller upgradePolicy: applyNodeEffect:
falseserviceConfiguration: helmChart: values: dpu: ovnController: enabled:
true
manifests/04-vpc-ovn-dpudeployment/dpuserviceconfig-vpc-ovn-controller.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: vpc-ovn-controller namespace: dpf-operator-system spec: deploymentServiceName: vpc-ovn-controller upgradePolicy: applyNodeEffect:
falseserviceConfiguration: deployInCluster:
truehelmChart: values: host: vpcOVNController: enabled:
trueaffinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key:
"node-role.kubernetes.io/master"operator: Exists - matchExpressions: - key:
"node-role.kubernetes.io/control-plane"operator: Exists
manifests/04-vpc-ovn-dpudeployment/dpuserviceconfig-vpc-ovn-node.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: vpc-ovn-node namespace: dpf-operator-system spec: deploymentServiceName: vpc-ovn-node upgradePolicy: applyNodeEffect:
falseserviceConfiguration: helmChart: values: dpu: vpcOVNNode: enabled:
trueinitContainers: vpcOVNDpuProvisioner: env: ovnSbEndpoint:
"tcp:$TARGETCLUSTER_OVN_CENTRAL_IP:30642"ipRequests: - name:
"vtep"poolName:
"vpc-ippool-vtep"allocateIPWithIndex:
1- name:
"gateway"poolName:
"vpc-ippool-gateway"allocateIPWithIndex:
1
manifests/04-vpc-ovn-dpudeployment/dpuservicetemplate-ovn-central.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: ovn-central namespace: dpf-operator-system spec: deploymentServiceName: ovn-central helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: $TAG chart: ovn-chart
manifests/04-vpc-ovn-dpudeployment/dpuservicetemplate-ovn-controller.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: ovn-controller namespace: dpf-operator-system spec: deploymentServiceName: ovn-controller helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: $TAG chart: ovn-chart
manifests/04-vpc-ovn-dpudeployment/dpuservicetemplate-vpc-ovn-controller.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: vpc-ovn-controller namespace: dpf-operator-system spec: deploymentServiceName: vpc-ovn-controller helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: $TAG chart: dpf-vpc-ovn
manifests/04-vpc-ovn-dpudeployment/dpuservicetemplate-vpc-ovn-node.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: vpc-ovn-node namespace: dpf-operator-system spec: deploymentServiceName: vpc-ovn-node helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: $TAG chart: dpf-vpc-ovn
manifests/04-vpc-ovn-dpudeployment/dpuserviceipam.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: vpc-ippool-vtep namespace: dpf-operator-system spec: metadata: labels: ovn.vpc.dpu.nvidia.com/pool: vpc-ippool-vtep ipv4Subnet: subnet: $VTEP_CIDR gateway: $VTEP_GATEWAY perNodeIPCount:
4--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: vpc-ippool-gateway namespace: dpf-operator-system spec: metadata: labels: ovn.vpc.dpu.nvidia.com/pool: vpc-ippool-gateway ipv4Subnet: subnet: $EXTERNAL_CIDR gateway: $EXTERNAL_GATEWAY perNodeIPCount:
4
manifests/04-vpc-ovn-dpudeployment/dpuserviceinterface.yaml
--- apiVersion:
"svc.dpu.nvidia.com/v1alpha1"kind: DPUServiceInterface metadata: name: p0 namespace: dpf-operator-system spec: template: spec: template: metadata: labels: ovn.vpc.dpu.nvidia.com/
interface:
"p0"spec: interfaceType: physical physical: interfaceName: p0 --- apiVersion:
"svc.dpu.nvidia.com/v1alpha1"kind: DPUServiceInterface metadata: name: ovn-vtep-patch namespace: dpf-operator-system spec: template: spec: template: metadata: labels: ovn.vpc.dpu.nvidia.com/
interface:
"ovn-vtep-patch"spec: interfaceType: ovn ovn: externalBridge: br-ovn-vtep --- apiVersion:
"svc.dpu.nvidia.com/v1alpha1"kind: DPUServiceInterface metadata: name: ovn-ext-patch namespace: dpf-operator-system spec: template: spec: template: metadata: labels: ovn.vpc.dpu.nvidia.com/
interface:
"ovn-ext-patch"spec: interfaceType: ovn ovn: externalBridge: br-ovn-ext
Create the
DPUServiceConfiguration.yamlfile for the Argus service:
manifests/04-vpc-ovn-dpudeployment/DPUServiceConfiguration.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: argus namespace: dpf-operator-system spec: deploymentServiceName: argus serviceConfiguration: helmChart: values: config: isLocalPath:
falsecontainerImage: $ARGUS_NGC_IMAGE_URL
Create the
DPUServiceTemplate.yamlfile for the Argus service:
manifests/04-vpc-ovn-dpudeployment/DPUServiceTemplate.yaml
--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: argus namespace: dpf-operator-system spec: deploymentServiceName: argus helmChart: source: chart: doca-argus repoURL: $HELM_REGISTRY_REPO_URL version:
1.0.
0
Apply all of the YAML files mentioned above using the following command:
Jump Node Console
$ cat manifests/04-vpc-ovn-dpudeployment/* | envsubst | kubectl apply -f -
Verify the DPUService installation by ensuring that:Note
These verification commands may need to be run multiple times to ensure the conditions are met.
Jump Node Console
$ kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices --all dpuservice.svc.dpu.nvidia.com/flannel condition met dpuservice.svc.dpu.nvidia.com/multus condition met dpuservice.svc.dpu.nvidia.com/nvidia-k8s-ipam condition met dpuservice.svc.dpu.nvidia.com/ovs-cni condition met dpuservice.svc.dpu.nvidia.com/servicechainset-controller condition met dpuservice.svc.dpu.nvidia.com/servicechainset-rbac-and-crds condition met dpuservice.svc.dpu.nvidia.com/sfc-controller condition met dpuservice.svc.dpu.nvidia.com/sriov-device-plugin condition met $ kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all dpuserviceipam.svc.dpu.nvidia.com/vpc-ippool-gateway condition met dpuserviceipam.svc.dpu.nvidia.com/vpc-ippool-vtep condition met $ kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all dpuserviceinterface.svc.dpu.nvidia.com/ovn-ext-patch condition met dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met $ kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all dpuservicechain.svc.dpu.nvidia.com/vpc-ovn-bd9v2 condition met
To follow the progress of DPU provisioning, run the following command to check its current phase:
Jump Node Console
$ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'" Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase' setup5-jump: Sun Sep 14 11:35:13 2025 Dpu Node Name: dpu-node-mt2402xz0f7x Type: InternalIP Type: Hostname Last Transition Time: 2025-09-14T08:04:10Z Type: Initialized Last Transition Time: 2025-09-14T08:04:10Z Type: BFBReady Last Transition Time: 2025-09-14T08:04:11Z Type: NodeEffectReady Last Transition Time: 2025-09-14T08:04:16Z Type: InterfaceInitialized Last Transition Time: 2025-09-14T08:04:20Z Type: FWConfigured Last Transition Time: 2025-09-14T08:04:20Z Type: BFBPrepared Last Transition Time: 2025-09-14T08:14:12Z Type: OSInstalled Last Transition Time: 2025-09-14T08:32:54Z Type: Rebooted Dpu Node Name: dpu-node-mt2402xz0f80 Type: InternalIP Type: Hostname Last Transition Time: 2025-09-14T08:04:10Z Type: Initialized Last Transition Time: 2025-09-14T08:04:11Z Type: BFBReady Last Transition Time: 2025-09-14T08:04:12Z Type: NodeEffectReady Last Transition Time: 2025-09-14T08:04:19Z Type: InterfaceInitialized Last Transition Time: 2025-09-14T08:04:20Z Type: FWConfigured Last Transition Time: 2025-09-14T08:04:22Z Type: BFBPrepared Last Transition Time: 2025-09-14T08:14:37Z Type: OSInstalled Last Transition Time: 2025-09-14T08:33:01Z Type: Rebooted ...
Wait for the Rebooted stage and then Power Cycle the bare-metal host manual.
After the DPU is up, run following command for each DPU worker:
Jump Node Console
$ kubectl annotate dpunodes -n dpf-operator-system dpu-node-mt2402xz0f7x provisioning.dpu.nvidia.com/dpunode-external-reboot-required- $ kubectl annotate dpunodes -n dpf-operator-system dpu-node-mt2402xz0f80 provisioning.dpu.nvidia.com/dpunode-external-reboot-required- $ kubectl annotate dpunodes -n dpf-operator-system dpu-node-mt2402xz0f8g provisioning.dpu.nvidia.com/dpunode-external-reboot-required- $ kubectl annotate dpunodes -n dpf-operator-system dpu-node-mt2402xz0f9n provisioning.dpu.nvidia.com/dpunode-external-reboot-required-
At this point, the DPU workers should be added to the cluster. As they being added to the cluster, the DPUs are provisioned.
Jump Node Console
$ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'" Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase' setup5-jump: Sun Sep 14 11:35:13 2025 Dpu Node Name: dpu-node-mt2402xz0f7x Type: InternalIP Type: Hostname Last Transition Time: 2025-09-14T08:04:10Z Type: Initialized Last Transition Time: 2025-09-14T08:04:10Z Type: BFBReady Last Transition Time: 2025-09-14T08:04:11Z Type: NodeEffectReady Last Transition Time: 2025-09-14T08:04:16Z Type: InterfaceInitialized Last Transition Time: 2025-09-14T08:04:20Z Type: FWConfigured Last Transition Time: 2025-09-14T08:04:20Z Type: BFBPrepared Last Transition Time: 2025-09-14T08:14:12Z Type: OSInstalled Last Transition Time: 2025-09-14T08:32:54Z Type: Rebooted Last Transition Time: 2025-09-14T08:32:54Z Type: DPUClusterReady Last Transition Time: 2025-09-14T08:32:54Z Type: Ready Phase: Ready Dpu Node Name: dpu-node-mt2402xz0f80 Type: InternalIP Type: Hostname Last Transition Time: 2025-09-14T08:04:10Z Type: Initialized Last Transition Time: 2025-09-14T08:04:11Z Type: BFBReady Last Transition Time: 2025-09-14T08:04:12Z Type: NodeEffectReady Last Transition Time: 2025-09-14T08:04:19Z Type: InterfaceInitialized Last Transition Time: 2025-09-14T08:04:20Z Type: FWConfigured Last Transition Time: 2025-09-14T08:04:22Z Type: BFBPrepared Last Transition Time: 2025-09-14T08:14:37Z Type: OSInstalled Last Transition Time: 2025-09-14T08:33:01Z Type: Rebooted Last Transition Time: 2025-09-14T08:33:01Z Type: DPUClusterReady Last Transition Time: 2025-09-14T08:33:01Z Type: Ready Phase: Ready ...
Finally, validate that all the different DPU-related objects are now in the Ready state:
Jump Node Console
$ echo 'alias dpfctl="kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl "' >> ~/.bashrc $ dpfctl describe dpudeployments NAME NAMESPACE STATUS REASON SINCE MESSAGE DPFOperatorConfig/dpfoperatorconfig dpf-operator-system Ready: True Success 9m7s └─DPUDeployments └─DPUDeployment/vpc-ovn dpf-operator-system Ready: True Success 0s ├─DPUServiceChains │ └─DPUServiceChain/vpc-ovn-bd9v2 dpf-operator-system Ready: True Success 29m ├─DPUSets │ └─DPUSet/vpc-ovn-dpuset1 dpf-operator-system │ ├─BFB/bf-bundle dpf-operator-system Ready: True Ready 58m File: bf-bundle-3.1.0-76_25.07_ubuntu-22.04_prod.bfb, DOCA: 3.1.0 │ └─DPUs │ └─4 DPUs... dpf-operator-system Ready: True DPUReady 59s See dpu-node-mt2402xz0f7x-mt2402xz0f7x, dpu-node-mt2402xz0f80-mt2402xz0f80, │ dpu-node-mt2402xz0f8g-mt2402xz0f8g, dpu-node-mt2402xz0f9n-mt2402xz0f9n └─Services ├─DPUServiceTemplates │ ├─DPUServiceTemplate/argus dpf-operator-system Ready: True Success 29m │ ├─DPUServiceTemplate/ovn-central dpf-operator-system Ready: True Success 46m │ ├─DPUServiceTemplate/ovn-controller dpf-operator-system Ready: True Success 46m │ ├─DPUServiceTemplate/vpc-ovn-controller dpf-operator-system Ready: True Success 46m │ └─DPUServiceTemplate/vpc-ovn-node dpf-operator-system Ready: True Success 46m └─DPUServices └─5 DPUServices... dpf-operator-system Ready: True Success 29m See argus-92gck, ovn-central-2qkdj, ovn-controller-x69wv, vpc-ovn-controller-lxnbc, vpc-ovn-node-l8dcz $ echo "alias ki='KUBECONFIG=/home/depuser/dpu-cluster.config kubectl'" >> ~/.bashrc $ kubectl get secrets -n dpu-cplane-tenant1 dpu-cplane-tenant1-admin-kubeconfig -o json | jq -r '.data["admin.conf"]' | base64 --decode > /home/depuser/dpu-cluster.config $ ki get node -A NAME STATUS ROLES AGE VERSION dpu-node-mt2402xz0f7x-mt2402xz0f7x Ready <none> 13m v1.33.3 dpu-node-mt2402xz0f80-mt2402xz0f80 Ready <none> 13m v1.33.3 dpu-node-mt2402xz0f8g-mt2402xz0f8g Ready <none> 13m v1.33.3 dpu-node-mt2402xz0f9n-mt2402xz0f9n Ready <none> 13m v1.33.3 $ kubectl get dpu -A NAMESPACE NAME READY PHASE AGE dpf-operator-system dpu-node-mt2402xz0f7x-mt2402xz0f7x True Ready 33m dpf-operator-system dpu-node-mt2402xz0f80-mt2402xz0f80 True Ready 33m dpf-operator-system dpu-node-mt2402xz0f8g-mt2402xz0f8g True Ready 33m dpf-operator-system dpu-node-mt2402xz0f9n-mt2402xz0f9n True Ready 33m $ kubectl wait --for=condition=ready --namespace dpf-operator-system dpu --all dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f7x-mt2402xz0f7x condition met dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f80-mt2402xz0f80 condition met dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f8g-mt2402xz0f8g condition met dpu.provisioning.dpu.nvidia.com/dpu-node-mt2402xz0f9n-mt2402xz0f9n condition met
Deploy IsolationClass
In this step, you will deploy the
IsolationClass resource, which will be used by subsequent user-created
DPUVPC and
DPUVirtualNetwork resources.
Validate the
manifests/05-vpc-resources/ovn-isolation-class.yamlfile.
manifests/05-vpc-resources/ovn-isolation-class.yaml
--- apiVersion: vpc.dpu.nvidia.com/v1alpha1 kind: IsolationClass metadata: name: ovn.vpc.dpu.nvidia.com spec: provisioner: ovn.vpc.dpu.nvidia.com parameters: ovn-nb-endpoint: "tcp:$TARGETCLUSTER_OVN_CENTRAL_IP:30641" ovn-nb-reconnect-time: "5"
Deploy
IsolationClass
Jump Node Console
cat manifests/
05-vpc-resources/* | envsubst | kubectl apply -f -
Deploy test topology
In our deployment we are going to create dual VPC environment ( blue and red ).
Add blue and red labels to relevant
DPU Nodes. Set the values according to your environment.
Jump Node Console
$ ki label node dpu-node-mt2402xz0f7x-mt2402xz0f7x dpu-node-mt2402xz0f80-mt2402xz0f80 vpc.dpu.nvidia.com/tenant=red node/dpu-node-mt2402xz0f7x-mt2402xz0f7x labeled node/dpu-node-mt2402xz0f80-mt2402xz0f80 labeled $ ki label node dpu-node-mt2402xz0f8g-mt2402xz0f8g dpu-node-mt2402xz0f9n-mt2402xz0f9n vpc.dpu.nvidia.com/tenant=blue node/dpu-node-mt2402xz0f8g-mt2402xz0f8g labeled node/dpu-node-mt2402xz0f9n-mt2402xz0f9n labeled
Create the manifests/06-optional-test-traffic/vpc-topology-dual-vpc.yaml to following configuration:
manifests/06-optional-test-traffic/vpc-topology-dual-vpc.yaml
--- apiVersion: v1 kind: Namespace metadata: name: blue --- apiVersion: v1 kind: Namespace metadata: name: red --- apiVersion: vpc.dpu.nvidia.com/v1alpha1 kind: DPUVPC metadata: name: blue-vpc namespace: blue spec: tenant: blue isolationClassName: ovn.vpc.dpu.nvidia.com interNetworkAccess:
truenodeSelector: matchLabels: vpc.dpu.nvidia.com/tenant: blue --- apiVersion: vpc.dpu.nvidia.com/v1alpha1 kind: DPUVirtualNetwork metadata: name: blue-net namespace: blue spec: vpcName: blue-vpc type: Bridged externallyRouted:
truemasquerade:
truebridgedNetwork: ipam: ipv4: dhcp:
truesubnet:
192.178.
0.0/
16--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: blue-vf2 namespace: blue spec: template: spec: nodeSelector: matchLabels: vpc.dpu.nvidia.com/tenant: blue template: spec: interfaceType: vf vf: pfID:
0vfID:
2virtualNetwork: blue-net parentInterfaceRef:
""--- apiVersion: vpc.dpu.nvidia.com/v1alpha1 kind: DPUVPC metadata: name: red-vpc namespace: red spec: tenant: red isolationClassName: ovn.vpc.dpu.nvidia.com interNetworkAccess:
truenodeSelector: matchLabels: vpc.dpu.nvidia.com/tenant: red --- apiVersion: vpc.dpu.nvidia.com/v1alpha1 kind: DPUVirtualNetwork metadata: name: red-net namespace: red spec: vpcName: red-vpc type: Bridged externallyRouted:
truemasquerade:
truebridgedNetwork: ipam: ipv4: dhcp:
truesubnet:
192.178.
0.0/
16--- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceInterface metadata: name: red-vf2 namespace: red spec: template: spec: nodeSelector: matchLabels: vpc.dpu.nvidia.com/tenant: red template: spec: interfaceType: vf vf: pfID:
0vfID:
2virtualNetwork: red-net parentInterfaceRef:
""
Apply the YAML files mentioned above using the following command:
Jump Node Console
$ kubectl apply -f manifests/06-optional-test-traffic/vpc-topology-dual-vpc.yaml
Verify:
Jump Node Console
$ ki get serviceinterface -A NAMESPACE NAME IFTYPE IFNAME NODE READY REASON AGE blue blue-vf27r958 vf dpu-node-mt2402xz0f8g-mt2402xz0f8g True Success 7s blue blue-vf2w9x6n vf dpu-node-mt2402xz0f9n-mt2402xz0f9n True Success 7s dpf-operator-system ovn-ext-patchdfhtf ovn dpu-node-mt2402xz0f7x-mt2402xz0f7x True Success 19m dpf-operator-system ovn-ext-patchmpg54 ovn dpu-node-mt2402xz0f8g-mt2402xz0f8g True Success 19m dpf-operator-system ovn-ext-patchpnmxl ovn dpu-node-mt2402xz0f9n-mt2402xz0f9n True Success 19m dpf-operator-system ovn-ext-patchz9q9l ovn dpu-node-mt2402xz0f80-mt2402xz0f80 True Success 19m dpf-operator-system p04g2pt physical dpu-node-mt2402xz0f9n-mt2402xz0f9n True Success 19m dpf-operator-system p09nzbv physical dpu-node-mt2402xz0f80-mt2402xz0f80 True Success 19m dpf-operator-system p0f8rqq physical dpu-node-mt2402xz0f7x-mt2402xz0f7x True Success 19m dpf-operator-system p0wdsfs physical dpu-node-mt2402xz0f8g-mt2402xz0f8g True Success 19m red red-vf2lqdxj vf dpu-node-mt2402xz0f80-mt2402xz0f80 True Success 5s red red-vf2xs7z5 vf dpu-node-mt2402xz0f7x-mt2402xz0f7x True Success 6s $ kubectl get dpuvpcs.vpc.dpu.nvidia.com -A NAMESPACE NAME READY PHASE AGE blue blue-vpc True Success 40s red red-vpc True Success 39s $ ki get serviceinterface -A -o yaml -n red ... status: conditions: - lastTransitionTime: "2025-09-14T08:42:22Z" message: "" observedGeneration: 1 reason: Success status: "True" type: Ready - lastTransitionTime: "2025-09-14T08:42:22Z" message: "" observedGeneration: 1 reason: Success status: "True" type: ServiceInterfaceReconciled observedGeneration: 1 ... $ ki get serviceinterface -A -o yaml -n blue ... status: conditions: - lastTransitionTime: "2025-09-14T08:42:22Z" message: "" observedGeneration: 1 reason: Success status: "True" type: Ready - lastTransitionTime: "2025-09-14T08:42:22Z" message: "" observedGeneration: 1 reason: Success status: "True" type: ServiceInterfaceReconciled observedGeneration: 1 ...
Zero-Trust Mode Checking
Ubuntu 24.04 was installed on the servers.
Here's a step-by-step procedure to check the Zero-Trust Mode on your NVIDIA BlueField DPU from the host server, including the installation of the Mellanox Firmware Tools (MFT).
Navigate to the NVIDIA Downloads Site: Open your web browser and go to the official NVIDIA Mellanox software downloads page.
Select the Latest Version for your OS:
Transfer and Extract MFT Tools on the Worker 1 BareMetal Host.
First Pod Console
root@worker1:~# tar -xvzf /tmp/mft-4.33.0-169-x86_64-deb.tgz
Navigate into the Extracted Directory.
First Pod Console
root@worker1:~# cd mft-4.33.0-169-x86_64-deb/
Run following commands .
First Pod Console
root@worker1:~# apt-get install gcc make dkms root@worker1:~# ./install.sh
Start MST (Mellanox Software Tools) Service and Identify DPU Device Name.
First Pod Console
root@worker1:~# mst start Starting MST (Mellanox Software Tools) driver set Loading MST PCI module - Success Loading MST PCI configuration module - Success Create devices Unloading MST PCI module (unused) - Success root@worker1:~# mst status MST modules: ------------ MST PCI module is not loaded MST PCI configuration module loaded MST devices: ------------ /dev/mst/mt41692_pciconf0 - PCI configuration cycles access. domain:bus:dev.fn=0000:2b:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1 Chip revision is: 01
Perform Zero-Trust Checking.
First Pod Console
root@worker1:~# mlxprivhost -d 2b:00.0 q Host configurations ------------------- level : RESTRICTED Port functions status: ----------------------- disable_rshim : TRUE disable_tracer : TRUE disable_port_owner : TRUE disable_counter_rd : TRUE #Expected Zero-Trust Output.
This is the most definitive confirmation.
level : RESTRICTEDmeans the host is in Zero-Trust Mode, and the
TRUEflags confirm individual security restrictions are active.
Check Firmware Access with
mlxfwmanager:
First Pod Console
root@worker1:~# mlxfwmanager -d 2b:00.0 --query Querying Mellanox devices firmware ... Device #1: ---------- Device Type: BlueField3 Part Number: -- Description: PSID: PCI Device Name: 2b:00.0 Base MAC: N/A Versions: Current Available FW -- Status: Failed to open device # Expected Zero-Trust Output
"Failed to open device" indicates the host is blocked from accessing the DPU for firmware operations, a key aspect of Zero-Trust.
Check Device Configuration with
mlxconfig:
First Pod Console
root@worker1:~# mlxconfig -d 2b:00.0 q Device #1: ---------- Device type: BlueField3 Name: 900-9D3B6-00CV-A_Ax Description: NVIDIA BlueField-3 B3220 P-Series FHHL DPU; 200GbE (default mode) / NDR200 IB; Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Enabled Device: 2b:00.0 Configurations: Next Boot ... ALLOW_RD_COUNTERS True(1) # No RO, but restricted by mlxprivhost ... PORT_OWNER True(1) # No RO, but restricted by mlxprivhost ... TRACER_ENABLE True(1) # No RO, but restricted by mlxprivhost
Most configuration parameters will be prefixed with
RO(Read-Only). Parameters related to direct host control, like
PORT_OWNER,
ALLOW_RD_COUNTERS,
TRACER_ENABLE, even if shown as
True(1)for the DPU's internal capability, will be unenforcible by the host due to the
mlxprivhostrestrictions. The widespread
ROstatus shows that the host cannot modify these configurations, reinforcing the DPU's autonomous and secure state. The few parameters without
ROare still overridden by the
mlxprivhostsecurity policy.
Check Low-Level Hardware Access with
ethtool:
First Pod Console
root@worker1:~# ethtool -d ens1f0np0 Cannot get register dump: Operation not supported
This confirms the DPU is preventing deep, low-level hardware access from the host, aligning with Zero-Trust's isolation goals.
Conclusion
The command outputs of
mlxprivhost,
mlxfwmanager,
mlxconfig (showing
RO flags), and
ethtool (showing "Operation not supported"), then your NVIDIA BlueField DPU is indeed operating in Zero-Trust Mode.
This means the host has significantly restricted privileges and cannot perform sensitive operations on the DPU, ensuring its security and isolation.
Infrastructure Bandwidth & Latency Validation
Verify the deployment and confirm that the DPU system achieves link-speed performance and low latency by running various tests:
- Iperf TCP—for bandwidth measurements
- RDMA—for bandwidth and latency measurements
- Network isolation
Each test is described in detail. At the end of each test, the achieved performance is displayed.
Make sure that the servers are tuned for maximum performance (not covered in this document).
Performance and Isolation Tests
Now that the test deployment is running, perform bandwidth and latency performance tests between two bare-metal workload servers.
Ubuntu 24.04 was installed on the servers.
Connect to a first Workload Server console, install iperf, perftest, set number of VFs, dhcp client, check VF2 IP address, and identify the relevant RDMA device:
First Pod Console
root@worker1:~# apt install iperf3 root@worker1:~# apt install perftest root@worker1:~# lspci | grep nox 2b:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) 2b:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) root@worker1:~# echo 8 > /sys/bus/pci/devices/0000\:2b:00.0/sriov_numvfs root@worker1:~# apt install isc-dhcp-client root@worker1:~# dhclient -1 -v ens1f0v2 root@worker1:~# ip a s ... 10: ens1f0v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 72:fa:ff:bc:3a:43 brd ff:ff:ff:ff:ff:ff altname enp43s0f0v2 inet 192.178.0.2/16 brd 192.178.255.255 scope global dynamic ens1f0v2 valid_lft 3595sec preferred_lft 3595sec inet6 fe80::70fa:ffff:febc:3a43/64 scope link valid_lft forever preferred_lft forever ... depuser@worker1:~$ ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=5.35 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=5.10 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=117 time=5.15 ms root@worker1:~# rdma link | grep ens1f0v2 link mlx5_4/1 state ACTIVE physical_state LINK_UP netdev ens1f0v2
Using another console window , reconnect to the jump node and connect to a second Workload Server .
From within the servers, install iperf, perftest , check DPU Hight Speed Interfaces, set route to ethernet and identify the relevant RDMA device:
First Pod Console
root@worker2:~# apt install iperf3 root@worker2:~# apt install perftest root@worker2:~# lspci | grep nox 2b:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) 2b:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) root@worker2:~# echo 8 > /sys/bus/pci/devices/0000\:2b:00.0/sriov_numvfs root@worker2:~# apt install isc-dhcp-client root@worker2:~# dhclient -1 -v ens1f0v2 root@worker2:~# ip a s ... 10: ens1f0v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 66:8a:59:ea:40:fa brd ff:ff:ff:ff:ff:ff altname enp43s0f0v2 inet 192.178.0.3/16 brd 192.178.255.255 scope global dynamic ens1f0v2 valid_lft 3596sec preferred_lft 3596sec inet6 fe80::648a:59ff:feea:40fa/64 scope link valid_lft forever preferred_lft forever ... depuser@worker2:~$ ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=5.35 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=5.10 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=117 time=5.15 ms root@worker2:~# rdma link | grep ens1f0v2 link mlx5_4/1 state ACTIVE physical_state LINK_UP netdev ens1f0v2
iPerf TCP Bandwidth Test
Move back to the first server console.
Start the
iperfserver side:
First BM Server Console
root
@worker1:~# iperf3 -s ----------------------------------------------------------- Server listening on
5201(test #
1) -----------------------------------------------------------
Move to the second server console.
Start the
iperfclient side:
Second BM Server Console
root
@worker2:~# iperf3 -c
192.178.
0.3-P
16Connecting to host
192.178.
0.3, port
5201[
5] local
192.178.
0.2port
46348connected to
192.178.
0.3port
5201[
7] local
192.178.
0.2port
46360connected to
192.178.
0.3port
5201[
9] local
192.178.
0.2port
46368connected to
192.178.
0.3port
5201[
11] local
192.178.
0.2port
46372connected to
192.178.
0.3port
5201[
13] local
192.178.
0.2port
46376connected to
192.178.
0.3port
5201[
15] local
192.178.
0.2port
46378connected to
192.178.
0.3port
5201[
17] local
192.178.
0.2port
46382connected to
192.178.
0.3port
5201[
19] local
192.178.
0.2port
46384connected to
192.178.
0.3port
5201[
21] local
192.178.
0.2port
46396connected to
192.178.
0.3port
5201[
23] local
192.178.
0.2port
46402connected to
192.178.
0.3port
5201[
25] local
192.178.
0.2port
46410connected to
192.178.
0.3port
5201[
27] local
192.178.
0.2port
46424connected to
192.178.
0.3port
5201[
29] local
192.178.
0.2port
46438connected to
192.178.
0.3port
5201[
31] local
192.178.
0.2port
46454connected to
192.178.
0.3port
5201[
33] local
192.178.
0.2port
46466connected to
192.178.
0.3port
5201[
35] local
192.178.
0.2port
46472connected to
192.178.
0.3port
5201[ ID] Interval Transfer Bandwidth [
3]
0.0000-
10.0058sec
14.1GBytes
12.1Gbits/sec [
13]
0.0000-
10.0057sec
14.2GBytes
12.2Gbits/sec [
7]
0.0000-
10.0056sec
13.4GBytes
11.5Gbits/sec [
12]
0.0000-
10.0057sec
15.2GBytes
13.1Gbits/sec [
4]
0.0000-
10.0058sec
14.1GBytes
12.1Gbits/sec [
11]
0.0000-
10.0058sec
15.8GBytes
13.6Gbits/sec [
8]
0.0000-
10.0057sec
13.9GBytes
11.9Gbits/sec [
9]
0.0000-
10.0058sec
13.8GBytes
11.9Gbits/sec [
15]
0.0000-
10.0057sec
14.3GBytes
12.3Gbits/sec [
16]
0.0000-
10.0058sec
14.6GBytes
12.5Gbits/sec [
1]
0.0000-
10.0057sec
14.6GBytes
12.6Gbits/sec [
6]
0.0000-
10.0058sec
13.1GBytes
11.3Gbits/sec [
14]
0.0000-
10.0059sec
13.6GBytes
11.6Gbits/sec [
10]
0.0000-
10.0055sec
13.5GBytes
11.6Gbits/sec [
2]
0.0000-
10.0057sec
14.0GBytes
12.0Gbits/sec [
5]
0.0000-
10.0058sec
14.6GBytes
12.6Gbits/sec [SUM]
0.0000-
10.0010sec
227GBytes
195Gbits/sec
RoCE Latency Test
Return to the first server console.
Start the
ib_read_latserver side:
First BM Server Console
root
@worker1:~# ib_read_lat -F -n
20000-d mlx5_4 ************************************ * Waiting
forclient to connect... * ************************************
Move to the second server console.
Start the
ib_read_latclient side:
Second BM Server Console
root
@worker2:~# ib_read_lat -F -n
20000-d mlx5_4
192.178.
0.3--------------------------------------------------------------------------------------- RDMA_Read Latency Test Dual-port : OFF Device : mlx5_4 Number of qps :
1Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth :
1Mtu :
1024[B] Link type : Ethernet GID index :
3Outstand reads :
16rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- local address: LID
0000QPN
0x0108PSN
0xa5a4eOUT
0x10RKey
0x031005VAddr
0x005a7a24ef7000GID:
00:
00:
00:
00:
00:
00:
00:
00:
00:
00:
255:
255:
192:
178:
00:
02remote address: LID
0000QPN
0x0108PSN
0x6caf0OUT
0x10RKey
0x031005VAddr
0x006264a9e00000GID:
00:
00:
00:
00:
00:
00:
00:
00:
00:
00:
255:
255:
192:
178:
00:
03--------------------------------------------------------------------------------------- #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec]
99% percentile[usec]
99.9% percentile[usec]
2
20000
10.51
73.16
13.81
15.35
4.74
29.66
42.23---------------------------------------------------------------------------------------
RoCE Bandwidth Test
Return to the first server console.
Start the
ib_write_bwserver side:
First BM Server Console
root
@worker1:~# ib_write_bw -s
1048576-F -D
30-q
64-d mlx5_4 ************************************ * Waiting
forclient to connect... * ************************************
Move to the second server console.
Start the
ib_write_bwclient side:
Second BM Server Console
root
@worker2:~# ib_write_bw -s
1048576-F -D
30-q
64-d mlx5_4
192.178.
0.3--report_gbit --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_4 Number of qps :
64Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON ibv_wr* API : ON TX depth :
128CQ Moderation :
1Mtu :
1024[B] Link type : Ethernet GID index :
3Max inline data :
0[B] rdma_cm QPs : OFF Data ex. method : Ethernet --------------------------------------------------------------------------------------- … --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
1048576
448865
0.00
235.89
0.028120---------------------------------------------------------------------------------------
Network Isolation Test
Finally, verify that the two servers running on different networks—using virtual functions on the RED VPC and the PBLUE VPC can't communicate with each other.
Run the
Iperf3
test between the Worker1 to the Worker3.
Start the
iperf3server side:
First BM Server Console
root
@worker1:~# iperf3 -s ----------------------------------------------------------- Server listening on
5201(test #
1) -----------------------------------------------------------
- Move to the second server console.
- Start the
iperf3client side:
Second BM Server Console
root
@worker3:~# iperf3 -c
192.178.
0.3 -P
16
iperf3: error - unable to connect to server - server may have stopped running or use a different port, firewall issue, etc.: Connection refused
This ping operation should fail due to the network isolation implemented in HBN using different VLANs, VNIs and VRFs.
Argus Service Verification
Here's a step-by-step procedure to check the DOCA Argus service on your NVIDIA BlueField DPU.
Ubuntu 24.04 was installed on the servers.
Open the first worker server console.
First BM Server Console
$ ssh worker1
Add iommu configuration in the
/etc/default/grubfile:
First BM Server Console
root
@worker1:~# vim /etc/
default/grub ## Add iommu=pt intel_iommu=on in GRUB_CMDLINE_LINUX_DEFAULT parameter GRUB_CMDLINE_LINUX_DEFAULT=
"iommu.passthrough=1 intel_iommu=on"
Reboot the server.
Second BM Server Console
root
@worker1:~# reboot
For test we will run the sleep 100 command.
Second BM Server Console
root
@worker1:~# sleep
100&
C onnect to the first DPU
OOBover SSH and change the
OOBubuntu's user password(d efault password is ubuntu).
Second BM Server Console
root
@worker1:~# ssh ubuntu
@10.0.
110.211
Run following command to see Argus log events about the
sleep 100process on the worker host.
Second BM Server Console
ubuntu
@dpu-node-mt2402xz0f7x-mt2402xz0f7x:~$ jq
'select(.activity_data.process_details.process_name == "sleep") | .activity_data'/var/log/doca_argus_activity_report/doca_argus_log_MT2402XZ0F7XMLNXS0D0F0.log -C | less -R {
"name":
"process_created",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"process_container_id":
""} } {
"name":
"thread_created",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""},
"thread_details": {
"thread_id":
"2067",
"thread_self_exec_id":
"10",
"thread_exit_state":
"0"} } {
"name":
"new_file_mapped",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""},
"process_memory_details": {
"process_id":
"2067",
"virtual_memory_area_start_address":
"103842736050176",
"virtual_memory_area_end_address":
"103842736066560",
"memory_permissions":
"r-x",
"virtual_memory_area_file_structure":
"18393486039071318016",
"is_main_process_executable":
"1",
"file_path":
"/usr/bin/sleep",
"file_name":
"sleep"},
"process_attestation_details": {
"elf_file_inode_number":
"14287898",
"elf_file_name":
"sleep",
"elf_file_path":
"/usr/bin/sleep",
"elf_file_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"elf_file_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"elf_file_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"elf_file_size_bytes":
"35336",
"elf_file_process_executable_state":
"1",
"elf_file_type":
"ET_DYN + INTERP segment - Executable file"} } {
"name":
"foreign_binary_executed",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""},
"process_memory_details": {
"process_id":
"2067",
"virtual_memory_area_start_address":
"103842736050176",
"virtual_memory_area_end_address":
"103842736066560",
"memory_permissions":
"r-x",
"virtual_memory_area_file_structure":
"18393486039071318016",
"is_main_process_executable":
"1",
"file_path":
"/usr/bin/sleep",
"file_name":
"sleep"},
"process_attestation_details": {
"elf_file_inode_number":
"14287898",
"elf_file_name":
"sleep",
"elf_file_path":
"/usr/bin/sleep",
"elf_file_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"elf_file_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"elf_file_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"elf_file_size_bytes":
"35336",
"elf_file_process_executable_state":
"1",
"elf_file_type":
"ET_DYN + INTERP segment - Executable file"} } {
"name":
"new_file_mapped",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""},
"process_memory_details": {
"process_id":
"2067",
"virtual_memory_area_start_address":
"132709628227584",
"virtual_memory_area_end_address":
"132709628403712",
"memory_permissions":
"r-x",
"virtual_memory_area_file_structure":
"18393486039071323648",
"is_main_process_executable":
"0",
"file_path":
"/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2",
"file_name":
"ld-linux-x86-64.so.2"},
"process_attestation_details": {
"elf_file_inode_number":
"14321201",
"elf_file_name":
"ld-linux-x86-64.so.2",
"elf_file_path":
"/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2",
"elf_file_hash_sha256":
"4f961aefd1ecbc91b6de5980623aa389ca56e8bfb5f2a1d2a0b94b54b0fde894",
"elf_file_hash_sha1":
"d6878eaa6b21fc4eee9d5e441bbf2df102f850aa",
"elf_file_hash_md5":
"9d4fdd5d382e1212c9f793974ee0f44a",
"elf_file_size_bytes":
"236616",
"elf_file_process_executable_state":
"0",
"elf_file_type":
"ET_DYN - Shared object"} } {
"name":
"foreign_library_loaded",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""},
"process_memory_details": {
"process_id":
"2067",
"virtual_memory_area_start_address":
"132709628227584",
"virtual_memory_area_end_address":
"132709628403712",
"memory_permissions":
"r-x",
"virtual_memory_area_file_structure":
"18393486039071323648",
"is_main_process_executable":
"0",
"file_path":
"/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2",
"file_name":
"ld-linux-x86-64.so.2"},
"process_attestation_details": {
"elf_file_inode_number":
"14321201",
"elf_file_name":
"ld-linux-x86-64.so.2",
"elf_file_path":
"/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2",
"elf_file_hash_sha256":
"4f961aefd1ecbc91b6de5980623aa389ca56e8bfb5f2a1d2a0b94b54b0fde894",
"elf_file_hash_sha1":
"d6878eaa6b21fc4eee9d5e441bbf2df102f850aa",
"elf_file_hash_md5":
"9d4fdd5d382e1212c9f793974ee0f44a",
"elf_file_size_bytes":
"236616",
"elf_file_process_executable_state":
"0",
"elf_file_type":
"ET_DYN - Shared object"} } {
"name":
"new_file_mapped",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""},
"process_memory_details": {
"process_id":
"2067",
"virtual_memory_area_start_address":
"132709624217600",
"virtual_memory_area_end_address":
"132709625823232",
"memory_permissions":
"r-x",
"virtual_memory_area_file_structure":
"18393486039071319808",
"is_main_process_executable":
"0",
"file_path":
"/usr/lib/x86_64-linux-gnu/libc.so.6",
"file_name":
"libc.so.6"},
"process_attestation_details": {
"elf_file_inode_number":
"14321204",
"elf_file_name":
"libc.so.6",
"elf_file_path":
"/usr/lib/x86_64-linux-gnu/libc.so.6",
"elf_file_hash_sha256":
"de259f5276c4a991f78bf87225d6b40e56edbffe0dcbc0ffca36ec7fe30f3f77",
"elf_file_hash_sha1":
"5b02e178d9ded9b8c37a605e7a233687aa45f72f",
"elf_file_hash_md5":
"289071786eab0c1910da49b2b1bfd377",
"elf_file_size_bytes":
"2125328",
"elf_file_process_executable_state":
"0",
"elf_file_type":
"ET_DYN + INTERP segment - Executable file"} } {
"name":
"foreign_library_loaded",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""},
"process_memory_details": {
"process_id":
"2067",
"virtual_memory_area_start_address":
"132709624217600",
"virtual_memory_area_end_address":
"132709625823232",
"memory_permissions":
"r-x",
"virtual_memory_area_file_structure":
"18393486039071319808",
"is_main_process_executable":
"0",
"file_path":
"/usr/lib/x86_64-linux-gnu/libc.so.6",
"file_name":
"libc.so.6"},
"process_attestation_details": {
"elf_file_inode_number":
"14321204",
"elf_file_name":
"libc.so.6",
"elf_file_path":
"/usr/lib/x86_64-linux-gnu/libc.so.6",
"elf_file_hash_sha256":
"de259f5276c4a991f78bf87225d6b40e56edbffe0dcbc0ffca36ec7fe30f3f77",
"elf_file_hash_sha1":
"5b02e178d9ded9b8c37a605e7a233687aa45f72f",
"elf_file_hash_md5":
"289071786eab0c1910da49b2b1bfd377",
"elf_file_size_bytes":
"2125328",
"elf_file_process_executable_state":
"0",
"elf_file_type":
"ET_DYN + INTERP segment - Executable file"} } {
"name":
"process_terminated",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""} } {
"name":
"thread_terminated",
"process_details": {
"process_id":
"2067",
"process_name":
"sleep",
"process_self_exec_id":
"10",
"process_parent_process_id":
"2055",
"process_cpu_clock_cycles":
"1139964",
"process_real_group_id":
"0",
"process_real_user_id":
"0",
"process_command_line_arguments":
"sleep 100",
"process_creation_time_nanoseconds":
"977145605",
"process_state":
"RUNNING",
"process_pid_namespace":
"4026531836",
"process_mount_points_namespace":
"4026531841",
"process_network_namespace":
"4026531840",
"process_hash_sha256":
"4a193eb6f25eecf27bad523cb8a53ec4d40775eb498f44760b19bfc421cc90aa",
"process_hash_sha1":
"bab62b22ddb568b245ebc0132200a5e2ddd8577c",
"process_hash_md5":
"ecdb9cd1468ff7151564b334b73161f5",
"process_file_size_bytes":
"35336",
"process_folder_path":
"/usr/bin/",
"container_id":
"",
"process_container_id":
""},
"thread_details": {
"thread_id":
"2067",
"thread_self_exec_id":
"10",
"thread_exit_state":
"0"} }
Done.
Authors
Boris Kovalev
Boris Kovalev has worked for the past several years as a Solutions Architect, focusing on NVIDIA Networking/Mellanox technology, and is responsible for complex machine learning, Big Data and advanced VMware-based cloud research and design. Boris previously spent more than 20 years as a senior consultant and solutions architect at multiple companies, most recently at VMware. He has written multiple reference designs covering VMware, machine learning, Kubernetes, and container solutions which are available at the NVIDIA Documents website.
NVIDIA, the NVIDIA logo, and BlueField are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. TM
© 2025 NVIDIA Corporation. All rights reserved.
This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality. NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice. Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete. NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.