RDG for DPF Host-Trusted with HBN and SNAP Virtio-FS

Created on January 6, 2026

Scope

This Reference Deployment Guide (RDG) provides detailed instructions for deploying a Kubernetes (K8s) cluster using the DOCA Platform Framework (DPF) in Host-Trusted mode, and utilizing the SNAP DPU Service with Virtio-FS . The guide focuses on setting up an accelerated Host-Based Networking (HBN) service on NVIDIA® BlueField®-3 DPUs to deliver secure, isolated, and hardware-accelerated environments, and utilizing the SNAP VirtIO-FS DPU service which provides a VirtIO-FS CSI to the cluster via the DPU using an external storage target (NFS).

This guide is designed for experienced system administrators, system engineers, and solution architects who seek to deploy high-performance Kubernetes clusters with Host-Based Networking enabled on NVIDIA BlueField DPUs and a VirtIO-FS CSI provided from an external storage target.

Note
  • This reference implementation, as the name implies, is a specific, opiniated deployment example designed to address the use case described above.

  • While other approaches may exist to implement similar solutions, this document provides a detailed guide for this particular method.

Abbreviations and Acronyms

Term

Definition

Term

Definition

BFB

BlueField Bootstream (OS Image)

RDG

Reference Deployment Guide

BGP

Border Gateway Protocol

RDMA

Remote Direct Memory Access

CNI

Container Network Interface

SFC

Service Function Chaining

CSI

Container Storage Interface

SNAP

Storage-Defined Network Accelerated Processing

DOCA

Data Center Infrastructure-on-a-Chip Architecture

SR-IOV

Single Root Input/Output Virtualization

DPF

DOCA Platform Framework

TOR

Top of Rack

DPU

Data Processing Unit

VLAN

Virtual LAN (Local Area Network)

GENEVE

Generic Network Virtualization Encapsulation

VNI

Virtual Network Interface

HBN

Host Based Networking

VRF

Virtual Router/Forwarder

IPAM

IP Address Management

VRR

Virtual Router Redundancy

K8S

Kubernetes

VTEP

Virtual Tunnel End Point

MAAS

Metal as a Service

Introduction

The NVIDIA BlueField-3 Data Processing Unit (DPU) is a 400 Gb/s infrastructure compute platform designed for line-rate processing of software-defined networking, storage, and cybersecurity workloads. It combines powerful compute resources, high-speed networking, and advanced programmability to deliver hardware-accelerated, software-defined solutions for modern data centers.

NVIDIA DOCA unleashes the full potential of the BlueField platform by enabling rapid development of applications and services that offload, accelerate, and isolate data center workloads.

One such service is Host-Based Networking (HBN) - a DOCA-enabled solution that allows network architects to design networks based on Layer 3 (L3) protocols. HBN enables routing on the server side by using BlueField as a BGP router. It encapsulates key networking functions in a containerized service pod, deployed directly on the BlueField’s ARM cores.

Another such service is SNAP, which has both Block Device and File System modes. In this RDG, we will demonstrate its file system mode - Virtio-FS, that provides file system storage provided to the cluster from an external storage target (NFS).

In this solution, the SNAP Virtio-fs service deployed via NVIDIA DOCA Platform Framework (DPF) is composed of multiple functional components packaged into containers, which DPF orchestrates to run together with HBN. DPF simplifies DPU management by providing orchestration through a Kubernetes API. It handles the provisioning and lifecycle management of DPUs, orchestrates specialized DPU services, and automates tasks such as service function chaining (SFC).

This RDG extends the capabilities of the DPF-managed Kubernetes cluster described in the RDG for DPF Host-Trusted with HBN DPU Service (referred to as the "Baseline RDG") by adding the SNAP DPU Service in Virtio-fs mode. It demonstrates performance optimizations, including Jumbo frame implementation, with results validated through an iperf3 TCP test and a standard FIO workload test.

References

Solution Architecture

Key Components and Technologies

  • NVIDIA BlueField® Data Processing Unit (DPU)

    The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.

  • NVIDIA DOCA Software Framework

    NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.

  • NVIDIA ConnectX SmartNICs

    10/25/40/50/100/200 and 400G Ethernet Network Adapters

    The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.

    NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.

  • NVIDIA LinkX Cables

    The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.

  • NVIDIA Spectrum Ethernet Switches

    Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.

    Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.

    NVIDIA combines the benefits of NVIDIA Spectrum switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.

  • NVIDIA Cumulus Linux

    NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.

  • NVIDIA Network Operator

    The NVIDIA Network Operator simplifies the provisioning and management of NVIDIA networking resources in a Kubernetes cluster. The operator automatically installs the required host networking software - bringing together all the needed components to provide high-speed network connectivity. These components include the NVIDIA networking driver, Kubernetes device plugin, CNI plugins, IP address management (IPAM) plugin and others. The NVIDIA Network Operator works in conjunction with the NVIDIA GPU Operator to deliver high-throughput, low-latency networking for scale-out, GPU computing clusters.

  • Kubernetes

    Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.

  • Kubespray

    Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:

    • A highly available cluster
    • Composable attributes
    • Support for most popular Linux distributions

  • RDMA

    RDMA is a technology that allows computers in a network to exchange data without involving the processor, cache or operating system of either computer.

    Like locally based DMA, RDMA improves throughput and performance and frees up compute resources.

Solution Design

Solution Logical Design

The logical design includes the following components:

  • 1 x Hypervisor node (KVM-based) with ConnectX-7

    • 1 x Firewall VM
    • 1 x Jump VM
    • 1 X MaaS VM
    • 3 x K8s Master VMs running all K8s management components
    • 1 x Storage Target VM
  • 2 x Worker nodes (PCI Gen5), each with 1 x BlueField-3 NIC
  • Single High-Speed (HS) switch
  • 1 Gb Host Management network

image-2025-12-15_18-35-29-version-1-modificationdate-1765816530807-api-v2.png

HBN service Logical Design

The HBN+SNAP-VirtioFS services deployment leverages the Service Function Chaining (SFC) capabilities inherent in the DPF system, as described in the Baseline RDG for the HBN DPU Service (refer to section " Infrastructure Latency & Bandwidth Validation" ). The following SFC logical diagram displays the complete flow for all of the services involved in the implemented solution:

image-2026-1-1_15-8-37-version-1-modificationdate-1767272930067-api-v2.png

Volume Emulation Logical Diagram

The following logical diagram demonstrates the main components involved in a volume mount procedure to a workload pod.

In the Host Trusted mode, the hosts runs the SNAP CSI plugin, which performs all necessary actions to make storage resources available to the host. Users can utilize Kubernetes Storage APIs (StorageClass, PVC , PV, VolumeAttachment) to provision and attach storage to the host. Upon creation of PersistentVolumeClaim ( PVC ) object in the host cluster that references a storage class that specifies the SNAP CSI Plugin as its provisioner, the DPF storage subsystem components bring a NFS volume via NFS-kernel client to the required DPU K8s worker node. The DOCA SNAP service then emulates it as a Virtio-fs volume and presents the networked storage as local file system device to the host, which when requested by the kubelet is mounted into the Pod namespace by the SNAP CSI Plugin.

Info

For a complete information about the different components involved in the emulation process and how they work together, refer to: DPF Storage Development Guide - NVIDIA Docs .

VirtioFS_Device_Emulation_Diagram_final-version-1-modificationdate-1766700044213-api-v2.png

Firewall Design

The pfSense firewall in this solution serves two key roles:

  • Firewall – provides an isolated environment for the DPF system, ensuring secure operations
  • Router – enables Internet access for the management network

Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address on the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of both the Kubernetes (K8s) cluster and DPF components.
The following diagram illustrates the firewall design used in this solution:

image-2025-12-31_10-10-21-1-version-1-modificationdate-1767168633103-api-v2.png

Software Stack Components

image-2026-1-1_15-0-29-1-version-1-modificationdate-1767272441693-api-v2.png

Warning

Make sure to use the exact same versions for the software stack as described above.

Bill of Materials

image-2025-12-31_10-34-16-1-version-1-modificationdate-1767170068743-api-v2.png

Deployment and Configuration

Node and Switch Definitions

The following definitions and parameters are used to deploy the demonstrated fabric:

Switches Ports Usage

Hostname

Rack ID

Ports

hs-switch

1

swp1-5

mgmt-switch

1

swp1-3

Hosts

Rack

Server Type

Server Name

Switch Port

IP and NICs

Default Gateway

Rack1

Hypervisor Node

hypervisor

mgmt-switch: swp1

hs-switch: swp5

mgmt-br (interface eno2): -

lab-br (interface eno1): Trusted LAN IP

Trusted LAN GW

Rack1

Worker Node

worker1

mgmt-switch: swp2

hs-switch: swp1-swp2

ens15f0: 10.0.110.21/24

10.0.110.254

Rack1

Worker Node

worker2

mgmt-switch: swp3

hs-switch: swp3-swp4

ens15f0: 10.0.110.22/24

10.0.110.254

Rack1

Firewall (Virtual)

fw

-

LAN (mgmt-br): 10.0.110.254/24

WAN (lab-br): Trusted LAN IP

Trusted LAN GW

Rack1

Jump Node (Virtual)

jump

-

enp1s0: 10.0.110.253/24

10.0.110.254

Rack1

MaaS (Virtual)

maas

-

enp1s0: 10.0.110.252/24

10.0.110.254

Rack1

Storage Target Node (Virtual)

storage-target

-

enp1s0: 10.0.110.30/24

enp5s0np1: 10.0.124.1/24

10.0.110.254

Rack1

Master Node (Virtual)

master1

-

enp1s0: 10.0.110.1/24

10.0.110.254

Rack1

Master Node (Virtual)

master2

-

enp1s0: 10.0.110.2/24

10.0.110.254

Rack1

Master Node (Virtual)

master3

-

enp1s0: 10.0.110.3/24

10.0.110.254

Wiring

Hypervisor Node

image-2025-12-31_10-36-28-1-version-1-modificationdate-1767170200217-api-v2.png

K8s Worker Node

image-2025-5-11_17-20-54-1-version-1-modificationdate-1765789925457-api-v2.png

Fabric Configuration

Updating Cumulus Linux

As a best practice, make sure to use the latest released Cumulus Linux NOS version.

For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.

Configuring the Cumulus Linux Switch

Configure the SN3700 switch (hs-switch) as follows:

Info
  • The following commands configure BGP unnumbered on hs-switch

  • Cumulus Linux enables the BGP equal-cost multipathing (ECMP) option by default

SN3700 Switch Console

Copy
Copied!
            

nv set bridge domain br_default vlan 10 vni 10 nv set evpn state enabled nv set interface lo ipv4 address 11.0.0.101/32 nv set interface lo type loopback nv set interface swp1-5 link state up nv set interface swp1-5 type swp nv set interface swp5 bridge domain br_default access 10 nv set nve vxlan state enabled nv set nve vxlan source address 11.0.0.101 nv set router bgp autonomous-system 65001 nv set router bgp state enabled nv set router bgp graceful-restart mode full nv set router bgp router-id 11.0.0.101 nv set vrf default router bgp address-family ipv4-unicast state enabled nv set vrf default router bgp address-family ipv4-unicast redistribute connected state enabled nv set vrf default router bgp address-family ipv4-unicast redistribute static state enabled nv set vrf default router bgp address-family ipv6-unicast state enabled nv set vrf default router bgp address-family ipv6-unicast redistribute connected state enabled nv set vrf default router bgp address-family l2vpn-evpn state enabled nv set vrf default router bgp state enabled nv set vrf default router bgp neighbor swp1-4 peer-group hbn nv set vrf default router bgp neighbor swp1-4 type unnumbered nv set vrf default router bgp path-selection multipath aspath-ignore enabled nv set vrf default router bgp peer-group hbn remote-as external nv set vrf default router bgp peer-group snap remote-as external nv set vrf default router bgp peer-group snap address-family l2vpn-evpn state enabled nv config apply -y

Configure the SN2201 switch (mgmt-switch) as follows:

SN2201 Switch Console

Copy
Copied!
            

nv set bridge domain br_default untagged 1 nv set interface swp1-3 link state up nv set interface swp1-3 type swp nv set interface swp1-3 bridge domain br_default nv config apply -y

Host Configuration

Warning

Ensure that SR-IOV is enabled in the BIOS settings on the worker node servers, and that the servers are tuned for maximum performance.

Warning

Make sure all worker nodes have the same PCIe placement for the BlueField-3 NIC and that they show the same interface name.

Hypervisor Installation and Configuration

No change from the Baseline RDG (Section "Deployment and Configuration", Subsection "Prepare Infrastructure Servers") regarding Firewall VM, Jump VM, MaaS VM.

Provision Master VMs and Worker Nodes Using MaaS

Proceed with the instructions from the Baseline RDG until you reach the subsection "Deploy Master VMs using Cloud-Init".

Use the following cloud-init script instead of the one in the Baseline RDG to install the necessary software and also configure correct routing to the storage target node:

Master node cloud-init

Copy
Copied!
            

#cloud-config system_info: default_user: name: depuser passwd: "$6$jOKPZPHD9XbG72lJ$evCabLvy1GEZ5OR1Rrece3NhWpZ2CnS0E3fu5P1VcZgcRO37e4es9gmriyh14b8Jx8gmGwHAJxs3ZEjB0s0kn/" lock_passwd: false groups: [adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, video] sudo: ["ALL=(ALL) NOPASSWD:ALL"] shell: /bin/bash ssh_pwauth: True package_upgrade: true runcmd: - apt-get update - apt-get -y install nfs-common - | cat <<'EOF' | tee /etc/netplan/99-static-route.yaml network: version: 2         ethernets: enp1s0: routes: - to: 10.0.124.1 via: 10.0.110.30 EOF - netplan apply

After that proceed exactly as instructed in the Baseline RDG, and in addition to the verification commands mentioned there, run the following command to verify that the static route has been configured correctly:

Master1 Console

Copy
Copied!
            

root@master1:~# ip r default via 10.0.110.254 dev enp1s0 proto static 10.0.110.0/24 dev enp1s0 proto kernel scope link src 10.0.110.1 10.0.124.1 via 10.0.110.30 dev enp1s0 proto static

No changes from the Baseline RDG to the worker nodes provisioning.

Storage Target Configuration

Note
  • The Storage target node is a separate, manually configured node in this RDG.

  • It will be a VM running on the hypervisor, with ConnectX-7 NIC and NVMe SSD disk attached to it as PCIe devices using PCI passthrough.

Suggested specifications:

  • vCPU: 8
  • RAM: 32GB
  • Storage:

    • VirtIO disk of 60GB size
    • NVMe SSD of 1.7TB size
  • Network interface:

    • Bridge device, connected to mgmt-br

Procedure:

  1. Perform a regular Ubuntu 24.04 installation on the Storage target VM.
  2. Create the following Netplan configuration to enable internet connectivity, DNS resolution and set an IP in the storage high-speed subnet :

    Note

    Replace enp1s0 and enp5s0np1 with your interface names.

    Storage Target netplan

    Copy
    Copied!
                

    network:   version: 2   ethernets:     enp1s0:       addresses:       - "10.0.110.30/24"       mtu: 9000       nameservers:         addresses:         - 10.0.110.252         search:         - dpf.rdg.local.domain       routes:       - to: "default"         via: "10.0.110.254"     enp5s0np1:       addresses:       - "10.0.124.1/24"       mtu: 9000

  3. Apply the netplan configuration:

    Storage Target Console

    Copy
    Copied!
                

    sudo netplan apply

  4. Update and upgrade the system:

    Storage Target Console

    Copy
    Copied!
                

    sudo apt update -y sudo apt upgrade -y

  5. Create XFS file system on the NVMe disk and mount it on /srv/nfs directory:

    Note

    Replace /dev/nvme0n1 with your device name.

    Storage Target Console

    Copy
    Copied!
                

    sudo mkfs.xfs /dev/nvme0n1 sudo mkdir -m 777 /srv/nfs/ sudo mount /dev/nvme0n1 /srv/nfs/

  6. Set the mount to be persistent:

    Storage Target Console

    Copy
    Copied!
                

    $ sudo blkid /dev/nvme0n1 /dev/nvme0n1: UUID="b37df0a9-d741-4222-82c9-7a3d66ffc0e1" BLOCK_SIZE="512" TYPE="xfs"   $ echo "/dev/disk/by-uuid/b37df0a9-d741-4222-82c9-7a3d66ffc0e1 /srv/nfs xfs defaults 0 1" | sudo tee -a /etc/fstab

  7. Install and configure an NFS server with the /srv/nfs directory:

    Storage Target Console

    Copy
    Copied!
                

    sudo apt install -y nfs-server echo "/srv/nfs/ 10.0.110.0/24(rw,sync,no_subtree_check)" | sudo tee -a /etc/exports echo "/srv/nfs/ 10.0.124.0/24(rw,sync,no_subtree_check)" | sudo tee -a /etc/exports

  8. Restart the NFS server:

    Storage Target Console

    Copy
    Copied!
                

    sudo systemctl restart nfs-server

  9. Create the directory share under /srv/nfs with the same permissions as the parent directory:

    Storage Target Console

    Copy
    Copied!
                

    sudo mkdir -m 777 /srv/nfs/share

K8s Cluster Deployment and Configuration

The procedures for initial Kubernetes cluster deployment using Kubespray for the master nodes, and subsequent verification, remain unchanged from the Baseline RDG (Section "K8s Cluster Deployment and Configuration", Subsections: "Kubespray Deployment and Configuration", "Deploying Cluster Using Kubespray Ansible Playbook","K8s Deployment Verification").

As in Baseline RDG, Worker nodes are added later, after DPF and prerequisite components are installed.

DPF Installation

Software Prerequisites and Required Variables

Refer to the Baseline RDG (Section "DPF Installation", Subsection "Software Prerequisites and Required Variables") for software prerequisites (like helm , envsubst ).

Proceed to clone the doca-platform Git repository (and make sure to use tag v25.10.0):

Jump Node Console

Copy
Copied!
            

git clone https://github.com/NVIDIA/doca-platform.git cd doca-platform git checkout v25.10.0

Change to the directory containing the hbn-snap readme.md, as all commands will be run from this location:

Jump Node Console

Copy
Copied!
            

$ cd docs/public/user-guides/host-trusted/use-cases/hbn-snap

Edit the following file to define the required variables for the installation:

Note
  • Replace the values for the variables in the following file with the values that fit your setup.

    Specifically, pay attention to DPU_P0,DPUCLUSTER_INTERFACEand to DPU_P0_PF_NAME, DPU_P1_PF_NAME, DPU_P0_VF10_NAME, DPU_P1_VF10_NAME

manifests/00-env-vars/envvars.env

Copy
Copied!
            

## Virtual IP used by the load balancer for the DPU Cluster. Must be a reserved IP from the management subnet and not allocated by DHCP. export DPUCLUSTER_VIP=10.0.110.200   ## Interface on which the DPUCluster load balancer will listen. Should be the management interface of the control plane node. export DPUCLUSTER_INTERFACE=enp1s0   ## IP address of the NFS server used for storing the BFB image. ## NOTE: This environment variable does NOT control the address of the NFS server used as a remote target by SNAP VirtioFS. export NFS_SERVER_IP=10.0.110.253   ## The repository URL for the NVIDIA Helm chart registry. ## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository. export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca   ## The repository URL for the HBN container image. ## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository. export HBN_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_hbn   ## The repository URL for the SNAP VFS container image. ## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository. export SNAP_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_vfs   ## The DPF REGISTRY is the Helm repository URL where the DPF Operator Chart resides. ## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository. export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca   ## The DPF TAG is the version of the DPF components which will be deployed in this guide. export TAG=v25.10.0   ## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet. export BFB_URL="https://content.mellanox.com/BlueField/BFBs/Ubuntu24.04/bf-bundle-3.2.1-34_25.11_ubuntu-24.04_64k_prod.bfb"   # contains the name of the network PF 0 on the host side, e.g. enp8s0f0np0 export DPU_P0_PF_NAME=ens4f0 # contains the name of the network PF 1 on the host side, e.g. enp8s0f1np1 export DPU_P1_PF_NAME=ens4f1 # contains the name of the network VF 10 on P0 on the host side, e.g. enp8s0f0v10 export DPU_P0_VF10_NAME=ens4f0v10 # contains the name of the network VF 10 on P1 on the host side, e.g. enp8s0f1v10 export DPU_P1_VF10_NAME=ens4f1v10

Export environment variables for the installation:

Jump Node Console

Copy
Copied!
            

source manifests/00-env-vars/envvars.env

DPF Operator Installation

No change from the Baseline RDG (Section "DPF Installation", Subsection "DPF Operator Installation").

DPF System Installation

No change from the Baseline RDG (Section "DPF Installation", Subsection "DPF System Installation").

Install components to enable Accelerated Interfaces

Please perform this step from the Baseline RDG (Section "DPF Installation", Subsection "Install Components to enable Accelerated Interfaces").

Note that sriov_network_operator_policy.yaml is not applied at this time and will be applied later on...

DPU Deployment Installation

Before deploying the objects under manifests/04.2-dpudeployment-installation-virtiofs/directory, a few adjustments are needed to achieve better performance results.

Edit the DPUFlavor YAML to add the NUM_VF_MSIX firmware paramater and increase the hugepages value in the grub:

Note

The parameter NUM_VF_MSIX is set to 48 in the provided example, which is suitable for the servers used in this RDG.

Set this value to match the physical number of cores in the NUMA node where the NIC is located.

manifests/04.2-dpudeployment-installation-virtiofs/dpuflavor.yaml

Copy
Copied!
            

--- apiVersion: provisioning.dpu.nvidia.com/v1alpha1 kind: DPUFlavor metadata: name: hbn-snap-virtiofs-$TAG namespace: dpf-operator-system spec: bfcfgParameters: - UPDATE_ATF_UEFI=yes - UPDATE_DPU_OS=yes - WITH_NIC_FW_UPDATE=yes configFiles: - operation: override path: /etc/mellanox/mlnx-bf.conf permissions: "0644" raw: | ALLOW_SHARED_RQ="no" IPSEC_FULL_OFFLOAD="no" ENABLE_ESWITCH_MULTIPORT="yes" RDMA_SET_NETNS_EXCLUSIVE="no" - operation: override path: /etc/mellanox/mlnx-ovs.conf permissions: "0644" raw: | CREATE_OVS_BRIDGES="no" OVS_DOCA="yes" - operation: override path: /etc/mellanox/mlnx-sf.conf permissions: "0644" raw: "" grub: kernelParameters: - console=hvc0 - console=ttyAMA0 - earlycon=pl011,0x13010000 - fixrttc - net.ifnames=0 - biosdevname=0 - iommu.passthrough=1 - cgroup_no_v1=net_prio,net_cls - hugepagesz=2048kB - hugepages=8192 nvconfig: - device: '*' parameters: - PF_BAR2_ENABLE=0 - PER_PF_NUM_SF=1 - PF_TOTAL_SF=20 - PF_SF_BAR_SIZE=10 - NUM_PF_MSIX_VALID=0 - PF_NUM_PF_MSIX_VALID=1 - PF_NUM_PF_MSIX=228 - INTERNAL_CPU_MODEL=1 - INTERNAL_CPU_OFFLOAD_ENGINE=0 - SRIOV_EN=1 - NUM_OF_VFS=46 - LAG_RESOURCE_ALLOCATION=1 - PCI_SWITCH_EMULATION_ENABLE=1 - PCI_SWITCH_EMULATION_NUM_PORT=32 - VIRTIO_FS_EMULATION_ENABLE=1 - VIRTIO_FS_EMULATION_NUM_PF=0 - LINK_TYPE_P1=ETH - LINK_TYPE_P2=ETH    - NUM_VF_MSIX=48 ovs: rawConfigScript: | _ovs-vsctl() { ovs-vsctl --no-wait --timeout 15 "$@" }   _ovs-vsctl set Open_vSwitch . other_config:doca-init=true _ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=50000 _ovs-vsctl set Open_vSwitch . other_config:hw-offload=true _ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=true _ovs-vsctl set Open_vSwitch . other_config:max-idle=20000 _ovs-vsctl set Open_vSwitch . other_config:max-revalidator=5000 _ovs-vsctl --if-exists del-br ovsbr1 _ovs-vsctl --if-exists del-br ovsbr2 _ovs-vsctl --may-exist add-br br-sfc _ovs-vsctl set bridge br-sfc datapath_type=netdev _ovs-vsctl set bridge br-sfc fail_mode=secure _ovs-vsctl --may-exist add-port br-sfc p0 _ovs-vsctl set Interface p0 type=dpdk _ovs-vsctl set Interface p0 mtu_request=9216 _ovs-vsctl set Port p0 external_ids:dpf-type=physical _ovs-vsctl --may-exist add-port br-sfc p1 _ovs-vsctl set Interface p1 type=dpdk _ovs-vsctl set Interface p1 mtu_request=9216 _ovs-vsctl set Port p1 external_ids:dpf-type=physical _ovs-vsctl --may-exist add-br br-hbn _ovs-vsctl set bridge br-hbn datapath_type=netdev _ovs-vsctl set bridge br-hbn fail_mode=secure

The rest of the configuration files remain the same, you would need to apply the following command:

Jump Node Console

Copy
Copied!
            

cat manifests/04.2-dpudeployment-installation-virtiofs/*.yaml | envsubst | kubectl apply -f -

It will apply all the YAMLs required for the deployment - DPUDeployment, BFB, DPUFlavor, Service Templates and Configurations for the various DPU Services (7 separate service modules for SNAP and one for HBN), Physical Interfaces definitions and IPAM definitions.

Please proceed as described in the Baseline RDG until "Infrastructure Latency & Bandwidth Validation" section, including the cluster scale-out (adding the worker nodes).

Note that the first validation command after applying the above command should be (instead of the first command that appears in the Baseline RDG):

Jump Node Console

Copy
Copied!
            

kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_hbn-snap

Testing Storage & Network Connectivity

In the next steps, we will configure and test the Virtio-FS storage and the accelerated network connection.

This will create the SriovNetworkNodePolicy and NetworkAttachmentDefinition objects:

Jump Node Console

Copy
Copied!
            

cat manifests/05-network-configuration/*.yaml | envsubst | kubectl apply -f -

And this will create the test pods:

Note

For achieving maximum TCP performance, please edit the pods in test-hostdev-pods.yaml to use 24 cores instead of 16

Jump Node Console

Copy
Copied!
            

kubectl apply -f manifests/06-network-test/test-hostdev-pods.yaml

iPerf TCP Bandwidth Test

Connect to the first pod:

Jump Node Console

Copy
Copied!
            

$ kubectl exec -it sriov-hostdev-pf0vf10-test-worker1-5bccdc4c75-97xms -- bash

Before starting the iperf3 server listeners, and to achieve good results, check which cores the pod is currently running on in another tab:

Jump Node Console

Copy
Copied!
            

$ ssh worker1 depuser@worker1:~$ sudo -i root@worker1:~# crictl ps | grep sriov-hostdev-pf0vf10 a4441f76405cf 0ac86781a84f1 14 minutes ago Running nginx 0 24f4c327d918f sriov-hostdev-pf0vf10-test-worker1-5bccdc4c75-97xms default   root@worker1:~# crictl inspect a4441f76405cf | jq '.status.resources.linux.cpusetCpus' "28-51"

Back in the first pod - use vim to create the following script to start multiple iperf3 servers (1 for each core) on different ports:

iperf_server.sh

Copy
Copied!
            

#!/bin/bash   # Cores to bind the iperf3 server processes to CORES=$1   # Function to expand core ranges (e.g., "10-20,40-50" -> array of individual cores) expand_core_ranges() { local ranges=$1 local cores=() # Split by comma to handle multiple ranges IFS=',' read -ra RANGE_ARRAY <<< "$ranges" for range in "${RANGE_ARRAY[@]}"; do # Check if it's a range (contains '-') or a single core if [[ $range == *"-"* ]]; then first=$(echo $range | cut -d "-" -f1) last=$(echo $range | cut -d "-" -f2) for core in $(seq $first $last); do cores+=($core) done else cores+=($range) fi done echo "${cores[@]}" }   # Expand the core ranges into an array core_array=($(expand_core_ranges "$CORES")) ports_num=${#core_array[@]}   echo "Starting $ports_num iperf3 server processes on cores: ${core_array[@]}"   # Loop over each core and run iperf3 servers with sequential port assignment for i in $(seq 1 $ports_num); do core=${core_array[$((i-1))]} port=$((5201 + i * 2)) echo "Running iperf3 server $i on core $core, port $port" taskset -c $core iperf3 -s -p $port > /dev/null 2>&1 & done

For best performance please set 9K MTU on the net1 interface and then start the script using the previous CPU range (leave 1 core as a buffer):

First Pod Console

Copy
Copied!
            

root@sriov-hostdev-pf0vf10-test-worker1-5bccdc4c75-97xms:/# ip link set net1 mtu 9000 root@sriov-hostdev-pf0vf10-test-worker1-5bccdc4c75-97xms:/# chmod +x iperf_server.sh root@sriov-hostdev-pf0vf10-test-worker1-5bccdc4c75-97xms:/# ./iperf_server.sh 28-51 Starting 16 iperf3 server processes on cores: 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 Running iperf3 server 1 Running iperf3 server 2 ... ... Running iperf3 server 23 Running iperf3 server 24   root@sriov-hostdev-pf0vf10-test-worker1-5bccdc4c75-97xms:/# ps -ef | grep iperf3 38 root 0:00 iperf3 -s -p 5203 39 root 0:00 iperf3 -s -p 5205 ... ... 60 root 0:27 iperf3 -s -p 5247 61 root 0:40 iperf3 -s -p 5249

Connect to the second pod:

Jump Node Console

Copy
Copied!
            

$ kubectl exec -it sriov-hostdev-pf0vf10-test-worker2-85b7cb76fd-qmljl -- bash

Follow the previously displayed method to identify the CPU cores that the second pod is running on. In our case it was the same range (28-51).

Use vim to create the following script to start multiple iperf3 clients that will connect to each iperf3 server in the first pod:

Note
  • The script receives 3 parameters: the server IP to connect to, the CPU cores on which to spawn the iperf3 processes, and the duration the iperf3 test. Make sure to provide all 3 when initiating the script and providing the CPU cores as a range (28-51).

iperf_client.sh

Copy
Copied!
            

#!/bin/bash   # IP address of the server where iperf3 servers are running SERVER_IP=$1 # Change to your server's IP   # Cores to bind the iperf3 client processes to CORES=$2   # Duration to run the iperf3 test DUR=$3   # Variable to accumulate the total bandwidth in Gbit/sec total_bandwidth_Gbit=0   # Function to expand core ranges (e.g., "10-20,40-50" -> array of individual cores) expand_core_ranges() { local ranges=$1 local cores=() # Split by comma to handle multiple ranges IFS=',' read -ra RANGE_ARRAY <<< "$ranges" for range in "${RANGE_ARRAY[@]}"; do # Check if it's a range (contains '-') or a single core if [[ $range == *"-"* ]]; then first=$(echo $range | cut -d "-" -f1) last=$(echo $range | cut -d "-" -f2) for core in $(seq $first $last); do cores+=($core) done else cores+=($range) fi done echo "${cores[@]}" }   # Expand the core ranges into an array core_array=($(expand_core_ranges "$CORES")) ports_num=${#core_array[@]}   echo "Starting $ports_num iperf3 client processes on cores: ${core_array[@]}"   # Array to store the PIDs of background tasks pids=()   # Loop over each core and run iperf3 clients with sequential port assignment for i in $(seq 1 $ports_num); do port=$((5201 + i * 2)) cpu_core=${core_array[$((i-1))]} # Assign CPU core from the expanded array output_file="iperf3_client_results_$port.log"   echo "Running iperf3 client $i on core $cpu_core, connecting to port $port" # Run the iperf3 client in the background with CPU core binding timeout $(( DUR +5 )) taskset -c $cpu_core iperf3 -Z -c $SERVER_IP -p $port -t $DUR -J > $output_file & pid=$! pids+=("$pid") done   # Wait for all background tasks to complete and check their status for pid in "${pids[@]}"; do wait $pid if [[ $? -ne 0 ]]; then echo "Process with PID $pid failed or timed out." fi done   # Summarize the results from each log file echo "Summary of iperf3 client results:" for i in $(seq 1 $ports_num); do port=$((5201 + i * 2)) output_file="iperf3_client_results_$port.log"   if [[ -f $output_file ]]; then echo "Results for port $port:"   # Parse the results and print a summary bandwidth_bps=$(jq '.end.sum_received.bits_per_second' $output_file)   if [[ -n $bandwidth_bps ]]; then # Convert bandwidth from bps to Gbit/sec bandwidth_Gbit=$(echo "scale=3; $bandwidth_bps / 1000000000" | bc) echo " Bandwidth: $bandwidth_Gbit Gbit/sec"   # Accumulate the bandwidth for the total summary total_bandwidth_Gbit=$(echo "scale=3; $total_bandwidth_Gbit + $bandwidth_Gbit" | bc)   # Delete current log file rm $output_file else echo "No bandwidth data found in $output_file" fi   else echo "No results found for port $port" fi done   # Print the total bandwidth summary echo "Total Bandwidth across all streams: $total_bandwidth_Gbit Gbit/sec"

Again, please set 9K MTU on net1 for maximum performance and run the script to check the performance results:

Second Pod Console

Copy
Copied!
            

root@sriov-hostdev-pf0vf10-test-worker2-85b7cb76fd-qmljl:/# ip link set net1 mtu 9000 root@sriov-hostdev-pf0vf10-test-worker2-85b7cb76fd-qmljl:/# chmod +x iperf_client.sh root@sriov-hostdev-pf0vf10-test-worker2-85b7cb76fd-qmljl:/# ./iperf_client.sh 10.0.121.1 28-51 30   Summary of iperf3 client results: Results for port 5203: Bandwidth: 14.207 Gbit/sec Results for port 5205: Bandwidth: 22.445 Gbit/sec Results for port 5207: Bandwidth: 8.868 Gbit/sec Results for port 5209: Bandwidth: 11.115 Gbit/sec Results for port 5211: Bandwidth: 14.104 Gbit/sec Results for port 5213: Bandwidth: 13.387 Gbit/sec Results for port 5215: Bandwidth: 22.743 Gbit/sec Results for port 5217: Bandwidth: 12.132 Gbit/sec Results for port 5219: Bandwidth: 13.927 Gbit/sec Results for port 5221: Bandwidth: 13.470 Gbit/sec Results for port 5223: Bandwidth: 22.720 Gbit/sec Results for port 5225: Bandwidth: 14.771 Gbit/sec Results for port 5227: Bandwidth: 12.752 Gbit/sec Results for port 5229: Bandwidth: 9.174 Gbit/sec Results for port 5231: Bandwidth: 14.265 Gbit/sec Results for port 5233: Bandwidth: 24.338 Gbit/sec Results for port 5235: Bandwidth: 14.087 Gbit/sec Results for port 5237: Bandwidth: 13.353 Gbit/sec Results for port 5239: Bandwidth: 14.555 Gbit/sec Results for port 5241: Bandwidth: 20.808 Gbit/sec Results for port 5243: Bandwidth: 13.056 Gbit/sec Results for port 5245: Bandwidth: 16.648 Gbit/sec Results for port 5247: Bandwidth: 17.545 Gbit/sec Results for port 5249: Bandwidth: 20.905 Gbit/sec Total Bandwidth across all streams: 375.375 Gbit/sec

Storage Test

The following command will define the DPUStorageVendor for NFS CSI and the DPUStoragePolicy for filesystem policy:

Jump Node Console

Copy
Copied!
            

cat manifests/07.2-storage-configuration-virtiofs/*.yaml | envsubst | kubectl apply -f -

Verify the DPUStorageVendor and DPUStoragePolicy objects are ready:

Jump Node Console

Copy
Copied!
            

kubectl wait --for=condition=Ready --namespace dpf-operator-system dpustoragevendors --all kubectl wait --for=condition=Ready --namespace dpf-operator-system dpustoragepolicies --all

Deploy storage test pods that mount a storage volume provided by SNAP VirtioFS:

Jump Node Console

Copy
Copied!
            

kubectl apply -f manifests/08.2-storage-test-virtiofs

Check the virtiofs-tag name:

Jump Node Console

Copy
Copied!
            

$ kubectl get dpuvolumeattachments.storage.dpu.nvidia.com -A -o json | jq '.items[0].status.dpu.virtioFSAttrs.filesystemTag' "3e76e376579383d2tag"

Connect to the test pod, validate that the virtiofs filesystem is mounted with the previous tag name and install the fio software:

Jump Node Console

Copy
Copied!
            

$ kubectl exec -it storage-test-pod-virtiofs-hotplug-pf-0 -- bash   root@storage-test-pod-virtiofs-hotplug-pf-0:/# df -Th Filesystem Type Size Used Avail Use% Mounted on overlay overlay 439G 17G 400G 4% / tmpfs tmpfs 64M 0 64M 0% /dev 3e76e376579383d2tag virtiofs 1.8T 45G 1.8T 3% /mnt/vol1 /dev/nvme0n1p2 ext4 439G 17G 400G 4% /etc/hosts shm tmpfs 64M 0 64M 0% /dev/shm tmpfs tmpfs 251G 12K 251G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs tmpfs 126G 0 126G 0% /proc/acpi tmpfs tmpfs 126G 0 126G 0% /proc/scsi tmpfs tmpfs 126G 0 126G 0% /sys/firmware tmpfs tmpfs 126G 0 126G 0% /sys/devices/virtual/powercap   root@storage-test-pod-virtiofs-hotplug-pf-0:/# apt update && apt install -y vim fio

Using vim, create the following file:

job-4k.fio

Copy
Copied!
            

[global] ioengine=libaio direct=1 iodepth=32 rw=read bs=4k size=1G numjobs=8 runtime=60 time_based group_reporting   [job1] filename=/mnt/vol1/test.fio

Finally, run the fio test:

Jump Node Console

Copy
Copied!
            

root@storage-test-pod-virtiofs-hotplug-pf-0:/# fio job-4k.fio job1: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 ... fio-2.2.10 ... ... Starting 8 processes job1: Laying out IO file(s) (1 file(s) / 1024MB) Jobs: 8 (f=8): [R(8)] [100.0% done] [826.1MB/0KB/0KB /s] [212K/0/0 iops] [eta 00m:00s] job1: (groupid=0, jobs=8): err= 0: pid=1183: Mon Dec 1 10:31:32 2025 read : io=47664MB, bw=813351KB/s, iops=203337, runt= 60008msec slat (usec): min=0, max=679, avg= 6.90, stdev= 4.13 clat (usec): min=167, max=135036, avg=1250.42, stdev=4941.25 lat (usec): min=170, max=135038, avg=1257.36, stdev=4940.79 clat percentiles (usec): | 1.00th=[ 258], 5.00th=[ 278], 10.00th=[ 286], 20.00th=[ 298], | 30.00th=[ 302], 40.00th=[ 310], 50.00th=[ 314], 60.00th=[ 322], | 70.00th=[ 326], 80.00th=[ 338], 90.00th=[ 358], 95.00th=[ 470], | 99.00th=[27520], 99.50th=[32128], 99.90th=[46336], 99.95th=[52992], | 99.99th=[68096] bw (KB /s): min=85832, max=121912, per=12.51%, avg=101789.00, stdev=5105.93 lat (usec) : 250=0.39%, 500=95.22%, 750=0.55%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=1.05%, 50=2.70% lat (msec) : 100=0.07%, 250=0.01% cpu : usr=2.78%, sys=24.20%, ctx=8652632, majf=0, minf=340 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued : total=r=12201896/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32   Run status group 0 (all jobs): READ: io=47664MB, aggrb=813351KB/s, minb=813351KB/s, maxb=813351KB/s, mint=60008msec, maxt=60008msec

Done!

Authors

GZ-version-1-modificationdate-1739267330290-api-v2.jpg


Guy Zilberman

Guy Zilberman is a solution architect at NVIDIA's Networking Solution s Labs, bringing extensive experience from several leadership roles in cloud computing. He specializes in designing and implementing solutions for cloud and containerized workloads, leveraging NVIDIA's advanced networking technologies. His work primarily focuses on open-source cloud infrastructure, with expertise in platforms such as Kubernetes (K8s) and OpenStack.

SD-version-2-modificationdate-1697457893220-api-v2.jpg


Shachar Dor

Shachar Dor joined the Solutions Lab team after working more than ten years as a software architect at NVIDIA Networking (previously Mellanox Technologies), where he was responsible for the architecture of network management products and solutions. Shachar's focus is on networking technologies, especially around fabric bring-up, configuration, monitoring, and life-cycle management.

Shachar has a strong background in software architecture, design, and programming through his work on multiple projects and technologies also prior to joining the company.

NVIDIA and the NVIDIA logo, are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. TM

© 2025 NVIDIA Corporation. All rights reserved.

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality. NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice. Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete. NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.

Last updated on Jan 6, 2026