RDG for DPF Zero Trust (DPF-ZT) with HBN and SNAP Block Bootable Disk DPU Services

Created on Jan 1, 2026 (v 25.10)

Scope

This Reference Deployment Guide (RDG) provides detailed instructions for deploying the NVIDIA DOCA Platform Framework (DPF) in Zero-Trust mode on high-performance, bare-metal infrastructure. The guide focuses on configuring Host-Based Networking (HBN) and Storage-Defined Network Accelerated Processing (SNAP) services on NVIDIA® BlueField®-3 DPUs, with SNAP operating in block device mode. This deployment delivers secure, isolated, and hardware-accelerated environments optimized for Zero-Trust architectures.

This document is an extension of the RDG for DPF Zero Trust (DPF-ZT) (referred to as the Baseline RDG ). It details the additional steps and modifications required to deploy SNAP DPU service with HBN DPU service in addition to the services in the Baseline RDG and orchestrate them.

The guide is intended for experienced system administrators, systems engineers, and solution architects who build highly secure bare-metal environments with Host-Based Networking enabled using NVIDIA BlueField DPUs for acceleration, isolation, and infrastructure offload.

Note

This reference implementation, as the name implies, is a specific, opiniated deployment example designed to address the usecase described above.
While other approaches may exist to implement similar solutions, this document provides a detailed guide for this particular method.

Abbreviations and Acronyms

Term	Definition	Term	Definition
BFB	BlueField Bootstream	MAAS	Metal as a Service
BGP	Border Gateway Protocol	OVN	Open Virtual Network
CNI	Container Network Interface	PVC	Persistent Volume Claim
CRD	Custom Resource Definition	RDG	Reference Deployment Guide
CSI	Container Storage Interface	RDMA	Remote Direct Memory Access
DHCP	Dynamic Host Configuration Protocol	SF	Scalable Function
DOCA	Data Center Infrastructure-on-a-Chip Architecture	SFC	Service Function Chaining
DOCA SNAP	NVIDIA® DOCA™ Storage-Defined Network Accelerated Processing	SPDK	Storage Performance Development Kit
DPF	DOCA Platform Framework	SR-IOV	Single Root Input/Output Virtualization
DPU	Data Processing Unit	TOR	Top of Rack
DTS	DOCA Telemetry Service	VF	Virtual Function
GENEVE	Generic Network Virtualization Encapsulation	VLAN	Virtual LAN (Local Area Network)
HBN	Host Based Networking	VRR	Virtual Router Redundancy
IPAM	IP Address Management	VTEP	Virtual Tunnel End Point
K8S	Kubernetes	VXLAN	Virtual Extensible LAN

Introduction

The NVIDIA BlueField-3 Data Processing Unit is a powerful infrastructure compute platform designed for high-speed processing of software-defined networking, storage, and cybersecurity . With a capacity of 400 Gb/s, BlueField-3 combines robust computing, high-speed networking, and extensive programmability to deliver hardware-accelerated, software-defined solutions for demanding workloads.

Deploying and managing DPUs and their associated DOCA services, especially at scale, can be quite challenging. Without a proper provisioning and orchestration system, handling the DPU lifecycle and configuring DOCA services place a heavy operational burden on system administrators. The NVIDIA DOCA Platform Foundation addresses this challenge by streamlining and automating the lifecycle management of DOCA services.

NVIDIA DOCA unleashes the full power of the BlueField® platform, empowering organizations to rapidly build next-generation applications and services that offload, accelerate, and isolate critical data center workloads. By leveraging DOCA, businesses can achieve unmatched performance, security, and efficiency across modern infrastructure.

A prime example of this innovation is NVIDIA DOCA SNAP — a breakthrough DPU-based storage solution designed to accelerate and optimize storage protocols using BlueField’s advanced hardware acceleration. DOCA SNAP delivers a family of services that virtualize local storage at the hardware level, presenting networked storage as local block devices to the host and emulating physical drives over the PCIe bus. In our use case the block device will be serve as boot drive. With DOCA SNAP, organizations gain high-performance, low-latency access to storage by bypassing traditional filesystem overhead and interacting directly with raw block devices. This results in faster data access, reduced CPU utilization, and improved workload efficiency. Integrated into the DOCA Platform Framework (DPF), SNAP is packaged as containerized components deployed seamlessly across x86 and DPU Kubernetes clusters—delivering a scalable, Zero-Trust architecture for the modern data center.

DPF supports multiple deployment models. This guide focuses on the Zero Trust bare-metal deployment model. In this scenario:

The DPU is managed through its Baseboard Management Controller ( BMC )
All management traffic occurs over the DPU's out-of-band ( OOB ) network
The host is considered as an untrusted entity towards the data center network. The DPU acts as a barrier between the host and the network.
The host sees the DPU as a standard NIC , with no access to the internal DPU management plane (Zero Trust Mode)

This Reference Deployment Guide (RDG) provides a step-by-step example for installing DPF in Zero-Trust mode with HBN and SNAP DPU services. As part of the reference implementation, open-source components outside the scope of DPF (e.g., MAAS, pfSense, Kubespray) are used to simulate a realistic customer deployment environment. The guide includes the full end-to-end deployment process, including:

Infrastructure provisioning
DPF deployment
DPU provisioning (redfish)
Service configuration and deployment
Service chaining

Info

In our guide we used the Storage Performance Development Kit (SPDK) as an example of storage backend service.

This storage backend service is used only for demonstration purposes and is not intended or supported for production use cases.

References

Solution Architecture

Key Components and Technologies

NVIDIA BlueField® Data Processing Unit (DPU)
The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.

NVIDIA DOCA Software Framework
NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.

NVIDIA ConnectX SmartNICs
10/25/40/50/100/200 and 400G Ethernet Network Adapters
The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.
NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.

NVIDIA LinkX Cables
The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.

NVIDIA Spectrum Ethernet Switches
Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.
Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.
NVIDIA combines the benefits of NVIDIA Spectrum^™ switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus^® Linux , SONiC and NVIDIA Onyx^®.

NVIDIA Cumulus Linux
NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.

Kubernetes
Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.

Kubespray
Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:
- A highly available cluster
- Composable attributes
- Support for most popular Linux distributions

Solution Design

Solution Logical Design

The logical design includes the following components:

1 x Hypervisor node (KVM-based) with ConnectX-7:
- 1 x Firewall VM
- 1 x Jump Node VM
- 1 x MaaS VM
- 1 x DPU DHCP VM
- 3 x K8s Master VMs running all K8s management components
2 x Worker nodes (PCI Gen5), each with a 1 x BlueField-3 NIC
Storage Target Node with ConnectX-7 and SPDK target apps
Single High-Speed (HS) switch
1 Gb Host Management network

image-2026-1-15_8-49-26-version-1-modificationdate-1768459765087-api-v2.png

SFC Logical Diagram

The DOCA Platform Framework simplifies DPU management by providing orchestration through a K8s API. It handles the provisioning and lifecycle management of DPUs, orchestrates specialized DPU services, and automates service function chaining (SFC) tasks. This ensures seamless deployment of NVIDIA DOCA services, allowing traffic to be efficiently offloaded and routed through HBN's data plane. The SFC logical diagram implemented in this guide is shown below.

image-2026-1-11_10-54-18-version-1-modificationdate-1768121657257-api-v2.png

Disk Emulation Logical Diagram

The following logical diagram demonstrates the main components involved in a disk mount procedure to tenant workload pod.

Upon receiving a new request for an emulated NVMe drive, DOCA SNAP components bring a block device (BDEV) via NVMe-oF using either RDMA or TCP storage protocols to the required BM worker node. The DPU then emulates it as a block device on the x86 host via the "BlueField NVMe SNAP Controller".

image-2026-1-11_10-56-53-version-1-modificationdate-1768121811597-api-v2.png

Firewall Design

The pfSense firewall in this solution serves a dual purpose:

Firewall – Provides an isolated environment for the DPF system, ensuring secure operations
Router – Enables internet access and connectivity between the host management network and the high-speed network

Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address in the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of the Kubernetes (K8s) cluster and DPF components.

The following diagram illustrates the firewall design used in this solution:

image-2026-1-11_11-10-58-1-version-1-modificationdate-1768122656830-api-v2.png

Software Stack Components

image-2026-1-13_9-52-27-1-version-1-modificationdate-1768290748100-api-v2.png

Warning

Make sure to use the exact same versions for the software stack as described above.

Bill of Materials

image-2026-1-11_11-42-7-1-version-1-modificationdate-1768124526123-api-v2.png

Deployment and Configuration

Node and Switch Definitions

These are the definitions and parameters used for deploying the demonstrated fabric :

Switch Port Usage
`mgmt-switch`	1	swp1-6
`hs-switch`	1	swp1,2,11-14,32

Hosts
Rack	Server Type	Server Name	Switch Port	IP and NICs	Default Gateway
Rack1	Hypervisor Node	`hypervisor`	mgmt-switch: `swp1` hs-switch: `swp1`	lab-br (interface eno1): Trusted LAN IP mgmt-br (interface eno2): - hs-br (interface ens2f0np0):	Trusted LAN GW
Rack1	Storage Target Node	`target`	mgmt-switch: `swp4` hs-switch: `swp32`	enp1s0f0: 10.0.110.25/24 enp144s0f0np0: 10.0.124.1/24	10.0.110.254
Rack1	Worker Node	`worker1`	mgmt-switch: `swp2` hs-switch: `swp11`-`swp12`	ens15f0: 10.0.110.21/24 ens5f0np0/ens5f1np1: 10.0.120.0/22	10.0.110.254
Rack1	Worker Node	`worker2`	mgmt-switch: `swp3` hs-switch: `swp13`-`swp14`	ens15f0: 10.0.110.22/24 ens5f0np0/ens5f1np1: 10.0.120.0/22	10.0.110.254
Rack1	Firewall (Virtual)	`fw`	-	WAN (lab-br): Trusted LAN IP LAN (mgmt-br): 10.0.110.254/24 OPT1 (hs-br): 172.169.50.1/30	Trusted LAN GW
Rack1	Jump Node (Virtual)	`jump`	-	enp1s0: 10.0.110.253/24	10.0.110.254
Rack1	MAAS (Virtual)	`maas`	-	enp1s0: 10.0.110.252/24	10.0.110.254
Rack1	DPU DHCP (Virtual)	`dhcp`	-	enp1s0: 10.0.125.4/24	10.0.125.1
Rack1	Master Node (Virtual)	`master1`	-	enp1s0: 10.0.110.1/24	10.0.110.254
Rack1	Master Node (Virtual)	`master2`	-	enp1s0: 10.0.110.2/24	10.0.110.254
Rack1	Master Node (Virtual)	`master3`	-	enp1s0: 10.0.110.3/24	10.0.110.254

Wiring

Hypervisor Node

image-2025-hv-version-1-modificationdate-1768129175293-api-v2.png

Bare Metal Worker Node

image-2025-dpu-version-1-modificationdate-1768129178520-api-v2.png

Storage Target Node

image-2025-st-version-1-modificationdate-1768129176843-api-v2.png

Fabric Configuration

Updating Cumulus Linux

As a best practice, make sure to use the latest released Cumulus Linux NOS version.

For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.

Configuring the Cumulus Linux Switch

The SN3700 switch (hs-switch), is configured as follows:

The following commands configure BGP unnumbered on hs-switch.
Cumulus Linux enables the BGP equal-cost multipathing (ECMP) option by default.

SN3700 Switch Console

Copy
Copied!

            
            nv set bridge domain br_default vlan 10 vni 10
nv set evpn state enabled
nv set interface eth0 ipv4 dhcp-client state enabled
nv set interface eth0 type eth
nv set interface eth0 vrf mgmt
nv set interface lo ipv4 address 11.0.0.101/32
nv set interface lo type loopback
nv set interface swp1 ipv4 address 172.169.50.2/30
nv set interface swp1 link speed auto
nv set interface swp1-32 type swp
nv set interface swp2 ipv4 address 10.0.125.254/24
nv set interface swp32 bridge domain br_default access 10
nv set nve vxlan source address 11.0.0.101
nv set nve vxlan state enabled
nv set qos roce mode lossless
nv set qos roce state enabled
nv set router bgp autonomous-system 65001
nv set router bgp graceful-restart mode full
nv set router bgp router-id 11.0.0.101
nv set router bgp state enabled
nv set system hostname hs-switch
nv set vrf default router bgp address-family ipv4-unicast network 10.0.125.0/24
nv set vrf default router bgp address-family ipv4-unicast network 11.0.0.101/32
nv set vrf default router bgp address-family ipv4-unicast state enabled
nv set vrf default router bgp address-family ipv6-unicast redistribute connected state enabled
nv set vrf default router bgp address-family ipv6-unicast state enabled
nv set vrf default router bgp address-family l2vpn-evpn state enabled
nv set vrf default router bgp neighbor swp11 enforce-first-as disabled
nv set vrf default router bgp neighbor swp11 peer-group hbn
nv set vrf default router bgp neighbor swp11 type unnumbered
nv set vrf default router bgp neighbor swp12 enforce-first-as disabled
nv set vrf default router bgp neighbor swp12 peer-group hbn
nv set vrf default router bgp neighbor swp12 type unnumbered
nv set vrf default router bgp neighbor swp13 enforce-first-as disabled
nv set vrf default router bgp neighbor swp13 peer-group hbn
nv set vrf default router bgp neighbor swp13 type unnumbered
nv set vrf default router bgp neighbor swp14 enforce-first-as disabled
nv set vrf default router bgp neighbor swp14 peer-group hbn
nv set vrf default router bgp neighbor swp14 type unnumbered
nv set vrf default router bgp path-selection multipath aspath-ignore enabled
nv set vrf default router bgp peer-group hbn address-family ipv4-unicast default-route-origination state enabled
nv set vrf default router bgp peer-group hbn address-family ipv4-unicast state enabled
nv set vrf default router bgp peer-group hbn address-family ipv6-unicast state enabled
nv set vrf default router bgp peer-group hbn address-family l2vpn-evpn state enabled
nv set vrf default router bgp peer-group hbn enforce-first-as disabled
nv set vrf default router bgp peer-group hbn remote-as external
nv set vrf default router bgp state enabled
nv set vrf default router static 0.0.0.0/0 address-family ipv4-unicast
nv set vrf default router static 0.0.0.0/0 via 172.169.50.1 type ipv4-address
 
nv config apply -y

The SN2201 switch (mgmt-switch) is configured as follows:

SN2201 Switch Console

Copy
Copied!

            
            nv set bridge domain br_default untagged 1
nv set interface swp1-6 link state up
nv set interface swp1-6 type swp
nv set interface swp1-6 bridge domain br_default
nv config apply -y

Installation and Configuration

Warning

Make sure that the BIOS settings on the worker node servers have SR-IOV enabled and that the servers are tuned for maximum performance.

All worker nodes must have the same PCIe placement for the BlueField-3 NIC and must display the same interface name.

Make sure that you have DPU BMC and OOB MAC addresses.

No change from the Baseline RDG (Section "Deployment and Configuration", Subsection "Host Configuration").

Hypervisor Installation and Configuration

No change from the Baseline RDG (Section "Hypervisor Installation and Configuration").

Prepare Infrastructure Servers

No change from the Baseline RDG (Section "Deployment and Configuration", Subsection "Prepare Infrastructure Servers") regarding Jump VM, MaaS VM.

Regarding Firewall VM, it should be configured according to section "Firewall VM - pfSense Installation and Interface Configuration" from the RDG for DPF with OVN-Kubernetes and HBN Services.

Provisioning "DPU DHCP VM"

Please install Rocky Linux 9.0 in minimal server configuration.
Configure manually IP address to 10.0.125.4/24 with default GW 10.0.125.1/24 and your prefferred DNS server.

Install following modules:

Jump Node Console

Copy
Copied!

            
            sudo dnf -y update
sudo dnf install -y lldpd dnsmasq

Apply following configuration to DNSMASQ apps - file /etc/dnsmasq.conf

/etc/dnsmasq.conf

Copy
Copied!

            
            #
#Disable the DNS server set: port=0
#
port=53
#
#Setup the server to be your authoritative DHCP server
#
dhcp-authoritative
#
#Set the DHCP server to hand addresses sequentially
#
dhcp-sequential-ip
#
#Enable more detailed logging for DHCP
#
log-dhcp
log-queries
no-resolv
log-facility=/var/log/dnsmasq.log
domain=x86.dpf.rdg.local.domain
local=/x86.dpf.rdg.local.domain/
server=8.8.8.8
 
#
#Create different dhcp scopes for each of the three simulated subnets here, using tags for ID
#Format is: dhcp-range=<your_tag_here>,<start_of_scope>,<end_of_scope>,<subnet_mask>,<lease_time>
#
dhcp-range=subnet0,10.0.120.2,10.0.120.6,255.255.255.248,8h
dhcp-option=subnet0,42,192.114.62.250
dhcp-option=subnet0,6,10.0.125.4
dhcp-option=subnet0,3,10.0.120.1
 
 
dhcp-range=subnet1,10.0.120.10,10.0.120.14,255.255.255.248,8h
dhcp-option=subnet1,42,192.114.62.250
dhcp-option=subnet1,6,10.0.125.4
dhcp-option=subnet1,3,10.0.120.9

Info

The following dnsmasq configuration is customized for our specific deployment use case and should not be used as a default configuration.

Start and enable autostart for dnsmasq.service.

Jump Node Console

Copy
Copied!

            
            sudo systemctl start dnsmasq.service
sudo systemctl enable dnsmasq.service

Check service status

Jump Node Console

Copy
Copied!

            
            sudo systemctl status dnsmasq.service
 
### Command output should look like: ###
 
 dnsmasq.service - DNS caching server.
     Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; enabled; preset: disabled)
     Active: active (running) since Wed 2025-12-24 08:49:28 EST; 2 weeks 3 days ago
 Invocation: 10eb617fa5fe4bedb1fc021ddcc7751f
    Process: 1172 ExecStart=/usr/sbin/dnsmasq (code=exited, status=0/SUCCESS)
   Main PID: 1193 (dnsmasq)
      Tasks: 1 (limit: 23017)
     Memory: 2M (peak: 2.5M)
        CPU: 112ms
     CGroup: /system.slice/dnsmasq.service
             └─1193 /usr/sbin/dnsmasq
 
Dec 24 08:49:28 hbn-dhcp systemd[1]: Starting dnsmasq.service - DNS caching server....
Dec 24 08:49:28 hbn-dhcp systemd[1]: Started dnsmasq.service - DNS caching server..

Provision SPDK Target Apps on Storage Target Node

Login as root account to Storage Target Node:

Jump Node Console

Copy
Copied!

            
            $ ssh target
$ sudo -i

Build SPDK from source (root privileges is required!):

Jump Node Console

Copy
Copied!

            
            git clone https://github.com/spdk/spdk
 
cd spdk
git submodule update --init
apt update && apt install meson python3-pyelftools -y
./scripts/pkgdep.sh --rdma
./configure --with-rdma
make

Run SPDK target:

Jump Node Console

Copy
Copied!

            
            # Get all nvme devices
 
lshw -c storage -businfo
 
Bus info          Device         Class          Description
===========================================================
pci@0000:08:00.0                 storage        PCIe Data Center SSD
pci@0000:00:11.4                 storage        C610/X99 series chipset sSATA Controller [AHCI mode]
pci@0000:00:1f.2                 storage        C610/X99 series chipset 6-Port SATA Controller [AHCI mode]
pci@0000:81:00.0  scsi4          storage        MegaRAID SAS-3 3108 [Invader]
 
# Start target
scripts/setup.sh
build/bin/nvmf_tgt &
 
# Add bdevs with nvme backend
scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t PCIe -a 0000:08:00.0
 
# Add logical volume store on base bdev
scripts/rpc.py bdev_lvol_create_lvstore Nvme0n1 lvs0
 
# Display current logical volume list
scripts/rpc.py bdev_lvol_get_lvstores
 
scripts/rpc_http_proxy.py 10.0.110.25 8000 exampleuser examplepassword &

SPDK target is ready.

Provision Master VMs and Worker Nodes Using MaaS

No change from the Baseline RDG ((Section "Provision Master VMs Using MaaS").

Note

UEFI Server BIOS mode is require for Bare-Metal Worker Nodes.

DPU Service Installation

Change the DPUDeployment, DPUServiceConfig, DPUServiceTemplate and other necessary objects according to your setup environment.

Before deploying the objects under doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn-snapdirectory, a few adjustments are required.

Note

It is necessary to set several environment variables before running this command.

$ source manifests/00-env-vars/envvars.env

Change directory to readme.md from where all the commands will be run:

Jump Node Console

Copy
Copied!

            
            $ cd doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn-snap

Modify the variables in manifests/00-env-vars/envvars.env to fit your environment, then source the file:

Warning

Replace the values for the variables in the following file with the values that fit your setup. Specifically, pay attention to DPUCLUSTER_INTERFACE and BMC_ROOT_PASSWORD.

manifests/00-env-vars/envvars.env

Copy
Copied!

            
            ## IP Address for the Kubernetes API server of the target cluster on which DPF is installed.
## This should never include a scheme or a port.
## e.g. 10.10.10.10
export TARGETCLUSTER_API_SERVER_HOST=10.0.110.10
 
## Port for the Kubernetes API server of the target cluster on which DPF is installed.
export TARGETCLUSTER_API_SERVER_PORT=6443
 
## Virtual IP used by the load balancer for the DPU Cluster. Must be a reserved IP from the management subnet and not allocated by DHCP.
export DPUCLUSTER_VIP=10.0.110.200
 
## DPU_P0 is the name of the first port of the DPU. This name must be the same on all worker nodes.
#export DPU_P0=enp204s0f0np0
 
## Interface on which the DPUCluster load balancer will listen. Should be the management interface of the control plane node.
export DPUCLUSTER_INTERFACE=enp1s0
 
# IP address to the NFS server used as storage for the BFB.
export NFS_SERVER_IP=10.0.110.253
 
## The repository URL for the NVIDIA Helm chart registry.
## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca
 
## The repository URL for the HBN container image.
## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository.
export HBN_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_hbn
 
## The repository URL for the SNAP VFS container image.
## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository.
export SNAP_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_vfs
 
## The DPF REGISTRY is the Helm repository URL where the DPF Operator Chart resides.
## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca
 
## The DPF TAG is the version of the DPF components which will be deployed in this guide.
export TAG=v25.10.0
 
## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet.
export BFB_URL="https://content.mellanox.com/BlueField/BFBs/Ubuntu24.04/bf-bundle-3.2.1-34_25.11_ubuntu-24.04_64k_prod.bfb"
 
## IP_RANGE_START and IP_RANGE_END
## These define the IP range for DPU discovery via Redfish/BMC interfaces
## Example: If your DPUs have BMC IPs in range 192.168.1.100-110
## export IP_RANGE_START=192.168.1.100
## export IP_RANGE_END=192.168.1.110
## IP_RANGE_START and IP_RANGE_END
 
## Start of DPUDiscovery IpRange
export IP_RANGE_START=10.0.110.75
 
## End of DPUDiscovery IpRange
export IP_RANGE_END=10.0.110.76
 
# The password used for DPU BMC root login, must be the same for all DPUs
# For more information on how to set the BMC root password refer to BlueField DPU Administrator Quick Start Guide. 
export BMC_ROOT_PASSWORD=<set your BMC_ROOT_PASSWORD>
 
## Serial number of DPUs. If you have more than 2 DPUs, you will need to parameterize the system accordingly and expose
## additional variables.
## All serial numbers must be in lowercase.
 
## Serial number of DPU1
export DPU1_SERIAL=mt2334xz09f0
 
## Serial number of DPU2
export DPU2_SERIAL=mt2334xz09f1

Export environment variables for the installation:

Jump Node Console

Copy
Copied!

            
            $ source manifests/00-env-vars/envvars.env

Apply the necessary updates to all YAML files (dpudeployment.yaml, hbn-dpuserviceconfig.yaml, hbn-dpuservicetemplate.yaml, hbn-ipam.yaml) located in the manifests/03.1-dpudeployment-installation-nvme/ directory:

dpudeployment.yaml

Copy
Copied!

            
            ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUDeployment
metadata:
  name: hbn-snap
  namespace: dpf-operator-system
spec:
  dpus:
    bfb: bf-bundle-$TAG
    flavor: hbn-snap-nvme-$TAG
    nodeEffect:
      noEffect: true
    dpuSets:
    - nameSuffix: "dpuset1"
      dpuAnnotations:
        storage.nvidia.com/preferred-dpu: "true"
      nodeSelector:
        matchLabels:
          feature.node.kubernetes.io/dpu-enabled: "true"
  services:
    doca-hbn:
      serviceTemplate: doca-hbn
      serviceConfiguration: doca-hbn
    snap-host-controller:
      serviceTemplate: snap-host-controller
      serviceConfiguration: snap-host-controller
    snap-node-driver:
      serviceTemplate: snap-node-driver
      serviceConfiguration: snap-node-driver
    doca-snap:
      serviceTemplate: doca-snap
      serviceConfiguration: doca-snap
    block-storage-dpu-plugin:
      serviceTemplate: block-storage-dpu-plugin
      serviceConfiguration: block-storage-dpu-plugin
    spdk-csi-controller:
      serviceTemplate: spdk-csi-controller
      serviceConfiguration: spdk-csi-controller
    spdk-csi-controller-dpu:
      serviceTemplate: spdk-csi-controller-dpu
      serviceConfiguration: spdk-csi-controller-dpu
  serviceChains:
    switches:
      - ports:
        - serviceInterface:
            matchLabels:
              interface: p0
        - service:
            name: doca-hbn
            interface: p0_if
      - ports:
        - serviceInterface:
            matchLabels:
              interface: p1
        - service:
            name: doca-hbn
            interface: p1_if
      - ports:
        - serviceInterface:
            matchLabels:
              interface: pf0hpf
        - service:
            name: doca-hbn
            interface: pf0hpf_if
      - ports:
        - service:
            name: doca-snap
            interface: app_sf
            ipam:
              matchLabels:
                svc.dpu.nvidia.com/pool: storage-pool
        - service:
            name: doca-hbn
            interface: snap_if

hbn-dpuserviceconfig.yaml

Copy
Copied!

            
            ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceConfiguration
metadata:
  name: doca-hbn
  namespace: dpf-operator-system
spec:
  deploymentServiceName: "doca-hbn"
  serviceConfiguration:
    serviceDaemonSet:
      annotations:
        k8s.v1.cni.cncf.io/networks: |-
          [
          {"name": "iprequest", "interface": "ip_lo", "cni-args": {"poolNames": ["loopback"], "poolType": "cidrpool"}},
          {"name": "iprequest", "interface": "ip_pf0hpf", "cni-args": {"poolNames": ["pool1"], "poolType": "cidrpool", "allocateDefaultGateway": true}}
          ]
    helmChart:
      values:
        configuration:
          perDPUValuesYAML: |
            - hostnamePattern: "*"
              values:
                bgp_peer_group: hbn
            - hostnamePattern: "dpu-node-${DPU1_SERIAL}*"
              values:
                bgp_autonomous_system: 65101
            - hostnamePattern: "dpu-node-${DPU2_SERIAL}*"
              values:
                bgp_autonomous_system: 65201
          startupYAMLJ2: |
            - header:
                model: bluefield
                nvue-api-version: nvue_v1
                rev-id: 1.0
                version: HBN 3.0.0
            - set:
                evpn:
                  enable: on
                nve:
                  vxlan:
                    enable: on
                    source:
                      address: {{ ipaddresses.ip_lo.ip }}
                bridge:
                  domain:
                    br_default:
                      vlan:
                        '10':
                          vni:
                            '10': {}                    
                interface:
                  lo:
                    ip:
                      address:
                        {{ ipaddresses.ip_lo.ip }}/32: {}
                    type: loopback
                  p0_if,p1_if,snap_if:
                    type: swp
                    link:
                      mtu: 9000
                  pf0hpf_if:
                    ip:
                      address:
                        {{ ipaddresses.ip_pf0hpf.cidr }}: {}
                    type: swp
                    link:
                      mtu: 9000
                  snap_if:
                    bridge:
                      domain:
                        br_default:
                          access: 10
                  vlan10:
                    type: svi
                    vlan: 10                          
                router:
                  bgp:
                    autonomous-system: {{ config.bgp_autonomous_system }}
                    enable: on
                    graceful-restart:
                      mode: full
                    router-id: {{ ipaddresses.ip_lo.ip }}
                service:
                  dhcp-relay:
                    default:
                      server:
                        10.0.125.4: {}                      
                vrf:
                  default:
                    router:
                      bgp:
                        address-family:
                          ipv4-unicast:
                            enable: on
                            redistribute:
                              connected:
                                enable: on
                          ipv6-unicast:
                            enable: on
                            redistribute:
                              connected:
                                enable: on
                          l2vpn-evpn:
                            enable: on
                        enable: on
                        neighbor:
                          p0_if:
                            peer-group: {{ config.bgp_peer_group }}
                            type: unnumbered
                          p1_if:
                            peer-group: {{ config.bgp_peer_group }}
                            type: unnumbered
                        path-selection:
                          multipath:
                            aspath-ignore: on
                        peer-group:
                          {{ config.bgp_peer_group }}:
                            address-family:
                              ipv4-unicast:          
                                enable: on
                              ipv6-unicast:
                                enable: on
                              l2vpn-evpn:
                                enable: on                          
                            remote-as: external
  interfaces:
  - name: p0_if
    network: mybrhbn
  - name: p1_if
    network: mybrhbn
  - name: pf0hpf_if
    network: mybrhbn
  - name: snap_if
    network: mybrhbn

hbn-dpuservicetemplate.yaml

Copy
Copied!

            
            ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceTemplate
metadata:
  name: doca-hbn
  namespace: dpf-operator-system
spec:
  deploymentServiceName: "doca-hbn"
  helmChart:
    source:
      repoURL: $HELM_REGISTRY_REPO_URL
      version: 1.0.5
      chart: doca-hbn
    values:
      image:
        repository: $HBN_NGC_IMAGE_URL
        tag: 3.2.1-doca3.2.1
      resources:
        memory: 6Gi
        nvidia.com/bf_sf: 4

hbn-ipam.yaml

Copy
Copied!

            
            ---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceIPAM
metadata:
  name: pool1
  namespace: dpf-operator-system
spec:
  ipv4Network:
    network: "10.0.120.0/22"
    gatewayIndex: 1
    prefixSize: 29

Run the command to deploy DPU deployment:

Jump Node Console

Copy
Copied!

            
            $ cat manifests/03.1-dpudeployment-installation-nvme/*.yaml | envsubst |kubectl apply -f -

Apply the following updates to manifests/04.1-storage-configuration-nvme/policy-block-dpustoragepolicy.yaml YAML file and run the command to deploy storage configuration:

manifests/04.1-storage-configuration-nvme/policy-block-dpustoragepolicy.yaml

Copy
Copied!

            
            ---
apiVersion: storage.dpu.nvidia.com/v1alpha1
kind: DPUStoragePolicy
metadata:
  name: policy-block
  namespace: dpf-operator-system
spec:
  dpuStorageVendors:
    - spdk-csi
  selectionAlgorithm: "NumberVolumes"
  parameters:
    num_queues: "16"

Jump Node Console

Copy
Copied!

            
            $ cat manifests/04.1-storage-configuration-nvme/*.yaml | envsubst |kubectl apply -f -

Wait for the Rebooted stage and then Power Cycle the bare-metal host manual :

Jump Node Console

Copy
Copied!

            
            $ kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl describe dpudeployments### Command partial output ####----DPUs 
|-DPU/dpu-node-mt2334xz09f0-mt2334xz09f0     dpf-operator-system 
                           | |-Rebooted                                                                    False WaitingForManualPowerCycleOrReboot   51m 
                           | |-Ready                                                                       False Rebooting                            51m 
|-DPU/dpu-node-mt2334xz09f1-mt2334xz09f1     dpf-operator-system     
                           | |-Rebooted                                                                    False WaitingForManualPowerCycleOrReboot   51m
                           | |-Ready                                                                       False Rebooting                            51m
----

After the DPU is up, run following command:

Jump Node Console

Copy
Copied!

            
            $ kubectl -n dpf-operator-system annotate dpunode --all provisioning.dpu.nvidia.com/dpunode-external-reboot-required-

At this point, the DPU workers should be added to the cluster. As they being added to the cluster, the DPUs are provisioned. Finally, validate that all the different DPU-related objects are now in the Ready state:
Jump Node Console

Copy

Copied!
```
            
            $ kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl describe dpudeployments
        
```
Congratulations, the DPF system has been successfully installed.

Provisioning SNAP DPU Service block device

Note

Before starting SNAP block device provisioning, reboot your bare-metal hosts to apply the latest BlueField firmware settings.

Please review YAML configuratuion files before deploying the objects under doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn-snap/manifests/05.1-storage-test-nvmedirectory.

dpuvolume.yaml

Copy
Copied!

            
            ---
apiVersion: storage.dpu.nvidia.com/v1alpha1
kind: DPUVolume
metadata:
  name: test-volume-static-pf-${DPU1_SERIAL}
  namespace: dpf-operator-system
spec:
  dpuStoragePolicyName: policy-block
  resources:
    requests:
      storage: 60Gi
  accessModes:
  - ReadWriteOnce
  volumeMode: Block
---
apiVersion: storage.dpu.nvidia.com/v1alpha1
kind: DPUVolume
metadata:
  name: test-volume-static-pf-${DPU2_SERIAL}
  namespace: dpf-operator-system
spec:
  dpuStoragePolicyName: policy-block
  resources:
    requests:
      storage: 60Gi
  accessModes:
  - ReadWriteOnce
  volumeMode: Block

dpuvolumeattachment.yaml

Copy
Copied!

            
            ---
apiVersion: storage.dpu.nvidia.com/v1alpha1
kind: DPUVolumeAttachment
metadata:
  name: test-volume-attachment-static-pf-${DPU1_SERIAL}
  namespace: dpf-operator-system
spec:
  dpuNodeName: dpu-node-${DPU1_SERIAL}
  dpuVolumeName: test-volume-static-pf-${DPU1_SERIAL}
  functionType: pf
  hotplugFunction: false
---
apiVersion: storage.dpu.nvidia.com/v1alpha1
kind: DPUVolumeAttachment
metadata:
  name: test-volume-attachment-static-pf-${DPU2_SERIAL}
  namespace: dpf-operator-system
spec:
  dpuNodeName: dpu-node-${DPU2_SERIAL}
  dpuVolumeName: test-volume-static-pf-${DPU2_SERIAL}
  functionType: pf
  hotplugFunction: false

Run the command to deploy SNAP block device:

Jump Node Console

Copy
Copied!

            
            $ cat manifests/05.1-storage-test-nvme/*.yaml | envsubst |kubectl apply -f -
 
dpuvolumeattachment.storage.dpu.nvidia.com/test-volume-attachment-static-pf-mt2334xz09f0 created
dpuvolumeattachment.storage.dpu.nvidia.com/test-volume-attachment-static-pf-mt2334xz09f1 created
dpuvolume.storage.dpu.nvidia.com/test-volume-static-pf-mt2334xz09f0 created
dpuvolume.storage.dpu.nvidia.com/test-volume-static-pf-mt2334xz09f1 created

Check deployment:

Jump Node Console

Copy
Copied!

            
            $ kubectl  get dpuvolume -A
NAMESPACE             NAME                                 DPUSTORAGEPOLICYNAME   VOLUMEMODE   SIZE   READY   AGE
dpf-operator-system   test-volume-static-pf-mt2334xz09f0   policy-block           Block        60Gi   True    16s
dpf-operator-system   test-volume-static-pf-mt2334xz09f1   policy-block           Block        60Gi   True    16s
 
$ kubectl  get dpuvolumeattachments -A
NAMESPACE             NAME                                            DPUVOLUMENAME                        DPUNODENAME             FUNCTIONTYPE   HOTPLUGFUNCTION   READY   AGE
dpf-operator-system   test-volume-attachment-static-pf-mt2334xz09f0   test-volume-static-pf-mt2334xz09f0   dpu-node-mt2334xz09f0   pf             false             True    25s
dpf-operator-system   test-volume-attachment-static-pf-mt2334xz09f1   test-volume-static-pf-mt2334xz09f1   dpu-node-mt2334xz09f1   pf             false             True    25s

SNAP block device deployed successfully.

Bare-metal Server customization

Using SNAP block device as BOOT BLOCK device require server BIOS customization steps and depend on the server manufacturing.

For our server "NVMe controller and Drive information" is look like:

image-2026-1-11_16-36-47-1-version-1-modificationdate-1768142208507-api-v2.png

Please install OS from Virtual ISO (in our case: Rocky Linux).

Completed Rocky Linux OS installation for our server in UEFI BIOS mode should look like:

image-2026-1-11_16-48-58-1-version-1-modificationdate-1768142941430-api-v2.png

Inside installed OS it looks like:

image-2026-1-11_16-52-20-version-1-modificationdate-1768143139387-api-v2.png

Congratulations! The SNAP NVMe drive has been successfully configured and is now ready for use.

Authors

Vitaliy Razinkov

Over the past few years, Vitaliy Razinkov has been working as a Solutions Architect on the NVIDIA Networking team, responsible for complex Kubernetes/OpenShift and Microsoft's leading solutions, research and design. He previously spent more than 25 years in senior positions at several companies. Vitaliy has written several reference design guides on Microsoft technologies, RoCE/RDMA accelerated machine learning in Kubernetes/OpenShift, and container solutions, all of which are available on the NVIDIA Networking Documentation website.

Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality. NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice. Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete. NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.

On This Page

SN3700 Switch Console

SN2201 Switch Console

Jump Node Console

/etc/dnsmasq.conf

Jump Node Console

Jump Node Console

Jump Node Console

Jump Node Console

Jump Node Console

Jump Node Console

manifests/00-env-vars/envvars.env

Jump Node Console

dpudeployment.yaml

hbn-dpuserviceconfig.yaml

hbn-dpuservicetemplate.yaml

hbn-ipam.yaml

Jump Node Console

manifests/04.1-storage-configuration-nvme/policy-block-dpustoragepolicy.yaml

Jump Node Console

Jump Node Console

Jump Node Console

Jump Node Console

dpuvolume.yaml

dpuvolumeattachment.yaml

Jump Node Console

Jump Node Console