RDG for DPF Zero Trust (DPF-ZT) with HBN and SNAP Block Bootable Disk DPU Services

Created on Jan 1, 2026 (v 25.10)

This Reference Deployment Guide (RDG) provides detailed instructions for deploying the NVIDIA DOCA Platform Framework (DPF) in Zero-Trust mode on high-performance, bare-metal infrastructure. The guide focuses on configuring Host-Based Networking (HBN) and Storage-Defined Network Accelerated Processing (SNAP) services on NVIDIA® BlueField®-3 DPUs, with SNAP operating in block device mode. This deployment delivers secure, isolated, and hardware-accelerated environments optimized for Zero-Trust architectures.

This document is an extension of the RDG for DPF Zero Trust (DPF-ZT) (referred to as the Baseline RDG ). It details the additional steps and modifications required to deploy SNAP DPU service with HBN DPU service in addition to the services in the Baseline RDG and orchestrate them.

The guide is intended for experienced system administrators, systems engineers, and solution architects who build highly secure bare-metal environments with Host-Based Networking enabled using NVIDIA BlueField DPUs for acceleration, isolation, and infrastructure offload.

Note
  • This reference implementation, as the name implies, is a specific, opiniated deployment example designed to address the usecase described above.

  • While other approaches may exist to implement similar solutions, this document provides a detailed guide for this particular method.

Term

Definition

Term

Definition

BFB

BlueField Bootstream

MAAS

Metal as a Service

BGP

Border Gateway Protocol

OVN

Open Virtual Network

CNI

Container Network Interface

PVC

Persistent Volume Claim

CRD

Custom Resource Definition

RDG

Reference Deployment Guide

CSI

Container Storage Interface

RDMA

Remote Direct Memory Access

DHCP

Dynamic Host Configuration Protocol

SF

Scalable Function

DOCA

Data Center Infrastructure-on-a-Chip Architecture

SFC

Service Function Chaining

DOCA SNAP

NVIDIA® DOCA™ Storage-Defined Network Accelerated Processing

SPDK

Storage Performance Development Kit

DPF

DOCA Platform Framework

SR-IOV

Single Root Input/Output Virtualization

DPU

Data Processing Unit

TOR

Top of Rack

DTS

DOCA Telemetry Service

VF

Virtual Function

GENEVE

Generic Network Virtualization Encapsulation

VLAN

Virtual LAN (Local Area Network)

HBN

Host Based Networking

VRR

Virtual Router Redundancy

IPAM

IP Address Management

VTEP

Virtual Tunnel End Point

K8S

Kubernetes

VXLAN

Virtual Extensible LAN

The NVIDIA BlueField-3 Data Processing Unit is a powerful infrastructure compute platform designed for high-speed processing of software-defined networking, storage, and cybersecurity . With a capacity of 400 Gb/s, BlueField-3 combines robust computing, high-speed networking, and extensive programmability to deliver hardware-accelerated, software-defined solutions for demanding workloads.

Deploying and managing DPUs and their associated DOCA services, especially at scale, can be quite challenging. Without a proper provisioning and orchestration system, handling the DPU lifecycle and configuring DOCA services place a heavy operational burden on system administrators. The NVIDIA DOCA Platform Foundation addresses this challenge by streamlining and automating the lifecycle management of DOCA services.

NVIDIA DOCA unleashes the full power of the BlueField® platform, empowering organizations to rapidly build next-generation applications and services that offload, accelerate, and isolate critical data center workloads. By leveraging DOCA, businesses can achieve unmatched performance, security, and efficiency across modern infrastructure.

A prime example of this innovation is NVIDIA DOCA SNAP — a breakthrough DPU-based storage solution designed to accelerate and optimize storage protocols using BlueField’s advanced hardware acceleration. DOCA SNAP delivers a family of services that virtualize local storage at the hardware level, presenting networked storage as local block devices to the host and emulating physical drives over the PCIe bus. In our use case the block device will be serve as boot drive. With DOCA SNAP, organizations gain high-performance, low-latency access to storage by bypassing traditional filesystem overhead and interacting directly with raw block devices. This results in faster data access, reduced CPU utilization, and improved workload efficiency. Integrated into the DOCA Platform Framework (DPF), SNAP is packaged as containerized components deployed seamlessly across x86 and DPU Kubernetes clusters—delivering a scalable, Zero-Trust architecture for the modern data center.

DPF supports multiple deployment models. This guide focuses on the Zero Trust bare-metal deployment model. In this scenario:

  • The DPU is managed through its Baseboard Management Controller ( BMC )

  • All management traffic occurs over the DPU's out-of-band ( OOB ) network

  • The host is considered as an untrusted entity towards the data center network. The DPU acts as a barrier between the host and the network.

  • The host sees the DPU as a standard NIC , with no access to the internal DPU management plane (Zero Trust Mode)

This Reference Deployment Guide (RDG) provides a step-by-step example for installing DPF in Zero-Trust mode with HBN and SNAP DPU services. As part of the reference implementation, open-source components outside the scope of DPF (e.g., MAAS, pfSense, Kubespray) are used to simulate a realistic customer deployment environment. The guide includes the full end-to-end deployment process, including:

  • Infrastructure provisioning

  • DPF deployment

  • DPU provisioning (redfish)

  • Service configuration and deployment

  • Service chaining

Info

In our guide we used the Storage Performance Development Kit (SPDK) as an example of storage backend service.

This storage backend service is used only for demonstration purposes and is not intended or supported for production use cases.

Key Components and Technologies

  • NVIDIA BlueField® Data Processing Unit (DPU)

    The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.

  • NVIDIA DOCA Software Framework

    NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.

  • NVIDIA ConnectX SmartNICs

    10/25/40/50/100/200 and 400G Ethernet Network Adapters

    The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.

    NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.

  • NVIDIA LinkX Cables

    The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.

  • NVIDIA Spectrum Ethernet Switches

    Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.

    Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.

    NVIDIA combines the benefits of NVIDIA Spectrum switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.

  • NVIDIA Cumulus Linux

    NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.

  • Kubernetes

    Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.

  • Kubespray

    Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:

    • A highly available cluster

    • Composable attributes

    • Support for most popular Linux distributions

Solution Design

Solution Logical Design

The logical design includes the following components:

  • 1 x Hypervisor node (KVM-based) with ConnectX-7:

    • 1 x Firewall VM

    • 1 x Jump Node VM

    • 1 x MaaS VM

    • 1 x DPU DHCP VM

    • 3 x K8s Master VMs running all K8s management components

  • 2 x Worker nodes (PCI Gen5), each with a 1 x BlueField-3 NIC

  • Storage Target Node with ConnectX-7 and SPDK target apps

  • Single High-Speed (HS) switch

  • 1 Gb Host Management network

image-2026-1-15_8-49-26-version-1-modificationdate-1768459765087-api-v2.png

SFC Logical Diagram

The DOCA Platform Framework simplifies DPU management by providing orchestration through a K8s API. It handles the provisioning and lifecycle management of DPUs, orchestrates specialized DPU services, and automates service function chaining (SFC) tasks. This ensures seamless deployment of NVIDIA DOCA services, allowing traffic to be efficiently offloaded and routed through HBN's data plane. The SFC logical diagram implemented in this guide is shown below.

image-2026-1-11_10-54-18-version-1-modificationdate-1768121657257-api-v2.png

Disk Emulation Logical Diagram

The following logical diagram demonstrates the main components involved in a disk mount procedure to tenant workload pod.

Upon receiving a new request for an emulated NVMe drive, DOCA SNAP components bring a block device (BDEV) via NVMe-oF using either RDMA or TCP storage protocols to the required BM worker node. The DPU then emulates it as a block device on the x86 host via the "BlueField NVMe SNAP Controller".

image-2026-1-11_10-56-53-version-1-modificationdate-1768121811597-api-v2.png

Firewall Design

The pfSense firewall in this solution serves a dual purpose:

  • Firewall – Provides an isolated environment for the DPF system, ensuring secure operations

  • Router – Enables internet access and connectivity between the host management network and the high-speed network

Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address in the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of the Kubernetes (K8s) cluster and DPF components.

The following diagram illustrates the firewall design used in this solution:

image-2026-1-11_11-10-58-1-version-1-modificationdate-1768122656830-api-v2.png

Software Stack Components

image-2026-1-13_9-52-27-1-version-1-modificationdate-1768290748100-api-v2.png

Warning

Make sure to use the exact same versions for the software stack as described above.


Bill of Materials

image-2026-1-11_11-42-7-1-version-1-modificationdate-1768124526123-api-v2.png

Node and Switch Definitions

These are the definitions and parameters used for deploying the demonstrated fabric :

Switch Port Usage

mgmt-switch

1

swp1-6

hs-switch

1

swp1,2,11-14,32

Hosts

Rack

Server Type

Server Name

Switch Port

IP and NICs

Default Gateway

Rack1

Hypervisor Node

hypervisor

mgmt-switch: swp1

hs-switch: swp1

lab-br (interface eno1): Trusted LAN IP

mgmt-br (interface eno2): -

hs-br (interface ens2f0np0):

Trusted LAN GW

Rack1

Storage Target Node

target

mgmt-switch: swp4

hs-switch: swp32

enp1s0f0: 10.0.110.25/24

enp144s0f0np0: 10.0.124.1/24

10.0.110.254

Rack1

Worker Node

worker1

mgmt-switch: swp2

hs-switch: swp11-swp12

ens15f0: 10.0.110.21/24

ens5f0np0/ens5f1np1: 10.0.120.0/22

10.0.110.254

Rack1

Worker Node

worker2

mgmt-switch: swp3

hs-switch: swp13-swp14

ens15f0: 10.0.110.22/24

ens5f0np0/ens5f1np1: 10.0.120.0/22

10.0.110.254

Rack1

Firewall (Virtual)

fw

-

WAN (lab-br): Trusted LAN IP

LAN (mgmt-br): 10.0.110.254/24

OPT1 (hs-br): 172.169.50.1/30

Trusted LAN GW

Rack1

Jump Node (Virtual)

jump

-

enp1s0: 10.0.110.253/24

10.0.110.254

Rack1

MAAS (Virtual)

maas

-

enp1s0: 10.0.110.252/24

10.0.110.254

Rack1

DPU DHCP (Virtual)

dhcp

-

enp1s0: 10.0.125.4/24

10.0.125.1

Rack1

Master Node (Virtual)

master1

-

enp1s0: 10.0.110.1/24

10.0.110.254

Rack1

Master Node (Virtual)

master2

-

enp1s0: 10.0.110.2/24

10.0.110.254

Rack1

Master Node (Virtual)

master3

-

enp1s0: 10.0.110.3/24

10.0.110.254

Wiring

Hypervisor Node

image-2025-hv-version-1-modificationdate-1768129175293-api-v2.png

Bare Metal Worker Node

image-2025-dpu-version-1-modificationdate-1768129178520-api-v2.png

Storage Target Node

image-2025-st-version-1-modificationdate-1768129176843-api-v2.png

Fabric Configuration

Updating Cumulus Linux

As a best practice, make sure to use the latest released Cumulus Linux NOS version.

For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.

Configuring the Cumulus Linux Switch

The SN3700 switch (hs-switch), is configured as follows:

  • The following commands configure BGP unnumbered on hs-switch.

  • Cumulus Linux enables the BGP equal-cost multipathing (ECMP) option by default.

SN3700 Switch Console

Copy
Copied!
            

nv set bridge domain br_default vlan 10 vni 10 nv set evpn state enabled nv set interface eth0 ipv4 dhcp-client state enabled nv set interface eth0 type eth nv set interface eth0 vrf mgmt nv set interface lo ipv4 address 11.0.0.101/32 nv set interface lo type loopback nv set interface swp1 ipv4 address 172.169.50.2/30 nv set interface swp1 link speed auto nv set interface swp1-32 type swp nv set interface swp2 ipv4 address 10.0.125.254/24 nv set interface swp32 bridge domain br_default access 10 nv set nve vxlan source address 11.0.0.101 nv set nve vxlan state enabled nv set qos roce mode lossless nv set qos roce state enabled nv set router bgp autonomous-system 65001 nv set router bgp graceful-restart mode full nv set router bgp router-id 11.0.0.101 nv set router bgp state enabled nv set system hostname hs-switch nv set vrf default router bgp address-family ipv4-unicast network 10.0.125.0/24 nv set vrf default router bgp address-family ipv4-unicast network 11.0.0.101/32 nv set vrf default router bgp address-family ipv4-unicast state enabled nv set vrf default router bgp address-family ipv6-unicast redistribute connected state enabled nv set vrf default router bgp address-family ipv6-unicast state enabled nv set vrf default router bgp address-family l2vpn-evpn state enabled nv set vrf default router bgp neighbor swp11 enforce-first-as disabled nv set vrf default router bgp neighbor swp11 peer-group hbn nv set vrf default router bgp neighbor swp11 type unnumbered nv set vrf default router bgp neighbor swp12 enforce-first-as disabled nv set vrf default router bgp neighbor swp12 peer-group hbn nv set vrf default router bgp neighbor swp12 type unnumbered nv set vrf default router bgp neighbor swp13 enforce-first-as disabled nv set vrf default router bgp neighbor swp13 peer-group hbn nv set vrf default router bgp neighbor swp13 type unnumbered nv set vrf default router bgp neighbor swp14 enforce-first-as disabled nv set vrf default router bgp neighbor swp14 peer-group hbn nv set vrf default router bgp neighbor swp14 type unnumbered nv set vrf default router bgp path-selection multipath aspath-ignore enabled nv set vrf default router bgp peer-group hbn address-family ipv4-unicast default-route-origination state enabled nv set vrf default router bgp peer-group hbn address-family ipv4-unicast state enabled nv set vrf default router bgp peer-group hbn address-family ipv6-unicast state enabled nv set vrf default router bgp peer-group hbn address-family l2vpn-evpn state enabled nv set vrf default router bgp peer-group hbn enforce-first-as disabled nv set vrf default router bgp peer-group hbn remote-as external nv set vrf default router bgp state enabled nv set vrf default router static 0.0.0.0/0 address-family ipv4-unicast nv set vrf default router static 0.0.0.0/0 via 172.169.50.1 type ipv4-address   nv config apply -y

The SN2201 switch (mgmt-switch) is configured as follows:

SN2201 Switch Console

Copy
Copied!
            

nv set bridge domain br_default untagged 1 nv set interface swp1-6 link state up nv set interface swp1-6 type swp nv set interface swp1-6 bridge domain br_default nv config apply -y

Installation and Configuration

Warning

Make sure that the BIOS settings on the worker node servers have SR-IOV enabled and that the servers are tuned for maximum performance.

All worker nodes must have the same PCIe placement for the BlueField-3 NIC and must display the same interface name.

Make sure that you have DPU BMC and OOB MAC addresses.

No change from the Baseline RDG (Section "Deployment and Configuration", Subsection "Host Configuration").

Hypervisor Installation and Configuration

No change from the Baseline RDG (Section "Hypervisor Installation and Configuration").

Prepare Infrastructure Servers

No change from the Baseline RDG (Section "Deployment and Configuration", Subsection "Prepare Infrastructure Servers") regarding Jump VM, MaaS VM.

Regarding Firewall VM, it should be configured according to section "Firewall VM - pfSense Installation and Interface Configuration" from the RDG for DPF with OVN-Kubernetes and HBN Services.

Provisioning "DPU DHCP VM"

  1. Please install Rocky Linux 9.0 in minimal server configuration.

  2. Configure manually IP address to 10.0.125.4/24 with default GW 10.0.125.1/24 and your prefferred DNS server.

  3. Install following modules:

    Jump Node Console

    Copy
    Copied!
                

    sudo dnf -y update sudo dnf install -y lldpd dnsmasq

  4. Apply following configuration to DNSMASQ apps - file /etc/dnsmasq.conf

    /etc/dnsmasq.conf

    Copy
    Copied!
                

    # #Disable the DNS server set: port=0 # port=53 # #Setup the server to be your authoritative DHCP server # dhcp-authoritative # #Set the DHCP server to hand addresses sequentially # dhcp-sequential-ip # #Enable more detailed logging for DHCP # log-dhcp log-queries no-resolv log-facility=/var/log/dnsmasq.log domain=x86.dpf.rdg.local.domain local=/x86.dpf.rdg.local.domain/ server=8.8.8.8   # #Create different dhcp scopes for each of the three simulated subnets here, using tags for ID #Format is: dhcp-range=<your_tag_here>,<start_of_scope>,<end_of_scope>,<subnet_mask>,<lease_time> # dhcp-range=subnet0,10.0.120.2,10.0.120.6,255.255.255.248,8h dhcp-option=subnet0,42,192.114.62.250 dhcp-option=subnet0,6,10.0.125.4 dhcp-option=subnet0,3,10.0.120.1     dhcp-range=subnet1,10.0.120.10,10.0.120.14,255.255.255.248,8h dhcp-option=subnet1,42,192.114.62.250 dhcp-option=subnet1,6,10.0.125.4 dhcp-option=subnet1,3,10.0.120.9

    Info

    The following dnsmasq configuration is customized for our specific deployment use case and should not be used as a default configuration.

  5. Start and enable autostart for dnsmasq.service.

    Jump Node Console

    Copy
    Copied!
                

    sudo systemctl start dnsmasq.service sudo systemctl enable dnsmasq.service

  6. Check service status

    Jump Node Console

    Copy
    Copied!
                

    sudo systemctl status dnsmasq.service   ### Command output should look like: ###   dnsmasq.service - DNS caching server. Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; enabled; preset: disabled) Active: active (running) since Wed 2025-12-24 08:49:28 EST; 2 weeks 3 days ago Invocation: 10eb617fa5fe4bedb1fc021ddcc7751f Process: 1172 ExecStart=/usr/sbin/dnsmasq (code=exited, status=0/SUCCESS) Main PID: 1193 (dnsmasq) Tasks: 1 (limit: 23017) Memory: 2M (peak: 2.5M) CPU: 112ms CGroup: /system.slice/dnsmasq.service └─1193 /usr/sbin/dnsmasq   Dec 24 08:49:28 hbn-dhcp systemd[1]: Starting dnsmasq.service - DNS caching server.... Dec 24 08:49:28 hbn-dhcp systemd[1]: Started dnsmasq.service - DNS caching server..

Provision SPDK Target Apps on Storage Target Node

  1. Login as root account to Storage Target Node:

    Jump Node Console

    Copy
    Copied!
                

    $ ssh target $ sudo -i

  2. Build SPDK from source (root privileges is required!):

    Jump Node Console

    Copy
    Copied!
                

    git clone https://github.com/spdk/spdk   cd spdk git submodule update --init apt update && apt install meson python3-pyelftools -y ./scripts/pkgdep.sh --rdma ./configure --with-rdma make

  3. Run SPDK target:

    Jump Node Console

    Copy
    Copied!
                

    # Get all nvme devices   lshw -c storage -businfo   Bus info Device Class Description =========================================================== pci@0000:08:00.0 storage PCIe Data Center SSD pci@0000:00:11.4 storage C610/X99 series chipset sSATA Controller [AHCI mode] pci@0000:00:1f.2 storage C610/X99 series chipset 6-Port SATA Controller [AHCI mode] pci@0000:81:00.0 scsi4 storage MegaRAID SAS-3 3108 [Invader]   # Start target scripts/setup.sh build/bin/nvmf_tgt &   # Add bdevs with nvme backend scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t PCIe -a 0000:08:00.0   # Add logical volume store on base bdev scripts/rpc.py bdev_lvol_create_lvstore Nvme0n1 lvs0   # Display current logical volume list scripts/rpc.py bdev_lvol_get_lvstores   scripts/rpc_http_proxy.py 10.0.110.25 8000 exampleuser examplepassword &

  4. SPDK target is ready.

Provision Master VMs and Worker Nodes Using MaaS

No change from the Baseline RDG ((Section "Provision Master VMs Using MaaS").

Note

UEFI Server BIOS mode is require for Bare-Metal Worker Nodes.

DPU Service Installation

Change the DPUDeployment, DPUServiceConfig, DPUServiceTemplate and other necessary objects according to your setup environment.

Before deploying the objects under doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn-snapdirectory, a few adjustments are required.

Note

It is necessary to set several environment variables before running this command.

$ source manifests/00-env-vars/envvars.env

  1. Change directory to readme.md from where all the commands will be run:

    Jump Node Console

    Copy
    Copied!
                

    $ cd doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn-snap

  2. Modify the variables in manifests/00-env-vars/envvars.env to fit your environment, then source the file:

    Warning

    Replace the values for the variables in the following file with the values that fit your setup. Specifically, pay attention to DPUCLUSTER_INTERFACE and BMC_ROOT_PASSWORD.

    manifests/00-env-vars/envvars.env

    Copy
    Copied!
                

    ## IP Address for the Kubernetes API server of the target cluster on which DPF is installed. ## This should never include a scheme or a port. ## e.g. 10.10.10.10 export TARGETCLUSTER_API_SERVER_HOST=10.0.110.10   ## Port for the Kubernetes API server of the target cluster on which DPF is installed. export TARGETCLUSTER_API_SERVER_PORT=6443   ## Virtual IP used by the load balancer for the DPU Cluster. Must be a reserved IP from the management subnet and not allocated by DHCP. export DPUCLUSTER_VIP=10.0.110.200   ## DPU_P0 is the name of the first port of the DPU. This name must be the same on all worker nodes. #export DPU_P0=enp204s0f0np0   ## Interface on which the DPUCluster load balancer will listen. Should be the management interface of the control plane node. export DPUCLUSTER_INTERFACE=enp1s0   # IP address to the NFS server used as storage for the BFB. export NFS_SERVER_IP=10.0.110.253   ## The repository URL for the NVIDIA Helm chart registry. ## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository. export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca   ## The repository URL for the HBN container image. ## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository. export HBN_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_hbn   ## The repository URL for the SNAP VFS container image. ## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository. export SNAP_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_vfs   ## The DPF REGISTRY is the Helm repository URL where the DPF Operator Chart resides. ## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository. export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca   ## The DPF TAG is the version of the DPF components which will be deployed in this guide. export TAG=v25.10.0   ## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet. export BFB_URL="https://content.mellanox.com/BlueField/BFBs/Ubuntu24.04/bf-bundle-3.2.1-34_25.11_ubuntu-24.04_64k_prod.bfb"   ## IP_RANGE_START and IP_RANGE_END ## These define the IP range for DPU discovery via Redfish/BMC interfaces ## Example: If your DPUs have BMC IPs in range 192.168.1.100-110 ## export IP_RANGE_START=192.168.1.100 ## export IP_RANGE_END=192.168.1.110 ## IP_RANGE_START and IP_RANGE_END   ## Start of DPUDiscovery IpRange export IP_RANGE_START=10.0.110.75   ## End of DPUDiscovery IpRange export IP_RANGE_END=10.0.110.76   # The password used for DPU BMC root login, must be the same for all DPUs # For more information on how to set the BMC root password refer to BlueField DPU Administrator Quick Start Guide. export BMC_ROOT_PASSWORD=<set your BMC_ROOT_PASSWORD>   ## Serial number of DPUs. If you have more than 2 DPUs, you will need to parameterize the system accordingly and expose ## additional variables. ## All serial numbers must be in lowercase.   ## Serial number of DPU1 export DPU1_SERIAL=mt2334xz09f0   ## Serial number of DPU2 export DPU2_SERIAL=mt2334xz09f1

  3. Export environment variables for the installation:

    Jump Node Console

    Copy
    Copied!
                

    $ source manifests/00-env-vars/envvars.env

  4. Apply the necessary updates to all YAML files (dpudeployment.yaml, hbn-dpuserviceconfig.yaml, hbn-dpuservicetemplate.yaml, hbn-ipam.yaml) located in the manifests/03.1-dpudeployment-installation-nvme/ directory:

    dpudeployment.yaml

    Copy
    Copied!
                

    --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUDeployment metadata: name: hbn-snap namespace: dpf-operator-system spec: dpus: bfb: bf-bundle-$TAG flavor: hbn-snap-nvme-$TAG nodeEffect: noEffect: true dpuSets: - nameSuffix: "dpuset1" dpuAnnotations: storage.nvidia.com/preferred-dpu: "true" nodeSelector: matchLabels: feature.node.kubernetes.io/dpu-enabled: "true" services: doca-hbn: serviceTemplate: doca-hbn serviceConfiguration: doca-hbn snap-host-controller: serviceTemplate: snap-host-controller serviceConfiguration: snap-host-controller snap-node-driver: serviceTemplate: snap-node-driver serviceConfiguration: snap-node-driver doca-snap: serviceTemplate: doca-snap serviceConfiguration: doca-snap block-storage-dpu-plugin: serviceTemplate: block-storage-dpu-plugin serviceConfiguration: block-storage-dpu-plugin spdk-csi-controller: serviceTemplate: spdk-csi-controller serviceConfiguration: spdk-csi-controller spdk-csi-controller-dpu: serviceTemplate: spdk-csi-controller-dpu serviceConfiguration: spdk-csi-controller-dpu serviceChains: switches: - ports: - serviceInterface: matchLabels: interface: p0 - service: name: doca-hbn interface: p0_if - ports: - serviceInterface: matchLabels: interface: p1 - service: name: doca-hbn interface: p1_if - ports: - serviceInterface: matchLabels: interface: pf0hpf - service: name: doca-hbn interface: pf0hpf_if - ports: - service: name: doca-snap interface: app_sf ipam: matchLabels: svc.dpu.nvidia.com/pool: storage-pool - service: name: doca-hbn interface: snap_if

    hbn-dpuserviceconfig.yaml

    Copy
    Copied!
                

    --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceConfiguration metadata: name: doca-hbn namespace: dpf-operator-system spec: deploymentServiceName: "doca-hbn" serviceConfiguration: serviceDaemonSet: annotations: k8s.v1.cni.cncf.io/networks: |- [ {"name": "iprequest", "interface": "ip_lo", "cni-args": {"poolNames": ["loopback"], "poolType": "cidrpool"}}, {"name": "iprequest", "interface": "ip_pf0hpf", "cni-args": {"poolNames": ["pool1"], "poolType": "cidrpool", "allocateDefaultGateway": true}} ] helmChart: values: configuration: perDPUValuesYAML: | - hostnamePattern: "*" values: bgp_peer_group: hbn - hostnamePattern: "dpu-node-${DPU1_SERIAL}*" values: bgp_autonomous_system: 65101 - hostnamePattern: "dpu-node-${DPU2_SERIAL}*" values: bgp_autonomous_system: 65201 startupYAMLJ2: | - header: model: bluefield nvue-api-version: nvue_v1 rev-id: 1.0 version: HBN 3.0.0 - set: evpn: enable: on nve: vxlan: enable: on source: address: {{ ipaddresses.ip_lo.ip }} bridge: domain: br_default: vlan: '10': vni: '10': {} interface: lo: ip: address: {{ ipaddresses.ip_lo.ip }}/32: {} type: loopback p0_if,p1_if,snap_if: type: swp link: mtu: 9000 pf0hpf_if: ip: address: {{ ipaddresses.ip_pf0hpf.cidr }}: {} type: swp link: mtu: 9000 snap_if: bridge: domain: br_default: access: 10 vlan10: type: svi vlan: 10 router: bgp: autonomous-system: {{ config.bgp_autonomous_system }} enable: on graceful-restart: mode: full router-id: {{ ipaddresses.ip_lo.ip }} service: dhcp-relay: default: server: 10.0.125.4: {} vrf: default: router: bgp: address-family: ipv4-unicast: enable: on redistribute: connected: enable: on ipv6-unicast: enable: on redistribute: connected: enable: on l2vpn-evpn: enable: on enable: on neighbor: p0_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered p1_if: peer-group: {{ config.bgp_peer_group }} type: unnumbered path-selection: multipath: aspath-ignore: on peer-group: {{ config.bgp_peer_group }}: address-family: ipv4-unicast: enable: on ipv6-unicast: enable: on l2vpn-evpn: enable: on remote-as: external interfaces: - name: p0_if network: mybrhbn - name: p1_if network: mybrhbn - name: pf0hpf_if network: mybrhbn - name: snap_if network: mybrhbn

    hbn-dpuservicetemplate.yaml

    Copy
    Copied!
                

    --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceTemplate metadata: name: doca-hbn namespace: dpf-operator-system spec: deploymentServiceName: "doca-hbn" helmChart: source: repoURL: $HELM_REGISTRY_REPO_URL version: 1.0.5 chart: doca-hbn values: image: repository: $HBN_NGC_IMAGE_URL tag: 3.2.1-doca3.2.1 resources: memory: 6Gi nvidia.com/bf_sf: 4

    hbn-ipam.yaml

    Copy
    Copied!
                

    --- apiVersion: svc.dpu.nvidia.com/v1alpha1 kind: DPUServiceIPAM metadata: name: pool1 namespace: dpf-operator-system spec: ipv4Network: network: "10.0.120.0/22" gatewayIndex: 1 prefixSize: 29

  5. Run the command to deploy DPU deployment:

    Jump Node Console

    Copy
    Copied!
                

    $ cat manifests/03.1-dpudeployment-installation-nvme/*.yaml | envsubst |kubectl apply -f -

  6. Apply the following updates to manifests/04.1-storage-configuration-nvme/policy-block-dpustoragepolicy.yaml YAML file and run the command to deploy storage configuration:

    manifests/04.1-storage-configuration-nvme/policy-block-dpustoragepolicy.yaml

    Copy
    Copied!
                

    --- apiVersion: storage.dpu.nvidia.com/v1alpha1 kind: DPUStoragePolicy metadata: name: policy-block namespace: dpf-operator-system spec: dpuStorageVendors: - spdk-csi selectionAlgorithm: "NumberVolumes" parameters: num_queues: "16"

    Jump Node Console

    Copy
    Copied!
                

    $ cat manifests/04.1-storage-configuration-nvme/*.yaml | envsubst |kubectl apply -f -

  7. Wait for the Rebooted stage and then Power Cycle the bare-metal host manual :

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl describe dpudeployments### Command partial output ####----DPUs  |-DPU/dpu-node-mt2334xz09f0-mt2334xz09f0     dpf-operator-system                             | |-Rebooted                                                                    False WaitingForManualPowerCycleOrReboot   51m                            | |-Ready                                                                       False Rebooting                            51m |-DPU/dpu-node-mt2334xz09f1-mt2334xz09f1     dpf-operator-system                            | |-Rebooted                                                                    False WaitingForManualPowerCycleOrReboot   51m                            | |-Ready                                                                       False Rebooting                            51m ----

  8. After the DPU is up, run following command:

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl -n dpf-operator-system annotate dpunode --all provisioning.dpu.nvidia.com/dpunode-external-reboot-required-

  9. At this point, the DPU workers should be added to the cluster. As they being added to the cluster, the DPUs are provisioned. Finally, validate that all the different DPU-related objects are now in the Ready state:

    Jump Node Console

    Copy
    Copied!
                

    $ kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl describe dpudeployments

    full-version-3-modificationdate-1768136123147-api-v2.png

    Congratulations, the DPF system has been successfully installed.

Provisioning SNAP DPU Service block device

Note

Before starting SNAP block device provisioning, reboot your bare-metal hosts to apply the latest BlueField firmware settings.

Please review YAML configuratuion files before deploying the objects under doca-platform/docs/public/user-guides/zero-trust/use-cases/hbn-snap/manifests/05.1-storage-test-nvmedirectory.

dpuvolume.yaml

Copy
Copied!
            

--- apiVersion: storage.dpu.nvidia.com/v1alpha1 kind: DPUVolume metadata: name: test-volume-static-pf-${DPU1_SERIAL} namespace: dpf-operator-system spec: dpuStoragePolicyName: policy-block resources: requests: storage: 60Gi accessModes: - ReadWriteOnce volumeMode: Block --- apiVersion: storage.dpu.nvidia.com/v1alpha1 kind: DPUVolume metadata: name: test-volume-static-pf-${DPU2_SERIAL} namespace: dpf-operator-system spec: dpuStoragePolicyName: policy-block resources: requests: storage: 60Gi accessModes: - ReadWriteOnce volumeMode: Block

dpuvolumeattachment.yaml

Copy
Copied!
            

--- apiVersion: storage.dpu.nvidia.com/v1alpha1 kind: DPUVolumeAttachment metadata: name: test-volume-attachment-static-pf-${DPU1_SERIAL} namespace: dpf-operator-system spec: dpuNodeName: dpu-node-${DPU1_SERIAL} dpuVolumeName: test-volume-static-pf-${DPU1_SERIAL} functionType: pf hotplugFunction: false --- apiVersion: storage.dpu.nvidia.com/v1alpha1 kind: DPUVolumeAttachment metadata: name: test-volume-attachment-static-pf-${DPU2_SERIAL} namespace: dpf-operator-system spec: dpuNodeName: dpu-node-${DPU2_SERIAL} dpuVolumeName: test-volume-static-pf-${DPU2_SERIAL} functionType: pf hotplugFunction: false

Run the command to deploy SNAP block device:

Jump Node Console

Copy
Copied!
            

$ cat manifests/05.1-storage-test-nvme/*.yaml | envsubst |kubectl apply -f -   dpuvolumeattachment.storage.dpu.nvidia.com/test-volume-attachment-static-pf-mt2334xz09f0 created dpuvolumeattachment.storage.dpu.nvidia.com/test-volume-attachment-static-pf-mt2334xz09f1 created dpuvolume.storage.dpu.nvidia.com/test-volume-static-pf-mt2334xz09f0 created dpuvolume.storage.dpu.nvidia.com/test-volume-static-pf-mt2334xz09f1 created

Check deployment:

Jump Node Console

Copy
Copied!
            

$ kubectl get dpuvolume -A NAMESPACE NAME DPUSTORAGEPOLICYNAME VOLUMEMODE SIZE READY AGE dpf-operator-system test-volume-static-pf-mt2334xz09f0 policy-block Block 60Gi True 16s dpf-operator-system test-volume-static-pf-mt2334xz09f1 policy-block Block 60Gi True 16s   $ kubectl get dpuvolumeattachments -A NAMESPACE NAME DPUVOLUMENAME DPUNODENAME FUNCTIONTYPE HOTPLUGFUNCTION READY AGE dpf-operator-system test-volume-attachment-static-pf-mt2334xz09f0 test-volume-static-pf-mt2334xz09f0 dpu-node-mt2334xz09f0 pf false True 25s dpf-operator-system test-volume-attachment-static-pf-mt2334xz09f1 test-volume-static-pf-mt2334xz09f1 dpu-node-mt2334xz09f1 pf false True 25s

SNAP block device deployed successfully.

Bare-metal Server customization

Using SNAP block device as BOOT BLOCK device require server BIOS customization steps and depend on the server manufacturing.

For our server "NVMe controller and Drive information" is look like:

image-2026-1-11_16-36-47-1-version-1-modificationdate-1768142208507-api-v2.png

Please install OS from Virtual ISO (in our case: Rocky Linux).

Completed Rocky Linux OS installation for our server in UEFI BIOS mode should look like:

image-2026-1-11_16-48-58-1-version-1-modificationdate-1768142941430-api-v2.png

Inside installed OS it looks like:

image-2026-1-11_16-52-20-version-1-modificationdate-1768143139387-api-v2.png

Congratulations! The SNAP NVMe drive has been successfully configured and is now ready for use.


VR-version-2-modificationdate-1697457967017-api-v2.jpg


Vitaliy Razinkov

Over the past few years, Vitaliy Razinkov has been working as a Solutions Architect on the NVIDIA Networking team, responsible for complex Kubernetes/OpenShift and Microsoft's leading solutions, research and design. He previously spent more than 25 years in senior positions at several companies. Vitaliy has written several reference design guides on Microsoft technologies, RoCE/RDMA accelerated machine learning in Kubernetes/OpenShift, and container solutions, all of which are available on the NVIDIA Networking Documentation website.

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality. NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice. Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete. NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.

© Copyright 2026, NVIDIA. Last updated on Jan 15, 2026