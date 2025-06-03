On This Page
RDG for DPF Zero Trust (DPF-ZT) with HBN DPU Service
Created on May 20, 2025
Scope
This Reference Deployment Guide (RDG) provides comprehensive instructions for deploying the NVIDIA DOCA Platform Framework (DPF) on high-performance, bare-metal infrastructure in Zero-Trust mode. It focuses on the setup and use of DPU-based services on NVIDIA® BlueField®-3 DPUs to deliver secure, isolated, and hardware-accelerated environments.
The guide is intended for experienced system administrators, systems engineers, and solution architects who build highly secure bare-metal environments using NVIDIA BlueField DPUs for acceleration, isolation, and infrastructure offload.
This reference implementation, as the name implies, is a specific, opinionated deployment example designed to address the use case described above.
Although other approaches may exist for implementing similar solutions, this document provides a detailed guide for this specific method.
Introduction
The NVIDIA BlueField-3 Data Processing Unit (DPU) is a 400 Gb/s infrastructure compute platform designed for line-rate processing of software-defined networking, storage, and cybersecurity workloads. It combines powerful compute resources, high-speed networking, and advanced programmability to deliver hardware-accelerated, software-defined solutions for modern data centers.
NVIDIA DOCA unleashes the full potential of the BlueField platform by enabling rapid development of applications and services that offload, accelerate, and isolate data center workloads.
One such service is Host-Based Networking (HBN) - a DOCA-enabled solution that allows network architects to design networks based on Layer 3 (L3) protocols. HBN enables routing on the server side by using BlueField as a BGP router. It encapsulates key networking functions in a containerized service pod, deployed directly on the BlueField’s Arm cores.
However, deploying and managing DPUs and their associated DOCA services, especially at scale, presents operational challenges. Without a robust provisioning and orchestration system, tasks such as lifecycle management, service deployment, and network configuration for service function chaining (SFC) can quickly become complex and error prone. This is where the DOCA Platform Framework (DPF) comes into play.
DPF automates the full DPU lifecycle, streamlines the deployment of DOCA services, and simplifies advanced network configurations. With DPF, services such as HBN can be deployed seamlessly, allowing for efficient offloading and intelligent routing of traffic through the DPU data plane.
By leveraging DPF, users can scale and automate DPU management across Bare Metal, Virtual, and Kubernetes customer environments - optimizing performance while simplifying operations.
DPF supports multiple deployment models. This guide focuses on the Zero Trust bare-metal deployment model. In this scenario:
- The DPU is managed through its Baseboard Management Controller (BMC)
- All management traffic occurs over the DPU's out-of-band (OOB) network
- The host is considered as an untrusted entity towards the data center network. The DPU acts as a barrier between the host and the network.
- The host sees the DPU as a standard NIC, with no access to the internal DPU management plane (Zero Trust Mode)
This Reference Deployment Guide (RDG) provides a step-by-step example for installing DPF in Zero-Trust mode and HBN. It also includes practical demonstrations of performance optimization, validated using standard RDMA and TCP workloads.
As part of the reference implementation, open-source components outside the scope of DPF (e.g., MAAS, pfSense, Kubespray) are used to simulate a realistic customer deployment environment. The guide includes the full end-to-end deployment process, including:
- Infrastructure provisioning
- DPF deployment
- DPU provisioning (redfish)
- Service configuration and deployment
- Service chaining.
Solution Architecture
Key Components and Technologies
NVIDIA BlueField® Data Processing Unit (DPU)
The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.
NVIDIA DOCA Software Framework
NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.
10/25/40/50/100/200 and 400G Ethernet Network Adapters
The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.
NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.
The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.
NVIDIA Spectrum Ethernet Switches
Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.
Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects.
NVIDIA combines the benefits of NVIDIA Spectrum™ switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® Linux , SONiC and NVIDIA Onyx®.
NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.
Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.
Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:
- A highly available cluster
- Composable attributes
- Support for most popular Linux distributions
Solution Design
Solution Logical Design
The logical design includes the following components:
1 x Hypervisor node (KVM-based) with ConnectX-7:
- 1 x Firewall VM
- 1 x Jump Node VM
- 1 x MaaS VM
- 3 x K8s Master VMs running all K8s management components
- 2 x Worker nodes (PCI Gen5), each with a 1 x BlueField-3 NIC
- Single High-Speed (HS) switch
- 1 Gb Host Management network
HBN service Logical Design
As part of this RDG, we will:
Create two isolated networks on each bare-metal workload server using physical function PF0 and PF1
- Each network connects through the HBN service to a separate VLAN/VNI, on separate VRFs - RED and BLUE
- Route traffic through the HBN service
- Assign PFs to each bare-metal workload server as its network interfaces
- Demonstrate accelerated RDMA and TCP traffic between two workload servers that run on different bare-metal servers within the same network (e.g., RED network)
- Validate
network isolation between bare-metal workload servers connected to different networks (
RED
vs
BLUE
).
Firewall Design
The pfSense firewall in this solution serves a dual purpose:
- Firewall—provides an isolated environment for the DPF system, ensuring secure operations
- Router—enables Internet access for the management network
Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address in the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of the Kubernetes (K8s) cluster and DPF components.
The following diagram illustrates the firewall design used in this solution:
Software Stack Components
Make sure to use the exact same versions for the software stack as described above.
Bill of Materials
Deployment and Configuration
Node and Switch Definitions
These are the definitions and parameters used for deploying the demonstrated fabric:
Switches Ports Usage
Hostname
Rack ID
Ports
1
swp1-3
1
swp1-5
Hosts
Rack
Server Type
Server Name
Switch Port
IP and NICs
Default Gateway
Rack1
Hypervisor Node
mgmt-switch:
hs-switch:
lab-br (interface eno1): Trusted LAN IP
mgmt-br (interface eno2): -
hs-br (interface enp1s0): -
Trusted LAN GW
Rack1
Firewall (Virtual)
-
WAN (lab-br): Trusted LAN IP
LAN (mgmt-br): 10.0.110.254/24
OPT1(hs-br): 172.169.50.1/30
Trusted LAN GW
Rack1
Jump Node (Virtual)
-
enp1s0: 10.0.110.253/24
10.0.110.254
Rack1
MaaS (Virtual)
-
enp1s0: 10.0.110.252/24
10.0.110.254
Rack1
Master Node
(Virtual)
-
enp1s0: 10.0.110.1/24
10.0.110.254
Rack1
Master Node
(Virtual)
-
enp1s0: 10.0.110.2/24
10.0.110.254
Rack1
Master Node
(Virtual)
-
enp1s0: 10.0.110.3/24
10.0.110.254
Rack1
Worker Node
mgmt-switch:
hs-switch:
dpubmc: 10.0.110.21/24
ens1f0np0/ens1f1np1: 10.0.120.0/22
10.0.110.254
Rack1
Worker Node
mgmt-switch:
hs-switch:
dpubmc: 10.0.110.22/24
ens1f0np0/ens1f1np1: 10.0.120.0/22
10.0.110.254
Wiring
Hypervisor Node
Bare Metal Worker Node
Fabric Configuration
Updating Cumulus Linux
As a best practice, make sure to use the latest released Cumulus Linux NOS version.
For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.
Configuring the Cumulus Linux Switch
The SN3700 switch (
hs-switch), is configured as follows:
SN3700 Switch Console
nv set evpn enable on
nv set interface eth0 ip address dhcp
nv set interface eth0 ip vrf mgmt
nv set interface eth0 type eth
nv set interface lo ip address 11.0.0.101/32
nv set interface lo type loopback
nv set interface swp1-5 link state up
nv set interface swp1-5 type swp
nv set interface swp1 ip address 172.169.50.2/30
nv set nve vxlan enable on
nv set router bgp autonomous-system 65001
nv set router bgp enable on
nv set router bgp graceful-restart mode full
nv set router bgp router-id 11.0.0.101
nv set service ntp mgmt server 0.cumulusnetworks.pool.ntp.org
nv set service ntp mgmt server 1.cumulusnetworks.pool.ntp.org
nv set service ntp mgmt server 2.cumulusnetworks.pool.ntp.org
nv set service ntp mgmt server 3.cumulusnetworks.pool.ntp.org
nv set system aaa class nvapply action allow
nv set system aaa class nvapply command-path / permission all
nv set system aaa class nvshow action allow
nv set system aaa class nvshow command-path / permission ro
nv set system aaa class sudo action allow
nv set system aaa class sudo command-path / permission all
nv set system aaa role nvue-admin class nvapply
nv set system aaa role nvue-monitor class nvshow
nv set system aaa role system-admin class nvapply
nv set system aaa role system-admin class sudo
nv set system aaa user cumulus full-name cumulus,,,
nv set system aaa user cumulus hashed-password '*'
nv set system aaa user cumulus role system-admin
nv set system api state enabled
nv set system config auto-save state enabled
nv set system control-plane acl acl-default-dos inbound
nv set system control-plane acl acl-default-whitelist inbound
nv set system reboot mode cold
nv set system ssh-server state enabled
nv set system wjh channel forwarding trigger l2
nv set system wjh channel forwarding trigger l3
nv set system wjh channel forwarding trigger tunnel
nv set system wjh enable on
nv set vrf default router bgp address-family ipv4-unicast enable on
nv set vrf default router bgp address-family ipv4-unicast redistribute connected enable on
nv set vrf default router bgp address-family ipv4-unicast redistribute static enable on
nv set vrf default router bgp address-family ipv6-unicast enable on
nv set vrf default router bgp address-family ipv6-unicast redistribute connected enable on
nv set vrf default router bgp address-family l2vpn-evpn enable on
nv set vrf default router bgp enable on
nv set vrf default router bgp neighbor swp2 peer-group hbn
nv set vrf default router bgp neighbor swp2 type unnumbered
nv set vrf default router bgp neighbor swp3 peer-group hbn
nv set vrf default router bgp neighbor swp3 type unnumbered
nv set vrf default router bgp neighbor swp4 peer-group hbn
nv set vrf default router bgp neighbor swp4 type unnumbered
nv set vrf default router bgp neighbor swp5 peer-group hbn
nv set vrf default router bgp neighbor swp5 type unnumbered
nv set vrf default router bgp path-selection multipath aspath-ignore on
nv set vrf default router bgp peer-group hbn address-family l2vpn-evpn enable on
nv set vrf default router bgp peer-group hbn remote-as external
nv set vrf default router static 0.0.0.0/0 address-family ipv4-unicast
nv set vrf default router static 0.0.0.0/0 via 172.169.50.1 type ipv4-address
nv config apply -y
The SN2201 switch (
mgmt-switch) is configured as follows:
SN2201 Switch Console
nv set interface swp1-3 link state up
nv set interface swp1-3 type swp
nv set interface swp1-3 bridge domain br_default
nv set bridge domain br_default untagged 1
nv config apply
nv config save -y
Host Configuration
All worker nodes must have the same PCIe placement for the BlueField-3 NIC and must display the same interface name.
Hypervisor Installation and Configuration
The hypervisor used in this Reference Deployment Guide (RDG) is based on Ubuntu 24.04 with KVM.
While this document does not detail the KVM installation process, it is important to note that the setup requires the following ISOs to deploy the Firewall, Jump, and MaaS virtual machines (VMs):
- Ubuntu 24.04
- pfSense-CE-2.7.2
To implement the solution, three Linux bridges must be created on the hypervisor:
Ensure a DHCP record is configured for the
lab-br bridge interface in your trusted LAN to assign it an IP address.
lab-br– connects the Firewall VM to the trusted LAN.
mgmt-br– Connects the various VMs to the host management network.
- hs-br – Connects the Firewall VM to the high-speed network.
Additionally, an MTU of 9000 must be configured on the management and high-speed bridges ( mgmt-br and hs-br ) as well as their uplink interfaces to ensure optimal performance.
Hypervisor netplan configuration
network:
ethernets:
eno1:
dhcp4:
false
eno2:
dhcp4:
false
mtu:
9000
ens2f0np0:
dhcp4:
false
mtu:
9000
bridges:
lab-br:
interfaces: [eno1]
dhcp4:
true
mgmt-br:
interfaces: [eno2]
dhcp4:
false
mtu:
9000
hs-br:
interfaces: [ens2f0np0]
dhcp4:
false
mtu:
9000
version:
2
Apply the configuration:
Hypervisor Console
$ sudo netplan apply
Prepare Infrastructure Servers
Firewall VM - pfSense Installation and Interface Configuration
Download the pfSense CE (Community Edition) ISO to your hypervisor and proceed with the software installation.
Suggested spec:
- vCPU: 2
- RAM: 2GB
- Storage: 10GB
Network interfaces
- Bridge device connected to lab-br
- Bridge device connected to mgmt-br
- Bridge device connected to hs-br
The Firewall VM must be connected to all three Linux bridges on the hypervisor. Before beginning the installation, ensure that three virtual network interfaces of type "Bridge device" are configured. Each interface should be connected to a different bridge (lab-br, mgmt-br, and hs-br) as illustrated in the diagram below.
After completing the installation, the setup wizard displays a menu with several options, such as "Assign Interfaces" and "Reboot System." During this phase, you must configure the network interfaces for the Firewall VM.
Select Option 2: "Set interface(s) IP address" and configure the interfaces as follows:
- WAN (lab-br) – Trusted LAN IP (Static/DHCP)
- LAN (mgmt-br) – Static IP 10.0.110.254/24
- OPT1 (hs-br) – Static IP 172.169.50.1/30
- Once the interface configuration is complete, use a web browser within the host management network to access the Firewall web interface and finalize the configuration.
Next, proceed with installing the Jump VM. This VM serves as a platform for running a browser for accessing the firewall’s web interface (UI) for post-installation configuration.
Jump VM
Suggested specifications:
- vCPU: 4
- RAM: 8GB
- Storage: 25GB
- Network interface: Bridge device, connected to
mgmt-br
Procedure:
Install standard Ubuntu 24.04 on each host . Use the following login credentials across all nodes in this deployment:
Username
Password
depuser
user
Enable internet connectivity and DNS resolution by creating the following Netplan configuration:Note
Use
10.0.110.254as a temporary DNS nameserver until the MaaS VM is installed and configured. After completing the MaaS installation, update the Netplan file to replace this address with the MaaS IP:
10.0.110.252.
Jump Node netplan
network: ethernets: enp1s0: dhcp4:
falseaddresses: [
10.0.
110.253/
24] nameservers: search: [dpf.rdg.local.domain] addresses: [
10.0.
110.254] routes: - to:
defaultvia:
10.0.
110.254version:
2
Apply the configuration:
Jump Node Console
depuser@jump:~$ sudo netplan apply
Update and upgrade the system:
Jump Node Console
depuser@jump:~$ sudo apt update -y depuser@jump:~$ sudo apt upgrade -y
Install and configure the Xfce desktop environment and XRDP (complementary packages for RDP):
Jump Node Console
depuser@jump:~$ sudo apt install -y xfce4 xfce4-goodies depuser@jump:~$ sudo apt install -y lightdm-gtk-greeter depuser@jump:~$ sudo apt install -y xrdp depuser@jump:~$ echo "xfce4-session" | tee .xsession depuser@jump:~$ sudo systemctl restart xrdp
Install Firefox for accessing the Firewall web interface:
Jump Node Console
$ sudo apt install -y firefox
Install and configure an NFS server with the
/mnt/dpf_sharedirectory:
Jump Node Console
$ sudo apt install -y nfs-server $ sudo mkdir -m 777 /mnt/dpf_share $ sudo vi /etc/exports
Add the following line to
/etc/exports:
Jump Node Console
/mnt/dpf_share 10.0.110.0/24(rw,sync,no_subtree_check)
Restart the NFS server:
Jump Node Console
$ sudo systemctl restart nfs-server
Create the directory
bfbunder
/mnt/dpf_sharewith the same permissions as the parent directory:
Jump Node Console
$ sudo mkdir -m 777 /mnt/dpf_share/bfb
Generate an SSH key pair for
depuserin the jump node. These keys will later be imported to the admin user in MaaS to enable password-less login to the provisioned servers):
Jump Node Console
depuser@jump:~$ ssh-keygen -t rsa
Firewall VM – Web Configuration
From your Jump node, open a Firefox web browser and navigate to the pfSense web UI (
http://10.0.110.254. The default login credentials are
admin/pfsense). The login page should appear as follows:
The IP addresses from the trusted LAN network under "DNS servers" and "Interfaces - WAN" are blurred.
Configure the following settings:
The following screenshots display only a part of the configuration view. Make sure to not miss any of the steps mentioned below!
Interfaces
- WAN—Mark “Enable interface”, unmark “Block private networks and loopback addresses”, “MTU”: 9000
- LAN—Mark “Enable interface”, “IPv4 configuration type”: “MTU”: 9000, Static IPv4 ("IPv4 Address": 10.0.110.254/24, "IPv4 Upstream Gateway": None)
- OPT1—Mark “Enable interface”, “IPv4 configuration type”: “MTU”: 9000, Static IPv4 ("IPv4 Address": 172.169.50.1/30, "IPv4 Upstream Gateway": None)
Firewall:
NAT -> Port Forward -> Add rule -> “Interface”: WAN, “Address Family”: IPv4, “Protocol”: TCP, “Destination”: WAN address, “Destination port range”: (“From port”: SSH, “To port”: SSH), “Redirect target IP”: (“Type”: Address or Alias, “Address”: 10.0.110.253), “Redirect target port”: SSH, “Description”: NAT SSH
NAT -> Port Forward -> Add rule -> “Interface”: WAN, “Address Family”: IPv4, “Protocol”: TCP, “Destination”: WAN address, “Destination port range”: (“From port”: MS RDP, “To port”: MS RDP), “Redirect target IP”: (“Type”: Address or Alias, “Address”: 10.0.110.253), “
Rules -> OPT1 -> Add rule -> “Action”: Pass , “Interface”: OPT1 , “Address Family”: IPv4+IPv6 , “Protocol”: Any , “Source”: Any , “Destination”: Any
MaaS VM
Suggested specifications:
- vCPU: 4
- RAM: 4 GB
- Storage: 100 GB
- Network interface: Bridge device, connected to
mgmt-br
Procedure:
- Perform a regular Ubuntu installation on the MaaS VM.
Create the following Netplan configuration to enable internet connectivity and DNS resolution:Note
Use
10.0.110.254as a temporary DNS nameserver. After the MaaS installation, replace this with the MaaS IP address (
10.0.110.252) in both the Jump and MaaS VM Netplan files.
MaaS netplan
network: ethernets: enp1s0: dhcp4:
falseaddresses: [
10.0.
110.252/
24] nameservers: search: [dpf.rdg.local.domain] addresses: [
10.0.
110.254] routes: - to:
defaultvia:
10.0.
110.254version:
2
Apply the netplan configuration:
MaaS Console
depuser@maas:~$ sudo netplan apply
Update and upgrade the system:
MaaS Console
depuser@maas:~$ sudo apt update -y depuser@maas:~$ sudo apt upgrade -y
Install PostgreSQL and configure the database for MaaS:
MaaS Console
$ sudo -i # apt install -y postgresql # systemctl disable --now systemd-timesyncd # export MAAS_DBUSER=maasuser # export MAAS_DBPASS=maaspass # export MAAS_DBNAME=maas # sudo -i -u postgres psql -c "CREATE USER \"$MAAS_DBUSER\" WITH ENCRYPTED PASSWORD '$MAAS_DBPASS'" # sudo -i -u postgres createdb -O "$MAAS_DBUSER" "$MAAS_DBNAME"
Install MaaS:
MaaS Console
# snap install maas
Initialize MaaS:
MaaS Console
# maas init region+rack --maas-url http://10.0.110.252:5240/MAAS --database-uri "postgres://$MAAS_DBUSER:$MAAS_DBPASS@localhost/$MAAS_DBNAME"
Create an admin account:
MaaS Console
# maas createadmin --username admin --password admin --email admin@example.com
Save the admin API key:
MaaS Console
# maas apikey --username admin > admin-apikey
Log in to the MaaS server:
MaaS Console
# maas login admin http://localhost:5240/MAAS "$(cat admin-apikey)"
Configure MaaS (Substitute <Trusted_LAN_NTP_IP> and <Trusted_LAN_DNS_IP> with the IP addresses in your environment):
MaaS Console
# maas admin domain update maas name="dpf.rdg.local.domain" # maas admin maas set-config name=ntp_servers value="<Trusted_LAN_NTP_IP>" # maas admin maas set-config name=network_discovery value="disabled" # maas admin maas set-config name=upstream_dns value="<Trusted_LAN_DNS_IP>" # maas admin maas set-config name=dnssec_validation value="no" # maas admin maas set-config name=default_osystem value="ubuntu"
Define and configure IP ranges and subnets:
MaaS Console
# maas admin ipranges create type=dynamic start_ip="10.0.110.51" end_ip="10.0.110.120" # maas admin ipranges create type=dynamic start_ip="10.0.110.201" end_ip="10.0.110.240" # maas admin ipranges create type=reserved start_ip="10.0.110.10" end_ip="10.0.110.10" comment="c-plane VIP" # maas admin ipranges create type=reserved start_ip="10.0.110.200" end_ip="10.0.110.200" comment="kamaji VIP" # maas admin ipranges create type=reserved start_ip="10.0.110.251" end_ip="10.0.110.254" comment="dpfmgmt" # maas admin vlan update 0 untagged dhcp_on=True primary_rack=maas mtu=9000 # maas admin dnsresources create fqdn=kube-vip.dpf.rdg.local.domain ip_addresses=10.0.110.10 # maas admin dnsresources create fqdn=jump.dpf.rdg.local.domain ip_addresses=10.0.110.253 # maas admin dnsresources create fqdn=fw.dpf.rdg.local.domain ip_addresses=10.0.110.254 # maas admin fabrics create Success. Machine-readable output follows: { "class_type": null, "name": "fabric-1", "id": 1, ... # maas admin subnets create name="fake-dpf" cidr="20.20.20.0/24" fabric=1
Complete MaaS setup:
- Connect to the Jump node GUI and access the MaaS UI at
http://10.0.110.252:5240/MAAS.
- On the first page, verify the "Region Name" and "DNS Forwarder," then continue.
On the image selection page, select Ubuntu 24.04 LTS (amd64) and sync the image.
Import the previously generated SSH key (
id_rsa.pub) for the
depuserinto the MaaS admin user profile and finalize the setup.
- Connect to the Jump node GUI and access the MaaS UI at
Configure DHCP snippets:
- Navigate to Settings → DHCP Snippets → Add Snippet.
Fill in the following fields:
- Name:
dpu-bmc-oob-mgmt
- Toggle on "Enabled"
- Type: IP Range
- Applies to:
10.0.110.201-
10.0.110.240
- Name:
Fill in the content of the DHCP snippet field with the following (replace the MAC address with the appropriate value for your DPU workers' BMC and OOB interface MAC) addresses:
DHCP snippet
# dpuworker1 host dpuworker1-bmc { # # Node DHCP snippets # hardware ethernet 58:a2:e1:73:6a:0b; fixed-address 10.0.110.201; } host dpuworker1-oob{ # # Node DHCP snippets # hardware ethernet 58:a2:e1:73:6a:0a; fixed-address 10.0.110.221; } # dpuworker2 host dpuworker2-bmc { # # Node DHCP snippets # hardware ethernet 58:a2:e1:73:6a:7d; fixed-address 10.0.110.202; } host dpuworker2-oob{ # # Node DHCP snippets # hardware ethernet 58:a2:e1:73:6a:7c; fixed-address 10.0.110.222; }
Go to Settings → Deploy, set "Default OS release" to Ubuntu 24.04 LTS Noble Numbat, and save.
- Update the DNS nameserver IP address in the Netplan files for both the Jump and MaaS VMs from
10.0.110.254to
10.0.110.252, then reapply the configuration.
K8s Master VMs
Suggested specifications:
- vCPU: 8
- RAM: 16GB
- Storage: 100GB
- Network interface: Bridge device, connected to
mgmt-br
Before provisioning the Kubernetes (K8s) Master VMs with MaaS, create the required virtual disks with empty storage. Use the following one-liner to create three 100 GB QCOW2 virtual disks:
Hypervisor Console
$ for i in $(seq 1 3); do qemu-img create -f qcow2 /var/lib/libvirt/images/master$i.qcow2 100G; done
This command generates the following disks in the
/var/lib/libvirt/images/directory:
master1.qcow2
master2.qcow2
master3.qcow2
Configure VMs in virt-manager:
Open virt-manager and create three virtual machines:
- Assign the corresponding virtual disk (
master1.qcow2,
master2.qcow2, or
master3.qcow2) to each VM.
- Configure each VM with the suggested specifications (vCPU, RAM, storage, and network interface).
- Assign the corresponding virtual disk (
- During the VM setup, ensure the NIC is selected under the Boot Options tab. This ensures the VMs can PXE boot for MaaS provisioning.
- Once the configuration is complete, shut down all the VMs.
- After the VMs are created and configured, proceed to provision them via the MaaS interface. MaaS will handle the OS installation and further setup as part of the deployment process.
Provision Master VMs Using MaaS
Install virsh and Set Up SSH Access
SSH to the MaaS VM from the Jump node:
MaaS Console
depuser@jump:~$ ssh maas depuser@maas:~$ sudo -i
Install the
virshclient to communicate with the hypervisor:
MaaS Console
# apt install -y libvirt-clients
Generate an SSH key for the
rootuser and copy it to the hypervisor user in the
libvirtdgroup:
MaaS Console
# ssh-keygen -t rsa # ssh-copy-id ubuntu@<hypervisor_MGMT_IP>
Verify SSH access and
virshcommunication with the hypervisor:
MaaS Console
# virsh -c qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system list --all
Expected output:
MaaS Console
Id Name State ------------------------------ 1 fw running 2 jump running 3 maas running - master1 shut off - master2 shut off - master3 shut off
Copy the SSH key to the required MaaS directory (for snap-based installations):
MaaS Console
# mkdir -p /var/snap/maas/current/root/.ssh # cp .ssh/id_rsa* /var/snap/maas/current/root/.ssh/
Get MAC Addresses of the Master VMs
Retrieve the MAC addresses of the Master VMs:
MaaS Console
# for i in $(seq 1 3); do virsh -c qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system dumpxml master$i | grep 'mac address'; done
Example output:
MaaS Console
<mac address='52:54:00:a9:9c:ef'/>
<mac address='52:54:00:19:6b:4d'/>
<mac address='52:54:00:68:39:7f'/>
Add Master VMs to MaaS
Add the Master VMs to MaaS:Info
Once added, MaaS will automatically start the newly added VMs commissioning (discovery and introspection).
MaaS Console
# maas admin machines create hostname=master1 architecture=amd64/generic mac_addresses='52:54:00:a9:9c:ef' power_type=virsh power_parameters_power_address=qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system power_parameters_power_id=master1 skip_bmc_config=1 testing_scripts=none Success. Machine-readable output follows: { "description": "", "status_name": "Commissioning", ... "status": 1, ... "system_id": "c3seyq", ... "fqdn": "master1.dpf.rdg.local.domain", "power_type": "virsh", ... "status_message": "Commissioning", "resource_uri": "/MAAS/api/2.0/machines/c3seyq/" } # maas admin machines create hostname=master2 architecture=amd64/generic mac_addresses='52:54:00:19:6b:4d' power_type=virsh power_parameters_power_address=qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system power_parameters_power_id=master2 skip_bmc_config=1 testing_scripts=none # maas admin machines create hostname=master3 architecture=amd64/generic mac_addresses='52:54:00:68:39:7f' power_type=virsh power_parameters_power_address=qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system power_parameters_power_id=master3 skip_bmc_config=1 testing_scripts=none
- Repeat the command for
master2and
master3with their respective MAC addresses.
Verify commissioning by waiting for the status to change to "Ready" in MaaS.
After commissioning, the next phase is deployment (OS provisioning).
Configure Master VMs Network
To ensure persistence across reboots, assign a static IP address to the management interface of the master nodes.
For each Master VM:
Navigate to Network and click "actions" near the management interface (a small arrowhead pointing down), then select "Edit Physical".
Configure as follows:
- Subnet: 10.0.110.0/24
- IP Mode: Static Assign
Address: Assign
10.0.110.1for
master1,
10.0.110.2for
master2, and
10.0.110.3for
master3.
- Save the interface settings for each VM.
Deploy Master VMs Using Cloud-Init
Use the following cloud-init script to configure the necessary software and ensure persistency:
Master nodes cloud-init
#cloud-config system_info: default_user: name: depuser passwd:
"$6$jOKPZPHD9XbG72lJ$evCabLvy1GEZ5OR1Rrece3NhWpZ2CnS0E3fu5P1VcZgcRO37e4es9gmriyh14b8Jx8gmGwHAJxs3ZEjB0s0kn/"lock_passwd:
falsegroups: [adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, video] sudo: [
"ALL=(ALL) NOPASSWD:ALL"] shell: /bin/bash ssh_pwauth: True package_upgrade:
trueruncmd: - apt-get update - apt-get -y install nfs-common
Deploy the master VMs:
- Select all three Master VMs → Actions → Deploy.
- Toggle Cloud-init user-data and paste the cloud-init script.
Start the deployment and wait for the status to change to "Ubuntu 24.04 LTS".
Verify Deployment
SSH into the Master VMs from the Jump node:
Jump Node Console
depuser@jump:~$ ssh master1 depuser@master1:~$
Run
sudowithout a password:
Master1 Console
depuser@master1:~$ sudo -i root@master1:~#
Verify installed packages:
Master1 Console
root@master1:~# apt list --installed | egrep 'nfs-common' nfs-common/noble,now 1:2.6.4-3ubuntu5 amd64 [installed]
- Reboot the Master VMs to complete the provisioning.
Master1 Console
root@master1:~# reboot
Repeat the verification commands for
master2 and
master3.
K8s Cluster Deployment and Configuration
Kubespray Deployment and Configuration
In this solution, the Kubernetes (K8s) cluster is deployed using a modified Kubespray (based on tag
v2.26.0) with a non-root
depuser account from the Jump Node. The modifications in Kubespray are designed to meet the DPF prerequisites as described in the User Manual and facilitate cluster deployment and scaling.
Our modified Kubespray installs Flannel CNI for the primary Kubernetes network.
- Download the modified Kubespray archive: modified_kubespray_v2.26.0.tar.gz.
Extract the contents and navigate to the extracted directory:
Jump Node Console
$ tar -xzf /home/depuser/modified_kubespray_v2.26.0.tar.gz $ cd kubespray/ depuser@jump:~/kubespray$
Verify that the network plugin is set to
flanneland that
kube_proxy_removeis set to
falsein the
inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.ymlfile.
inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
[depuser@jump kubespray-2.26.0]$ vim inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml # Choose network plugin (cilium, calico, kube-ovn, weave or flannel. Use cni for generic cni plugin) # Can also be set to 'cloud', which lets the cloud provider setup appropriate routing kube_network_plugin: flannel .... # Kube-proxy proxyMode configuration. # Can be ipvs, iptables kube_proxy_remove: false kube_proxy_mode: ipvs .....
Set the K8s API VIP address and DNS record. Replace it with your own IP address and DNS record if different:
Jump Node Console
depuser@jump:~/kubespray$ sed -i '/ #kube_vip_address:/s/.*/kube_vip_address: 10.0.110.10/' inventory/mycluster/group_vars/k8s_cluster/addons.yml depuser@jump:~/kubespray$ sed -i '/apiserver_loadbalancer_domain_name:/s/.*/apiserver_loadbalancer_domain_name: "kube-vip.dpf.rdg.local.domain"/' roles/kubespray-defaults/defaults/main/main.yml
Install the necessary dependencies and set up the Python virtual environment:
Jump Node Console
depuser@jump:~/kubespray$ sudo apt -y install python3-pip jq python3.12-venv depuser@jump:~/kubespray$ python3 -m venv .venv depuser@jump:~/kubespray$ source .venv/bin/activate (.venv) depuser@jump:~/kubespray$ python3 -m pip install --upgrade pip (.venv) depuser@jump:~/kubespray$ pip install -U -r requirements.txt (.venv) depuser@jump:~/kubespray$ pip install ruamel-yaml
Review and edit the
inventory/mycluster/hosts.yamlfile to define the cluster nodes. The following is the configuration for this deployment:
inventory/mycluster/hosts.yaml
all: hosts: master1: ansible_host:
10.0.
110.1ip:
10.0.
110.1access_ip:
10.0.
110.1node_labels:
"k8s.ovn.org/zone-name":
"master1"master2: ansible_host:
10.0.
110.2ip:
10.0.
110.2access_ip:
10.0.
110.2node_labels:
"k8s.ovn.org/zone-name":
"master2"master3: ansible_host:
10.0.
110.3ip:
10.0.
110.3access_ip:
10.0.
110.3node_labels:
"k8s.ovn.org/zone-name":
"master3"children: kube_control_plane: hosts: master1: master2: master3: kube_node: hosts: etcd: hosts: master1: master2: master3: k8s_cluster: children: kube_control_plane:
Deploying Cluster Using Kubespray Ansible Playbook
Run the following command from the Jump Node to initiate the deployment process:Note
Ensure you are in the Python virtual environment (
.venv) when running the command.
Jump Node Console
(.venv) depuser@jump:~/kubespray$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
It takes a while for this deployment to complete. Make sure there are no errors. Successful result example:Tip
It is recommended to keep the shell from which Kubespray has been running open, later on it will be useful when performing cluster scale out to add the worker nodes.
K8s Deployment Verification
To simplify managing the K8s cluster from the Jump Host, set up
kubectl with bash auto-completion.
Copy
kubectland the kubeconfig file from
master1to the Jump Host:
Jump Node Console
## Connect to master1 depuser@jump:~$ ssh master1 depuser@master1:~$ cp /usr/local/bin/kubectl /tmp/ depuser@master1:~$ sudo cp /root/.kube/config /tmp/kube-config depuser@master1:~$ sudo chmod 644 /tmp/kube-config
In another terminal tab, copy the files to the Jump Host:
Jump Node Console
depuser@jump:~$ scp master1:/tmp/kubectl /tmp/ depuser@jump:~$ sudo chown root:root /tmp/kubectl depuser@jump:~$ sudo mv /tmp/kubectl /usr/local/bin/ depuser@jump:~$ mkdir -p ~/.kube depuser@jump:~$ scp master1:/tmp/kube-config ~/.kube/config depuser@jump:~$ chmod 600 ~/.kube/config
Enable bash auto-completion for
kubectl:
Verify if bash-completion is installed:
Jump Node Console
depuser@jump:~$ type _init_completion
If installed, the output includes:
Jump Node Console
_init_completion is a function
If not installed, install it:
Jump Node Console
depuser@jump:~$ sudo apt install -y bash-completion
Set up the
kubectlcompletion script:
Jump Node Console
depuser@jump:~$ kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl > /dev/null depuser@jump:~$ bash
Check the status of the nodes in the cluster:
Jump Node Console
depuser@jump:~$ kubectl get nodes
Expected output:
Jump Node Console
NAME STATUS ROLES AGE VERSION master1 Ready control-plane 8m7s v1.30.4 master2 Ready control-plane 7m13s v1.30.4 master3 Ready control-plane 6m40s v1.30.4
Check the pods in all namespaces:
Jump Node Console
depuser@jump:~$ kubectl get pods -A
Expected output:
Jump Node Console
[depuser@setup5-jump ~]$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-776bb9db5d-2st6b 1/1 Running 0 5m58s kube-system coredns-776bb9db5d-kbklh 1/1 Running 0 5m53s kube-system dns-autoscaler-6ffb84bd6-cp466 1/1 Running 0 5m54s kube-system kube-apiserver-master1 1/1 Running 0 8m35s kube-system kube-apiserver-master2 1/1 Running 0 7m44s kube-system kube-apiserver-master3 1/1 Running 0 7m10s kube-system kube-controller-manager-master1 1/1 Running 1 8m35s kube-system kube-controller-manager-master2 1/1 Running 1 7m44s kube-system kube-controller-manager-master3 1/1 Running 1 7m10s kube-system kube-flannel-8r2dd 1/1 Running 0 6m22s kube-system kube-flannel-sq88x 1/1 Running 0 6m22s kube-system kube-flannel-xf9mn 1/1 Running 0 6m23s kube-system kube-proxy-4v7hn 1/1 Running 0 8m21s kube-system kube-proxy-6cdjc 1/1 Running 0 7m14s kube-system kube-proxy-tm2j4 1/1 Running 0 7m47s kube-system kube-scheduler-master1 1/1 Running 1 8m36s kube-system kube-scheduler-master2 1/1 Running 1 7m45s kube-system kube-scheduler-master3 1/1 Running 1 7m10s kube-system kube-vip-master1 1/1 Running 0 8m35s kube-system kube-vip-master2 1/1 Running 0 7m45s kube-system kube-vip-master3 1/1 Running 0 7m10s
DPF Installation
Software Prerequisites and Required Variables
Start by installing the remaining software perquisites:
Jump Node Console
## Connect to master1 to copy helm client utility that was installed during kubespray deployment
$ depuser@jump:~$ ssh master1
depuser@master1:~$ cp /usr/local/bin/helm /tmp/
## In another tab
depuser@jump:~$ scp master1:/tmp/helm /tmp/
depuser@jump:~$ sudo chown root:root /tmp/helm
depuser@jump:~$ sudo mv /tmp/helm /usr/local/bin/
## Verify that envsubst utility is installed
depuser@jump:~$ which envsubst
/usr/bin/envsubst
Proceed to clone the doca-platform Git repository (and make sure to use tag v25.4.0):
Jump Node Console
$ git clone https://github.com/NVIDIA/doca-platform.git
$ cd doca-platform
$ git checkout v25.4.0
Change directory to the location of the hbn-only readme.md from where all the commands are run:
Jump Node Console
$ cd docs/public/user-guides/hbn_only/
Use the following file to define the required variables for the installation:
Replace the values for the variables in the following file with the values that fit your setup. Specifically, pay attention to
DPU_P0 and
DPUCLUSTER_INTERFACE.
export_vars.env
## IP Address for the Kubernetes API server of the target cluster on which DPF is installed.
## This should never include a scheme or a port.
## e.g. 10.10.10.10
export TARGETCLUSTER_API_SERVER_HOST=10.0.110.10
## Port for the Kubernetes API server of the target cluster on which DPF is installed.
export TARGETCLUSTER_API_SERVER_PORT=6443
## Virtual IP used by the load balancer for the DPU Cluster. Must be a reserved IP from the management subnet and not allocated by DHCP.
export DPUCLUSTER_VIP=10.0.110.200
## DPU_P0 is the name of the first port of the DPU. This name must be the same on all worker nodes.
export DPU_P0=ens1f0np0
## Interface on which the DPUCluster load balancer will listen. Should be the management interface of the control plane node.
export DPUCLUSTER_INTERFACE=eno1
# IP address to the NFS server used as storage for the BFB.
export NFS_SERVER_IP=10.0.110.253
## The repository URL for the NVIDIA Helm chart registry.
## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca
## The repository URL for the HBN container image.
## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository.
export HBN_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_hbn
## The DPF REGISTRY is the Helm repository URL for the DPF Operator.
## Usually this is the GHCR registry. For development purposes, this can be set to a different repository.
export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca
## The DPF TAG is the version of the DPF components which will be deployed in this guide.
export TAG=v25.4.0
## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet.
export BLUEFIELD_BITSTREAM="https://content.mellanox.com/BlueField/BFBs/Ubuntu22.04/bf-bundle-3.0.0-135_25.04_ubuntu-22.04_prod.bfb"
Export environment variables for the installation:
Jump Node Console
$ source export_vars.env
DPF Operator Installation
Cert-manager Installation
Cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes workloads. It obtains certificates from a variety of Issuers, both popular public Issuers as well as private ones. Cert-manager ensures certificates are valid and up-to-date, and it attempts to renew certificates at a configured time before expiration.
In this deployment, Cert-manager is a prerequisite that provides certificates for webhooks used by DPF and its dependencies.
Install Cert-manager using Helm. The following values will be used for the Helm chart installation:
manifests/01-dpf-operator-installation/helm-values/cert-manager.yml
startupapicheck:
enabled:
false
crds:
enabled:
true
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
tolerations:
- operator: Exists
effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- operator: Exists
effect: NoSchedule
key: node-role.kubernetes.io/master
cainjector:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
tolerations:
- operator: Exists
effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- operator: Exists
effect: NoSchedule
key: node-role.kubernetes.io/master
webhook:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
tolerations:
- operator: Exists
effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- operator: Exists
effect: NoSchedule
key: node-role.kubernetes.io/master
Run the following commands:
Jump Node Console
$ helm repo add jetstack https://charts.jetstack.io --force-update
$ helm upgrade --install --create-namespace --namespace cert-manager cert-manager jetstack/cert-manager --version v1.16.1 -f ./manifests/01-dpf-operator-installation/helm-values/cert-manager.yml
Release "cert-manager" does not exist. Installing it now.
NAME: cert-manager
LAST DEPLOYED: Tue Apr 8 13:40:48 2025
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.16.1 has been deployed successfully!
...
Verify that all the pods in the Cert-manager namespace are in the Ready state:
Jump Node Console
$ kubectl wait --for=condition=ready --namespace cert-manager pods --all
pod/cert-manager-6ffdf6c5f8-5sx4q condition met
pod/cert-manager-cainjector-66b8577665-rgrlz condition met
pod/cert-manager-webhook-5cb94cb7b6-c7lpz condition met
Install a CSI to Back the DPUCluster etcd
Download local-path-provisioner helm chart to your current working directory and create a NS for it:
Jump Node Console
$ curl https://codeload.github.com/rancher/local-path-provisioner/tar.gz/v0.0.30 | tar -xz --strip=3 local-path-provisioner-0.0.30/deploy/chart/local-path-provisioner/
$ kubectl create ns local-path-provisioner
The following values will be used for the installation:
manifests/01-dpf-operator-installation/helm-values/local-path-provisioner.yml
tolerations:
- operator: Exists
effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- operator: Exists
effect: NoSchedule
key: node-role.kubernetes.io/master
Run the following command:
Jump Node Console
$ helm install -n local-path-provisioner local-path-provisioner ./local-path-provisioner --version 0.0.30 -f ./manifests/01-dpf-operator-installation/helm-values/local-path-provisioner.yml
NAME: local-path-provisioner
LAST DEPLOYED: Tue Apr 8 13:43:06 2025
NAMESPACE: local-path-provisioner
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
...
Ensure that the pod in the local-path-provisioner namespace is in the Ready state:
Jump Node Console
$ kubectl wait --for=condition=ready --namespace local-path-provisioner pods --all
pod/local-path-provisioner-75f649c47c-rsvb8 condition met
Create Storage Required by the DPF Operator
The following YAML file defines storage (for the BFB images) that are required by the DPF operator.
manifests/01-dpf-operator-installation/nfs-storage-for-bfb-dpf-ga.yaml
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: bfb-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
nfs:
path: /mnt/dpf_share/bfb
server: $NFS_SERVER_IP
persistentVolumeReclaimPolicy: Delete
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: bfb-pvc
namespace: dpf-operator-system
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
volumeMode: Filesystem
storageClassName:
""
Run the following commands to first create the namespace for the DPF Operator, then substitute the environment variables using
envsubst,and apply the YAML files:
Jump Node Console
$ kubectl create namespace dpf-operator-system
$ cat manifests/01-dpf-operator-installation/*.yaml | envsubst | kubectl apply -f -
DPF Operator Deployment
The DPF Operator Helm values are detailed in the following YAML file:
manifests/01-dpf-operator-installation/helm-values/dpf-operator.yml
kamaji-etcd:
persistentVolumeClaim:
storageClassName: local-path
node-feature-discovery:
worker:
extraEnvs:
- name:
"KUBERNETES_SERVICE_HOST"
value:
"$TARGETCLUSTER_API_SERVER_HOST"
- name:
"KUBERNETES_SERVICE_PORT"
value:
"$TARGETCLUSTER_API_SERVER_PORT"
Run the following commands to substitute the environment variables and install the DPF Operator:
Jump Node Console
$ helm repo add --force-update dpf-repository ${REGISTRY}
$ helm repo update
$ envsubst < ./manifests/01-dpf-operator-installation/helm-values/dpf-operator.yml | helm upgrade --install -n dpf-operator-system dpf-operator dpf-repository/dpf-operator --version=$TAG --values -
Release "dpf-operator" does not exist. Installing it now.
coalesce.go:286: warning: cannot overwrite table with non table for dpf-operator.parca.server.tolerations (map[])
NAME: dpf-operator
LAST DEPLOYED: Tue May 20 23:18:22 2025
NAMESPACE: dpf-operator-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
Verify the DPF Operator installation by ensuring the deployment is available and all the pods are ready:
The following verification commands may need to be run multiple times to ensure the conditions are met.
Jump Node Console
$ kubectl rollout status deployment --namespace dpf-operator-system dpf-operator-controller-manager
deployment "dpf-operator-controller-manager" successfully rolled out
$ kubectl wait --for=condition=ready --namespace dpf-operator-system pods --all
pod/dpf-operator-argocd-application-controller-0 condition met
pod/dpf-operator-argocd-redis-5bc74d76fc-v6l7m condition met
pod/dpf-operator-argocd-repo-server-86c9454fc9-zqtqf condition met
pod/dpf-operator-argocd-server-554d9f446-lntpv condition met
pod/dpf-operator-controller-manager-67599cdcb7-5dchf condition met
pod/dpf-operator-kamaji-6dcf4ccdfd-fg64w condition met
pod/dpf-operator-kamaji-etcd-0 condition met
pod/dpf-operator-kamaji-etcd-1 condition met
pod/dpf-operator-kamaji-etcd-2 condition met
pod/dpf-operator-maintenance-operator-666b88bfcd-p72nn condition met
pod/dpf-operator-node-feature-discovery-gc-656b95dc48-gwtsb condition met
pod/dpf-operator-node-feature-discovery-master-76d5695c7c-6kwfz condition met
DPF System Installation
This section involves creating the DPF system components and some basic infrastructure required for a functioning DPF-enabled cluster.
The files define the DPFOperatorConfig to install the DPF System components, and the DPUCluster to serve as the Kubernetes control plane for DPU nodes.
manifests/02-dpf-system-installation/operatorconfig.yaml
---
apiVersion: operator.dpu.nvidia.com/v1alpha1
kind: DPFOperatorConfig
metadata:
name: dpfoperatorconfig
namespace: dpf-operator-system
spec:
kamajiClusterManager:
disable:
false
provisioningController:
bfbPVCName: bfb-pvc
installInterface:
installViaRedfish:
# set
this to the IP of one of your control plane node +
8080
bfbRegistryAddress:
"10.0.110.1:8080"
dmsTimeout:
900
staticClusterManager:
disable:
false
networking:
controlPlaneMTU:
9216
highSpeedMTU:
9216
manifests/02-dpf-system-installation/dpucluster.yaml
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUCluster
metadata:
name: dpu-cplane-tenant1
namespace: dpu-cplane-tenant1
spec:
type: kamaji
maxNodes:
10
version: v1.
30.2
clusterEndpoint:
# deploy keepalived instances on the nodes that match the given nodeSelector.
keepalived:
#
interface on which keepalived will listen. Should be the oob
interface of the control plane node.
interface: $DPUCLUSTER_INTERFACE
# Virtual IP reserved
for the DPU Cluster load balancer. Must not be allocatable by DHCP.
vip: $DPUCLUSTER_VIP
# virtualRouterID must be in range [
1,
255], make sure the given virtualRouterID does not duplicate with any existing keepalived process running on the host
virtualRouterID:
126
nodeSelector:
node-role.kubernetes.io/control-plane:
""
Create a namespace for the Kubernetes control plane of the DPU nodes:
Jump Node Console
$ kubectl create ns dpu-cplane-tenant1
Apply the previous YAML files:
Jump Node Console
$ cat manifests/02-dpf-system-installation/operatorconfig.yaml | envsubst | kubectl apply -f -
$ cat manifests/02-dpf-system-installation/dpucluster.yaml | envsubst | kubectl apply -f -
Verify the DPF system by ensuring that the provisioning and DPUService controller manager deployments are available, all other deployments in the DPF Operator system are available, and that the DPUCluster is ready for nodes to join.
Jump Node Console
$ kubectl rollout status deployment --namespace dpf-operator-system dpf-provisioning-controller-manager dpuservice-controller-manager
deployment "dpf-provisioning-controller-manager" successfully rolled out
deployment "dpuservice-controller-manager" successfully rolled out
$ kubectl rollout status deployment --namespace dpf-operator-system
deployment "dpf-provisioning-controller-manager" successfully rolled out
deployment "dpuservice-controller-manager" successfully rolled out
$ kubectl rollout status deployment --namespace dpf-operator-system
deployment "dpf-operator-argocd-applicationset-controller" successfully rolled out
deployment "dpf-operator-argocd-redis" successfully rolled out
deployment "dpf-operator-argocd-repo-server" successfully rolled out
deployment "dpf-operator-argocd-server" successfully rolled out
deployment "dpf-operator-controller-manager" successfully rolled out
deployment "dpf-operator-kamaji" successfully rolled out
deployment "dpf-operator-maintenance-operator" successfully rolled out
deployment "dpf-operator-node-feature-discovery-gc" successfully rolled out
deployment "dpf-operator-node-feature-discovery-master" successfully rolled out
deployment "dpf-provisioning-controller-manager" successfully rolled out
deployment "dpuservice-controller-manager" successfully rolled out
deployment "kamaji-cm-controller-manager" successfully rolled out
deployment "static-cm-controller-manager" successfully rolled out
$ kubectl wait --for=condition=ready --namespace dpu-cplane-tenant1 dpucluster --all
dpucluster.provisioning.dpu.nvidia.com/dpu-cplane-tenant1 condition met
DPU Provisioning
Run the following command from the Jump Node console to verify BMC version (25.04-2 is the recomended BMC FW version):
Jump Node Console
$ curl -k -u root:'3tango11!OBMC' https://10.0.110.201/redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware
{
"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware",
"@odata.type": "#SoftwareInventory.v1_4_0.SoftwareInventory",
"Description": "BMC image",
"Id": "BMC_Firmware",
"Manufacturer": "",
"Name": "Software Inventory",
"RelatedItem": [],
"RelatedItem@odata.count": 0,
"SoftwareId": "0x0018",
"Status": {
"Conditions": [],
"Health": "OK",
"HealthRollup": "OK",
"State": "Enabled"
},
"Updateable": true,
"Version": "BF-23.09-6",
"WriteProtected": false
If you have an older BMC version, run the following steps to update DPU BMC, EUFI, and firmware:
Download a relevant bfb image.
Jump Node Console
$ wget https://content.mellanox.com/BlueField/BFBs/Ubuntu22.04/bf-bundle-3.0.0-135_25.04_ubuntu-22.04_prod.bfb
Create a
bf.cfgfile.
Jump Node Console
$ vim bf.cfg BMC_PASSWORD="$(tr -dc 'A-Za-z0-9' </dev/urandom | head -c 4)-$(tr -dc 'A-Za-z0-9' </dev/urandom | head -c 4)_$(tr -dc '0-9' </dev/urandom | head -c 2)$(tr -dc 'a-z' </dev/urandom | head -c 1)$(tr -dc 'A-Z' </dev/urandom | head -c 1)" BMC_USER="firmware_updater" BMC_REBOOT="yes" CEC_REBOOT="yes" USER_ID=8 pre_bmc_components_update() { ipmitool user set name $USER_ID $BMC_USER ipmitool user set password $USER_ID $BMC_PASSWORD ipmitool user enable $USER_ID ipmitool channel setaccess 1 $USER_ID ipmi=on ipmitool user priv $USER_ID 0x4 1 } post_bmc_components_update() { ipmitool user set name $USER_ID "" }
Run following command.
Jump Node Console
$ cat bf-bundle-3.0.0-135_25.04_ubuntu-22.04_prod.bfb bf.cfg > bfb-install.bfb
Connect to the DPU over SSH and start the
rshimservice.
Jump Node Console
$ ssh root@10.0.110.201 root@10.0.110.201's password: <BMC Root Password. Default root/0penBmc. need to change first time>
Start the
rshimservice.
Jump Node Console
root@dpu-bmc:~# systemctl enable rshim root@dpu-bmc:~# systemctl start rshim root@dpu-bmc:~# systemctl status rshim * rshim.service - rshim driver for BlueField SoC Loaded: loaded (/usr/lib/systemd/system/rshim.service; enabled; preset: disabled) Active: active (running) since Wed 2025-04-23 14:21:43 UTC; 24h ago Docs: man:rshim(8) Main PID: 940 (rshim) CPU: 3h 39min 40.138s CGroup: /system.slice/rshim.service `-940 /usr/sbin/rshim Apr 23 14:21:42 dpu-bmc (rshim)[908]: rshim.service: Referenced but unset environment variable evaluates to an empty string: OPTIONS Apr 23 14:21:42 dpu-bmc rshim[940]: Created PID file: /var/run/rshim.pid Apr 23 14:21:43 dpu-bmc rshim[940]: USB device detected Apr 23 14:21:47 dpu-bmc rshim[940]: Probing usb-2.1 Apr 23 14:21:47 dpu-bmc rshim[940]: create rshim usb-2.1 Apr 23 14:21:48 dpu-bmc rshim[940]: rshim0 attached root@dpu-bmc:~# exit logout Connection to 10.0.110.201 closed. [depuser@setup5-jump ~]$
Open an additional console to the Jump node. And connect to DPU OOB to monitor the update process status.
Jump Node and DPU OOB Console
$ ssh root
@10.0.
110.201root
@10.0.
110.201's password: root
@dpu-bmc:~# obmc-console-client dpu-device-
1login:
Return to the Jump node console and run the command to start the BMC, EUFI and firmware update process.
Jump Node Console
$ scp bfb-install.bfb root@10.0.110.201:/dev/rshim0/boot
Return to the DPU OOB console. Wait ~20-25 minutes to update process finnish.
Jump Node Console
[13:26:32] No active BMC task [13:26:32] Updating BMC firmware [13:26:32] Found BMC firmware image: /mnt/lib/firmware/mellanox/bmc/bf3-bmc-fw.fwpkg [13:26:32] Provided BMC firmware version: 25.04-2 [13:26:32] - INFO: BMC_FIRMWARE_URL: /redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware [13:26:32] Running BMC firmware version: 23.09-6 [13:26:32] Proceeding with the BMC firmware update. [13:26:33] curl -sSk -u <BMC_USER:BMC_PASSWORD> -H Content-Type: application/octet-stream -X POST -T /mnt/lib/firmware/mellanox/bmc/bf3-bmc-fw.fwpkg https://192.168.240.1/redfish/v1/UpdateService [13:26:41] BMC Firmware update: { "@odata.id": "/redfish/v1/TaskService/Tasks/0", "@odata.type": "#Task.v1_4_3.Task", "Id": "0", "TaskState": "Running", "TaskStatus": "OK" } [13:26:44] Task id: /redfish/v1/TaskService/Tasks/0 [13:39:32] INFO: BMC firmware was updated to: 25.04-2 [13:39:32] BFB-Installer: Installing BMC Image passed, total 64% complete [13:39:33] Task id: /redfish/v1/TaskService/Tasks/0 [13:39:33] Updating CEC firmware [13:39:33] Found CEC firmware image: /mnt/lib/firmware/mellanox/cec/bf3-cec-fw.fwpkg [13:39:33] Provided CEC firmware version: 00.02.0195.0000 [13:39:33] Running CEC firmware version: 00.02.0127.0000 [13:39:33] Proceeding with the CEC firmware update... [13:39:33] curl -sSk -u <BMC_USER:BMC_PASSWORD> -H Content-Type: application/octet-stream -X POST -T /mnt/lib/firmware/mellanox/cec/bf3-cec-fw.fwpkg https://192.168.240.1/redfish/v1/UpdateService [13:39:35] CEC Firmware update: { "@odata.id": "/redfish/v1/TaskService/Tasks/1", "@odata.type": "#Task.v1_4_3.Task", "Id": "1", "TaskState": "Running", "TaskStatus": "OK" } [13:39:38] Task id: /redfish/v1/TaskService/Tasks/1 [13:39:59] INFO: CEC firmware was updated to 00.02.0195.0000. Host power cycle is required [13:39:59] BFB-Installer: Installing Glacier Image passed, total 65% complete [13:39:59] Rebooting BMC... Connection to 10.0.110.201 closed by remote host. Connection to 10.0.110.201 closed.
- Power cycle the server with update DPU.
Run the following command from the Jump node console to verify the BMC version:
Jump Node Console
$ curl -k -u root:'3tango11!OBMC' https://10.0.110.201/redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware { "@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/BMC_Firmware", "@odata.type": "#SoftwareInventory.v1_4_0.SoftwareInventory", "Description": "BMC image", "Id": "BMC_Firmware", "Manufacturer": "", "Name": "Software Inventory", "RelatedItem": [], "RelatedItem@odata.count": 0, "SoftwareId": "0x0018", "Status": { "Conditions": [], "Health": "OK", "HealthRollup": "OK", "State": "Enabled" }, "Updateable": true, "Version": "BF-25.04-2", "WriteProtected": false
Repeat the step 4-10 on the DPU 2.
To authenticate with Redfish, it is necessary to provide a password for the BMC root user . Change ROOT_BMC_PASSWORD to the root password and run following command. The password must to be same on all DPUs .
Jump Node Console
$ kubectl create secret generic -n dpf-operator-system bmc-shared-password --from-literal=password='ROOT_BMC_PASSWORD'
Create the following YAML
to define a
DPUDevice:
manifests/04-dpudeployment-installation/create-dpu-devices.yaml
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUDevice
metadata:
name: dpu-device-
1
namespace: dpf-operator-system
spec:
bmcIp:
10.0.
110.201
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUDevice
metadata:
name: dpu-device-
2
namespace: dpf-operator-system
spec:
bmcIp:
10.0.
110.202
Run the command to create a
DPUDevice:
Jump Node Console
$ kubectl apply -f manifests/04-dpudeployment-installation/create-dpu-devices.yaml
Verify the DPF system by ensuring that the DPUDevices exist:
Jump Node Console
$ kubectl get dpudevices -n dpf-operator-system
NAME AGE
dpu-device-1 7s
dpu-device-2 7s
Create the following YAML
to define a
DPUNode:
manifests/04-dpudeployment-installation/create-dpu-nodes.yaml
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUNode
metadata:
name: dpuworker1
namespace: dpf-operator-system
spec:
nodeRebootMethod:
external: {} # DPU will be rebooted externally (via BMC/IPMI)
dpus:
- name: dpu-device-
1 # Name of the previously created DPUDevice
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUNode
metadata:
name: dpuworker2
namespace: dpf-operator-system
spec:
nodeRebootMethod:
external: {}
dpus:
- name: dpu-device-
2
Run the command to create a
DPUNode:
Jump Node Console
$ kubectl apply -f manifests/04-dpudeployment-installation/create-dpu-nodes.yaml
Verify the DPF system by ensuring that the DPUDevices exist.
Jump Node Console
$ kubectl get dpunodes -n dpf-operator-system
NAME AGE
dpuworker1 8s
dpuworker2 8s
Use the following YAML
to create a
BFB resource that downloads the Bluefield Bitstream to a shared volume:
manifests/04-dpudeployment-installation/bfb.yaml
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: BFB
metadata:
name: bf-bundle
namespace: dpf-operator-system
spec:
url: $BLUEFIELD_BITSTREAM
Run the command to create the
BFB:
Jump Node Console
$ cat manifests/04-dpudeployment-installation/bfb.yaml | envsubst |kubectl apply -f -
Add labels to
DPUNodes. Set the values according to your environment.
Jump Node Console
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker1 feature.node.kubernetes.io/dpu-
0-pf0-name=ens1f0np0
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker1 feature.node.kubernetes.io/dpu-
0-number-of-pfs=
2
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker1 feature.node.kubernetes.io/dpu-oob-bridge-configured=
""
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker1 feature.node.kubernetes.io/dpu-enabled=
true
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker1 feature.node.kubernetes.io/dpu-
0-pci-address=
0000-2b-
00
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker2 feature.node.kubernetes.io/dpu-
0-pf0-name=ens1f0np0
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker2 feature.node.kubernetes.io/dpu-
0-number-of-pfs=
2
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker2 feature.node.kubernetes.io/dpu-oob-bridge-configured=
""
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker2 feature.node.kubernetes.io/dpu-enabled=
true
kubectl label dpunodes.provisioning.dpu.nvidia.com -n dpf-operator-system dpuworker2 feature.node.kubernetes.io/dpu-
0-pci-address=
0000-2b-
00
DPU Service Installation
Create the DPUDeployment, DPUServiceConfig, DPUServiceTemplate and other necessary objects.
Before deploying the objects under
manifests/04-dpudeployment-installation/directory, a few adjustments are required.
Change the DPUFlavor using the following YAML:
The settings below configure a DPU in Zero Trust mode, which means DPU management will be blocked from the bare-metal host.
To deploy in DPU mode, comment out the line containing
dpuMode:
# dpuMode: zero-trust
manifests/04-dpudeployment-installation/hbn-dpuflavor.yaml
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUFlavor
metadata:
name: dpf-provisioning-hbn
namespace: dpf-operator-system
spec:
dpuMode: zero-trust
bfcfgParameters:
- UPDATE_ATF_UEFI=yes
- UPDATE_DPU_OS=yes
- WITH_NIC_FW_UPDATE=yes
configFiles:
- operation: override
path: /etc/mellanox/mlnx-bf.conf
permissions:
"0644"
raw: |
ALLOW_SHARED_RQ=
"no"
IPSEC_FULL_OFFLOAD=
"no"
ENABLE_ESWITCH_MULTIPORT=
"yes"
- operation: override
path: /etc/mellanox/mlnx-ovs.conf
permissions:
"0644"
raw: |
CREATE_OVS_BRIDGES=
"no"
OVS_DOCA=
"yes"
- operation: override
path: /etc/mellanox/mlnx-sf.conf
permissions:
"0644"
raw:
""
grub:
kernelParameters:
- console=hvc0
- console=ttyAMA0
- earlycon=pl011,
0x13010000
- fixrttc
- net.ifnames=
0
- biosdevname=
0
- iommu.passthrough=
1
- cgroup_no_v1=net_prio,net_cls
- hugepagesz=2048kB
- hugepages=
8072
nvconfig:
- device:
'*'
parameters:
- PF_BAR2_ENABLE=
0
- PER_PF_NUM_SF=
1
- PF_TOTAL_SF=
20
- PF_SF_BAR_SIZE=
10
- NUM_PF_MSIX_VALID=
0
- PF_NUM_PF_MSIX_VALID=
1
- PF_NUM_PF_MSIX=
228
- INTERNAL_CPU_MODEL=
1
- INTERNAL_CPU_OFFLOAD_ENGINE=
0
- SRIOV_EN=
1
- NUM_OF_VFS=
46
- LAG_RESOURCE_ALLOCATION=
1
ovs:
rawConfigScript: |
_ovs-vsctl() {
ovs-vsctl --no-wait --timeout
15
"$@"
}
_ovs-vsctl set Open_vSwitch . other_config:doca-init=
true
_ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=
50000
_ovs-vsctl set Open_vSwitch . other_config:hw-offload=
true
_ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=
true
_ovs-vsctl set Open_vSwitch . other_config:max-idle=
20000
_ovs-vsctl set Open_vSwitch . other_config:max-revalidator=
5000
_ovs-vsctl --
if-exists del-br ovsbr1
_ovs-vsctl --
if-exists del-br ovsbr2
_ovs-vsctl --may-exist add-br br-sfc
_ovs-vsctl set bridge br-sfc datapath_type=netdev
_ovs-vsctl set bridge br-sfc fail_mode=secure
_ovs-vsctl --may-exist add-port br-sfc p0
_ovs-vsctl set Interface p0 type=dpdk
_ovs-vsctl set Interface p0 mtu_request=
9216
_ovs-vsctl set Port p0 external_ids:dpf-type=physical
_ovs-vsctl --may-exist add-port br-sfc p1
_ovs-vsctl set Interface p1 type=dpdk
_ovs-vsctl set Interface p1 mtu_request=
9216
_ovs-vsctl set Port p1 external_ids:dpf-type=physical
Change the
dpudeployment.yaml file to reference the DPUFlavor suited for performance:
manifests/04-dpudeployment-installation/dpudeployment.yaml
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUDeployment
metadata:
name: hbn-only
namespace: dpf-operator-system
spec:
dpus:
bfb: bf-bundle
flavor: dpf-provisioning-hbn
nodeEffect:
noEffect:
true
dpuSets:
- nameSuffix:
"dpuset1"
nodeSelector:
matchLabels:
feature.node.kubernetes.io/dpu-enabled:
"true"
services:
doca-hbn:
serviceTemplate: doca-hbn
serviceConfiguration: doca-hbn
serviceChains:
switches:
- ports:
- serviceInterface:
matchLabels:
uplink: p0
- service:
name: doca-hbn
interface: p0_if
- ports:
- serviceInterface:
matchLabels:
uplink: p1
- service:
name: doca-hbn
interface: p1_if
- ports:
- serviceInterface:
matchLabels:
interface: hostpf0
- service:
interface: pf0hpf_if
name: doca-hbn
- ports:
- serviceInterface:
matchLabels:
interface: hostpf1
- service:
interface: pf1hpf_if
name: doca-hbn
Change the rest of the configuration files.
As explained in the introduction, these files create service chains that connect two physical functions PF0 and PF1 to the outer fabric through HBN, providing EVPN VXLAN overlay, VNI based isolation, and ECMP redundancy across both DPU uplinks (p0 and p1).
These are the configuration files:
- HBN DPUServiceConfig and DPUServiceTemplate to deploy HBN workloads to the DPUs.
manifests/04-dpudeployment-installation/hbn-dpuserviceconfig.yaml
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceConfiguration
metadata:
name: doca-hbn
namespace: dpf-operator-system
spec:
deploymentServiceName:
"doca-hbn"
serviceConfiguration:
serviceDaemonSet:
annotations:
k8s.v1.cni.cncf.io/networks: |-
[
{
"name":
"iprequest",
"interface":
"ip_lo",
"cni-args": {
"poolNames": [
"loopback"],
"poolType":
"cidrpool"}},
{
"name":
"iprequest",
"interface":
"ip_pf0hpf",
"cni-args": {
"poolNames": [
"pool1"],
"poolType":
"cidrpool",
"allocateDefaultGateway":
true}},
{
"name":
"iprequest",
"interface":
"ip_pf1hpf",
"cni-args": {
"poolNames": [
"pool2"],
"poolType":
"cidrpool",
"allocateDefaultGateway":
true}}
]
helmChart:
values:
configuration:
perDPUValuesYAML: |
- hostnamePattern:
"*"
values:
bgp_peer_group: hbn
vrf1: RED
vrf2: BLUE
l2vni1:
10010
l2vni2:
10020
l3vni1:
100001
l3vni2:
100002
- hostnamePattern:
"dpu-device-1"
values:
vlan1:
11
vlan2:
21
bgp_autonomous_system:
65101
- hostnamePattern:
"dpu-device-2"
values:
vlan1:
12
vlan2:
22
bgp_autonomous_system:
65201
startupYAMLJ2: |
- header:
model: bluefield
nvue-api-version: nvue_v1
rev-id:
1.0
version: HBN
2.4.
0
- set:
bridge:
domain:
br_default:
vlan:
{{ config.vlan1 }}:
vni:
{{ config.l2vni1 }}: {}
{{ config.vlan2 }}:
vni:
{{ config.l2vni2 }}: {}
evpn:
enable: on
route-advertise: {}
interface:
lo:
ip:
address:
{{ ipaddresses.ip_lo.ip }}/
32: {}
type: loopback
p0_if,p1_if,pf0hpf_if,pf1hpf_if:
type: swp
link:
mtu:
9000
pf0hpf_if:
bridge:
domain:
br_default:
access: {{ config.vlan1 }}
pf1hpf_if:
bridge:
domain:
br_default:
access: {{ config.vlan2 }}
vlan{{ config.vlan1 }}:
ip:
address:
{{ ipaddresses.ip_pf0hpf.cidr }}: {}
vrf: {{ config.vrf1 }}
vlan: {{ config.vlan1 }}
vlan{{ config.vlan1 }},{{ config.vlan2 }}:
type: svi
vlan{{ config.vlan2 }}:
ip:
address:
{{ ipaddresses.ip_pf1hpf.cidr }}: {}
vrf: {{ config.vrf2 }}
vlan: {{ config.vlan2 }}
nve:
vxlan:
arp-nd-suppress: on
enable: on
source:
address: {{ ipaddresses.ip_lo.ip }}
router:
bgp:
enable: on
graceful-restart:
mode: full
vrf:
default:
router:
bgp:
address-family:
ipv4-unicast:
enable: on
redistribute:
connected:
enable: on
l2vpn-evpn:
enable: on
autonomous-system: {{ config.bgp_autonomous_system }}
enable: on
neighbor:
p0_if:
peer-group: {{ config.bgp_peer_group }}
type: unnumbered
p1_if:
peer-group: {{ config.bgp_peer_group }}
type: unnumbered
path-selection:
multipath:
aspath-ignore: on
peer-group:
{{ config.bgp_peer_group }}:
address-family:
ipv4-unicast:
enable: on
l2vpn-evpn:
enable: on
remote-as: external
router-id: {{ ipaddresses.ip_lo.ip }}
{{ config.vrf1 }}:
evpn:
enable: on
vni:
{{ config.l3vni1 }}: {}
loopback:
ip:
address:
{{ ipaddresses.ip_lo.ip }}/
32: {}
router:
bgp:
address-family:
ipv4-unicast:
enable: on
redistribute:
connected:
enable: on
route-export:
to-evpn:
enable: on
autonomous-system: {{ config.bgp_autonomous_system }}
enable: on
router-id: {{ ipaddresses.ip_lo.ip }}
{{ config.vrf2 }}:
evpn:
enable: on
vni:
{{ config.l3vni2 }}: {}
loopback:
ip:
address:
{{ ipaddresses.ip_lo.ip }}/
32: {}
router:
bgp:
address-family:
ipv4-unicast:
enable: on
redistribute:
connected:
enable: on
route-export:
to-evpn:
enable: on
autonomous-system: {{ config.bgp_autonomous_system }}
enable: on
router-id: {{ ipaddresses.ip_lo.ip }}
interfaces:
- name: p0_if
network: mybrhbn
- name: p1_if
network: mybrhbn
- name: pf0hpf_if
network: mybrhbn
- name: pf1hpf_if
network: mybrhbn
manifests/04-dpudeployment-installation/hbn-dpuservicetemplate.yaml
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceTemplate
metadata:
name: doca-hbn
namespace: dpf-operator-system
spec:
deploymentServiceName:
"doca-hbn"
helmChart:
source:
repoURL: $HELM_REGISTRY_REPO_URL
version:
1.0.
2
chart: doca-hbn
values:
image:
repository: $HBN_NGC_IMAGE_URL
tag:
3.0.
0-doca3.
0.0
resources:
memory: 6Gi
nvidia.com/bf_sf:
4
- Physical Interfaces for physical ports on the DPU.
manifests/04-dpudeployment-installation/physical-ifaces.yaml
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
name: p0
namespace: dpf-operator-system
spec:
template:
spec:
template:
metadata:
labels:
uplink:
"p0"
spec:
interfaceType: physical
physical:
interfaceName: p0
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
name: p1
namespace: dpf-operator-system
spec:
template:
spec:
template:
metadata:
labels:
uplink:
"p1"
spec:
interfaceType: physical
physical:
interfaceName: p1
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
name: hostpf0
namespace: dpf-operator-system
spec:
template:
spec:
template:
metadata:
labels:
interface:
"hostpf0"
spec:
interfaceType: pf
pf:
pfID:
0
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceInterface
metadata:
name: hostpf1
namespace: dpf-operator-system
spec:
template:
spec:
template:
metadata:
labels:
interface:
"hostpf1"
spec:
interfaceType: pf
pf:
pfID:
1
- DPU Service IPAM objects to set up IP Address Management on the DPUCluster.
manifests/04-dpudeployment-installation/hbn-ipam.yaml
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceIPAM
metadata:
name: pool1
namespace: dpf-operator-system
spec:
ipv4Network:
network:
"10.0.121.0/24"
gatewayIndex:
2
prefixSize:
29
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceIPAM
metadata:
name: pool2
namespace: dpf-operator-system
spec:
ipv4Network:
network:
"10.0.122.0/24"
gatewayIndex:
2
prefixSize:
29
manifests/04-dpudeployment-installation/hbn-loopback-ipam.yaml
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceIPAM
metadata:
name: loopback
namespace: dpf-operator-system
spec:
ipv4Network:
network:
"11.0.0.0/24"
prefixSize:
32
It is necessary to set several environment variables before running this command.
$ source export_vars.env
Apply all of the YAML files mentioned above using the following command:
Jump Node Console
$ cat manifests/04-dpudeployment-installation/hbn-dpuflavor.yaml | envsubst | kubectl apply -f -
$ cat manifests/04-dpudeployment-installation/dpudeployment.yaml | envsubst | kubectl apply -f -
$ cat manifests/04-dpudeployment-installation/hbn-dpuserviceconfig.yaml | envsubst | kubectl apply -f -
$ cat manifests/04-dpudeployment-installation/hbn-dpuservicetemplate.yaml | envsubst | kubectl apply -f -
$ cat manifests/04-dpudeployment-installation/physical-ifaces.yaml | envsubst | kubectl apply -f -
$ cat manifests/04-dpudeployment-installation/hbn-ipam.yaml | envsubst | kubectl apply -f -
$ cat manifests/04-dpudeployment-installation/hbn-loopback-ipam.yaml | envsubst | kubectl apply -f -
Verify the DPUService installation by ensuring that:
- DPUServices are created and reconciled
- DPUServiceIPAMs are reconciled
- DPUServiceInterfaces are reconciled, and
- DPUServiceChains are reconciled.
These verification commands may need to be run multiple times to ensure the conditions are met.
Jump Node Console
$ kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices --all
dpuservice.svc.dpu.nvidia.com/doca-hbn-frbpp condition met
dpuservice.svc.dpu.nvidia.com/flannel condition met
dpuservice.svc.dpu.nvidia.com/multus condition met
dpuservice.svc.dpu.nvidia.com/nvidia-k8s-ipam condition met
dpuservice.svc.dpu.nvidia.com/ovs-cni condition met
dpuservice.svc.dpu.nvidia.com/ovs-helper condition met
dpuservice.svc.dpu.nvidia.com/servicechainset-controller condition met
dpuservice.svc.dpu.nvidia.com/servicechainset-rbac-and-crds condition met
dpuservice.svc.dpu.nvidia.com/sfc-controller condition met
dpuservice.svc.dpu.nvidia.com/sriov-device-plugin condition met
$ kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all
dpuserviceipam.svc.dpu.nvidia.com/loopback condition met
dpuserviceipam.svc.dpu.nvidia.com/pool1 condition met
dpuserviceipam.svc.dpu.nvidia.com/pool2 condition met
$ kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all
dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-p0-if-srmbd condition met
dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-p1-if-zgskx condition met
dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-pf0hpf-if-fdpkr condition met
dpuserviceinterface.svc.dpu.nvidia.com/doca-hbn-pf1hpf-if-xgdds condition met
dpuserviceinterface.svc.dpu.nvidia.com/hostpf0 condition met
dpuserviceinterface.svc.dpu.nvidia.com/hostpf1 condition met
dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met
dpuserviceinterface.svc.dpu.nvidia.com/p1 condition met
$ kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all
dpuservicechain.svc.dpu.nvidia.com/hbn-only-8xrrx condition met
To follow the progress of DPU provisioning, run the following command to check its current phase:
Jump Node Console
$ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'"
Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase' setup5-jump: Wed May 21 10:45:44 2025
Dpu Node Name: dpuworker1
Type: InternalIP
Type: Hostname
Last Transition Time: 2025-05-21T07:23:09Z
Type: Initialized
Last Transition Time: 2025-05-21T07:23:09Z
Type: BFBReady
Last Transition Time: 2025-05-21T07:23:11Z
Type: NodeEffectReady
Last Transition Time: 2025-05-21T07:23:15Z
Type: InterfaceInitialized
Last Transition Time: 2025-05-21T07:23:17Z
Type: FWConfigured
Last Transition Time: 2025-05-21T07:23:18Z
Type: BFBPrepared
Last Transition Time: 2025-05-21T07:27:25Z
Type: OSInstalled
Last Transition Time: 2025-05-21T07:44:54Z
Type: Rebooted
Dpu Node Name: dpuworker2
Type: InternalIP
Type: Hostname
Last Transition Time: 2025-05-21T07:23:08Z
Type: Initialized
Last Transition Time: 2025-05-21T07:23:09Z
Type: BFBReady
Last Transition Time: 2025-05-21T07:23:09Z
Type: NodeEffectReady
Last Transition Time: 2025-05-21T07:23:12Z
Type: InterfaceInitialized
Last Transition Time: 2025-05-21T07:23:14Z
Type: FWConfigured
Last Transition Time: 2025-05-21T07:23:15Z
Type: BFBPrepared
Last Transition Time: 2025-05-21T07:27:23Z
Type: OSInstalled
Last Transition Time: 2025-05-21T07:45:01Z
Type: Rebooted
Wait for the Rebooted stage and then Power Cycle the bare-metal host manual.
After the DPU is up, run following command for each DPU worker:
Jump Node Console
$ kubectl annotate dpunodes -n dpf-operator-system dpuworker1 provisioning.dpu.nvidia.com/dpunode-external-reboot-required-
$ kubectl annotate dpunodes -n dpf-operator-system dpuworker2 provisioning.dpu.nvidia.com/dpunode-external-reboot-required-
At this point, the DPU workers should be added to the cluster. As they being added to the cluster, the DPUs are provisioned.
Jump Node Console
$ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'"
Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase' setup5-jump: Wed May 21 10:45:44 2025
Dpu Node Name: dpuworker1
Type: InternalIP
Type: Hostname
Last Transition Time: 2025-05-21T07:23:09Z
Type: Initialized
Last Transition Time: 2025-05-21T07:23:09Z
Type: BFBReady
Last Transition Time: 2025-05-21T07:23:11Z
Type: NodeEffectReady
Last Transition Time: 2025-05-21T07:23:15Z
Type: InterfaceInitialized
Last Transition Time: 2025-05-21T07:23:17Z
Type: FWConfigured
Last Transition Time: 2025-05-21T07:23:18Z
Type: BFBPrepared
Last Transition Time: 2025-05-21T07:27:25Z
Type: OSInstalled
Last Transition Time: 2025-05-21T07:44:54Z
Type: Rebooted
Last Transition Time: 2025-05-21T07:44:54Z
Type: DPUClusterReady
Last Transition Time: 2025-05-21T07:44:55Z
Type: Ready
Phase: Ready
Dpu Node Name: dpuworker2
Type: InternalIP
Type: Hostname
Last Transition Time: 2025-05-21T07:23:08Z
Type: Initialized
Last Transition Time: 2025-05-21T07:23:09Z
Type: BFBReady
Last Transition Time: 2025-05-21T07:23:09Z
Type: NodeEffectReady
Last Transition Time: 2025-05-21T07:23:12Z
Type: InterfaceInitialized
Last Transition Time: 2025-05-21T07:23:14Z
Type: FWConfigured
Last Transition Time: 2025-05-21T07:23:15Z
Type: BFBPrepared
Last Transition Time: 2025-05-21T07:27:23Z
Type: OSInstalled
Last Transition Time: 2025-05-21T07:45:01Z
Type: Rebooted
Last Transition Time: 2025-05-21T07:45:01Z
Type: DPUClusterReady
Last Transition Time: 2025-05-21T07:45:02Z
Type: Ready
Phase: Ready
Finally, validate that all the different DPU-related objects are now in the Ready state:
Jump Node Console
$ kubectl get secrets -n dpu-cplane-tenant1 dpu-cplane-tenant1-admin-kubeconfig -o json | jq -r '.data["admin.conf"]' | base64 --decode > /home/depuser/dpu-cluster.config
$ KUBECONFIG=/home/depuser/dpu-cluster.config k get node -A
NAME STATUS ROLES AGE VERSION
dpu-device-1 Ready <none> 94s v1.30.12
dpu-device-2 Ready <none> 84s v1.30.12
$ kubectl get dpu -A
NAMESPACE NAME READY PHASE AGE
dpf-operator-system dpu-device-1 True Ready 23m
dpf-operator-system dpu-device-2 True Ready 23m
$ kubectl wait --for=condition=ready --namespace dpf-operator-system dpu --all
dpu.provisioning.dpu.nvidia.com/dpu-device-1 condition met
dpu.provisioning.dpu.nvidia.com/dpu-device-2 condition met
Congratulations, the DPF system with HBN service has been successfully installed!
Zero-Trust Mode Checking
Here's a step-by-step procedure to check the Zero-Trust Mode on your NVIDIA BlueField DPU from the host server, including the installation of the Mellanox Firmware Tools (MFT).
- Navigate to the NVIDIA Downloads Site: Open your web browser and go to the official NVIDIA Mellanox software downloads page.
Select the Latest Version for your OS:
Transfer and Extract MFT Tools on the Worker 1 BareMetal Host.
First Pod Console
root@worker1:~# tar -xvzf /tmp/mft-4.32.0-120-x86_64-deb.tgz
Navigate into the Extracted Directory.
First Pod Console
root@worker1:~# cd mft-4.32.0-120-x86_64-deb/
Run the Installation Script.
First Pod Console
root@worker1:~# ./install.sh
Start MST (Mellanox Software Tools) Service and Identify DPU Device Name.
First Pod Console
root@worker1:~# mst start root@worker1:~# mst status MST modules: ------------ MST PCI module is not loaded MST PCI configuration module is not loaded PCI Devices: ------------ 2b:00.0 # 2b:00.0 - NVIDIA BlueField-3 VPI FHHL Adapter
Perform Zero-Trust Checking.
First Pod Console
root@worker1:~# mlxprivhost -d 2b:00.0 q Host configurations ------------------- level : RESTRICTED Port functions status: ----------------------- disable_rshim : TRUE disable_tracer : TRUE disable_port_owner : TRUE disable_counter_rd : TRUE #Expected Zero-Trust Output.
This is the most definitive confirmation.
level : RESTRICTEDmeans the host is in Zero-Trust Mode, and the
TRUEflags confirm individual security restrictions are active.
Check Firmware Access with
mlxfwmanager:
First Pod Console
root@worker1:~# mlxfwmanager -d 2b:00.0 --query Querying Mellanox devices firmware ... Device #1: ---------- Device Type: BlueField3 Part Number: -- Description: PSID: PCI Device Name: 2b:00.0 Base MAC: N/A Versions: Current Available FW -- Status: Failed to open device # Expected Zero-Trust Output
"Failed to open device" indicates the host is blocked from accessing the DPU for firmware operations, a key aspect of Zero-Trust.
Check Device Configuration with
mlxconfig:
First Pod Console
mlxconfig -d 2b:00.0 q Device #1: ---------- Device type: BlueField3 Name: 900-9D3B6-00CV-A_Ax Description: NVIDIA BlueField-3 B3220 P-Series FHHL DPU; 200GbE (default mode) / NDR200 IB; Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Enabled Device: 2b:00.0 Configurations: Next Boot RO MODULE_SPLIT_M0 Array[0..15] RO MODULE_SPLIT_M1 Array[0..15] ... PORT_OWNER True(1) # No RO, but restricted by mlxprivhost ALLOW_RD_COUNTERS True(1) # No RO, but restricted by mlxprivhost TRACER_ENABLE True(1) # No RO, but restricted by mlxprivhost
Most configuration parameters will be prefixed with
RO(Read-Only). Parameters related to direct host control, like
PORT_OWNER,
ALLOW_RD_COUNTERS,
TRACER_ENABLE, even if shown as
True(1)for the DPU's internal capability, will be unenforcible by the host due to the
mlxprivhostrestrictions. The widespread
ROstatus shows that the host cannot modify these configurations, reinforcing the DPU's autonomous and secure state. The few parameters without
ROare still overridden by the
mlxprivhostsecurity policy.
Check Low-Level Hardware Access with
ethtool:
First Pod Console
root@worker1:~# ethtool -d ens1f0np0 Cannot get register dump: Operation not supported
This confirms the DPU is preventing deep, low-level hardware access from the host, aligning with Zero-Trust's isolation goals.
Conclusion
The command outputs of
mlxprivhost,
mlxfwmanager,
mlxconfig (showing
RO flags), and
ethtool (showing "Operation not supported"), then your NVIDIA BlueField DPU is indeed operating in Zero-Trust Mode.
This means the host has significantly restricted privileges and cannot perform sensitive operations on the DPU, ensuring its security and isolation.
Infrastructure Bandwidth & Latency Validation
Verify the deployment and confirm that the DPU system achieves link-speed performance and low latency by running various tests:
- Iperf TCP—for bandwidth measurements
- RDMA—for bandwidth and latency measurements
- Network isolation
Each test is described in detail. At the end of each test, the achieved performance is displayed.
Make sure that the servers are tuned for maximum performance (not covered in this document).
Performance and Isolation Tests
Now that the test deployment is running, perform bandwidth and latency performance tests between two bare-metal workload servers.
Ubuntu 24.04 was installed on the servers.
Before running the tests, check the Gateway address on each HBN pod:
Jump Node Console
$ ki get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dpf-operator-system dpu-cplane-tenant1-doca-hbn-gt6xf-ds-q8d6x 2/2 Running 0 2m28s 10.244.1.33 dpu-device-2 <none> <none>
dpf-operator-system dpu-cplane-tenant1-doca-hbn-gt6xf-ds-vwc6h 2/2 Running 0 2m35s 10.244.0.37 dpu-device-1 <none> <none>
...
$ ki exec -it -n dpf-operator-system dpu-cplane-tenant1-doca-hbn-gt6xf-ds-vwc6h -- bash
Defaulted container "doca-hbn" out of: doca-hbn, hbn-sidecar, hbn-init (init)
root@dpu-cplane-tenant1-doca-hbn-qldl6-ds-dh5bv:/tmp# ip a s
...
9: vlan21@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master BLUE state UP group default qlen 1000
link/ether 22:f2:b0:81:79:f6 brd ff:ff:ff:ff:ff:ff
inet 10.0.122.2/29 scope global vlan21
valid_lft forever preferred_lft forever
inet6 fe80::20f2:b0ff:fe81:79f6/64 scope link
valid_lft forever preferred_lft forever
...
12: vlan11@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master RED state UP group default qlen 1000
link/ether 22:f2:b0:81:79:f6 brd ff:ff:ff:ff:ff:ff
inet 10.0.121.2/29 scope global vlan11
valid_lft forever preferred_lft forever
inet6 fe80::20f2:b0ff:fe81:79f6/64 scope link
valid_lft forever preferred_lft forever
...
$ exit
$ ki exec -it -n dpf-operator-system dpu-cplane-tenant1-doca-hbn-gt6xf-ds-q8d6x -- bash
Defaulted container "doca-hbn" out of: doca-hbn, hbn-sidecar, hbn-init (init)
root@dpu-cplane-tenant1-doca-hbn-qldl6-ds-lvjrx:/tmp# ip a s
...
9: vlan22@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master BLUE state UP group default qlen 1000
link/ether 5e:a4:c0:72:ac:11 brd ff:ff:ff:ff:ff:ff
inet 10.0.122.10/29 scope global vlan22
valid_lft forever preferred_lft forever
inet6 fe80::5ca4:c0ff:fe72:ac11/64 scope link
valid_lft forever preferred_lft forever
...
12: vlan12@br_default: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master RED state UP group default qlen 1000
link/ether 5e:a4:c0:72:ac:11 brd ff:ff:ff:ff:ff:ff
inet 10.0.121.10/29 scope global vlan12
valid_lft forever preferred_lft forever
inet6 fe80::5ca4:c0ff:fe72:ac11/64 scope link
valid_lft forever preferred_lft forever
...
$ exit
Connect to a first Workload Server console, install iperf, perftest, check DPU Hight Speed Interfaces, set route to ethernet and identify the relevant RDMA device:
First Pod Console
root@worker1:~# apt install iperf
root@worker1:~# apt install perftest
root@worker1:~# ip a s
...
6: ens1f0np0: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 58:a2:e1:73:69:e6 brd ff:ff:ff:ff:ff:ff
altname enp43s0f0np0
7: ens1f1np1: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN group default qlen 1000
link/ether 58:a2:e1:73:69:e7 brd ff:ff:ff:ff:ff:ff
altname enp43s0f1np1
...
root@worker1:~# ip route add 172.169.50.0/30 via 10.0.121.2
depuser@worker2:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=5.35 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=5.10 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=117 time=5.15 ms
root@worker1:~# rdma link | grep ens1f0np0
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev ens1f0np0
root@worker1:~# rdma link | grep ens1f1np1
link mlx5_1/1 state ACTIVE physical_state LINK_UP netdev ens1f1np1
Configure VRF with two interfaces on Ubuntu 24.04 using (
ens1f0np0 in VRF
red
, and
ens1f1np1 in VRF
blue ) .
Configuration Overview
Interface
IP Address
Default Gateway
VRF
Routing Table
ens1f0np0
10.0.121.1/29
10.0.121.2/29
red
1001
ens1f1np1
10.0.122.1/29
10.0.122.2/29
blue
1002
First Pod Console
# Load VRF module
root@worker1:~# modprobe vrf
root@worker1:~# echo vrf | tee -a /etc/modules
# Create VRF devices
root@worker1:~# ip link add vrf-red type vrf table 1001
root@worker1:~# ip link add vrf-blue type vrf table 1002
# Bring up VRF devices
root@worker1:~# ip link set dev vrf-red up
root@worker1:~# ip link set dev vrf-blue up
# Assign interfaces to VRFs
root@worker1:~# ip link set dev ens1f0np0 master vrf-red
root@worker1:~# ip link set dev ens1f1np1 master vrf-blue
# Bring up physical interfaces
root@worker1:~# ip link set dev ens1f0np0 up
root@worker1:~# ip link set dev ens1f1np1 up
# Assign IP addresses
root@worker1:~# ip addr add 10.0.121.1/29 dev ens1f0np0
root@worker1:~# ip addr add 10.0.122.1/29 dev ens1f1np1
# Set default routes per VRF
root@worker1:~# ip route add table 1001 default via 10.0.121.2 dev ens1f0np0
root@worker1:~# ip route add table 1002 default via 10.0.122.2 dev ens1f1np1
Using another console window , reconnect to the jump node and connect to a second Workload Server .
From within the servers, install iperf, perftest , check DPU Hight Speed Interfaces, set route to ethernet and identify the relevant RDMA device:
First Pod Console
root@worker2:~# apt install iperf
root@worker2:~# apt install perftest
root@worker2:~# ip a s
...
6: ens1f0np0: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 58:a2:e1:73:6a:58 brd ff:ff:ff:ff:ff:ff
altname enp43s0f0np0
7: ens1f1np1: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 58:a2:e1:73:6a:59 brd ff:ff:ff:ff:ff:ff
altname enp43s0f1np1
...
root@worker2:~# ip route add 172.169.50.0/30 via 10.0.121.10
depuser@worker2:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=5.35 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=5.10 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=117 time=5.15 ms
root@worker2:~# rdma link | grep ens1f0np0
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev ens1f0np0
root@worker2:~# rdma link | grep ens1f1np1
link mlx5_1/1 state ACTIVE physical_state LINK_UP netdev ens1f1np1
Configure VRF with two interfaces on Ubuntu 24.04 using
iproute2: Assign (
ens1f0np0 to VRF
red
, and
ens1f1np1 to VRF
blue ).
Configuration Overview
Interface
IP Address
Default Gateway
VRF
Routing Table
ens1f0np0
10.0.121.9/29
10.0.121.10/29
red
1001
ens1f1np1
10.0.122.9/29
10.0.122.10/29
blue
1002
First Pod Console
# Load VRF module
root@worker2:~# modprobe vrf
root@worker2:~# echo vrf | tee -a /etc/modules
# Create VRF devices
root@worker2:~# ip link add vrf-red type vrf table 1001
root@worker2:~# ip link add vrf-blue type vrf table 1002
# Bring up VRF devices
root@worker2:~# ip link set dev vrf-red up
root@worker2:~# ip link set dev vrf-blue up
# Assign interfaces to VRFs
root@worker2:~# ip link set dev ens1f0np0 master vrf-red
root@worker2:~# ip link set dev ens1f1np1 master vrf-blue
# Bring up physical interfaces
root@worker2:~# ip link set dev ens1f0np0 up
root@worker2:~# ip link set dev ens1f1np1 up
# Assign IP addresses
root@worker2:~# ip addr add 10.0.121.9/29 dev ens1f0np0
root@worker2:~# ip addr add 10.0.122.9/29 dev ens1f1np1
# Set default routes per VRF
root@worker2:~# ip route add table 1001 default via 10.0.121.10 dev ens1f0np0
root@worker2:~# ip route add table 1002 default via 10.0.122.10 dev ens1f1np1
iPerf TCP Bandwidth Test
Move back to the first server console.
Start the
iperf
server side:
First BM Server Console
root
@worker1:~# iperf -s
------------------------------------------------------------
Server listening on TCP port
5001
TCP window size:
128 KByte (
default)
------------------------------------------------------------
Move to the second server console.
Start the
iperf
client side:
Second BM Server Console
root
@worker2:~# iperf -c
10.0.
121.1 -P
16
------------------------------------------------------------
Client connecting to
10.0.
121.1, TCP port
5001
TCP window size:
16.0 KByte (
default)
------------------------------------------------------------
[
9] local
10.0.
121.9 port
48620 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
827)
[
10] local
10.0.
121.9 port
48610 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
881)
[
1] local
10.0.
121.9 port
48712 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
608)
[
14] local
10.0.
121.9 port
48728 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
722)
[
11] local
10.0.
121.9 port
48710 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
870)
[
4] local
10.0.
121.9 port
48622 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
945)
[
7] local
10.0.
121.9 port
48690 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
906)
[
15] local
10.0.
121.9 port
48736 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
689)
[
2] local
10.0.
121.9 port
48616 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
796)
[
3] local
10.0.
121.9 port
48618 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
940)
[
12] local
10.0.
121.9 port
48706 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
892)
[
16] local
10.0.
121.9 port
48696 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
810)
[
8] local
10.0.
121.9 port
48626 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
801)
[
6] local
10.0.
121.9 port
48692 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
891)
[
5] local
10.0.
121.9 port
48624 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
931)
[
13] local
10.0.
121.9 port
48686 connected with
10.0.
121.1 port
5001 (icwnd/mss/irtt=
14/
1448/
903)
[ ID] Interval Transfer Bandwidth
[
3]
0.0000-
10.0058 sec
14.1 GBytes
12.1 Gbits/sec
[
13]
0.0000-
10.0057 sec
14.2 GBytes
12.2 Gbits/sec
[
7]
0.0000-
10.0056 sec
13.4 GBytes
11.5 Gbits/sec
[
12]
0.0000-
10.0057 sec
15.2 GBytes
13.1 Gbits/sec
[
4]
0.0000-
10.0058 sec
14.1 GBytes
12.1 Gbits/sec
[
11]
0.0000-
10.0058 sec
15.8 GBytes
13.6 Gbits/sec
[
8]
0.0000-
10.0057 sec
13.9 GBytes
11.9 Gbits/sec
[
9]
0.0000-
10.0058 sec
13.8 GBytes
11.9 Gbits/sec
[
15]
0.0000-
10.0057 sec
14.3 GBytes
12.3 Gbits/sec
[
16]
0.0000-
10.0058 sec
14.6 GBytes
12.5 Gbits/sec
[
1]
0.0000-
10.0057 sec
14.6 GBytes
12.6 Gbits/sec
[
6]
0.0000-
10.0058 sec
13.1 GBytes
11.3 Gbits/sec
[
14]
0.0000-
10.0059 sec
13.6 GBytes
11.6 Gbits/sec
[
10]
0.0000-
10.0055 sec
13.5 GBytes
11.6 Gbits/sec
[
2]
0.0000-
10.0057 sec
14.0 GBytes
12.0 Gbits/sec
[
5]
0.0000-
10.0058 sec
14.6 GBytes
12.6 Gbits/sec
[SUM]
0.0000-
10.0010 sec
227 GBytes
195 Gbits/sec
RoCE Latency Test
Return to the first server console.
Start the
ib_read_lat
server side:
First BM Server Console
root
@worker1:~# ib_read_lat -F -n
20000 -d mlx5_0
************************************
* Waiting
for client to connect... *
************************************
Move to the second server console.
Start the
ib_read_lat
client side:
Second BM Server Console
root
@worker2:~# ib_read_lat -F -n
20000 -d mlx5_0
10.0.
121.1
---------------------------------------------------------------------------------------
RDMA_Read Latency Test
Dual-port : OFF Device : mlx5_0
Number of qps :
1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth :
1
Mtu :
4096[B]
Link type : Ethernet
GID index :
3
Outstand reads :
16
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID
0000 QPN
0x008a PSN
0xe8a46 OUT
0x10 RKey
0x182f00 VAddr
0x0057f6160ce000
GID:
00:
00:
00:
00:
00:
00:
00:
00:
00:
00:
255:
255:
10:
00:
121:
09
remote address: LID
0000 QPN
0x008a PSN
0x726a6b OUT
0x10 RKey
0x182f00 VAddr
0x005d394be5f000
GID:
00:
00:
00:
00:
00:
00:
00:
00:
00:
00:
255:
255:
10:
00:
121:
01
---------------------------------------------------------------------------------------
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec]
99% percentile[usec]
99.9% percentile[usec]
2
20000
3.74
68.79
3.83
8.02
7.82
34.19
41.62
---------------------------------------------------------------------------------------
RoCE Bandwidth Test
Return to the first server console.
Start the
ib_write_bw
server side:
First BM Server Console
root
@worker1:~# ib_write_bw -s
1048576 -F -D
30 -q
64 -d mlx5_0
************************************
* Waiting
for client to connect... *
************************************
Move to the second server console.
Start the
ib_write_bw
client side:
Second BM Server Console
root
@worker2:~# ib_write_bw -s
1048576 -F -D
30 -q
64 -d mlx5_0
10.0.
121.1 --report_gbit
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps :
64 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
TX depth :
128
CQ Moderation :
1
Mtu :
1024[B]
Link type : Ethernet
GID index :
3
Max inline data :
0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
…
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
1048576
448865
0.00
235.89
0.028120
---------------------------------------------------------------------------------------
Network Isolation Test
Finally, verify that the two servers running on different networks—using virtual functions on PF0 and PF1 can't communicate with each other.
Connect to the first workload server, with the PF0 network, and try to ping the PF0 on second server , with the PF0 network interface:
Run the
ping
commands from PF0 to
PF0
and PF1 to PF1 on the second server:
First BM Server Console
root
@worker1:~# ip vrf exec vrf-red ping -c
3
10.0.
121.9
PING
10.0.
121.9 (
10.0.
121.9)
56(
84) bytes of data.
64 bytes from
10.0.
121.9: icmp_seq=
1 ttl=
62 time=
0.885 ms
64 bytes from
10.0.
121.9: icmp_seq=
2 ttl=
62 time=
0.273 ms
64 bytes from
10.0.
121.9: icmp_seq=
3 ttl=
62 time=
0.214 ms
root
@worker1:~# ip vrf exec vrf-blue ping -c
3
10.0.
122.9
PING
10.0.
122.9 (
10.0.
122.9)
56(
84) bytes of data.
64 bytes from
10.0.
122.9: icmp_seq=
1 ttl=
62 time=
0.911 ms
64 bytes from
10.0.
122.9: icmp_seq=
2 ttl=
62 time=
0.278 ms
64 bytes from
10.0.
122.9: icmp_seq=
3 ttl=
62 time=
0.257 ms
Run the
ping
commands from PF0 to PF1 and PF1 to PF0
:
First BM Server Console
root
@worker1:~# ip vrf exec vrf-red ping -c
3
10.0.
122.1
PING
10.0.
122.1 (
10.0.
122.1)
56(
84) bytes of data.
From
10.0.
121.2 icmp_seq=
1 Destination Host Unreachable
From
10.0.
121.2 icmp_seq=
2 Destination Host Unreachable
From
10.0.
121.2 icmp_seq=
3 Destination Host Unreachable
---
10.0.
122.1 ping statistics ---
3 packets transmitted,
0 received, +
3 errors,
100% packet loss, time 2037ms
root
@worker1:~# ip vrf exec vrf-red ping -c
3
10.0.
122.9
PING
10.0.
122.9 (
10.0.
122.9)
56(
84) bytes of data.
^C
---
10.0.
122.9 ping statistics ---
3 packets transmitted,
0 received,
100% packet loss, time 2044ms
root
@worker1:~# ip vrf exec vrf-blue ping -c
3
10.0.
121.1
PING
10.0.
121.1 (
10.0.
121.1)
56(
84) bytes of data.
From
10.0.
122.2 icmp_seq=
1 Destination Host Unreachable
From
10.0.
122.2 icmp_seq=
2 Destination Host Unreachable
From
10.0.
122.2 icmp_seq=
3 Destination Host Unreachable
---
10.0.
121.1 ping statistics ---
3 packets transmitted,
0 received, +
3 errors,
100% packet loss, time 2033ms
root
@worker1:~# ip vrf exec vrf-blue ping -c
3
10.0.
121.9
PING
10.0.
121.9 (
10.0.
121.9)
56(
84) bytes of data.
From
10.0.
122.2 icmp_seq=
1 Destination Host Unreachable
From
10.0.
122.2 icmp_seq=
2 Destination Host Unreachable
From
10.0.
122.2 icmp_seq=
3 Destination Host Unreachable
---
10.0.
121.9 ping statistics ---
3 packets transmitted,
0 received, +
3 errors,
100% packet loss, time 2027ms
This ping operation should fail due to the network isolation implemented in HBN using different VLANs, VNIs and VRFs.
Authors
Boris Kovalev
Boris Kovalev has worked for the past several years as a Solutions Architect, focusing on NVIDIA Networking/Mellanox technology, and is responsible for complex machine learning, Big Data and advanced VMware-based cloud research and design. Boris previously spent more than 20 years as a senior consultant and solutions architect at multiple companies, most recently at VMware. He has written multiple reference designs covering VMware, machine learning, Kubernetes, and container solutions which are available at the NVIDIA Documents website.
