Component Description
DOCA Platform Framework (DPF) streamlines the provisioning and orchestration of NVIDIA BlueField DPUs in Kubernetes environments.
It implements a dual-cluster architecture:
- Host Cluster: Provisions and managed DPUs and hosting DPU Cluster control plane components. 
- DPU Cluster: Manages the lifecycle of services deployed on DPUs. 
Refer to the installation guide for setup details, dependencies, and prerequisites.
DPF is made up of the following sets of components:
- DPF Operator to install and configure the DPF system. 
- Provisioning components to manage the lifecycle of the DPUs including OS installation, configuration and Kubernetes Node creation. 
- DPUService components to manage the full lifecycle of services running on DPUs. 
- DPUServiceChain components to manage the lifecycle of ServiceFunctionChains on DPUs. 
 
This component set uses the Kamaji Cluster Manager - but other Cluster managers may be used.
In the host cluster control plane
- Kamaji system components - including the DPF Kamaji Cluster manager - Manage the lifecycle of a Kamaji pod-based control plane 
- Manage the Kamaji cluster load balancer 
- Communicate with the host control plane and DPU control plane 
 
- BFB controller - Download the BFB from a remote server 
- Communicate with the host control plane and remote BFB server 
 
- DPUSet controller - Create DPU objects and manage their lifecycle 
- Select a Kubernetes control plane for DPU Cluster nodes to join 
- Communicate with the host control plane 
 
- DPU controller - Flash the BFB to the DPU 
- Communicate with the DOCA Management Service 
 
On each node in the host cluster
- Node feature discovery - including DPU Detector - Add information about DPUs to the Kubernetes Node representing the host node 
- Communicate with the host control plane and host node filesystem 
 
- 
- Flash a BFB to DPU hardware 
- Communicate with the BlueField DPU and DPU Controller 
 
- Hostnetwork configuration - Configure up Virtual Functions, bridges and routes for Host to DPU communication 
- Communicate with the host node through CLI calls 
 
In the host cluster control plane
- DPUService controller - Manage the lifecycle of DPUServices created by users 
- Manage the lifecycle of ArgoCD Applications linked to DPUServices 
- Communicate with the host control plane 
 
- DPUDeployment controller - Manage the lifecycle of a group of DPUServices and DPUSets 
- Communicate with the host control plane 
 
- DPUServiceCredential controller - Manage authorization and authentication for communication between the host control plane and DPU control plane 
- Communicate with the host control plane and DPU control plane 
 
- ArgoCD system components - Manage lifecycle of helm charts on the DPU Cluster 
- Communicate with the host control plane and DPU control plane 
 
In the host cluster control plane
- DPUServiceInterface controller - Manage the lifecycle of DPUServiceInterfaces created by users 
- Communicate with the host control plane and DPU control plane 
 
- DPUServiceIPAM controller - Manage the lifecycle of the DPUServiceIPAM created by users 
- Communicate with the host control plane and DPU control plane 
 
- DPUServiceChain controller - Manage the lifecycle of the DPUServiceChain created by users 
- Communicate with the host control plane and DPU control plane 
 
In the DPU cluster control plane
- NVIDIA IPAM Plugin (NVIPAM) - Manage allocation of IPs in the DPU cluster 
- Communicate with the DPU control plane 
 
- ServiceChainSet controller - Manage the lifecycle of ServiceChainSets on the DPUCluster 
- Create ServiceChain objects for relevant DPU nodes 
- Communicate with the DPU control plane 
 
- ServiceInterfaceSet controller - Manage the lifecycle of ServiceInterfaceSets on the DPUCluster 
- Create ServiceInterface objects for relevant DPU nodes 
- Communicate with the DPU control plane 
 
On each node in the DPU cluster
- ServiceInterface controller - Creates ovs ports on DPU based on ServiceInterface objects 
- Communicate withe the DPU control plane and host OVS 
 
- ServiceChain controller - Create ovs flows on DPU based on ServiceChain objects 
- Communicate with the DPU control plane and DPU host OVS 
 
- ServiceFunctionChain CNI - Adds ovs network interfaces to pods 
- Communicate with Container Runtime Interface and host OVS 
 
- NVIDIA IPAM Plugin (NVIPAM) - Allocate IPs for pods on the DPU node 
- Communicate with the DPU control plane and host OS 
 
- Multus - Allocate network devices for pods on DPU nodes 
- Communicate with the DPU control plane, Container Runtime Ibnterface and host OS 
 
- SR-IOV Device Plugin - Manage the lifecycle of Virtual Functions on the DPU node 
- Communicate with the host OS and Kubelet 
 
DPF provisioning has four principle user flows.
Create a DPU Cluster
 
This is a prerequisite to provision a DPU with DPF. This flow is based on the Kamaji Cluster Manager - but other Cluster managers may be used.
- The user creates a DPUCluster object. 
- The DPUCluster manager creates an underlying Kamaji TenantControlPlane. 
- The Kamaji controllers create the cluster control plane pods. 
- The DPUCluster manager creates a load balancer for the Kamaji control plane. 
- The DPUCluster manager updates the DPUCluster with a kubeconfig for the Kamaji control plane. 
Provision a DPU
 
- Node Feature Discovery labels Kubernetes nodes with DPU information. 
- The user creates a BFB object. 
- The BFB controller downloads the BFB from a URL. 
- The user creates a DPUFlavor object if not using a default. 
- The user creates a DPUSet object which references both the BFB and the DPUFlavor. 
- The DPUSet controller creates a DPU object based on host node labels. 
- The DPU controller deploys DMS pod to a target node. 
- The DPU controller instructs DMS to install and configure the BFB on a target DPU. 
- The DPU controller instructs DMS to reboot the DPU node. 
- The DPU controller instructs DMS to reboot the Host node based on reboot policy. 
- The DPU controller deploys the Host Network Configuration pod on a target node. 
- The Host Network Configuration daemon ensures VFs are created on the host for the BlueField. 
- The Host Network Configuration daemon creates a bridge for network communication to the target DPU. 
- Kubeadm initializes a Kubernetes Node on the DPU and joins the DPUCluster. 
Update a DPU
 
- User updates the DPUSet altering its the BFB or DPUFlavor. 
- DPUSet controller deletes the target DPU object based on update policy rules. 
- DPU controller deletes the DPU Kubernetes node following - Delete a DPU.
- DPUSet controller creates a new DPU object for the target DPU. 
- The process in - Provision a DPUis followed to create a new DPU node with the updated DPU system.
Delete a DPU
 
- User deletes the DPUSet or alters the selector to reduce the number of DPU objects. 
- DPUSet controller deletes target DPU. 
- DPU controller deletes the DMS and Host Network Configuration static pods. 
- DPU controller deletes the DPU Kubernetes node. 
DPUService orchestration has four principle user flows.
Create a DPUServiceCredentialRequest
 
This is a prerequisite for DPUServices that are deployed in the host control plane but communicate with the DPU control plane and vice versa. Users must modify their DPUService to use the secret created by the DPUServiceCredentialRequest.
- User creates a DPUServiceCredentialRequest 
- DPUServiceCredentialRequest controller creates a service account in the target cluster. 
- DPUServiceCredentialRequest controller creates a secret with either a kubeconfig or token for the service account. 
Create DPUService
 
- User creates a DPUService object referencing a helm chart in a repository. 
- DPUService controller creates associated imagePullSecrets and namespaces in the DPUCluster. 
- DPUService controller creates an ArgoCD application with the DPUService helm chart and values. 
- ArgoCD system reconciles the ArgoCD application and installs the helm chart to the DPU cluster. 
Update DPUService
 
- User updates the DPUService object. 
- DPUService controller updates the ArgoCD application. 
- ArgoCD system components sync the updated application to the DPU cluster. 
Delete DPUServices
 
- User deletes the DPUService object. 
- DPUService controller deletes the ArgoCD application. 
- ArgoCD controller deletes the DPUService helm chart in the DPU cluster. 
TODO: DPUDeployment flow
 
- User creates the DPUServiceInterface object on the host cluster. 
- DPUServiceInterface controller creates the ServiceInterfaceSet object and syncs it to the DPU cluster. 
- ServiceInterfaceSet controller creates the ServiceInterface object for individual nodes based on the nodeSelector. 
- User refers to the DPUServiceInterface in DPUService to make actual use of it.