DOCA SDK Architecture
The DOCA SDK provides libraries for networking and data processing programmability, leveraging the NVIDIA® BlueField® networking platform (DPU or SuperNIC) and NVIDIA® ConnectX® NIC hardware accelerators.
The DOCA software framework is built on DOCA Core, which provides a unified software foundation for DOCA libraries. These libraries can be combined into processing pipelines or workflows that execute various offloaded tasks.
The DOCA SDK enables applications to offload resource-intensive tasks (e.g., encryption, compression) and network-related operations (e.g., packet acquisition, RDMA send) to specialized hardware.
Device Abstraction
The DOCA device subsystem provides an abstraction layer for the hardware processing units within BlueField and ConnectX devices. It allows applications to:
Discover available hardware acceleration units provided by networking platforms
Query capabilities and properties of these acceleration units
Open and configure devices for libraries to allocate and share resources required for hardware acceleration
A system may have multiple available devices. Applications can select a device based on topology (e.g., PCIe address) or capabilities (e.g., encryption support).
DOCA Device Types
DOCA Core defines two types of DOCA Devices:
Local device – A physical or virtual device exposed on the local system (BlueField or host). This includes:
Physical function (PF)
Virtual function (VF)
Scalable function (SF)
Representor device – A proxy device on the BlueField side that represents a host-side device. The host-side function (e.g., PF, VF, or SF) corresponds to a 1:1 mapped representor on BlueField.
The following figure provides an example of host local devices with representors on BlueField:

The diagram shows typical topology when using BlueField in DPU mode as described in NVIDIA BlueField DPU Modes of Operation .
The diagram shows BlueField (on the right side of the figure) connected to a host (on the left). The host has physical function PF0 with a child virtual function VF0.
The BlueField side has a representor-device per host function in a 1-to-1 ratio (e.g., hpf0
is the representor device for the host's PF0 device, etc.) as well as a representor for each SF function, such that both the SF and its representor reside in BlueField.
For more details on the DOCA Device subsystem, see section "DOCA Device".
Hardware-accelerated processing tasks require data buffers as inputs and outputs. The DOCA SDK employs zero-copy technology to maximize performance by avoiding unnecessary data movement.
To enable zero-copy, applications must pre-register memory before using it for data buffers. The memory management subsystem facilitates:
Memory registration:
Defines an application memory range for storing data buffers
Allows one or more devices to access the registered memory
Sets access permissions (e.g., read-only, read-write)
Data buffer allocation management:
Allocates data buffers within the registered memory range
Supports memory pooling over registered memory
DOCA memory has the following main components:
doca_buf
– Represents a data buffer used as input/output for DOCA library operationsdoca_mmap
– Describes registered memory accessible to devices with defined permissions. Eachdoca_buf
resides within adoca_mmap
memory range.doca_buf_inventory
– A pool ofdoca_buf
objects with shared characteristics (see more in sections "DOCA Core Buffers" and "DOCA Core Inventories")
The following diagram shows the various modules within the DOCA memory subsystem:

The diagram shows a doca_buf_inventory
containing 2 doca_buf
s. Each doca_buf
points to a portion of the memory buffer which is part of a doca_mmap
. The mmap is populated with one continuous memory range and is registered with 2 DOCA Devices, dev1
and dev2
.
For more details about DOCA Memory management subsystem, see section "DOCA Memory Subsystem".
DOCA libraries abstract low-level hardware operations, allowing applications to focus on high-level processing tasks such as encryption, packet processing, and compression.
Each DOCA library defines dedicated APIs for performing these tasks. These libraries interact with hardware processing units via contexts.
Task and Event Model
DOCA APIs follow an asynchronous execution model, where applications interact with hardware through tasks and events:
Task-based execution:
The application prepares task arguments
The application submits the task, requesting hardware processing
The application receives a completion callback upon task completion
Event-driven execution:
The application registers for an event, instructing hardware to report when the event occurs
The application receives a callback every time the event is triggered
Progress Engine – Managing Asynchronous Processing
Since hardware processing is asynchronous, DOCA provides a Progress Engine (PE) to manage task and event completion.
The PE supports:
Polling (or busy waiting) mode – The application repeatedly checks for completed tasks/events
Notification mode – The application registers for OS-based notifications (e.g., Linux event FD) and is notified upon task/event completion
Once completion occurs, whether caused by a task or event, the relevant callback is invoked as part of the PE method.
A single PE instance allows waiting on multiple tasks/events from different contexts. As such, it is possible for an application to utilize a single PE per thread.
For more details about the DOCA Progress Engine, see section "DOCA Progress Engine".
The following diagram illustrates how a combination of various DOCA modules combine DOCA cross-library processing runtime.

The diagram shows 3 contexts utilizing the same device, each context has some tasks/events that have been submitted/registered by the application. All 3 contexts are connected to the same PE, where the application can use the same PE to wait on all completions at once.
For more details about DOCA Execution model see section "DOCA Execution Model".