DOCA SDK Architecture

The DOCA SDK provides libraries for networking and data processing programmability, leveraging the NVIDIA® BlueField® networking platform (DPU or SuperNIC) and NVIDIA® ConnectX® NIC hardware accelerators.

The DOCA software framework is built on DOCA Core, which provides a unified software foundation for DOCA libraries. These libraries can be combined into processing pipelines or workflows that execute various offloaded tasks.

Device Subsystem

The DOCA SDK enables applications to offload resource-intensive tasks (e.g., encryption, compression) and network-related operations (e.g., packet acquisition, RDMA send) to specialized hardware.

Device Abstraction

The DOCA device subsystem provides an abstraction layer for the hardware processing units within BlueField and ConnectX devices. It allows applications to:

Discover available hardware acceleration units provided by networking platforms
Query capabilities and properties of these acceleration units
Open and configure devices for libraries to allocate and share resources required for hardware acceleration

A system may have multiple available devices. Applications can select a device based on topology (e.g., PCIe address) or capabilities (e.g., encryption support).

DOCA Device Types

DOCA Core defines two types of DOCA Devices:

Local device – A physical or virtual device exposed on the local system (BlueField or host). This includes:
- Physical function (PF)
- Virtual function (VF)
- Scalable function (SF)
Representor device – A proxy device on the BlueField side that represents a host-side device. The host-side function (e.g., PF, VF, or SF) corresponds to a 1:1 mapped representor on BlueField.

The following figure provides an example of host local devices with representors on BlueField:

device-subsystem-version-1-modificationdate-1742473498950-api-v2.png

Note

The diagram shows typical topology when using BlueField in DPU mode as described in NVIDIA BlueField DPU Modes of Operation .

The diagram shows BlueField (on the right side of the figure) connected to a host (on the left). The host has physical function PF0 with a child virtual function VF0.

The BlueField side has a representor-device per host function in a 1-to-1 ratio (e.g., hpf0 is the representor device for the host's PF0 device, etc.) as well as a representor for each SF function, such that both the SF and its representor reside in BlueField.

Info

For more details on the DOCA Device subsystem, see section "DOCA Device".

Memory Management Subsystem

Hardware-accelerated processing tasks require data buffers as inputs and outputs. The DOCA SDK employs zero-copy technology to maximize performance by avoiding unnecessary data movement.

To enable zero-copy, applications must pre-register memory before using it for data buffers. The memory management subsystem facilitates:

Memory registration:
- Defines an application memory range for storing data buffers
- Allows one or more devices to access the registered memory
- Sets access permissions (e.g., read-only, read-write)
Data buffer allocation management:
- Allocates data buffers within the registered memory range
- Supports memory pooling over registered memory

DOCA memory has the following main components:

doca_buf – Represents a data buffer used as input/output for DOCA library operations
doca_mmap – Describes registered memory accessible to devices with defined permissions. Each doca_buf resides within a doca_mmap memory range.
doca_buf_inventory – A pool of doca_buf objects with shared characteristics (see more in sections "DOCA Core Buffers" and "DOCA Core Inventories")

The following diagram shows the various modules within the DOCA memory subsystem:

memory-subsystem-version-1-modificationdate-1742473499427-api-v2.png

The diagram shows a doca_buf_inventory containing 2 doca_bufs. Each doca_buf points to a portion of the memory buffer which is part of a doca_mmap. The mmap is populated with one continuous memory range and is registered with 2 DOCA Devices, dev1 and dev2.

Info

For more details about DOCA Memory management subsystem, see section "DOCA Memory Subsystem".

Execution Model

DOCA libraries abstract low-level hardware operations, allowing applications to focus on high-level processing tasks such as encryption, packet processing, and compression.

Each DOCA library defines dedicated APIs for performing these tasks. These libraries interact with hardware processing units via contexts.

Task and Event Model

DOCA APIs follow an asynchronous execution model, where applications interact with hardware through tasks and events:

Task-based execution:
1. The application prepares task arguments
2. The application submits the task, requesting hardware processing
3. The application receives a completion callback upon task completion
Event-driven execution:
1. The application registers for an event, instructing hardware to report when the event occurs
2. The application receives a callback every time the event is triggered

Progress Engine – Managing Asynchronous Processing

Since hardware processing is asynchronous, DOCA provides a Progress Engine (PE) to manage task and event completion.

The PE supports:

Polling (or busy waiting) mode – The application repeatedly checks for completed tasks/events
Notification mode – The application registers for OS-based notifications (e.g., Linux event FD) and is notified upon task/event completion

Once completion occurs, whether caused by a task or event, the relevant callback is invoked as part of the PE method.

A single PE instance allows waiting on multiple tasks/events from different contexts. As such, it is possible for an application to utilize a single PE per thread.

Info

For more details about the DOCA Progress Engine, see section "DOCA Progress Engine".

The following diagram illustrates how a combination of various DOCA modules combine DOCA cross-library processing runtime.

execution-model-version-1-modificationdate-1742473499160-api-v2.png

The diagram shows 3 contexts utilizing the same device, each context has some tasks/events that have been submitted/registered by the application. All 3 contexts are connected to the same PE, where the application can use the same PE to wait on all completions at once.

Info

For more details about DOCA Execution model see section "DOCA Execution Model".

On This Page