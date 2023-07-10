The following sections describe the architecture for the various DOCA core software modules.



All DOCA APIs return the status in the form of doca_error.

Copy Copied! typedef enum doca_error { DOCA_SUCCESS, DOCA_ERROR_UNKNOWN, DOCA_ERROR_NOT_PERMITTED, /**< Operation not permitted */ DOCA_ERROR_IN_USE, /**< Resource already in use */ DOCA_ERROR_NOT_SUPPORTED, /**< Operation not supported */ DOCA_ERROR_AGAIN, /**< Resource temporarily unavailable, try again */ DOCA_ERROR_INVALID_VALUE, /**< Invalid input */ DOCA_ERROR_NO_MEMORY, /**< Memory allocation failure */ DOCA_ERROR_INITIALIZATION, /**< Resource initialization failure */ DOCA_ERROR_TIME_OUT, /**< Timer expired waiting for resource */ DOCA_ERROR_SHUTDOWN, /**< Shut down in process or completed */ DOCA_ERROR_CONNECTION_RESET, /**< Connection reset by peer */ DOCA_ERROR_CONNECTION_ABORTED, /**< Connection aborted */ DOCA_ERROR_CONNECTION_INPROGRESS, /**< Connection in progress */ DOCA_ERROR_NOT_CONNECTED, /**< Not Connected */ DOCA_ERROR_NO_LOCK, /**< Unable to acquire required lock */ DOCA_ERROR_NOT_FOUND, /**< Resource Not Found */ DOCA_ERROR_IO_FAILED, /**< Input/Output Operation Failed */ DOCA_ERROR_BAD_STATE, /**< Bad State */ DOCA_ERROR_UNSUPPORTED_VERSION, /**< Unsupported version */ DOCA_ERROR_OPERATING_SYSTEM, /**< Operating system call failure */ DOCA_ERROR_DRIVER, /**< DOCA Driver call failure */ DOCA_ERROR_UNEXPECTED, /**< An unexpected scenario was detected */ } doca_error_t;

The following types are common across all device types in the DOCA Core API.

Copy Copied! union doca_data { void *ptr; uint64_t u64; }; enum doca_access_flags { DOCA_ACCESS_LOCAL_READ = 0, DOCA_ACCESS_LOCAL_WRITE = 1, DOCA_ACCESS_REMOTE_WRITE = (1 << 1), DOCA_ACCESS_REMOTE_READ = (1 << 2), DOCA_ACCESS_REMOTE_ATOMIC = (1 << 3), }; enum doca_pci_func_type { DOCA_PCI_FUNC_PF = 0, /* physical function */ DOCA_PCI_FUNC_VF, /* virtual function */ DOCA_PCI_FUNC_SF, /* sub function */ };

The DOCA device represents an available processing unit backed by hardware or software implementation. The DOCA device exposes its properties to help an application in choosing the right device(s). DOCA Core supports two device types:

Local device – this is an actual device exposed in the local system (DPU or host) and can perform DOCA library processing jobs (can be a hardware device or device emulation)

Representor device – this is a representation of a local device. The local device is usually on the host (except for SFs) and the representor is always on the DPU side (a proxy on the DPU for the host-side device).

The following figure provides an example topology:

The diagrams shows a DPU (on the right side of the figure) connected to a host (on the left side of the figure). The host topology consists of two physical functions (PF0 and PF1). Furthermore, PF0 has two child virtual functions, VF0 and VF1. PF1 has only one VF associated with it, VF0. Using the DOCA SDK API, the user gets these five devices as local devices on the host.

The DPU side has a representor-device per each host function in a 1-to-1 relation (e.g., hpf0 is the representor device for the host's pf0 device and so on) as well as a representor for each SF function such that both the SF and its representor reside in the DPU.

If the user queries local devices on the DPU side (not representor devices), they get the two (in this example) DPU PFs, p0 and p1 . These two DPU local devices are the parent devices for:

7 representor devices – 5 representor devices shown as arrows to/from the host (devices with the prefix hpf* ) in the diagram 2 representor devices for the SF devices, pf0sf0 and pf1sf0

2 local SF devices (not the SF representors), p0s0 and p1s0

In the diagram, the topology is split into 2 parts (see dotted line), each part is represented by a DPU physical device, p0 and p1 , each of which is responsible for creating all other local devices (host PFs, host VFs, and DPU SFs). As such, the DPU physical device can be referred to as the parent device of the other devices and would have access to the representor of every other function (via doca_devinfo_rep_list_create ).



Based on the diagram in section Local Device and Representor, the mmap export APIs can be used as follows:

Device to Select on Host When Using doca_mmap_export() DPU Matching Representor Device to Select on DPU When Using doca_mmap_from_export() pf0 – 0b:00.0 hpf0 – 0b:00.0 p0 – 03:00.0 pf0vf0 – 0b:00.2 hpf0vf0 – 0b:00.2 pf0vf1 – 0b:00.3 hpf0vf1 – 0b:00.3 pf1 – 0b:00.1 hpf1 – 0b:00.1 p1 – 03:00.1 pf1vf0 – 0b:00.4 hpf1vf0 – 0b:00.4

To work with DOCA libraries or DOCA Core objects, the application must open and use a representor device on the DPU. Before it can open the representor device and use it, the application needs tools to allow it to select the appropriate representor device with the necessary capabilities. The DOCA Core API provides a wide range of device capabilities to help the application select the right device pair (device and its DPU representor). The flow is as follows:

List all representor devices on DPU. Select one with the required capabilities. Open this representor and use it.

As mentioned previously, the DOCA Core API is able to identify devices and their representors that have a unique property (e.g., the BDF address, the same BDF for the device and its DPU representor).

The application "knows" which device it wants to use (e.g., by its PCIe BDF address). On the host, it can be done using DOCA Core API or OS services. On the DPU side, the application gets a list of device representors for a specific DPU local device. Select a specific doca_devinfo_rep to work with according to one of its properties. This example looks for a specific PCIe address. Once the doca_devinfo_rep that suites the user's needs is found, open doca_dev_rep . After the user opens the right device representor, they can close the doca_devinfo list and continue working with doca_dev_rep . The application eventually has to close doca_dev too.

Note: Regarding device property caching, the functions doca_devinfo_list_create and doca_devinfo_rep_list_create provide a snapshot of the DOCA device properties when they are called. If any device's properties are changed dynamically (e.g., BDF address may change on bus reset), the device properties that those functions return would not reflect this change. One should call them again to get the updated properties of the devices.

DOCA memory subsystem is designed to optimize performance while keeping a minimal memory footprint (to facilitate scalability) as main design goals. DOCA memory is has two main components.

doca_buf – this is the data buffer descriptor.That is, it is not the actual data buffer, rather it is a descriptor that holds metadata on the "pointed" data buffer.

– this is the data buffer descriptor.That is, it is not the actual data buffer, rather it is a descriptor that holds metadata on the "pointed" data buffer. doca_mmap – this is the data buffers pool (chunks) which are pointed at by doca_buf . The application populates this memory pool with buffers/chunks and maps them to devices that must access the data.

As the doca_mmap serves as the memory pool for data buffers, there is also an entity called doca_buf_inventory which serves as a pool of doca_buf with same characteristics (see more in section doca_buf/doca_buf_inventory). As all DOCA entities, memory subsystem objects are opaque and can be instantiated by DOCA SDK only.

One of the critical requirements from doca_buf is to minimize its size so programs would not run into a lack of memory or scalability issues. For that purpose, DOCA features an extension support for doca_buf_inventory which means that the application can assign specific extensions to each doca_buf_inventory it creates (can also bitwise OR extensions). By default, the minimal doca_buf structure is used without any extensions.

An example for extension is LINKED_LIST (see the doca_buf_extension enumeration in doca_buf.h ) which allows the application to chain several doca_buf s and create a linked list of doca_buf (which can be used for scatter/gather scenarios). All doca_buf s originating from the same inventory have the same characteristics (i.e., extensions).

The following diagram shows the various modules within the DOCA memory subsystem.

In the diagram, you may see two doca_buf_inventory s. The left one has no extensions while right has a linked list extension enabled which enables chaining doca_buf1 and doca_buf2 . Each doca_buf points to a portion of the memory buffer which is part of a doca_mmap . The MMAP is populated with two memory buffers, chunk-1 and chunk-2 .



The DOCA memory subsystem mandates the usage of pools as opposed to dynamic allocation Pool for doca_buf → doca_buf_inventory Pool for data memory → doca_mmap

The memory buffers in the mmap can be mapped to one device or more

doca_buf points to a specific memory buffer (or part of it) and holds metadata for that buffer (e.g., lkey )

points to a specific memory buffer (or part of it) and holds metadata for that buffer (e.g., ) The internals of mapping and working with the device (e.g., memory registrations) is hidden from the application

The host-mapped memory chunk can be accessed by DPU

doca_buf is opaque and can only be allocated using DOCA API. As previously mentioned, it is the descriptor that points to a specific (portion or entire) mmap buffer (chunk). doca_buf_inventory is a pool of doca_buf s that the application creates. Still, the doca_buf s in such an inventory are placeholders and do not point to the data. When the application desires to assign a doca_buf to a specific data buffer, it calls the doca_buf_inventory_buf_by_addr API.

When enabling specific extensions for an inventory, the application must check to make sure that the relevant contexts indeed support the relevant extension. If the context does not support the requested extensions, the application must not pass doca_buf with these extensions to the context. For example, if the application wishes to use the linked list extension and concatenate several doca_buf to a scatter-gather list, it is expected to make sure the application indeed supports a linked list extension by calling doca_dma_get_max_list_buf_num_elem (this example checks linked-list support for DMA).

The following is a simplified example of the steps expected for exporting the host mmap to the DPU to be used by DOCA for direct access to the host memory (e.g. for DMA):

Create mmap on the host (see section Local Device and Representor Matching for information on how to choose the doca_dev to add to mmap). This example populates 2 data buffers and adds a single doca_dev to the mmap and exports it so DPU can use it. Import to the DPU (e.g., use the mmap descriptor output parameter as input to doca_mmap_create_from_export ).

doca_mmap is more than just a pool for data buffers (or chunks), it hides a lot of details (e.g., RDMA technicalities, device handling, etc.) from the application developer in regards to while giving the right level of abstraction to software using it. doca_mmap is the best way to share memory between the host and the DPU so the DPU can have direct access to the host-side memory. DOCA SDK supports several types of mmap that help with different use cases:

Local mmap – this is the basic type of mmap which maps local buffers to the local device(s) The application creates and starts doca_mmap. The application adds devices and populates buffers to the mmap granting the devices access to the buffers. Note: Populating and adding devices can be done at any order and the mmap will maintain the mapping of every buffer to every device. If the application desires the DPU to have access (zero-copy) to mmap buffers, it must call doca_mmap_export() and pass the resulting blob to the DPU that could import this mmap and have direct access to the host memory.

From the export mmap (imported) – this mmap can be created on the DPU only The application calls doca_mmap_create_from_export and gives the blob it got by calling doca_mmap_export on the host-side mmap. Now the application can create doca_buf to point to this imported mmap and have direct access to the host's memory.



In DOCA, the workload involves transforming source data to destination data. The basic transformation is a DMA operation on the data which simply moves data from one memory location to another. Other operations involve calculating the SHA value of the source data and writing it to the destination.

The workload can be broken into 3 steps:

Read source data ( doca_buf see memory subsystem). Apply an operation on the read data (handled by a dedicated hardware accelerator). Write the result of the operation to the destination ( doca_buf see memory subsystem).

Each such operation is referred to as a job ( doca_job ).

Jobs describe operations that an application would like to submit to DOCA (hardware or DPU). To do so, the application requires a means of communicating with the hardware/DPU. This is where the doca_workq comes into play. The WorkQ is a per-thread object used to queue jobs to offload to DOCA and eventually receive their completion status.

doca_workq introduces three main operations:

Submission of jobs. Checking progress/status of submitted jobs. Querying job completion status.

A workload can be split into many different jobs that can be executed on different threads, each thread represented by a different WorkQ. Each job must be associated to some context, where the context defines the type of job to be done.

A context can be obtained from some libraries within the DOCA SDK. For example, to submit DMA jobs, a DMA context can be acquired from doca_dma.h , whereas SHA context can be obtained using doca_sha.h . Each such context may allow submission of several job types.

A job is considered asynchronous in that once an application submits a job, the DOCA execution engine (hardware or DPU) would start processing it, and the application can continue to do some other processing until the hardware finishes. To keep track of which job has finished, there are two modes of operation: polling mode and event-driven mode.



A job is considered asynchronous in that once an application submits a job, the DOCA execution engine (hardware or DPU) would start processing it, and the application can continue to do some other processing until the hardware finishes. To keep track of which job has finished, there are two modes of operation: polling mode and blocking mode.

doca_ctx represents an instance of a specific DOCA library (e.g., DMA, SHA). Before submitting jobs to the WorkQ for execution, the job must be associated to a specific context that executes the job. The application is expected to associate (i.e., add) WorkQ with that context. Adding a WorkQ to a context allows submitting a job to the WorkQ using that context. Context represents a set of configurations including the job type and the device that runs it such that each job submitted to the WorkQ is associated with a context that has already been added. The following diagram shows the high-level (domain model) relations between various DOCA core entities.

doca_job is associated to a relevant doca_ctx that executes the job (with the help of the relevant doca_dev ). doca_job , after it is initialized, is submitted to doca_workq for execution. doca_ctx s are added to the doca_workq . once a doca_job is queued to doca_workq , it is submitted to the doca_ctx that is associated with that job type in this WorkQ.

The following diagram describes the initialization sequence of a context:

After the context is started, it can be used to enable the submission of jobs to a WorkQ based on the types of jobs that the context supports. See section TBD for more information.

Context is a thread-safe object. Some contexts can be used across multiple WorkQs while others can only be used on 1. Please refer to documentation of the specific context for specific information per context (e.g., doca_dma ).

doca_workq is a logical representation of DOCA thread of execution (non-thread-safe). WorkQ is used to submit jobs to the relevant context/library (hardware offload most of the time) and query the job's completion status. To start submitting jobs, however, the WorkQ must be configured to accept that type of job. Each WorkQ can be configured to accept any number of job types depending on how it initialized.

The following diagram describes the initialization flow of the WorkQ:

After the WorkQ has been created and added to a context, it can start accepting jobs that the context defines. Refer to the context documentation to find details such as whether the context supports adding multiple doca_workq s to the same context and what jobs can be submitted using the context.

Please note that the WorkQ can be added to multiple contexts. Such contexts can be of the same type or of different types. This allows submitting different job types to the same WorkQ and waiting for any of them to finish from the same place/thread.

In this mode, the application submits a job and then does busy-wait to find out when the job has completed. Polling mode is enabled by default. The following diagram demonstrates this sequence:

The application submits all jobs (one or more) and tracks the number of completed jobs to know if all jobs are done. The application waits for a job to finish. If doca_workq_progress_retrieve() returns DOCA_ERROR_AGAIN , it means that jobs are still running (i.e. no result). Once a job is done, DOCA_SUCCESS is returned from doca_workq_progress_retrieve() . If another status is returned, that means an error has occurred (see section Job Error Handling). Once a job has finished, the counter for tracking the number of finished jobs is updated.

Note: In this mode the application is always using the CPU even when it is doing nothing (during busy-wait).

In this mode, the application submits a job and then waits for a signal to be received before querying the status. The following diagram shows this sequence:

The application enables event-driven mode of the WorkQ. If this step fails ( DOCA_ERROR_NOT_SUPPORTED ), it means that one or more of the contexts associated with the WorkQ (via doca_ctx_workq_add ) do not support this mode. To find out if a context supports this event-driven mode, refer to the context documentation. Alternatively, the API doca_ctx_get_event_driven_supported() can be called during runtime. The application gets an event handle from the doca_workq representing a Linux file descriptor which is used to signal the application that some work has finished. The application then arms the WorkQ. Note: This must be done every time an application is interested in receiving a signal from the WorkQ. The application submits a job to the WorkQ. The application waits (e.g., Linux epoll/select) for a signal to be received on the workq-fd . The application clears the received events, notifying the WorkQ that a signal has been received and allowing it to do some event handling. The application attempts to retrieve a result from the WorkQ. Note: There is no guarantee that the call to doca_workq_progress_retrieve would return a job completion event but the WorkQ can continue the job. Increment the number of finished jobs if successful or handle error. Arm the WorkQ to receive the next signal. Repeat steps 5-9 until all jobs are finished.

After a job is submitted successfully, consequent calls to doca_workq_progress_retrieve may fail (i.e., return different status from DOCA_SUCCESS or DOCA_ERROR_AGAIN ). In this case, the error is split into 2 main categories:

DOCA_ERROR_INVALID_VALUE This means that some error has occurred within the WorkQ that is not related to any submitted job. This can happen due to the application passing invalid arguments or to some objects that have been previously provided (e.g., a doca_ctx that was associated using doca_ctx_workq_add ) getting corrupted. In this scenario, the output parameter of type doca_event is not valid and no more information is given about the error. DOCA_ERROR_IO_FAILED This means that a specific job has failed where the output variable of type doca_event is valid and can be used to trace the exact job that failed. Additional error code explaining the exact failure reason is given. To find the exact error, refer to the documentation of the context that provides the job type (e.g., if the job is DMA memcpy, then refer to doca_dma.h ).

The following diagram shows how an application is expected to handle error from doca_workq_progress_retrieve :

Most DOCA core objects share the same handling model in which: