Term Definition DPA context Software construct for the host process that encapsulates the state associated with a DPA process (on a specific device). DPA context must be associated with a PF device. Extended DPA context A DPA context associated with a VF/SF device to be used for RDMA utilities. This to allow creation of DPA resources such as RDMA/DPA completion/DPA Async ops... contexts on the VF/SF device. DPA Application Interface with the DPACC compiler to produce a DPA program (app) which is obtained by the DPA context to begin working on DPA. Kernel User function (and its arguments) to be executed on DPA. A kernel may be executed by one or more DPA threads. DPA EU Affinity An object used to control which EU to use for DPA thread. DPA Thread DOCA DPA provides APIs to create/manage DPA thread which runs a given kernel. DPA Completion Context An object used to receive/handle a completion notification. The user can associate a DPA thread with a completion context. When the completion context receives a notification, DPA thread is scheduled. DPA Thread Notification A mechanism for one DPA thread to notify another DPA thread. DPA Async Ops An object used to allow a DPA thread to issue asynchronous operations, like memcpy or post_wait operations. DPA RPC A blocking one-time call from host application to execute a kernel on DPA. RPC is mainly used for control path. The RPC's return value is reported back to the host application. DPA Memory DOCA DPA provides an API to allocate/manage DPA memory, as well as handling host/Target BlueField memory that has been exported to DPA. Sync Event Data structure in either CPU, Target BlueField, GPU, or DPA-heap. An event contains a counter that can be updated and waited on. RDMA Abstraction around a network transport object. Allows executing various RDMA operations. DPA Hash Table DOCA DPA provides an API to create a Hash Table on DPA. This data structure is managed on DPA using relevant device APIs. DPA Logger/Tracer DOCA DPA provides a set of debugging APIs to allow the user to diagnose and troubleshoot any issue on the device, as well as accessing real-time information from the running application.

DOCA DPA comprises three main components which are part of the DOCA SDK installation package

Note The DOCA DPA SDK does not use any means of multi-thread synchronization primitives. All DOCA DPA objects are non-thread-safe. Developers should make sure the user program and kernels are written to avoid race conditions.

Host library and header files:

The host library offers an interface for managing the following components:

The user can control which EU to use for a DPA thread using DPA EU affinity object.

A DPA EU affinity object can be configured for one EU ID at a time.

Use the following host-side APIs to manage it:

To create/destroy DPA EU affinity object: Copy Copied! doca_error_t doca_dpa_eu_affinity_create(struct doca_dpa *dpa, struct doca_dpa_eu_affinity **affinity) doca_error_t doca_dpa_eu_affinity_destroy(struct doca_dpa_eu_affinity *affinity)

To set/clear EU ID in DPA EU affinity object: Copy Copied! doca_error_t doca_dpa_eu_affinity_set(struct doca_dpa_eu_affinity *affinity, unsigned int eu_id) doca_error_t doca_dpa_eu_affinity_clear(struct doca_dpa_eu_affinity *affinity)

To get EU ID of a DPA EU affinity object: Copy Copied! doca_error_t doca_dpa_eu_affinity_get(struct doca_dpa_eu_affinity *affinity, unsigned int *eu_id)

DOCA DPA thread used to run a user function “DPA kernel” on DPA.

User can control on which EU to run DPA kernel by attaching a DPA EU affinity object to the thread.

The thread can be triggered on DPA using two methods:

DPA Thread Notification - Notifying one DPA thread from another DPA thread. DPA Completion Context - A completion is arrived at a DPA completion context which is attached to the thread.

To create/destroy DPA thread: Copy Copied! doca_error_t doca_dpa_thread_create(struct doca_dpa *dpa, struct doca_dpa_thread **dpa_thread) doca_error_t doca_dpa_thread_destroy(struct doca_dpa_thread *dpa_thread)

To set/get thread user function and it's argument: Copy Copied! doca_error_t doca_dpa_thread_set_func_arg(struct doca_dpa_thread *thread, doca_dpa_func_t *func, uint64_t arg) doca_error_t doca_dpa_thread_get_func_arg(struct doca_dpa_thread *dpa_thread, doca_dpa_func_t **func, uint64_t *arg)

To set/get DPA EU Affinity: Copy Copied! doca_error_t doca_dpa_thread_set_affinity(struct doca_dpa_thread *thread, struct doca_dpa_eu_thread_affinity *eu_affinity) doca_error_t doca_dpa_thread_get_affinity(struct doca_dpa_thread *dpa_thread, const struct doca_dpa_eu_affinity **affinity)

Thread Local Storage (TLS) User can ask to store an opaque for a DPA thread in host side application using the following API: Copy Copied! doca_error_t doca_dpa_thread_set_local_storage(struct doca_dpa_thread *dpa_thread, doca_dpa_dev_uintptr_t dev_ptr) doca_error_t doca_dpa_thread_get_local_storage(struct doca_dpa_thread *dpa_thread, doca_dpa_dev_uintptr_t *dev_ptr) dev_ptr is a pre-allocated DPA memory. In kernel, user can retrieve the stored opaque using the relevant device API (see below API). This opaque is stored/retrieved using the Thread Local Storage (TLS) mechanism.

To start/stop DPA thread: Copy Copied! doca_error_t doca_dpa_thread_start(struct doca_dpa_thread *thread) doca_error_t doca_dpa_thread_stop(struct doca_dpa_thread *dpa_thread)

To run DPA thread: Copy Copied! doca_error_t doca_dpa_thread_run(struct doca_dpa_thread *dpa_thread) This API sets the thread to run state. This function must be called after DPA thread is: Created, set and started. In case of DPA thread is attached to DPA Completion Context, the completion context must be started before.



Example (host-side pseudo code):

Copy Copied! extern doca_dpa_func_t hello_kernel; doca_dpa_thread_create(&dpa_thread); doca_dpa_thread_set_func_arg(dpa_thread, &hello_kernel, func_arg); doca_dpa_eu_affinity_create(&eu_affinity); doca_dpa_eu_affinity_set(eu_affinity, 10 ); doca_dpa_thread_set_affinity(dpa_thread, eu_affinity); doca_dpa_mem_alloc(&tls_dev_ptr); doca_dpa_thread_set_local_storage(dpa_thread, tls_dev_ptr); doca_dpa_thread_start(dpa_thread); doca_dpa_completion_create(&dpa_comp); doca_dpa_completion_set_thread(dpa_comp, dpa_thread); doca_dpa_completion_start(dpa_comp); doca_dpa_thread_run(dpa_thread);





To tie the user application closely with the DPA native model of event-driven scheduling/computation, we introduced DPA Completion Context.

User associates a DPA Thread with a completion context. When the completion context receives a notification, DPA Thread is triggered.

User can choose not to associate it with DPA Thread and to poll it manually.

User has the option to continue receiving new notifications or ignore them.

DOCA DPA provides a generic completion context that can be shared for Message Queues, RDMA, Ethernet and as well as DPA Async Ops.

To create/destroy DPA Completion Context: Copy Copied! doca_error_t doca_dpa_completion_create(struct doca_dpa *dpa, unsigned int queue_size, struct doca_dpa_completion **dpa_comp) doca_error_t doca_dpa_completion_destroy(struct doca_dpa_completion *dpa_comp)

To get queue size: Copy Copied! doca_error_t doca_dpa_completion_get_queue_size(struct doca_dpa_completion *dpa_comp, unsigned int *size)

To attach to a DPA Thread: Copy Copied! doca_error_t doca_dpa_completion_set_thread(struct doca_dpa_completion *dpa_comp, struct doca_dpa_thread *thread) doca_error_t doca_dpa_completion_get_thread(struct doca_dpa_completion *dpa_comp, struct doca_dpa_thread **thread) Attaching to a thread is only required if the user wants triggering of the thread when a completion is arrived at the completion context.

To start/stop DPA Completion Context: Copy Copied! doca_error_t doca_dpa_completion_start(struct doca_dpa_completion *dpa_comp) doca_error_t doca_dpa_completion_stop(struct doca_dpa_completion *dpa_comp)

To get DPA handle: Copy Copied! doca_error_t doca_dpa_completion_get_dpa_handle(struct doca_dpa_completion *dpa_comp, doca_dpa_dev_completion_t *handle) Use output parameter handle for below device APIs which can be used in thread kernel.

Thread Activation is a mechanism for one DPA thread to trigger another DPA Thread.

Thread activation is done without receiving a completion on the attached thread. Therefore it is expected that user of this method of thread activation passes the message in another fashion – such as shared memory.

Thread Activation can be achieved using DPA Notification Completion object.

To create/destroy DPA Notification Completion: Copy Copied! doca_error_t doca_dpa_notification_completion_create(struct doca_dpa *dpa, struct doca_dpa_thread *dpa_thread, struct doca_dpa_notification_completion **notify_comp) doca_error_t doca_dpa_notification_completion_destroy(struct doca_dpa_notification_completion *notify_comp) Attaching DPA Notification Completion to a DPA Thread is done using the given parameter dpa_thread .

To get attached DPA Thread: Copy Copied! doca_error_t doca_dpa_notification_completion_get_thread(struct doca_dpa_notification_completion *notify_comp, struct doca_dpa_thread **dpa_thread)

To start/stop DPA Notification Completion: Copy Copied! doca_error_t doca_dpa_notification_completion_start(struct doca_dpa_notification_completion *notify_comp) doca_error_t doca_dpa_notification_completion_stop(struct doca_dpa_notification_completion *notify_comp)

To get DPA handle: Copy Copied! doca_error_t doca_dpa_notify_completion_get_dpa_handle(struct doca_dpa_notification_completion *notify_comp, doca_dpa_dev_notification_completion_t *comp_handle) Use output parameter comp_handle for below device API which can be used in thread kernel.

Example (host-side pseudo code):

Copy Copied! extern doca_dpa_func_t hello_kernel; doca_dpa_thread_create(&dpa_thread); doca_dpa_thread_set_func_arg(dpa_thread, &hello_kernel, func_arg); doca_dpa_thread_start(dpa_thread); doca_dpa_notification_completion_create(dpa, dpa_thread, ¬ify_comp); doca_dpa_notification_completion_start(notify_comp); doca_dpa_notification_completion_get_dpa_handle(notify_comp, ¬ify_comp_handle); doca_dpa_thread_run(dpa_thread);





DPA Async Ops allows DPA Thread to issue asynchronous operations, like memcpy or post_wait.

This feature requires the user to create an “asynchronous ops” context and attach to a completion context.

User is expected to adhere to `queue_size` limit on the device when posting operations.

The completion context can raise activation if it is attached to a DPA Thread.

User can also choose to progress the completion context via polling it manually.

User can provide DPA Async Ops `user_data`, and retrieve this metadata in device using relevant device API.

To create/destroy DPA Async Ops: Copy Copied! doca_error_t doca_dpa_async_ops_create(struct doca_dpa *dpa, unsigned int queue_size, uint64_t user_data, struct doca_dpa_async_ops **async_ops) doca_error_t doca_dpa_async_ops_destroy(struct doca_dpa_async_ops *async_ops) Please use the following define for valid user_data values: Copy Copied! #define DOCA_DPA_COMPLETION_LOG_MAX_USER_DATA ( 24 )

To get queue size/user_data: Copy Copied! doca_error_t doca_dpa_async_ops_get_queue_size(struct doca_dpa_async_ops *async_ops, unsigned int *queue_size) doca_error_t doca_dpa_async_ops_get_user_data(struct doca_dpa_async_ops *async_ops, uint64_t *user_data)

To attach to a DPA Completion Context: Copy Copied! doca_error_t doca_dpa_async_ops_attach(struct doca_dpa_async_ops *async_ops, struct doca_dpa_completion *dpa_comp)

To start/stop DPA Async Ops: Copy Copied! doca_error_t doca_dpa_async_ops_start(struct doca_dpa_async_ops *async_ops) doca_error_t doca_dpa_async_ops_stop(struct doca_dpa_async_ops *async_ops)

To get DPA handle: Copy Copied! doca_error_t doca_dpa_async_ops_get_dpa_handle(struct doca_dpa_async_ops *async_ops, doca_dpa_dev_async_ops_t *handle) Use output parameter handle for below device API which can be used in thread kernel.

Example (host-side pseudo code):

Copy Copied! doca_dpa_thread_create(&dpa_thread); doca_dpa_thread_set_func_arg(dpa_thread); doca_dpa_thread_start(dpa_thread); doca_dpa_completion_create(&dpa_comp); doca_dpa_completion_set_thread(dpa_comp, dpa_thread); doca_dpa_completion_start(dpa_comp); doca_dpa_thread_run(dpa_thread); doca_dpa_async_ops_create(&async_ops); doca_dpa_async_ops_attach(async_ops, dpa_comp); doca_dpa_async_ops_start(async_ops); doca_dpa_async_ops_get_dpa_handle(async_ops, &handle);





Thread group is used to aggregate individual DPA threads to a single group.

To create/destroy DPA Thread Group: Copy Copied! doca_error_t doca_dpa_thread_group_create(struct doca_dpa *dpa, unsigned int num_threads, struct doca_dpa_tg **tg) doca_error_t doca_dpa_thread_group_destroy(struct doca_dpa_tg *tg)

To get number of threads: Copy Copied! doca_error_t doca_dpa_thread_group_get_num_threads(struct doca_dpa_tg *tg, unsigned int *num_threads);

To set DPA Thread at 'rank' in DPA Thread Group: Copy Copied! doca_error_t doca_dpa_thread_group_set_thread(struct doca_dpa_tg *tg, struct doca_dpa_thread *thread, unsigned int rank) Thread rank is an index of the thread (between 0 and (num_threads - 1)) within the group.

To start/stop DPA Thread Group: Copy Copied! doca_error_t doca_dpa_thread_group_start(struct doca_dpa_tg *tg) doca_error_t doca_dpa_thread_group_stop(struct doca_dpa_tg *tg)

The user can allocate (from the host API) and access (from both the host and device API) several memory locations using the relevant DOCA DPA API.

DOCA DPA supports access from the host/Target BlueField to DPA heap memory and also enables device access to host memory (e.g., kernel writes to host memory).

The normal memory usage flow would be to:

Allocate memory (Host/Target BlueField/DPA). Register the memory. Get a DPA handle for the registered memory so it can be accessed by DPA kernels. Access/use the memory from the kernel (see relevant device-side APIs).

To allocate DPA heap memory: Copy Copied! doca_dpa_mem_alloc(doca_dpa_t dpa, size_t size, doca_dpa_dev_uintptr_t *dev_ptr)

To free previously allocated DPA memory: Copy Copied! doca_dpa_mem_free(doca_dpa_dev_uintptr_t dev_ptr)

To copy previously allocated memory from a host pointer to a DPA heap device pointer: Copy Copied! doca_dpa_h2d_memcpy(doca_dpa_t dpa, doca_dpa_dev_uintptr_t src_ptr, void *dst_ptr, size_t size)

To copy previously allocated memory from a DOCA Buffer to a DPA heap device pointer: Copy Copied! doca_error_t doca_dpa_h2d_buf_memcpy(struct doca_dpa *dpa, doca_dpa_dev_uintptr_t dst_ptr, struct doca_buf *buf, size_t size)

To copy previously allocated memory from a DPA heap device pointer to a host pointer: Copy Copied! doca_dpa_d2h_memcpy(doca_dpa_t dpa, void *dst_ptr, doca_dpa_dev_uintptr_t src_ptr, size_t size)

To copy previously allocated memory from a DPA heap device pointer to a DOCA Buffer: Copy Copied! doca_error_t doca_dpa_d2h_buf_memcpy(struct doca_dpa *dpa, struct doca_buf *buf, doca_dpa_dev_uintptr_t src_ptr, size_t size)

To set memory: Copy Copied! doca_dpa_memset(doca_dpa_t dpa, doca_dpa_dev_uintptr_t dev_ptr, int value, size_t size)

To get a DPA handle to use in kernels, the user must use a DOCA Core Memory Inventory Object in the following manner (refer to "DOCA Memory Subsystem"): When the user wants to use device APIs with DOCA Buffer, use the following pseudo code: Copy Copied! doca_buf_arr_create(&buf_arr); doca_buf_arr_set_target_dpa(buf_arr, doca_dpa); doca_buf_arr_start(buf_arr); doca_buf_arr_get_dpa_handle(buf_arr, &handle); Use output parameter handle in relevant device APIs in thread kernel. When the user wants to use device APIs with DOCA Mmap, use the following pseudo code: Copy Copied! doca_mmap_create(&mmap); doca_mmap_set_dpa_memrange(mmap, doca_dpa, dev_ptr, dev_ptr_len); doca_mmap_start(mmap); doca_mmap_dev_get_dpa_handle(mmap, doca_dev, &handle); Use output parameter handle in relevant device APIs in thread kernel.



A base DOCA DPA context is a context created on a PF DOCA device using:

Copy Copied! doca_dpa_create(pf_doca_dev, &base_dpa_ctx);

To enable creating DPA resources (e.g., RDMA/DPA completion/DPA Async ops contexts) on VF/SF DOCA device, an extended DOCA DPA context is required.

DOCA DPA provides the following host API to extend a base DOCA DPA context (created on a PF DOCA device) to an SF/VF DOCA device:

Copy Copied! doca_error_t doca_dpa_device_extend(struct doca_dpa *dpa, struct doca_dev *other_dev, struct doca_dpa **extended_dpa)

The extended DPA context can be used later on for creation of DPA resources (e.g., as RDMA/DPA completion/DPA Async ops contexts) on the other DOCA device (SF/VF).

Note that:

The extended DPA context is already started.

The extended DPA context can be used later for all DOCA DPA APIs (e.g., creating DPA memory, DPA completion context) or within kernel launch flow.

Note When running from the DPU, DOCA RDMA context (on DPA datapath) must be created on SF DOCA device. Therefore, it must be created using an extended DOCA DPA context (created on the same SF DOCA device).

To obtain a DPA handle for a DOCA DPA context (both base or extended):

Copy Copied! doca_error_t doca_dpa_get_dpa_handle(struct doca_dpa *dpa, doca_dpa_dev_t *handle)

Use output parameter handle in the relevant device's APIs in the thread kernel.

Note When creating a DOCA RDMA context with an extended DOCA DPA context and that RDMA context is attached to a DPA completion context and a DPA thread, all DOCA RDMA, DPA completion context, and the DPA thread must be created on the same extended DOCA DPA context.

Example (host-side pseudo code):

Copy Copied! doca_dpa_create(pf_doca_dev, &base_dpa_ctx); doca_dpa_start(base_dpa_ctx); doca_dpa_device_extend(base_dpa_ctx, sf_doca_dev, &extended_dpa_ctx); doca_dpa_get_dpa_handle(extended_dpa_ctx, &extended_dpa_ctx_handle); doca_dpa_thread_create(extended_dpa_ctx, &dpa_thread); doca_dpa_thread_start(dpa_thread); doca_dpa_completion_create(extended_dpa_ctx, &dpa_completion); doca_dpa_completion_set_thread(dpa_completion, dpa_thread); doca_dpa_completion_start(dpa_completion); doca_rdma_create(sf_doca_dev, &rdma); doca_ctx_set_datapath_on_dpa(rdma_as_ctx, extended_dpa_ctx); doca_rdma_dpa_completion_attach(rdma, dpa_completion); doca_ctx_start(rdma_as_ctx); doca_rdma_get_dpa_handle(rdma, rdma_dpa_handle);





DOCA DPA provides an API to create a hash table on DPA. This data structure is managed on DPA using relevant device APIs.

To create a hash table on DPA: Copy Copied! doca_error_t doca_dpa_hash_table_create(struct doca_dpa *dpa, unsigned int num_entries, struct doca_dpa_hash_table **ht)

To destroy a hash table: Copy Copied! doca_error_t doca_dpa_hash_table_destroy(struct doca_dpa_hash_table *ht)

To obtain a DPA handle: Copy Copied! doca_error_t doca_dpa_hash_table_get_dpa_handle(struct doca_dpa_hash_table *ht, doca_dpa_dev_hash_table_t *handle) Use output parameter handle in relevant device APIs in the thread kernel.

A blocking one-time call from the host application to execute a kernel on DPA.

Info RPC is mainly used for control path.

The RPC's return value is reported back to the host application.

Copy Copied! doca_error_t doca_dpa_rpc(struct doca_dpa *dpa, doca_dpa_func_t *func, uint64_t *retval, … )

Example:

Device-side – DPA device func must be annotated with __dpa_rpc__ annotation, such as: Copy Copied! __dpa_rpc__ uint64_t hello_rpc( int arg) { ... }

Host-side: Copy Copied! extern doca_dpa_func_t hello_rpc; uint64_t retval; doca_dpa_rpc(dpa, &hello_rpc, &retval, 10 );

DOCA DPA provides an API which enables full control for launching and monitoring kernels.

Since DOCA DPA libraries are not thread-safe, it is up to the programmer to make sure the kernel is written to allow it to run in a multi-threaded environment. For example, to program a kernel that uses RDMAs with 16 concurrent threads, the user should pass an array of 16 RDMAs to the kernel so that each thread can access its RDMA using its rank ( doca_dpa_dev_thread_rank() ) as an index to the array.

Copy Copied! doca_dpa_kernel_launch_update_<add|set>(struct doca_dpa *dpa, struct doca_sync_event *wait_event, uint64_t wait_threshold, struct doca_sync_event *comp_event, uint64_t comp_count, unsigned int num_threads, doca_dpa_func_t *func, ... )

This function asks DOCA DPA to run func in DPA by num_threads and give it the supplied list of arguments (variadic list of arguments).

This function is asynchronous so when it returns, it does not mean that func started/ended its execution.

To add control or flow/ordering to these asynchronous kernels, two optional parameters for launching kernels are available: wait_event – the kernel does not start its execution until the event is signaled (if NULL, the kernel starts once DOCA DPA has an available EU to run on it) which means that DOCA DPA would not run the kernel until the event's counter is bigger than wait_threshold . Note Please note that the valid values for wait_threshold and wait_event counter and are [0-254]. Values out of this range might cause anomalous behavior. comp_event – once the last thread running the kernel is done, DOCA DPA updates this event (either sets or adds to its current counter value with comp_count ).

DOCA DPA takes care of packing (on host/Target BlueField) and unpacking (in DPA) the kernel parameters.

func must be prefixed with the __dpa_global__ macro for DPACC to compile it as a kernel (and add it to DPA executable binary) and not as part of host application binary.

The programmer must declare func in their application also by adding the line extern doca_dpa_func_t func .

Note The following APIs are only relevant for a kernel used in kernel_launch APIs. These APIs are not relevant in doca_dpa_thread kernel.

To retrieve the running thread's rank for a given kernel on the DPA. If, for example, a kernel is launched to run with 16 threads, each thread running this kernel is assigned a rank ranging from 0 to 15 within this kernel. This is helpful for making sure each thread in the kernel only accesses data relevant for its execution to avoid data-races: Copy Copied! unsigned int doca_dpa_dev_thread_rank()

To return the number of threads running current kernel: Copy Copied! unsigned int doca_dpa_dev_num_threads()

To yield the thread which runs the kernel: Copy Copied! void doca_dpa_dev_yield( void )

Linear Execution Example

Device-side pseudo code:

Copy Copied! #include "doca_dpa_dev.h" #include "doca_dpa_dev_sync_event.h" __dpa_global__ void linear_kernel(doca_dpa_dev_sync_event_t wait_ev, doca_dpa_dev_sync_event_t comp_ev) { if (wait_ev) doca_dpa_dev_sync_event_wait_gt(wait_ev, wait_th = 0 ); doca_dpa_dev_sync_event_update_add(comp_ev, comp_count = 1 ); }

Host-side pseudo code:

Collapse Source Copy Copied! #include <doca_dev.h> #include <doca_error.h> #include <doca_sync_event.h> #include <doca_dpa.h> int main( int argc, char **argv) { open_doca_dev(&doca_dev); doca_dpa_create(doca_dev, dpa_linear_app, &dpa_ctx, 0 ); doca_sync_event_create(&ev_a); doca_sync_event_add_publisher_location_cpu(ev_a, doca_dev); doca_sync_event_add_subscriber_location_dpa(ev_a, dpa_ctx); doca_sync_event_start(ev_a); doca_sync_event_create(&ev_b); doca_sync_event_add_publisher_location_dpa(ev_b, dpa_ctx); doca_sync_event_add_subscriber_location_dpa(ev_b, dpa_ctx); doca_sync_event_start(ev_b); doca_sync_event_create(&ev_c); doca_sync_event_add_publisher_location_dpa(ev_c, dpa_ctx); doca_sync_event_add_subscriber_location_dpa(ev_c, dpa_ctx); doca_sync_event_start(ev_c); doca_sync_event_create(&comp_ev); doca_sync_event_add_publisher_location_dpa(comp_ev, dpa_ctx); doca_sync_event_add_subscriber_location_cpu(comp_ev, doca_dev); doca_sync_event_start(comp_ev); doca_sync_event_get_dpa_handle(ev_b, dpa_ctx, &ev_b_handle); doca_sync_event_get_dpa_handle(ev_c, dpa_ctx, &ev_c_handle); doca_sync_event_get_dpa_handle(comp_ev, dpa_ctx, &comp_ev_handle); doca_dpa_kernel_launch_update_add(wait_ev = ev_a, wait_threshold = 1 , num_threads = 1 , &linear_kernel, kernel_args: NULL, ev_b_handle); doca_dpa_kernel_launch_update_add(wait_ev = NULL, num_threads = 1 , &linear_kernel, kernel_args: ev_b_handle, ev_c_handle); doca_dpa_kernel_launch_update_add(wait_ev = NULL, &linear_kernel, num_threads = 1 , kernel_args: ev_c_handle, comp_ev_handle); doca_sync_event_update_set(ev_a, 1 ) doca_sync_event_wait_gt(comp_ev, 0 ); teardown_resources(); }





Diamond Execution Example

Device-side pseudo code:

Copy Copied! #include "doca_dpa_dev.h" #include "doca_dpa_dev_sync_event.h" __dpa_global__ void diamond_kernel(doca_dpa_dev_sync_event_t wait_ev, uint64_t wait_th, doca_dpa_dev_sync_event_t comp_ev1, doca_dpa_dev_sync_event_t comp_ev2) { if (wait_ev) doca_dpa_dev_sync_event_wait_gt(wait_ev, wait_th); doca_dpa_dev_sync_event_update_add(comp_ev1, comp_count = 1 ); if (comp_ev2) doca_dpa_dev_sync_event_update_add(comp_ev2, comp_count = 1 ); }

Host-side pseudo code:

Collapse Source Copy Copied! #include <doca_dev.h> #include <doca_error.h> #include <doca_sync_event.h> #include <doca_dpa.h> int main( int argc, char **argv) { open_doca_dev(&doca_dev); doca_dpa_create(doca_dev, dpa_diamond_app, &dpa_ctx, 0 ); doca_sync_event_create(&ev_a); doca_sync_event_create(&ev_b); doca_sync_event_create(&ev_c); doca_sync_event_create(&ev_d); doca_sync_event_create(&ev_e); doca_sync_event_create(&comp_ev); doca_sync_event_get_dpa_handle(&ev_b_handle, &ev_c_handle, &ev_d_handle, &ev_e_handle, &comp_ev_handle); constexpr uint64_t wait_threshold_one_parent { 1 }; constexpr uint64_t wait_threshold_two_parent { 2 }; doca_dpa_kernel_launch_update_set(wait_ev = ev_a, wait_threshold = 1 , num_threads = 1 , &diamond_kernel, kernel_args: NULL, 0 , ev_b_handle, ev_c_handle); doca_dpa_kernel_launch_update_set(wait_ev = NULL, num_threads = 1 , &diamond_kernel, kernel_args: ev_b_handle, wait_threshold_one_parent, ev_e_handle, NULL); doca_dpa_kernel_launch_update_set(wait_ev = NULL, num_threads = 1 , &diamond_kernel, kernel_args: ev_c_handle, wait_threshold_one_parent, ev_d_handle, NULL); doca_dpa_kernel_launch_update_set(wait_ev = NULL, num_threads = 1 , &diamond_kernel, kernel_args: ev_d_handle, wait_threshold_one_parent, ev_e_handle, NULL); doca_dpa_kernel_launch_update_set(wait_ev = NULL, num_threads = 1 , &diamond_kernel, kernel_args: ev_e_handle, wait_threshold_two_parent, comp_ev_handle, NULL); doca_sync_event_update_set(ev_a, 1 ); doca_sync_event_wait_gt(comp_ev, 0 ); teardown_resources(); }

The time interval between a kernel launch call from the host and the start of its execution on the DPA is significantly optimized when the host application calls doca_dpa_kernel_launch_update_<add|set>() repeatedly to execute with the same number of DPA threads. So, if the application calls doca_dpa_kernel_launch_update_<add|set>(..., num_threads = x) , the next call with num_threads = x would have a shorter latency (as low as ~5-7 microseconds) for the start of the kernel's execution.

Applications calling for kernel launch with a wait event (i.e., the completion event of a previous kernel) also have significantly lower latency in the time between the host launching the kernel and the start of the execution of the kernel on the DPA. So, if the application calls doca_dpa_kernel_launch_update_<add|set>( ..., completion event = m_ev, ...) and then doca_dpa_kernel_launch_update_<add|set>( wait event = m_ev, ...) , the latter kernel launch call would have shorter latency (as low as ~3 microseconds) for the start of the kernel's execution.

The order in which kernels are launched is important. If an application launches K1 and then K2, K1 must not depend on K2's completion (e.g., wait on its wait event that K2 should update). Not following this guideline leads to unpredictable results (at runtime) for the application and might require restarting the DOCA DPA context (i.e., destroying, reinitializing, and rerunning the workload).

DPA threads are an actual hardware resource and are, therefore, limited in number to 256 (including internal allocations and allocations explicitly requested by the user as part of the kernel launch API) DOCA DPA does not check these limits. It is up to the application to adhere to this number and track thread allocation across different DPA contexts. Each doca_dpa_dev_rdma_t consumes one thread.

The DPA has an internal watchdog timer to make sure threads do not block indefinitely. Kernel execution time must be finite and not exceed the time returned. by doca_dpa_get_kernel_max_run_time .

The num_threads parameter in the doca_dpa_kernel_launch call cannot exceed the maximum allowed number of threads to run a kernel returned. by doca_dpa_get_max_threads_per_kernel .

DOCA DPA provides a set of debugging APIs to allow diagnosing and troubleshooting any issues on the device, as well as accessing real-time information from the running application.

Logging in the data path has significant impact on an application's performance. While the tracer provided by the library is of high-frequency and is designed to prevent significant impact on the application's performance.

Therefore its recommended to use:

Logging in the control path

Tracing in the data path

The user is able to control the log/trace file path and device log verbosity.

To set/get the trace file path: Copy Copied! doca_error_t doca_dpa_trace_file_set_path(struct doca_dpa *dpa, const char *file_path) doca_error_t doca_dpa_trace_file_get_path(struct doca_dpa *dpa, char *file_path, uint32_t *file_path_len);

To set/get the log file path: Copy Copied! doca_error_t doca_dpa_log_file_set_path(struct doca_dpa *dpa, const char *file_path) doca_error_t doca_dpa_log_file_get_path(struct doca_dpa *dpa, char *file_path, uint32_t *file_path_len)

To set/get device log verbosity: Copy Copied! doca_error_t doca_dpa_set_log_level(struct doca_dpa *dpa, doca_dpa_dev_log_level_t log_level) doca_error_t doca_dpa_get_log_level(struct doca_dpa *dpa, doca_dpa_dev_log_level_t *log_level)

DPA context can enter an error state caused by the device flow. The application can check this error state by calling the following host API:

Copy Copied! doca_error_t doca_dpa_peek_at_last_error( const struct doca_dpa *dpa)

If a fatal error core dump and crash occur, data is written to the file path /tmp/doca_dpa_fatal or to the file path set by the API doca_dpa_log_file_set_path() , with the suffixes .PID.core and .PID.crash respectively, where PID is the process ID. The data written to the file would include a memory snapshot at the time of the crash, which would contain information instrumental in pinpointing the cause of a crash (e.g., the program's state, variable values, and the call stack).

Info Creating core dump files can be done after the DPA application has crashed.

Info This call does not reset the error state.

Note If an error occurred, DPA context enters a fatal state and must be destroyed by the user.

DOCA DPA device library offers an interface for common utilities such as:

Managing DPA thread

Managing DPA completion context

Managing DPA hash table

Log and trace

Managing DPA sync event

Managing DPA DOCA buf and mmap

DPA device library and header files:

Thread restart APIs DPA thread can ends its run using one of the following device APIs: Reschedule API: Copy Copied! void doca_dpa_dev_thread_reschedule( void ) DPA thread is still active DPA thread resources are back to RTOS DPA thread can be triggered again Finish API: Copy Copied! void doca_dpa_dev_thread_finish( void ) DPA thread is marked as finished DPA thread resources are back to RTOS DPA thread cannot be triggered again

To get TLS: Copy Copied! doca_dpa_dev_uintptr_t doca_dpa_dev_thread_get_local_storage( void ) This function returns DPA thread local storage previously set using the host API doca_dpa_thread_set_local_storage() .

Note DPA thread device APIs cannot be used in the user's written kernel which is used in the DOCA DPA kernel launch API.





Kernels get doca_dpa_dev_completion_t handle and invoke the following API:

To get a completion element: Copy Copied! int doca_dpa_dev_get_completion(doca_dpa_dev_completion_t dpa_comp_handle, doca_dpa_dev_completion_element_t *comp_element) Use the returned comp_element to retrieve completion info using the APIs below.

To get completion element type: Copy Copied! typedef enum { DOCA_DPA_DEV_COMP_SEND = 0x0 , /**< Send completion */ DOCA_DPA_DEV_COMP_RECV_RDMA_WRITE_IMM = 0x1 , /**< Receive RDMA Write with Immediate completion */ DOCA_DPA_DEV_COMP_RECV_SEND = 0x2 , /**< Receive Send completion */ DOCA_DPA_DEV_COMP_RECV_SEND_IMM = 0x3 , /**< Receive Send with Immediate completion */ DOCA_DPA_DEV_COMP_SEND_ERR = 0xD , /**< Send Error completion */ DOCA_DPA_DEV_COMP_RECV_ERR = 0xE /**< Receive Error completion */ } doca_dpa_dev_completion_type_t; doca_dpa_dev_completion_type_t doca_dpa_dev_get_completion_type(doca_dpa_dev_completion_element_t comp_element)

To get completion element user data: Copy Copied! uint32_t doca_dpa_dev_get_completion_user_data(doca_dpa_dev_completion_element_t comp_element) This API returns user data which: Was set previously in host API doca_dpa_async_ops_create(..., user_data, ...) When DPA completion context is attached to the DPA async ops. Equivalent to connection_id in doca_error_t doca_rdma_connection_get_id(const struct doca_rdma_connection *rdma_connection, uint32_t *connection_id) When DPA completion context is attached to DOCA RDMA context.

To get completion element immediate data: Copy Copied! uint32_t doca_dpa_dev_get_completion_immediate(doca_dpa_dev_completion_element_t comp_element) This API returns immediate data for a completion element of type: DOCA_DPA_DEV_COMP_RECV_RDMA_WRITE_IMM DOCA_DPA_DEV_COMP_RECV_SEND_IMM

To acknowledge that the completions have been read on the DPA completion context: Copy Copied! void doca_dpa_dev_completion_ack(doca_dpa_dev_completion_t dpa_comp_handle, uint64_t num_comp) This API releases resources of the acked completion elements in the completion context. This acknowledgment enables receiving new num_comp completions.

To request notification on the DPA completion context: Copy Copied! void doca_dpa_dev_completion_request_notification(doca_dpa_dev_completion_t dpa_comp_handle) This API enables requesting new notifications on the DPA completion context. If this function is not called, the DPA completion context is not notified on newly arrived completion elements. Therefore, new completions are not populated in DPA completion context.

Example (device-side pseudo code):

Copy Copied! __dpa_global__ void hello_kernel(uint64_t arg) { DOCA_DPA_DEV_LOG_INFO( "Hello from kernel

" ); doca_dpa_dev_completion_element_t comp_element; found = doca_dpa_dev_get_completion(dpa_comp_handle, &comp_element); if (found) { comp_type = doca_dpa_dev_get_completion_type(comp_element); doca_dpa_dev_completion_ack(dpa_comp_handle, 1 ); doca_dpa_dev_completion_request_notification(dpa_comp_handle); } doca_dpa_dev_thread_reschedule(); }





Kernels get doca_dpa_dev_notification_completion_t handle and invoke the following API:

Copy Copied! void doca_dpa_dev_thread_notify(doca_dpa_dev_notification_completion_t comp_handle)

Calling this API triggers the attached DPA Thread (the one that is specified in dpa_thread parameter in host-side API doca_dpa_notification_completion_create() ).

Kernels get the doca_dpa_dev_async_ops_t handle and invoke the APIs listed below.

Users may control work request submit configuration using the following enum:

Copy Copied! /** * @brief DPA submit flag type */ __dpa_global__ enum doca_dpa_dev_submit_flag { DOCA_DPA_DEV_SUBMIT_FLAG_NONE = 0U, DOCA_DPA_DEV_SUBMIT_FLAG_FLUSH = (1U << 0 ), /** * Use flag to inform related DPA context * (such as RDMA or DPA Async ops) to flush related * operation and previous operations to HW, * otherwise the context may aggregate the operation * and not flush it immediately */ DOCA_DPA_DEV_SUBMIT_FLAG_OPTIMIZE_REPORTS = (1U << 1 ), /** * Use flag to inform related DPA context that it may * defer completion of the operation to a later time. If * flag is not provided then a completion will be raised * as soon as the operation is finished, and any * preceding completions that were deferred will also be * raised. Use this flag to optimize the amount of * completion notifications it receives from HW when * submitting a batch of operations, by receiving a * single completion notification on the entire batch. */ };

This enum can be used in the flags parameter in the following device APIs:

To post a memcpy operation using doca_buf : Copy Copied! void doca_dpa_dev_post_buf_memcpy(doca_dpa_dev_async_ops_t async_ops_handle, doca_dpa_dev_buf_t dst_buf_handle, doca_dpa_dev_buf_t src_buf_handle, uint32_t flags) This API copies data between two DOCA buffers. The destination buffer, specified by dst_buf_handle , will contain the copied data after memory copy is complete. This is a non-blocking routine.

To post a memcpy operation using doca_mmap and an explicit address: Copy Copied! void doca_dpa_dev_post_memcpy(doca_dpa_dev_async_ops_t async_ops_handle, doca_dpa_dev_mmap_t dst_mmap_handle, uint64_t dst_addr, doca_dpa_dev_mmap_t src_mmap_handle, uint64_t src_addr, size_t length, uint32_t flags) This API copies data between two DOCA Mmaps. The destination DOCA Mmap ( dst_addr ), specified by dst_mmap_handle , will contain the copied data in source DOCA Mmap specified by src_mmap_handle , src_addr , and length after memory copy is complete. This is a non-blocking routine. Info Use this API for memcpy instead of using the doca_buf memcpy API to gain better performance.

To post a wait greater operation on a DOCA Sync Event: Copy Copied! void doca_dpa_dev_sync_event_post_wait_gt(doca_dpa_dev_async_ops_t async_ops_handle, doca_dpa_dev_sync_event_t wait_se_handle, uint64_t value) This function posts a wait operation on the DOCA Sync Event using DPA async ops to obtain DPA thread activation. The attached thread is activated when the value of a DOCA Sync Event is greater than a given value. This is a non-blocking routine. Note Valid values must be in the range [0, 254] and can be called for an event with a value in the range [0, 254]. Invalid values leads to anomalous behavior.

To post a wait not equal operation on a DOCA Sync Event: Copy Copied! void doca_dpa_dev_sync_event_post_wait_ne(doca_dpa_dev_async_ops_t async_ops_handle, doca_dpa_dev_sync_event_t wait_se_handle, uint64_t value) This function posts a wait operation on the DOCA Sync Event using the DPA async ops to obtain DPA thread activation. The attached thread is activated when the value of DOCA Sync Event is not equal to a given value. This is a non-blocking routine.

Memory APIs supplied by the DOCA DPA SDK are all asynchronous (i.e., non-blocking).

The user can acquire either:

Pre-configured DOCA Buffers (previously configured with doca_buf_arr_set_params ). Non-configured DOCA Buffers and use below device setters to configure them.

Device-side API operations :

To obtain a single buffer handle from the buf array handle: Copy Copied! doca_dpa_dev_buf_t doca_dpa_dev_buf_array_get_buf(doca_dpa_dev_buf_arr_t buf_arr, const uint64_t buf_idx)

To set/get the address pointed to by the buffer handle: Copy Copied! void doca_dpa_dev_buf_set_addr(doca_dpa_dev_buf_t buf, uintptr_t addr) uintptr_t doca_dpa_dev_buf_get_addr(doca_dpa_dev_buf_t buf)

To set/get the length of the buffer: Copy Copied! void doca_dpa_dev_buf_set_len(doca_dpa_dev_buf_t buf, size_t len) uint64_t doca_dpa_dev_buf_get_len(doca_dpa_dev_buf_t buf)

To set/get the DOCA Mmap associated with the buffer: Copy Copied! void doca_dpa_dev_buf_set_mmap(doca_dpa_dev_buf_t buf, doca_dpa_dev_mmap_t mmap) doca_dpa_dev_mmap_t doca_dpa_dev_buf_get_mmap(doca_dpa_dev_buf_t buf)

To get a pointer to external memory registered on the host using DOCA Buffer: Copy Copied! doca_dpa_dev_uintptr_t doca_dpa_dev_buf_get_external_ptr(doca_dpa_dev_buf_t buf) Info After calling this API, users can read/write the memory indicated by the returned pointer from the DPA kernel.

To get a pointer to external memory registered on the host using an explicit address and DOCA Mmap: Copy Copied! doca_dpa_dev_uintptr_t doca_dpa_dev_mmap_get_external_ptr(doca_dpa_dev_mmap_t mmap_handle, uint64_t addr) Info After calling this API, users can read/write the memory indicated by the returned pointer from the DPA kernel.

Sync events fulfill the following roles:

DOCA DPA execution model is asynchronous and sync events are used to control various threads running in the system (allowing order and dependency)

DOCA DPA supports remote sync events, so the programmer is capable of invoking remote nodes by means of DOCA sync events

Info For host-side APIs, refer to "DOCA Sync Event".

To get the current event value: Copy Copied! doca_dpa_dev_sync_event_get(doca_dpa_dev_sync_event_t event, uint64_t *value)

To add/set to the current event value: Copy Copied! doca_dpa_dev_sync_event_update_<add|set>(doca_dpa_dev_sync_event_t event, uint64_t value)

To wait until event is greater than threshold: Copy Copied! doca_dpa_dev_sync_event_wait_gt(doca_dpa_dev_sync_event_t event, uint64_t value, uint64_t mask) Use mask to apply bitwise AND on the DOCA Sync Event value for comparison with the wait threshold.

To work with a DPA resource created on an extended DOCA DPA context, DOCA DPA offers the following device API:

Copy Copied! void doca_dpa_dev_device_set(doca_dpa_dev_t dpa_handle)

This function must be called before calling any related device API of a DPA resource (e.g., DPA RDMA) created on an extended DOCA DPA context.

Note:

When creating only a DPA base context without an extended DPA context in this application, there is no need to call this API

When creating DPA resources on both base and extended DPA contexts, the proper DPA device (and DPA context) must be set before calling the relevant DPA resource API

Example (device-side pseudo code):

Copy Copied! __dpa_global__ void kernel(uint64_t thread_arg) { doca_dpa_dev_device_set(thread_arg->extended_dpa_ctx_handle); doca_dpa_dev_rdma_post_send(rdma_dpa_handle); ... }





This data structure is managed on DPA using the following device APIs:

To add a new entry to the hash table: Copy Copied! void doca_dpa_dev_hash_table_add(doca_dpa_dev_hash_table_t ht_handle, uint32_t key, uint64_t value) Note Adding a new key when the hash table is full causes anomalous behavior.

To remove an entry from the hash table: Copy Copied! void doca_dpa_dev_hash_table_remove(doca_dpa_dev_hash_table_t ht_handle, uint32_t key)

To return the value to which the specified key is mapped in the hash table: Copy Copied! int doca_dpa_dev_hash_table_find(doca_dpa_dev_hash_table_t ht_handle, uint32_t key, uint64_t *value)

Log to host: Copy Copied! typedef enum doca_dpa_dev_log_level { DOCA_DPA_DEV_LOG_LEVEL_DISABLE = 10 , /**< Disable log messages */ DOCA_DPA_DEV_LOG_LEVEL_CRIT = 20 , /**< Critical log level */ DOCA_DPA_DEV_LOG_LEVEL_ERROR = 30 , /**< Error log level */ DOCA_DPA_DEV_LOG_LEVEL_WARNING = 40 , /**< Warning log level */ DOCA_DPA_DEV_LOG_LEVEL_INFO = 50 , /**< Info log level */ DOCA_DPA_DEV_LOG_LEVEL_DEBUG = 60 , /**< Debug log level */ } doca_dpa_dev_log_level_t; void doca_dpa_dev_log(doca_dpa_dev_log_level_t log_level, const char *format, ...)

Log macros: Copy Copied! DOCA_DPA_DEV_LOG_CRIT(...) DOCA_DPA_DEV_LOG_ERR(...) DOCA_DPA_DEV_LOG_WARN(...) DOCA_DPA_DEV_LOG_INFO(...) DOCA_DPA_DEV_LOG_DBG(...)

To create a trace message entry with arguments: Copy Copied! void doca_dpa_dev_trace(uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5)

To flush the trace message buffer to host: Copy Copied! void doca_dpa_dev_trace_flush( void )

DOCA DPA device communication library offers an interface for communication utilities such as RDMA.

DPA Device Communication Library and Header Files

DOCA DPA communication primitives allow sending data from one node to another.

The object used for the communication between nodes is called an RDMA DPA handle. RDMA DPA handles can be used by kernels only.

RDMAs represent a unidirectional communication pipe between two nodes.

RDMA DPA handles are created when setting a DOCA RDMA context to DPA data path. For more information, please refer to DOCA RDMA.

To track the completion of all communications, the user can attach DOCA RDMA context to a DPA Completion Context.

DPA Completion Context can be associated with a DPA Thread. When the completion context receives a completion on a communication operation, DPA Thread is triggered.

Info The user can choose not to associate it with a DPA Thread and to poll it manually.

Note When running from BlueFIeld and creating DOCA RDMA using SF DOCA device or when creating DOCA RDMA context on DPA, an extended DPA context (created on same SF DOCA device) must be used in the doca_ctx_set_datapath_on_dpa() API.

Relevant host-side APIs:

To create DOCA RDMA context on DPA, the user must use the following API for the DOCA RDMA context: Copy Copied! doca_error_t doca_ctx_set_datapath_on_dpa(struct doca_ctx *ctx, struct doca_dpa *dpa)

To attach a DOCA RDMA context to a DPA completion context: Copy Copied! doca_error_t doca_rdma_dpa_completion_attach(struct doca_rdma *rdma, struct doca_dpa_completion *dpa_comp)

To obtain a DPA RDMA handle and a connection ID: Copy Copied! doca_error_t doca_rdma_get_dpa_handle(struct doca_rdma *rdma, doca_dpa_dev_rdma_t *dpa_rdma) doca_error_t doca_rdma_connection_get_id( const struct doca_rdma_connection *rdma_connection, uint32_t *connection_id) Use output parameters dpa_handle and connection_id in relevant device APIs in the thread kernel. Note DPA RDMAs are not thread safe and, therefore, must not be used from different kernels/threads concurrently

DOCA DPA offers two work models for each device RDMA operation:

An API for RDMA operation using DOCA buffer

An API for RDMA operation using DOCA mmap and an explicit memory address

The user may control "work request submit" configuration using the following enum:

Copy Copied! /** * @brief DPA submit flag type */ __dpa_global__ enum doca_dpa_dev_submit_flag { DOCA_DPA_DEV_SUBMIT_FLAG_NONE = 0U, DOCA_DPA_DEV_SUBMIT_FLAG_FLUSH = (1U << 0 ), /** * Use flag to inform related DPA context * (such as RDMA or DPA Async ops) to flush related * operation and previous operations to HW, * otherwise the context may aggregate the operation * and not flush it immediately */ DOCA_DPA_DEV_SUBMIT_FLAG_OPTIMIZE_REPORTS = (1U << 1 ), /** * Use flag to inform related DPA context that it may * defer completion of the operation to a later time. If * flag is not provided then a completion will be raised * as soon as the operation is finished, and any * preceding completions that were deferred will also be * raised. Use this flag to optimize the amount of * completion notifications it receives from HW when * submitting a batch of operations, by receiving a * single completion notification on the entire batch. */ };

This enum can be used in the flags parameter in the below device APIs.

To aggregate multiple operations and flush them at once to hardware using the DOCA_DPA_DEV_SUBMIT_FLAG_FLUSH flag: Copy Copied! doca_dpa_dev_rdma_post_write(..., DOCA_DPA_DEV_SUBMIT_FLAG_NONE); doca_dpa_dev_rdma_post_write(..., DOCA_DPA_DEV_SUBMIT_FLAG_NONE); doca_dpa_dev_rdma_post_write(..., DOCA_DPA_DEV_SUBMIT_FLAG_FLUSH);

To defer completion of an operation to a later time using the DOCA_DPA_DEV_SUBMIT_FLAG_OPTIMIZE_REPORTS flag: Copy Copied! doca_dpa_dev_rdma_post_write(..., DOCA_DPA_DEV_SUBMIT_FLAG_OPTIMIZE_REPORTS); doca_dpa_dev_rdma_post_write(..., DOCA_DPA_DEV_SUBMIT_FLAG_OPTIMIZE_REPORTS); doca_dpa_dev_rdma_post_write(..., DOCA_DPA_DEV_SUBMIT_FLAG_FLUSH);

To read to a local buffer from the remote side buffer: Copy Copied! void doca_dpa_dev_rdma_post_read(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_mmap_t dst_mmap_handle, uint64_t dst_addr, doca_dpa_dev_mmap_t src_mmap_handle, uint64_t src_addr, size_t length, uint32_t flags) void doca_dpa_dev_rdma_post_buf_read(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_buf_t dst_buf_handle, doca_dpa_dev_buf_t src_buf_handle, uint32_t flags)

To write local memory to the remote side buffer: Copy Copied! void doca_dpa_dev_rdma_post_write(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_mmap_t dst_mmap_handle, uint64_t dst_addr, doca_dpa_dev_mmap_t src_mmap_handle, uint64_t src_addr, size_t length, uint32_t flags) void doca_dpa_dev_rdma_post_buf_write(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_buf_t dst_buf_handle, doca_dpa_dev_buf_t src_buf_handle, uint32_t flags)

To write local memory to the remote side buffer with an immediate data which can be retrieved when receiving a completion on this operation: Copy Copied! void doca_dpa_dev_rdma_post_write_imm(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_mmap_t dst_mmap_handle, uint64_t dst_addr, doca_dpa_dev_mmap_t src_mmap_handle, uint64_t src_addr, size_t length, uint32_t immediate, uint32_t flags) void doca_dpa_dev_rdma_post_buf_write_imm(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_buf_t dst_buf_handle, doca_dpa_dev_buf_t src_buf_handle, uint32_t immediate, uint32_t flags) uint32_t doca_dpa_dev_get_completion_immediate(doca_dpa_dev_completion_element_t comp_element)

To send local memory: Copy Copied! void doca_dpa_dev_rdma_post_send(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_mmap_t mmap_handle, uint64_t addr, size_t length, uint32_t flags) void doca_dpa_dev_rdma_post_buf_send(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_buf_t send_buf_handle, uint32_t flags)

To send local memory with an immediate data which can be retrieved when receiving a completion on this operation: Copy Copied! void doca_dpa_dev_rdma_post_send_imm(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_mmap_t mmap_handle, uint64_t addr, size_t length, uint32_t immediate, uint32_t flags) void doca_dpa_dev_rdma_post_buf_send_imm(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_buf_t send_buf_handle, uint32_t immediate, uint32_t flags) uint32_t doca_dpa_dev_get_completion_immediate(doca_dpa_dev_completion_element_t comp_element)

To handle posting an RDMA receive operation, use the following APIs: To post RDMA receive operation: Copy Copied! void doca_dpa_dev_rdma_post_receive(doca_dpa_dev_rdma_t rdma, doca_dpa_dev_mmap_t mmap_handle, uint64_t addr, size_t length) void doca_dpa_dev_rdma_post_buf_receive(doca_dpa_dev_rdma_t rdma, doca_dpa_dev_buf_t receive_buf_handle) To acknowledge that post receive operations are done (data has been received on associated data buffers): Copy Copied! void doca_dpa_dev_rdma_receive_ack(doca_dpa_dev_rdma_t rdma, uint32_t num_acked) This operation frees num_acked entries in the DOCA RDMA context.

To perform an atomic add operation on the remote side buffer: Copy Copied! void doca_dpa_dev_rdma_post_atomic_fetch_add(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_mmap_t dst_mmap_handle, uint64_t dst_addr, uint64_t value, uint32_t flags) void doca_dpa_dev_rdma_post_buf_atomic_fetch_add(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_buf_t dst_buf_handle, uint64_t value, uint32_t flags)

To signal a remote event: Copy Copied! doca_dpa_dev_rdma_signal_<add|set>(doca_dpa_dev_rdma_t rdma, uint32_t connection_id, doca_dpa_dev_sync_event_remote_t remote_sync_event, uint64_t count)

To support multiple DOCA RDMA connection management in DPA, DOCA offers the following APIs: