These sections discuss the usage of the DOCA DMA library in real-world situations. Most of this section utilizes code which is available through the DOCA DMA sample projects located under /samples/doca_dma/dma_local_copy .

When memory is local to your DOCA application (i.e., you can directly access the memory space of both source and destination buffers) this is referred to as a local DMA operation.

The following step-by-step guide goes through the various stages required to initialize, execute, and clean-up a local memory DMA operation.



The DMA API uses the DOCA core library to create the required objects (memory map, inventory, buffers, etc.) for the DMA operations. This section runs through this process in a logical order. If you already have some of these operations in your DOCA application, you may skip or modify them as needed.



The first requirement is to open a DOCA device, normally your BlueField controller. You should iterate all DOCA devices (via doca_devinfo_list_create() ), select one using some criteria (e.g., PCIe address), then the device should be opened (via doca_dev_open() ). More information that may help decide on a device can be found in the Device Support section. Once the desired device is opened, the list can be immediately destroyed (via doca_devinfo_list_destroy() ). This frees the resources of all devices other than the one that was opened.

DOCA DMA requires several DOCA objects to be created. This includes the memory map ( doca_mmap_create() ), buffer inventory ( doca_buf_inventory_create() ), work queue ( doca_workq_create() ). DOCA DMA also requires the actual DOCA DMA context to be created ( doca_dma_create() ).

Once a DMA instance is created, it can be used as a context (using doca_ctx APIs). This can be achieved by getting a context representation using doca_dma_as_ctx() .

In this phase of initialization, the core objects are ready to be set up and started.



The memory map is used to define the memory region where data is copied to or from. See NVIDIA DOCA Core Programming Guide for more details about memory subsystem.

Consider the case where the source data and destination data reside in two different memory ranges which are not necessarily continuous: For that purpose, two doca_mmap s must be created, a source mmap and a destination mmap.

The initialization of both mmaps is similar:

Set the source or destination memory range using doca_mmap_set_memrange() . Add the doca_device that has been opened earlier (must be same device used for the DMA context initialization later) using doca_mmap_dev_add() . Set permissions of the mmap. In this case, set the minimum viable permissions as follows: The source mmap is only be used for reading data ( DOCA_ACCESS_LOCAL_READ_ONLY )

) The destination mmap is used for writing data ( DOCA_ACCESS_LOCAL_READ_WRITE ) Start the mmap using doca_mmap_start() . Once this is done, the mmap cannot be configured any further.

The inventory is used to allocate two doca_buf s; one for the source and another for the destination. Unlike with mmap, it is enough to allocate a single inventory to hold them both.

The initialization of buffer inventory:

Specify that buffer inventory must accommodate two buffers during creation using doca_buf_inventory_create() . Start the inventory using doca_buf_inventory_start() .

The DMA context must be created and prepared to start receiving jobs:

Create the DMA context ( doca_dma_create() ). (Optional) Verify that DMA is supported ( doca_dma_job_get_supported() ). Get a doca_ctx representation of the DMA context ( doca_dma_as_ctx() ). Add a device to the context ( doca_ctx_dev_add() ). Must be the same device added to the mmap. Start the context ( doca_ctx_start() ). After this step, the context can no longer be configured. Add a WorkQ to the context ( doca_ctx_workq_add() ). This allows submission of DMA jobs to that WorkQ.

Prior to building and submitting a DOCA DMA operation, you must construct two DOCA buffers for the source and destination addresses (the addresses used must exist within any of the memory regions populated in the memory map). The doca_buf_inventory_buf_by_data() returns a doca_buffer with the data pointer and data length. Alternatively, it is possible to first allocate the buffer doca_buf_inventory_buf_by_addr() and then include only a segment within the buffer to be used in the DMA operation by using the doca_buf_set_data() API to set the data pointer and length.

These are the buffers supplied to the DMA operation the source buffer is used to determine the length to copy, where the destination buffer must be long enough to hold the data.

At this stage, there are two initialized mmaps and an inventory. It is now possible to use doca_buf_inventory_buf_by_data() to allocate the source and destination buffers for the job. Considerations for using this API:

The inventory holds the doca_buf descriptors, so that no memory is allocated in this stage

descriptors, so that no memory is allocated in this stage The mmap holds necessary information for mapping memory to device. The caller must provide the matching source/destination mmap.

As a result, doca_buf_inventory_buf_by_data() is considered non-resource intensive and can be done in data path

To begin the DMA operation, you must enqueue a DMA job on the previously created work queue object. This involves creating the DMA job (struct doca_dma_job_memcpy ) that is a composite of specific DMA fields.

Within the DMA job structure, the type field should be set to DOCA_DMA_JOB_MEMCPY with the context field pointing to your DMA context.

The DMA specific elements of the job point to your DOCA buffers for source and destination.

Finally, the doca_workq_submit() API call is used to submit the DMA operation to the hardware. Some errors may be detected immediately after submitting the job while others are only discovered midway through the job. For such cases, please refer to Completion Event Retrieval.

The DMA operation is asynchronous in nature. Therefore, you must enqueue the operation and then, later, poll for completion.

The DMA operation is not atomic. Therefore, it is imperative for the application to handle synchronization appropriately. Note: A DMA operation is not atomic because it is possible for the host side to read a memory which is accessed in parallel by the DPU. Therefore, the application must add a synchronization mechanism so data is not corrupted. For more details, please refer to section "DOCA Sync Event" in the NVIDIA DOCA Core Programming Guide.

To detect when the DMA operation has completed, you should periodically poll the work queue (via doca_workq_progress_retrieve() ).

If the call returns a valid event, the doca_event type field should be tested before inspecting the result as other WorkQ operations (i.e., non-DMA operations) present their events differently. Refer to their respective guides for more information.

To clean up the doca_buffers , you should dereference them using the doca_buf_refcount_rm() call. This call should be made on both buffers when you are done with them (regardless of whether the operation is successful or not). If the source buffer is a linked list, then it is enough to only dereference the head. That effectively releases the entire list.

The main cleanup process is to remove the worker queue from the context ( doca_ctx_workq_rm() ), stop the context itself ( doca_ctx_stop() ), remove the device from the context ( doca_ctx_dev_rm() ).