DOCA DMA
This guide provides instructions on building and developing applications that require copying memory using Direct Memory Access (DMA).
DOCA DMA provides an API to copy data between DOCA buffers using hardware acceleration, supporting both local and remote memory regions.
The library provides an API for executing DMA operations on DOCA buffers, where these buffers reside either in local memory (i.e., within the same host) or host memory accessible by the DPU. See DOCA Core for more information about the memory subsystem.
Using DOCA DMA, complex memory copy operations can be easily executed in an optimized, hardware-accelerated manner.
This document is intended for software developers wishing to accelerate their application's memory I/O operations and access memory that is not local to the host.
This library follows the architecture of a DOCA Core Context, it is recommended read the following sections before:
DOCA DMA-based applications can run either on the host machine or on the NVIDIA® BlueField® DPU target.
Copying from Host to DPU and vice versa only works with a DPU configured running in DPU mode as described in NVIDIA BlueField Modes of Operation.
DOCA DMA is a DOCA Context as defined by DOCA Core. See NVIDIA DOCA Core Context for more information.
DOCA DMA leverages DOCA Core architecture to expose asynchronous tasks/events that are offloaded to hardware.
DMA can be used to copy data as follows:
Copying from local memory to local memory:
Using DPU to copy memory between host and DPU:
Using host to copy memory between host and DPU:
Objects
Device and Device Representor
The DMA library needs a DOCA device to operate. The device is used to access memory and perform the actual copy. See DOCA Core Device Discovery.
For same BlueField DPU, it does not matter which device is used (PF/VF/SF), as all these devices utilize the same hardware component. If there are multiple DPUs, then it is possible to create a DMA instance per DPU, providing each instance with a device from a different DPU.
To access memory that is not local (from the host to the DPU or vice versa), the DPU side of the application must select a device with an appropriate representor. See DOCA Core Device Representor Discovery.
The device must stay valid for as long as the DMA instance is not destroyed.
Memory Buffers
The memory copy task requires two DOCA buffers containing the destination and the source. Depending on the allocation pattern of the buffers, refer to the table in the "Inventory Types" section. To find what kind of memory is supported, refer to the table in section "Buffer Support".
Buffers must not be modified or read during the memory copy operation.
To start using the library, users must go through a configuration phase as described in DOCA Core Context Configuration Phase.
This section describes how to configure and start the context, to allow execution of tasks and retrieval of events.
Configurations
The context can be configured to match the application use case.
To find if a configuration is supported, or what the min/max value for it is, refer to section "Device Support".
Mandatory Configurations
These configurations are mandatory and must be set by the application before attempting to start the context:
At least one task/event type must be configured. See configuration of Tasks and/or Events
A device with appropriate support must be provided upon creation
Device Support
DOCA DMA requires a device to operate. To picking a device, refer to DOCA Core Device Discovery.
As device capabilities may change (see DOCA Core Device Support), it is recommended to select your device using the following method:
doca_dma_cap_task_memcpy_is_supported
Some devices can allow different capabilities as follows:
The maximum number of tasks
The maximum buffer size
Buffer Support
Tasks support buffers with the following features:
Buffer Type |
Source Buffer |
Destination Buffer |
Local mmap buffer |
Yes |
Yes |
mmap from PCIe export buffer |
Yes |
Yes |
mmap From RDMA export buffer |
No |
No |
Linked list buffer |
Yes |
No |
This section describes execution on CPU using DOCA Core Progress Engine.
Tasks
DOCA DMA exposes asynchronous tasks that leverage the DPU hardware according to the DOCA Core architecture. See DOCA Core Task.
Memory Copy Task
The memory copy task allows copying memory from one location to another. Using buffers as described in Buffer Support.
Configuration
Description |
API to set the configuration |
API to query support |
Enable the task |
doca_dma_task_memcpy_set_conf |
doca_dma_cap_task_memcpy_is_supported |
Number of tasks |
doca_dma_task_memcpy_set_conf |
doca_dma_cap_get_max_num_tasks |
Maximal Buffer Size |
– |
doca_dma_cap_task_memcpy_get_max_buf_size |
Maximum buffer list size |
– |
doca_dma_cap_task_memcpy_get_max_buf_list_len |
Input
Common input as described in DOCA Core Task.
Name |
Description |
Notes |
Source buffer |
Buffer that points to the memory to be copied |
Only the data residing in the data segment is copied |
Destination buffer |
Buffer that points to where memory is copied |
The data is copied to the tail segment extending the data segment |
Output
Common output as described in DOCA Core Task.
Task Successful Completion
After the task is completed successfully:
The data is copied form source to destination
The destination buffer data segment is extended to include the copied data
Task Failed Completion
If the task fails midway:
The context may enter stopping state, if a fatal error occurs
The source and destination doca_buf objects are not modified
The destination buffer contents may be modified
Limitations
The operation is not atomic
Once the task has been submitted, then the source and destination should not be read/written to
Source and destination must not overlap
Other limitations are described in DOCA Core Task
Events
DOCA DMA exposes asynchronous events to notify on changes that happen unexpectedly, according to DOCA Core architecture.
The only event DMA exposes is common events as described in DOCA Core Event.
The DOCA DMA library follows the Context state machine as described in DOCA Core Context State Machine.
The following section describes how to move states and what is allowed in each state.
Idle
In this state it is expected that application:
Destorys the context
Starts the context
Allowed operations:
Configuring the context according to Configurations
Starting the context
It is possible to reach this state as follows:
Previous State |
Transition Action |
None |
Create the context |
Running |
Call stop after making sure all tasks have been freed |
Stopping |
Call progress until all tasks are completed and freed |
Starting
This state cannot be reached.
Running
In this state it is expected that application:
Allocates and submits tasks
Calls progress to complete tasks and/or receive events
Allowed operations:
Allocating a previously configured task
Submitting a task
Calling stop
It is possible to reach this state as follows:
Previous State |
Transition Action |
Idle |
Call start after configuration |
Stopping
In this state it is expected that application:
Calls progress to complete all inflight tasks (tasks complete with failure)
Frees any completed tasks
Allowed operations:
Call progress
It is possible to reach this state as follows:
Previous State |
Transition Action |
Running |
Call progress and fatal error occurs |
Running |
Call stop without freeing all tasks |
DOCA DMA only supports datapath on the CPU. See Execution Phase.
This section describes DOCA DMA samples based on the DOCA DMA library.
The samples illustrate how to use the DOCA DMA API to do the following:
Copy contents of a local buffer to another buffer
Use DPU to copy contents of buffer on the host to a local buffer
Running the Samples
Refer to the following documents:
NVIDIA DOCA Installation Guide for Linux for details on how to install BlueField-related software.
NVIDIA DOCA Troubleshooting Guide for any issue you may encounter with the installation, compilation, or execution of DOCA samples.
To build a given sample:
cd
/opt/mellanox/doca/samples/doca_dma/dma_local_copy meson /tmp/build ninja -C /tmp/buildThe binary doca_dma_local_copy is created under /tmp/build/.
Sample (e.g., doca_dma_local_copy ) usage:
Usage: doca_<sample_name> [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -
v
, --version Print program version information -l, --log-level Set the (numeric) log levelfor
the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse allcommand
flags from an input jsonfile
Program Flags: -p, --pci_addr <PCI-ADDRESS> PCI device address -t, --text Text to DMA copyFor additional information per sample, use the -h option:
/tmp/build/<sample_name> -h
Samples
DMA Local Copy
This sample illustrates how to locally copy memory with DMA from one buffer to another on the DPU. This sample should be run on the DPU.
The sample logic includes:
Locating DOCA device.
Initializing needed DOCA core structures.
Populating DOCA memory map with two relevant buffers.
Allocating element in DOCA buffer inventory for each buffer.
Initializing DOCA DMA memory copy task object.
Submitting DMA task.
Handling task completion once it is done.
Checking task result.
Destroying all DMA and DOCA core structures.
Reference:
/opt/mellanox/doca/samples/doca_dma/dma_local_copy/dma_local_copy_sample.c
/opt/mellanox/doca/samples/doca_dma/dma_local_copy/dma_local_copy_main.c
/opt/mellanox/doca/samples/doca_dma/dma_local_copy/meson.build
DMA Copy DPU
This sample should run only after DMA Copy Host is run and the required configuration files (descriptor and buffer) have been copied to the DPU.
This sample illustrates how to copy memory (which contains user defined text) with DMA from the x86 host into the DPU. This sample should be run on the DPU.
The sample logic includes:
Locating DOCA device.
Initializing needed DOCA core structures.
Reading configuration files and saving their content into local buffers.
Allocating the local destination buffer in which the host text is to be saved.
Populating DOCA memory map with destination buffer.
Creating the remote memory map with the export descriptor file.
Creating memory map to the remote buffer.
Allocating element in DOCA buffer inventory for each buffer.
Initializing DOCA DMA memory copy task object.
Submitting DMA task.
Handling task completion once it is done.
Checking DMA task result.
If the DMA task ends successfully, printing the text that has been copied to log.
Printing to log that the host-side sample can be closed.
Destroying all DMA and DOCA core structures.
Reference:
/opt/mellanox/doca/samples/doca_dma/dma_copy_dpu/dma_copy_dpu_sample.c
/opt/mellanox/doca/samples/doca_dma/dma_copy_dpu/dma_copy_dpu_main.c
/opt/mellanox/doca/samples/doca_dma/dma_copy_dpu/meson.build
DMA Copy Host
This sample should be run first. It is user responsibility to transfer the two configuration files (descriptor and buffer) to the DPU and provide their path to the DMA Copy DPU sample.
This sample illustrates how to allow memory copy with DMA from the x86 host into the DPU. This sample should be run on the host.
The sample logic includes:
Locating DOCA device.
Initializing needed DOCA core structures.
Populating DOCA memory map with source buffer.
Exporting memory map.
Saving export descriptor and local DMA buffer information into files. These files should be transferred to the DPU before running the DPU sample.
Waiting until DPU DMA sample has finished.
Destroying all DMA and DOCA core structures.
Reference:
/opt/mellanox/doca/samples/doca_dma/dma_copy_host/dma_copy_host_sample.c
/opt/mellanox/doca/samples/doca_dma/dma_copy_host/dma_copy_host_main.c
/opt/mellanox/doca/samples/doca_dma/dma_copy_host/meson.build