DOCA DMA

This guide provides instructions on building and developing applications that require copying memory using Direct Memory Access (DMA).

DOCA DMA provides an API to copy data between DOCA buffers using hardware acceleration, supporting both local and remote memory regions.

The library provides an API for executing DMA operations on DOCA buffers, where these buffers reside either in local memory (i.e., within the same host) or host memory accessible by the DPU. See DOCA Core for more information about the memory subsystem.

Using DOCA DMA, complex memory copy operations can be easily executed in an optimized, hardware-accelerated manner.

This document is intended for software developers wishing to accelerate their application's memory I/O operations and access memory that is not local to the host.

This library follows the architecture of a DOCA Core Context, it is recommended read the following sections before:

DOCA DMA-based applications can run either on the host machine or on the NVIDIA® BlueField® DPU target.

Copying from Host to DPU and vice versa only works with a DPU configured running in DPU mode as described in NVIDIA BlueField DPU Modes of Operation.

DOCA DMA is a DOCA Context as defined by DOCA Core. See NVIDIA DOCA Core Context for more information.

DOCA DMA leverages DOCA Core architecture to expose asynchronous tasks/events that are offloaded to hardware.

DMA can be used to copy data as follows:

  • Copying from local memory to local memory:

    DMA_Local_Copy-version-1-modificationdate-1707420754417-api-v2.png

  • Using DPU to copy memory between host and DPU:

    DMA_remote_on_DPU-version-1-modificationdate-1707420754157-api-v2.png

  • Using host to copy memory between host and DPU:

    DMA_remote_on_Host-version-1-modificationdate-1707420753787-api-v2.png

Objects

Device and Device Representor

The DMA library needs a DOCA device to operate. The device is used to access memory and perform the actual copy. See DOCA Core Device Discovery.

For same BlueField DPU, it does not matter which device is used (PF/VF/SF), as all these devices utilize the same hardware component. If there are multiple DPUs, then it is possible to create a DMA instance per DPU, providing each instance with a device from a different DPU.

To access memory that is not local (from the host to the DPU or vice versa), the DPU side of the application must select a device with an appropriate representor. See DOCA Core Device Representor Discovery.

The device must stay valid for as long as the DMA instance is not destroyed.

Memory Buffers

The memory copy task requires two DOCA buffers containing the destination and the source. Depending on the allocation pattern of the buffers, refer to the table in the "Inventory Types" section. To find what kind of memory is supported, refer to the table in section "Buffer Support".

Buffers must not be modified or read during the memory copy operation.

To start using the library, users must go through a configuration phase as described in DOCA Core Context Configuration Phase.

This section describes how to configure and start the context, to allow execution of tasks and retrieval of events.

Configurations

The context can be configured to match the application use case.

To find if a configuration is supported, or what the min/max value for it is, refer to section "Device Support".

Mandatory Configurations

These configurations are mandatory and must be set by the application before attempting to start the context:

  • At least one task/event type must be configured. See configuration of Tasks and/or Events

  • A device with appropriate support must be provided upon creation

Device Support

DOCA DMA requires a device to operate. To picking a device, refer to DOCA Core Device Discovery.

As device capabilities may change (see DOCA Core Device Support), it is recommended to select your device using the following method:

  • doca_dma_cap_task_memcpy_is_supported

Some devices can allow different capabilities as follows:

  • The maximum number of tasks

  • The maximum buffer size

Buffer Support

Tasks support buffers with the following features:

Buffer Type

Source Buffer

Destination Buffer

Local mmap buffer

Yes

Yes

mmap from PCIe export buffer

Yes

Yes

mmap From RDMA export buffer

No

No

Linked list buffer

Yes

No


This section describes execution on CPU using DOCA Core Progress Engine.

Tasks

DOCA DMA exposes asynchronous tasks that leverage the DPU hardware according to the DOCA Core architecture. See DOCA Core Task.

Memory Copy Task

The memory copy task allows copying memory from one location to another. Using buffers as described in Buffer Support.

Configuration

Description

API to set the configuration

API to query support

Enable the task

doca_dma_task_memcpy_set_conf

doca_dma_cap_task_memcpy_is_supported

Number of tasks

doca_dma_task_memcpy_set_conf

doca_dma_cap_get_max_num_tasks

Maximal Buffer Size

doca_dma_cap_task_memcpy_get_max_buf_size

Maximum buffer list size

doca_dma_cap_task_memcpy_get_max_buf_list_len


Input

Common input as described in DOCA Core Task.

Name

Description

Notes

Source buffer

Buffer that points to the memory to be copied

Only the data residing in the data segment is copied

Destination buffer

Buffer that points to where memory is copied

The data is copied to the tail segment extending the data segment


Output

Common output as described in DOCA Core Task.

Task Successful Completion

After the task is completed successfully:

  • The data is copied form source to destination

  • The destination buffer data segment is extended to include the copied data

Task Failed Completion

If the task fails midway:

  • The context may enter stopping state, if a fatal error occurs

  • The source and destination doca_buf objects are not modified

  • The destination buffer contents may be modified

Limitations

  • The operation is not atomic

  • Once the task has been submitted, then the source and destination should not be read/written to

  • Source and destination must not overlap

  • Other limitations are described in DOCA Core Task

Events

DOCA DMA exposes asynchronous events to notify on changes that happen unexpectedly, according to DOCA Core architecture.

The only event DMA exposes is common events as described in DOCA Core Event.

The DOCA DMA library follows the Context state machine as described in DOCA Core Context State Machine.

The following section describes how to move states and what is allowed in each state.

Idle

In this state it is expected that application:

  • Destorys the context

  • Starts the context

Allowed operations:

  • Configuring the context according to Configurations

  • Starting the context

It is possible to reach this state as follows:

Previous State

Transition Action

None

Create the context

Running

Call stop after making sure all tasks have been freed

Stopping

Call progress until all tasks are completed and freed


Starting

This state cannot be reached.

Running

In this state it is expected that application:

  • Allocates and submits tasks

  • Calls progress to complete tasks and/or receive events

Allowed operations:

  • Allocating a previously configured task

  • Submitting a task

  • Calling stop

It is possible to reach this state as follows:

Previous State

Transition Action

Idle

Call start after configuration


Stopping

In this state it is expected that application:

  • Calls progress to complete all inflight tasks (tasks complete with failure)

  • Frees any completed tasks

Allowed operations:

  • Call progress

It is possible to reach this state as follows:

Previous State

Transition Action

Running

Call progress and fatal error occurs

Running

Call stop without freeing all tasks


DOCA DMA only supports datapath on the CPU. See Execution Phase.

This section describes DOCA DMA samples based on the DOCA DMA library.

The samples illustrate how to use the DOCA DMA API to do the following:

  • Copy contents of a local buffer to another buffer

  • Use DPU to copy contents of buffer on the host to a local buffer

Running the Samples

  1. Refer to the following documents:

  2. To build a given sample:

    Copy
    Copied!
                

    cd /opt/mellanox/doca/samples/doca_dma/dma_local_copy meson /tmp/build ninja -C /tmp/build

    The binary doca_dma_local_copy is created under /tmp/build/.

  3. Sample (e.g., doca_dma_local_copy ) usage:

    Copy
    Copied!
                

    Usage: doca_<sample_name> [DOCA Flags] [Program Flags]   DOCA Flags:   -h, --help                              Print a help synopsis   -v, --version                           Print program version information       -l, --log-level                         Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   -j, --json <path>                       Parse all command flags from an input json file    Program Flags:   -p, --pci_addr <PCI-ADDRESS>            PCI device address -t, --text Text to DMA copy

  4. For additional information per sample, use the -h option:

    Copy
    Copied!
                

    /tmp/build/<sample_name> -h

Samples

DMA Local Copy

This sample illustrates how to locally copy memory with DMA from one buffer to another on the DPU. This sample should be run on the DPU.

The sample logic includes:

  1. Locating DOCA device.

  2. Initializing needed DOCA core structures.

  3. Populating DOCA memory map with two relevant buffers.

  4. Allocating element in DOCA buffer inventory for each buffer.

  5. Initializing DOCA DMA memory copy task object.

  6. Submitting DMA task.

  7. Handling task completion once it is done.

  8. Checking task result.

  9. Destroying all DMA and DOCA core structures.

Reference:

  • /opt/mellanox/doca/samples/doca_dma/dma_local_copy/dma_local_copy_sample.c

  • /opt/mellanox/doca/samples/doca_dma/dma_local_copy/dma_local_copy_main.c

  • /opt/mellanox/doca/samples/doca_dma/dma_local_copy/meson.build

DMA Copy DPU

Warning

This sample should run only after DMA Copy Host is run and the required configuration files (descriptor and buffer) have been copied to the DPU.

This sample illustrates how to copy memory (which contains user defined text) with DMA from the x86 host into the DPU. This sample should be run on the DPU.

The sample logic includes:

  1. Locating DOCA device.

  2. Initializing needed DOCA core structures.

  3. Reading configuration files and saving their content into local buffers.

  4. Allocating the local destination buffer in which the host text is to be saved.

  5. Populating DOCA memory map with destination buffer.

  6. Creating the remote memory map with the export descriptor file.

  7. Creating memory map to the remote buffer.

  8. Allocating element in DOCA buffer inventory for each buffer.

  9. Initializing DOCA DMA memory copy task object.

  10. Submitting DMA task.

  11. Handling task completion once it is done.

  12. Checking DMA task result.

  13. If the DMA task ends successfully, printing the text that has been copied to log.

  14. Printing to log that the host-side sample can be closed.

  15. Destroying all DMA and DOCA core structures.

Reference:

  • /opt/mellanox/doca/samples/doca_dma/dma_copy_dpu/dma_copy_dpu_sample.c

  • /opt/mellanox/doca/samples/doca_dma/dma_copy_dpu/dma_copy_dpu_main.c

  • /opt/mellanox/doca/samples/doca_dma/dma_copy_dpu/meson.build

DMA Copy Host

Warning

This sample should be run first. It is user responsibility to transfer the two configuration files (descriptor and buffer) to the DPU and provide their path to the DMA Copy DPU sample.

This sample illustrates how to allow memory copy with DMA from the x86 host into the DPU. This sample should be run on the host.

The sample logic includes:

  1. Locating DOCA device.

  2. Initializing needed DOCA core structures.

  3. Populating DOCA memory map with source buffer.

  4. Exporting memory map.

  5. Saving export descriptor and local DMA buffer information into files. These files should be transferred to the DPU before running the DPU sample.

  6. Waiting until DPU DMA sample has finished.

  7. Destroying all DMA and DOCA core structures.

Reference:

  • /opt/mellanox/doca/samples/doca_dma/dma_copy_host/dma_copy_host_sample.c

  • /opt/mellanox/doca/samples/doca_dma/dma_copy_host/dma_copy_host_main.c

  • /opt/mellanox/doca/samples/doca_dma/dma_copy_host/meson.build

© Copyright 2023, NVIDIA. Last updated on Feb 9, 2024.