DOCA Documentation v3.2.0

DOCA Storage Applications

This page outlines NVIDIA DOCA storage applications that demonstrate how to develop data storage implementations on the NVIDIA® BlueField® platform.

The DOCA storage applications serve as a reference to demonstrate how a block storage solution can be developed using the DOCA framework on the NVIDIA BlueField platform.

These applications are simplified to make the flow and overall process easy to understand. A real-world storage solution involves significantly more complexity, and solutions must be tailored to specific use cases, which is not attempted in this reference design.

The general architecture involves an initiator host application sending storage requests to a local DPU application. The DPU then interfaces with a remote storage target using RDMA, orchestrating the storage and retrieval of data directly to or from the initiator application's memory.

storage_ref_architecture-version-1-modificationdate-1761134489833-api-v2.png

The storage reference consists of several applications that can be orchestrated in different combinations to support varying storage approaches and implementations. Each application is described at a high level below, followed by detailed use case examples highlighting how these applications can be combined.

doca_storage_initiator_comch

The doca_storage_initiator_comch application serves two purposes. First, it acts as the user/client of the storage reference application that controls the execution flow. Second, it includes benchmarking capabilities to demonstrate the achievable performance of the storage solution. The application supports the following tests:

  • Read throughput performance test

  • Write throughput performance test

  • Read-only data validation test

  • Write-then-read data validation test

doca_storage_target_rdma

The doca_storage_target_rdma application provides a simple storage backend by utilizing a block of memory instead of interfacing with physical non-volatile storage. It includes the following features:

  • Configurable block count

  • Configurable block size

  • Import of initial data from a text file

  • Import of initial data from a binary format (.sbc)

doca_storage_comch_to_rdma_zero_copy

The doca_storage_comch_to_rdma_zero_copy application serves as a bridge between the initiator and a single storage target. It facilitates the exchange of the control path messages and data path messages between the initiator and the target. The data being exchanged between the initiator and the target is never visible in this application as is exchanged directly between the initiator and the target via RDMA which is it is referred to as zero copy. It offers the following features:

  • Facilitates the zero copy transfer of data from initiator to target

doca_storage_comch_to_rdma_gga_offload

The doca_storage_comch_to_rdma_gga_offload application serves as a bridge between the initiator and three storage targets. It is active in the transfer of data and uses doca_libs to accelerate some data operations transparently to the initiator and targets. It offers the following features:

  • Transparent data redundancy and error correction

  • Inline compression and decompression of stored data

  • Inline creation of redundancy data

  • Inline recovery of lost data blocks

  • Configuration of how frequently a simulated data loss occurs

doca_storage_gga_offload_sbc_generator

The doca_storage_gga_offload_sbc_generator application is used to generate Storage Binary Content (.sbc) files. These files are consumed by the doca_storage_target_rdma application when it is used with the doca_storage_comch_to_rdma_gga_offload application.

Using a user-supplied data file, the generator:

  1. Splits the input data into chunks.

  2. Compresses each chunk.

  3. Creates error correction blocks for each compressed block.

The output is stored in three files (named by the user), which can be used to initialize the storage with valid data upon startup:

  • Data1 file (data_1)

  • Data2 file (data_2)

  • Data-parity file (data_p)

This use case demonstrates a simple, hardware-accelerated storage solution that avoids unnecessary data copying by using the DOCA Comch, DOCA Core, and DOCA RDMA libraries. Data is transferred directly between the initiator's memory and the storage target's memory via DOCA RDMA. The initiator has no awareness of the target; it only communicates with the service.

Application Components

This use case involves three distinct applications:

  • Initiator (doca_storage_initiator_comch): A client/benchmark application that sends read and write requests to the storage backend.

  • Service (doca_storage_comch_to_rdma_zero_copy): A bridge that abstracts the target's details from the initiator. It relays IO messages between the two.

  • Target (doca_storage_target_rdma): A memory-based storage application that executes RDMA operations to fulfill the initiator's requests.

Operational Flow

The use case operates in two main phases.

Setup Phase

The applications first use DOCA Comch and a TCP socket to register with each other, share configuration, and prepare the data path operations.

storage_ref_zero_copy_arch_control-version-1-modificationdate-1761134489457-api-v2.png

Data Path Phase

Once configured, the data path contexts (doca_comch_producer, doca_comch_consumer, and doca_rdma) are used to perform the actual storage requests and send completion responses.

storage_ref_zero_copy_arch_data_path-version-1-modificationdate-1761134489043-api-v2.png

Memory Layout

The memory layout is simple: both the initiator and the target allocate memory regions partitioned into blocks. All RDMA read/write operations transfer data directly between these memory regions.

use_case_1_memory_layout-version-1-modificationdate-1761134488773-api-v2.png

Info

The number of blocks on the initiator and target does not need to match, but their block size must be identical.


Data Flow

The data flow for a storage operation is as follows:

  1. The Initiator sends a read/write IO request to the Service (via doca_comch_producer).

  2. The Service receives the request (via doca_comch_consumer).

  3. The Service forwards the request to the Target (via doca_rdma).

  4. The Target performs the RDMA read or write operation, transferring data directly between its memory and the Initiator's memory.

  5. The Target sends an IO response (completion) to the Service (via doca_rdma).

  6. The Service receives the IO response.

  7. The Service forwards the response to the Initiator (via doca_comch_producer).

  8. The Initiator receives the final response (via doca_comch_consumer).

Running the Zero Copy Use Case

Info

Assumptions:

  • The service and storage applications can communicate via TCP

  • The service is deployed on a BlueField-3 DPU

  • Full line-rate RDMA data transfer is possible between service and storage

Note

The examples below use CPU 0. Users should ensure that this CPU is suitable, considering NUMA configuration, pinned cores, kernel interrupts, etc.

Run the applications in the following order:

  1. doca_storage_target_rdma

  2. doca_storage_comch_to_rdma_zero_copy

  3. doca_storage_initiator_comch

Replace the following placeholders with actual values:

  • {{INITIATOR_DEV_PCI_ADDR}}

  • {{STORAGE_DEV_PCI_ADDR}}

  • {{STORAGE_IP_ADDR}}

  • {{STORAGE_TCP_PORT}}

Run Write Data Validity Test

  1. Run the storage target:

    Copy
    Copied!
                

    doca_storage_target_rdma \ -d {{STORAGE_DEV_PCI_ADDR}} \ --listen-port {{STORAGE_TCP_PORT}} \ --block-size 4096 \ --block-count 1024 \     --cpu 0

  2. Run the zero copy service:

    Copy
    Copied!
                

    doca_storage_comch_to_rdma_zero_copy \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --storage-server {{STORAGE_IP_ADDR}}:{{STORAGE_TCP_PORT}} \ --cpu 0

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_write_data_validity_test \ --run-limit-operation-count 10000 \     --cpu 0

Run Write Throughput Test

  1. Repeat step 1 from the previous section.

  2. Repeat step 2 from the previous section.

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy write_throughput_test \ --run-limit-operation-count 1000000 \ --cpu 0

Run Read Data Validity Test

Info

The file provided to the storage and initiator should have the same content, and a size that matches the storage block-size * the storage block-count.

  1. Run the storage target:

    Copy
    Copied!
                

    doca_storage_target_rdma \ -d {{STORAGE_DEV_PCI_ADDR}} \ --listen-port {{STORAGE_TCP_PORT}} \ --block-size 4096 \ --block-count 1024 \ --binary-content storage_content.txt \     --cpu 0

  2. Run the zero copy service:

    Copy
    Copied!
                

    doca_storage_comch_to_rdma_zero_copy \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --storage-server {{STORAGE_IP_ADDR}}:{{STORAGE_TCP_PORT}} \ --cpu 0

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_only_data_validity_test \ --storage-plain-content storage_content.txt \ --run-limit-operation-count 10000 \     --cpu 0

Run Read Throughput Test

  1. Repeat step 1 from the previous section.

  2. Repeat step 2 from the previous section.

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_throughput_test \ --run-limit-operation-count 1000000 \ --cpu 0

This use case demonstrates how to leverage hardware acceleration for a storage solution that provides transparent compression, recovery, and data mirroring. It uses a combination of the DOCA Comch, DOCA Core, DOCA RDMA, DOCA Erasure Coding and DOCA Compress libraries.

Architecture and Components

The GGA offload use case involves three applications running as five distinct instances:

  • 1x Initiator (doca_storage_initiator_comch): A client/benchmark application that sends read and write requests.

  • 1x Service (doca_storage_comch_to_rdma_gga_offload): A bridge that abstracts the storage details and orchestrates the offload operations (compression, erasure coding).

  • 3x Targets (doca_storage_target_rdma): Three instances of a memory-based storage application, one for each data "shard":

    • Target 1 (data1): Stores 50% of the logical data.

    • Target 2 (data2): Stores the other 50% of the logical data.

    • Target 3 (data_p): Stores parity information for redundancy, allowing recovery if data1 or data2 is lost.

Operational Flow

The use case operates in two main phases.

Setup Phase

The applications use DOCA Comch and three TCP sockets (one for each target) to register with each other, share configuration, and prepare the data path contexts.

storage_ref_gga_offload_arch_control-version-1-modificationdate-1761134488407-api-v2.png

Data Path Phase

Once configured, the contexts are used to perform storage requests and signal completions using doca_comch_producer, doca_comch_consumer, doca_rdma, doca_dma, doca_ec, and doca_compress.

storage_ref_gga_offload_arch_data_path-version-1-modificationdate-1761134487950-api-v2.png

Memory Layout

The memory layout for this use case is more complex:

  • Initiator: Allocates a memory region partitioned into data blocks.

  • Service: Allocates a larger region where each block is twice the size of the initiator's blocks. This provides space to perform compression, decompression, and parity calculations.

  • Storage Targets: Each of the three targets allocates blocks that are 50% of the initiator's block size. All targets must use the same block size and count.

Data is transferred between the targets and the service via RDMA. Cross-PCIe memory access via DOCA mmap (mmap from Export) is used for data transfers between the service (on the DPU) and the initiator (on the host).

The flow of data through this layout is as follows:

  • During Write Operations: DOCA DMA is used to transfer initiator blocks into the service's larger storage blocks (using the cross-PCIe DOCA mmap). The block's content is then compressed, parity data is calculated, and the resulting data (data_1, data_2, data_p) is transferred to the storage targets via RDMA.

  • During Normal Read Operations: Data is transferred from the targets to the service blocks via RDMA. The DOCA mmap is then used to decompress the data from the service blocks directly into the initiator's blocks.

  • During Recovery Read Operations: The service uses RDMA to fetch the valid data half and the parity data into its temporary block space. It combines them to reconstruct the missing data, and the resulting restored data is then decompressed into the initiator's block as normal.

use_case_2_memory_layout-version-1-modificationdate-1761134487573-api-v2.png

Normal Read Flow

The standard read flow, when all targets are available, is as follows:

  1. The Initiator sends a read request to the Service (via doca_comch_producer).

  2. The Service receives the request (via doca_comch_consumer).

  3. The Service sends two IO requests (via doca_rdma) to the data_1 and data_2 Targets.

  4. Both Targets perform RDMA transfers, sending their storage blocks to the Service's blocks.

  5. Both Targets send an IO response to the Service (via doca_rdma).

  6. The Service, upon receiving both responses, examines the block metadata to determine the compressed data size.

  7. The Service decompresses the data (via doca_compress using LZ4), using the cross-PCIe DOCA mmap feature to write the decompressed data directly into the Initiator's memory.

  8. The Service forwards an IO response to the Initiator (via doca_comch_producer).

  9. The Initiator receives the final response (via doca_comch_consumer).

Recovery Read Flow

If a data target is unavailable, the service performs a recovery read:

  1. The Initiator sends a read request to the Service (via doca_comch_producer).

  2. The Service receives the request (via doca_comch_consumer).

  3. The Service sends two IO requests (via doca_rdma) to the available data target and the parity target.

    • To recover data_1: Send requests to data_2 and data_p.

    • To recover data_2: Send requests to data_1 and data_p.

  4. Both Targets perform RDMA transfers to the Service's blocks.

  5. Both Targets send an IO response to the Service (via doca_rdma).

  6. The Service uses doca_ec to reconstruct the missing data block from the available data and parity blocks.

  7. The Service examines the recovered block's metadata to determine the compressed data size.

  8. The Service decompresses the recovered data (via doca_compress), using the cross-PCIe DOCA mmap to write the data directly into the Initiator's memory.

  9. The Service forwards an IO response to the Initiator (via doca_comch_producer).

  10. The Initiator receives the final response (via doca_comch_consumer).

Write Flow

Note

The write path is only available if the doca_storage_comch_to_rdma_gga_offload service is compiled with the liblz4 dev libraries.

  1. The Initiator sends a write request to the Service (via doca_comch_producer).

  2. The Service receives the request (via doca_comch_consumer).

  3. The Service fetches the data block from the Initiator using doca_dma via the cross-PCIe DOCA mmap.

  4. The Service compresses the data and packages it into a block with metadata.

    1. If the data is not compressible enough, an error is returned to the Initiator (via doca_comch_producer).

    2. The Initiator receives the error response (via doca_comch_consumer).

    3. The flow ends.

  5. The Service sends three IO requests (via doca_rdma) to the Targets to write data_1, data_2, and data_p.

  6. The Service waits for all three IO responses.

  7. The Service forwards a final IO response to the Initiator (via doca_comch_producer).

  8. The Initiator receives the response (via doca_comch_consumer).

Running the GGA Offload Use Case

Info

It is assumed that:

  • The service and storage applications can communicate via TCP

  • The service (doca_storage_comch_to_rdma_gga_offload) is deployed on a BlueField-3 DPU

  • Full line-rate RDMA data transfer is possible between the service and the storage targets

Note

All examples use CPU 0. Users should verify if this is optimal by considering NUMA affinity, core pinning, and kernel interrupt distribution.

Execution Order

Running this use case involves running the applications in this specific order:

  1. doca_storage_target_rdma (for data_1)

  2. doca_storage_target_rdma (for data_2)

  3. doca_storage_target_rdma (for data_p)

  4. doca_storage_comch_to_rdma_gga_offload (the service)

  5. doca_storage_initiator_comch (the initiator)

Placeholders

Replace the following placeholders in the commands with actual values:

  • {{INITIATOR_DEV_PCI_ADDR}}

  • {{STORAGE_DEV_PCI_ADDR}}

  • {{STORAGE_IP_ADDR_DATA_1}}

  • {{STORAGE_IP_ADDR_DATA_2}}

  • {{STORAGE_IP_ADDR_DATA_P}}

  • {{STORAGE_TCP_PORT_DATA_1}}

  • {{STORAGE_TCP_PORT_DATA_2}}

  • {{STORAGE_TCP_PORT_DATA_P}}

Generate SBC File

Note

doca_storage_gga_offload_sbc_generator is not compiled if the liblz4 dev libraries are not present.

Copy
Copied!
            

doca_storage_gga_offload_sbc_generator \ -d 3b:00.0 \ --original-input-data storage_content.txt \ --block-size 4096 \ --data-1 data_1.sbc \ --data-2 data_2.sbc \ --data-p data_p.sbc


Run Read Data Validity Test

  1. Run three storage targets:

    Copy
    Copied!
                

    doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_1}} --binary-content data_1.sbc --cpu 0 doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_2}} --binary-content data_2.sbc --cpu 1 doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_P}} --binary-content data_p.sbc --cpu 2

  2. Run the offload service:

    Copy
    Copied!
                

    doca_storage_comch_to_rdma_gga_offload \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --data-1-storage {{STORAGE_IP_ADDR_DATA_1}}:{{STORAGE_TCP_PORT_DATA_1}} \ --data-2-storage {{STORAGE_IP_ADDR_DATA_2}}:{{STORAGE_TCP_PORT_DATA_2}} \ --data-p-storage {{STORAGE_IP_ADDR_DATA_P}}:{{STORAGE_TCP_PORT_DATA_P}} \ --trigger-recovery-read-every-n 0 \ --cpu 0

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_only_data_validity_test \ --storage-plain-content storage_content.txt \ --run-limit-operation-count 20000 \ --cpu 0

Run Read Throughput Test

  1. Repeat step 1 from the "Read Data Validity Test".

  2. Repeat step 2 from the "Read Data Validity Test".

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_throughput_test\ --storage-plain-content storage_content.txt \ --run-limit-operation-count 1000000 \ --cpu 0

Run Write Data Validity Test

  1. Run three storage targets:

    Copy
    Copied!
                

    doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_1}} --binary-content data_1.sbc --cpu 0 doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_2}} --binary-content data_2.sbc --cpu 1 doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_P}} --binary-content data_p.sbc --cpu 2

  2. Run the offload service:

    Copy
    Copied!
                

    doca_storage_comch_to_rdma_gga_offload \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --data-1-storage {{STORAGE_IP_ADDR_DATA_1}}:{{STORAGE_TCP_PORT_DATA_1}} \ --data-2-storage {{STORAGE_IP_ADDR_DATA_2}}:{{STORAGE_TCP_PORT_DATA_2}} \ --data-p-storage {{STORAGE_IP_ADDR_DATA_P}}:{{STORAGE_TCP_PORT_DATA_P}} \ --cpu 0

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_write_data_validity_test \ --storage-plain-content storage_content.txt \ --run-limit-operation-count 20000 \ --cpu 0

Run Write Throughput Test

  1. Repeat step 1 from the "Run Write Data Validity Test".

  2. Repeat step 2 from the "Run Write Data Validity Test".

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy write_throughput_test \ --run-limit-operation-count 50000 \ --cpu 0

This application leverages the following DOCA libraries:

Refer to their respective programming guide for more information.

Refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.

The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application. For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.

The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/storage/.

Warning

doca_storage_gga_offload_sbc_generator is not compiled when liblz4 dev libraries are not present.

Note

The write path is only available if the doca_storage_comch_to_rdma_gga_offload service is compiled with the liblz4 dev libraries.

Compiling All Applications

All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.

To build all the applications together, run:

Copy
Copied!
            

cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

The storage zero copy applications are:

  • doca_storage_comch_to_rdma_gga_offload

  • doca_storage_comch_to_rdma_zero_copy

  • doca_storage_gga_offload_sbc_generator

  • doca_storage_initiator_comch

  • doca_storage_target_rdma

The applications are built under /tmp/build/storage/.

Compiling Storage Applications Only

To directly build only the storage applications:

Copy
Copied!
            

cd /opt/mellanox/doca/applications/ meson /tmp/build -Denable_all_applications=false -Denable_storage=true ninja -C /tmp/build

The storage applications are built under /tmp/build/storage/.

Alternatively, you can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:

  1. Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:

    • Set enable_all_applications to false

    • Set enable_storage to true

  2. Run the following compilation commands :

    Copy
    Copied!
                

    cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

    The storage applications are created under /tmp/build/storage/.

Troubleshooting

Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide if you encounter any issues with the compilation of the application.

Refer to each use case for details on how to run the applications:

© Copyright 2025, NVIDIA. Last updated on Nov 20, 2025