What can I help you with?
DOCA Documentation v3.0.0

DOCA Storage Applications

DOCA storage applications are designed to demonstrate how to develop a data storage implementation on top of the NVIDIA BlueField platform.

The storage reference consists of several applications that can be orchestrated in different combinations to support varying storage approaches and implementations. Each application is described at a high level below, followed by detailed use case examples highlighting how these applications can be combined.

doca_storage_initiator_comch

The doca_storage_initiator_comch application serves two purposes. First, it acts as the user/client of the storage reference application that controls the execution flow. Second, it includes benchmarking capabilities to demonstrate the achievable performance of the storage solution. The application supports the following tests:

  • Read throughput performance test

  • Write throughput performance test

  • Read-only data validation test

  • Write-then-read data validation test

doca_storage_target_rdma

The doca_storage_target_rdma application provides a simple storage backend by utilizing a block of memory instead of interfacing with physical non-volatile storage. It includes the following features:

  • Configurable block count

  • Configurable block size

  • Import of initial data from a text file

  • Import of initial data from a binary format (.sbc)

doca_storage_comch_to_rdma_zero_copy

The doca_storage_comch_to_rdma_zero_copy application serves as a bridge between the initiator and a single storage target. It facilitates the exchange of the control path messages and data path messages between the initiator and the target. The data being exchanged between the initiator and the target is never visible in this application as is exchanged directly between the initiator and the target via RDMA which is it is referred to as zero copy. It offers the following features:

  • Facilitates the zero copy transfer of data from initiator to target

doca_storage_comch_to_rdma_gga_offload

The doca_storage_comch_to_rdma_gga_offload application serves as a bridge between the initiator and three storage targets. It is active in the transfer of data and uses doca_libs to accelerate some data operations transparently to the initiator and targets. It offers the following features:

  • Transparent data redundancy and error correction

  • Inline decompression of stored data

  • Inline recovery of lost data blocks

  • Configuration of how frequently a simulated data loss occurs

Note

Currently, doca_storage_comch_to_rdma_gga_offload supports only read operations.


doca_storage_gga_offload_sbc_generator

The doca_storage_gga_offload_sbc_generator application is used to generate Storage Binary Content (.sbc) files for use by doca_storage_target_rdma when used with doca_storage_comch_to_rdma_gga_offload application. The .sbc file contains a minimum viable format for the requirements of these demonstration applications.

The generator takes a user-provided file and a specified block size, compresses and chunks the data, and generates error correction blocks. It produces the following output files (named by the user):

  • Data1 file (data_1)

  • Data2 file (data_2)

  • Data-parity file (data_p)

These files can be used to initialize the storage with valid data upon startup.

The Zero Copy use case demonstrates the use of the DOCA Comch, DOCA Core, and DOCA RDMA libraries to implement a simple, hardware-accelerated data storage solution that avoids unnecessary data copying. In this configuration, data is transferred directly between the initiator and the storage target via DOCA RDMA, with no awareness of the target required by the initiator.

This use case comprises the following three applications:

  • Initiator – doca_storage_initiator_comch: A simple benchmark/client used to configure and interact with the storage backend via read and write operations.

  • Target – doca_storage_target_rdma: A memory-based storage application that performs RDMA operations to fulfill the initiator's requests.

  • Service – doca_storage_comch_to_rdma_zero_copy: A bridge that abstracts the target details from the initiator and relays IO messages.

use_case_1_high_level_design-version-4-modificationdate-1744388754977-api-v2.png

The memory layout is straightforward: both the initiator and target allocate memory regions partitioned into blocks. RDMA read and write operations transfer the data between these memory regions.

use_case_1_memory_layout-version-3-modificationdate-1744388891007-api-v2.png

The data flow for this use case includes the following steps:

  1. The initiator sends a read or write IO request to the service via doca_comch_producer.

  2. The service receives the IO request via doca_comch_consumer.

  3. The service forwards the request to the target via doca_rdma.

  4. The target performs an RDMA read or write between its memory and the initiator’s memory.

  5. The target sends an IO response message to the service via doca_rdma.

  6. The service receives the IO response message.

  7. The service forwards the message to the initiator via doca_comch_producer.

  8. The initiator receives the response via doca_comch_consumer.

Running the Zero Copy Use Case

Info

Assumptions:

  • The service and storage applications can communicate via TCP

  • The service and storage applications are deployed on BlueField-3 DPUs

  • Full line-rate RDMA data transfer is supported between service and storage

Note

The examples below use CPU 0. Users should ensure that this CPU is suitable, considering NUMA configuration, pinned cores, kernel interrupts, etc.

Run the applications in the following order:

  1. doca_storage_target_rdma

  2. doca_storage_comch_to_rdma_zero_copy

  3. doca_storage_initiator_comch

Replace the following placeholders with actual values:

  • {{INITIATOR_DEV_PCI_ADDR}}

  • {{STORAGE_DEV_PCI_ADDR}}

  • {{STORAGE_IP_ADDR}}

  • {{STORAGE_TCP_PORT}}

Run Write Data Validity Test

  1. Run the storage target:

    Copy
    Copied!
                

    doca_storage_target_rdma \ -d {{STORAGE_DEV_PCI_ADDR}} \ --listen-port {{STORAGE_TCP_PORT}} \ --block-size 4096 \ --block-count 64 \ --cpu 0

  2. Run the zero copy service:

    Copy
    Copied!
                

    doca_storage_comch_to_rdma_zero_copy \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --storage-server {{STORAGE_IP_ADDR}}:{{STORAGE_TCP_PORT}} \ --cpu 0

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_write_data_validity_test \ --task-count 64 \ --batch-size 4 \ --cpu 0

Run Write Throughput Test

  1. Repeat step 1 from the previous section.

  2. Repeat step 2 from the previous section.

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy write_throughput_test \ --run-limit-operation-count 1000000 \     --task-count 64 \ --batch-size 4 \ --cpu 0

Run Read Data Validity Test

Info

The file provided to the storage and initiator should have the same content, and a size that matches the storage block-size * the storage block-count.

  1. Run the storage target:

    Copy
    Copied!
                

    doca_storage_target_rdma \ -d {{STORAGE_DEV_PCI_ADDR}} \ --listen-port {{STORAGE_TCP_PORT}} \ --block-size 4096 \ --block-count 64 \ --cpu 0

  2. Run the zero copy service:

    Copy
    Copied!
                

    doca_storage_comch_to_rdma_zero_copy \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --storage-server {{STORAGE_IP_ADDR}}:{{STORAGE_TCP_PORT}} \ --cpu 0

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_only_data_validity_test \ --storage-plain-content storage_content.txt \ --task-count 64 \ --batch-size 4 \ --cpu 0

Run Read Throughput Test

  1. Repeat step 1 from the previous section.

  2. Repeat step 2 from the previous section.

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_throughput_test \ --run-limit-operation-count 1000000 \ --task-count 64 \ --batch-size 4 \ --cpu 0

The GGA offload use case demonstrates how to use the DOCA Comch, DOCA Core, DOCA RDMA, DOCA Erasure Coding and DOCA Compress libraries to leverage hardware acceleration for a storage solution that provides transparent compression, recovery, and data mirroring.

The GGA offload use case involves the following applications:

  • Initiator – doca_storage_initiator_comch: A simple benchmark/client used to configure and interact with the storage by issuing read or write operations.

  • Target – doca_storage_target_rdma: A basic storage application that uses memory as a backend instead of a physical disk. It performs RDMA operations to fulfill the initiator’s requests.

  • Service – doca_storage_comch_to_rdma_gga_offload: A service that abstracts the initiator from the underlying storage target details and implementation.

Application Deployment

The deployment includes five instances:

  • One instance of doca_storage_initiator_comch

  • One instance of doca_storage_comch_to_rdma_gga_offload

  • Three instances of doca_storage_target_rdma serving as data1, data2, and parity storage targets

Each storage instance holds either data1, data2, or parity content.

use_case_2_high_level_design-version-4-modificationdate-1744389079487-api-v2.png

The memory layout for this use case is more complex. The initiator allocates a memory region partitioned into data blocks. The service allocates a larger region divided into half-blocks (data1, data2, and parity). Each target allocates a memory region sized as (block count × block size) to represent a full storage replica. RDMA is used to transfer data between targets and the service, while cross-PCIe memory access via DOCA mmap (DOCA Core: mmap from export) is used to decompress recovered or intact blocks into the initiator's memory. During recovery, the service combines data1 and parity to reconstruct data2 (or vice versa). The resulting data is then decompressed into the initiator's memory.

use_case_2_memory_layout-version-3-modificationdate-1744389166737-api-v2.png

Normal Read Flow

The normal read data flow for the GGA offload use-case consists of the following steps:

  1. The initiator sends a read/write request to the service via doca_comch_producer.

  2. The service receives the request via doca_comch_consumer.

  3. The service sends IO requests to both data1 and data2 storage targets via doca_rdma.

  4. Each storage target performs an RDMA transfer from storage memory to initiator memory.

  5. Each target sends an IO response to the service via doca_rdma.

  6. Upon receiving both responses, the service examines the binary block headers to determine compressed sizes.

  7. The service decompresses the data via doca_compress using LZ4.

  8. The service forwards an IO response to the initiator via doca_comch_producer.

  9. The initiator receives the response via doca_comch_consumer.

Recovery Read Flow

The recovery read data flow for the GGA offload use-case consists of the following steps:

  1. The initiator sends a read/write request to the service via doca_comch_producer.

  2. The service receives the request via doca_comch_consumer.

  3. The service creates two IO requests to either data_1 and data_p or data_2 and data_p storage targets via doca_rdma.

    • To recover data_1, send requests to data_2 and data_p

    • To recover data_2, send requests to data_1 and data_p

  4. Each storage performs a RDMA write from storage memory to the initiator memory.

  5. Each storage sends an IO message response to the service via doca_rdma.

  6. The service receives both IO message responses via doca_rdma.

  7. The service uses doca_ec to reconstruct the missing half block from the available data and parity. It then inspects the block header to determine the compressed size.

  8. The service decompresses the data via doca_compress using LZ4.

  9. The service forwards the IO response to the initiator via doca_comch_producer.

  10. The initiator receives the response via doca_comch_consumer.

Running the GGA Offload Use Case

Info

It is assumed that:

  • The service and storage targets communicate via TCP

  • The applications run on BlueField-3 DPUs

  • Full-rate RDMA data transfer is achievable between service and targets

Note

All examples use CPU 0. Users should verify whether this is optimal by considering NUMA affinity, core pinning, and kernel interrupt distribution.

Running this use-case involves running the following applications in order:

  1. doca_storage_target_rdma for data_1

  2. doca_storage_target_rdma for data_2

  3. doca_storage_target_rdma for data_p

  4. doca_storage_comch_to_rdma_gga_offload

  5. doca_storage_initiator_comch

Replace the following placeholders with actual values

  • {{INITIATOR_DEV_PCI_ADDR}}

  • {{STORAGE_DEV_PCI_ADDR}}

  • {{STORAGE_IP_ADDR_DATA_1}}

  • {{STORAGE_IP_ADDR_DATA_2}}

  • {{STORAGE_IP_ADDR_DATA_P}}

  • {{STORAGE_TCP_PORT_DATA_1}}

  • {{STORAGE_TCP_PORT_DATA_2}}

  • {{STORAGE_TCP_PORT_DATA_P}}

Generate SBC File

Note

doca_storage_gga_offload_sbc_generator is compiled only if liblz4 dev libraries are present.

Copy
Copied!
            

doca_storage_gga_offload_sbc_generator \ -d 3b:00.0 \ --original-input-data storage_content.txt \ --block-size 4096 \ --data-1 data_1.sbc \ --data-2 data_2.sbc \ --data-p data_p.sbc


Run Read Data Validity Test

  1. Run three storage targets:

    Copy
    Copied!
                

    doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_1}} --binary-content data_1.sbc --cpu 0 doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_2}} --binary-content data_2.sbc --cpu 0 doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_P}} --binary-content data_p.sbc --cpu 0

  2. Run the offload service:

    Copy
    Copied!
                

    doca_storage_comch_to_rdma_gga_offload \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --data-1-storage {{STORAGE_IP_ADDR_DATA_1}}:{{STORAGE_TCP_PORT_DATA_1}} \ --data-2-storage {{STORAGE_IP_ADDR_DATA_2}}:{{STORAGE_TCP_PORT_DATA_2}} \ --data-p-storage {{STORAGE_IP_ADDR_DATA_P}}:{{STORAGE_TCP_PORT_DATA_P}} \ --trigger-recovery-read-every-n 0 \ --cpu 0

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_only_data_validity_test \ --storage-plain-content storage_content.txt \ --task-count 64 \ --batch-size 4 \ --cpu 0

Run Read Throughput Test

  1. Repeat step 1 from the previous section.

  2. Repeat step 2 from the previous section.

  3. Run the initiator:

    Copy
    Copied!
                

    doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_throughput_test \ --run-limit-operation-count 1000000 \ --task-count 64 \ --batch-size 4 \ --cpu 0

This application leverages the following DOCA libraries:

Refer to their respective programming guide for more information.

Refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.

The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application. For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.

The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/storage/.

Warning

doca_storage_gga_offload_sbc_generator is compiled only if liblz4 dev libraries are present.

Compiling All Applications

All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.

To build all the applications together, run:

Copy
Copied!
            

cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

The storage zero copy applications are:

  • doca_storage_comch_to_rdma_gga_offload

  • doca_storage_comch_to_rdma_zero_copy

  • doca_storage_gga_offload_sbc_generator

  • doca_storage_initiator_comch

  • doca_storage_target_rdma

The applications are built under /tmp/build/storage/.

Compiling Storage Applications Only

To directly build only the storage applications:

Copy
Copied!
            

cd /opt/mellanox/doca/applications/ meson /tmp/build -Denable_all_applications=false -Denable_storage=true ninja -C /tmp/build

The storage applications are built under /tmp/build/storage/.

Alternatively, you can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:

  1. Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:

    • Set enable_all_applications to false

    • Set enable_storage to true

  2. Run the following compilation commands :

    Copy
    Copied!
                

    cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build

    The storage applications are created under /tmp/build/storage/.

Troubleshooting

Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide if you encounter any issues with the compilation of the application.

Refer to each use case for details on how to run the applications:

© Copyright 2025, NVIDIA. Last updated on May 22, 2025.