DOCA Storage Applications
DOCA storage applications are designed to demonstrate how to develop a data storage implementation on top of the NVIDIA BlueField platform.
The storage reference consists of several applications that can be orchestrated in different combinations to support varying storage approaches and implementations. Each application is described at a high level below, followed by detailed use case examples highlighting how these applications can be combined.
doca_storage_initiator_comch
The doca_storage_initiator_comch application serves two purposes. First, it acts as the user/client of the storage reference application that controls the execution flow. Second, it includes benchmarking capabilities to demonstrate the achievable performance of the storage solution. The application supports the following tests:
Read throughput performance test
Write throughput performance test
Read-only data validation test
Write-then-read data validation test
doca_storage_target_rdma
The doca_storage_target_rdma application provides a simple storage backend by utilizing a block of memory instead of interfacing with physical non-volatile storage. It includes the following features:
Configurable block count
Configurable block size
Import of initial data from a text file
Import of initial data from a binary format (.sbc)
doca_storage_comch_to_rdma_zero_copy
The doca_storage_comch_to_rdma_zero_copy application serves as a bridge between the initiator and a single storage target. It facilitates the exchange of the control path messages and data path messages between the initiator and the target. The data being exchanged between the initiator and the target is never visible in this application as is exchanged directly between the initiator and the target via RDMA which is it is referred to as zero copy. It offers the following features:
Facilitates the zero copy transfer of data from initiator to target
doca_storage_comch_to_rdma_gga_offload
The doca_storage_comch_to_rdma_gga_offload application serves as a bridge between the initiator and three storage targets. It is active in the transfer of data and uses doca_libs
to accelerate some data operations transparently to the initiator and targets. It offers the following features:
Transparent data redundancy and error correction
Inline decompression of stored data
Inline recovery of lost data blocks
Configuration of how frequently a simulated data loss occurs
Currently, doca_storage_comch_to_rdma_gga_offload
supports only read operations.
doca_storage_gga_offload_sbc_generator
The doca_storage_gga_offload_sbc_generator application is used to generate Storage Binary Content (.sbc
) files for use by doca_storage_target_rdma
when used with doca_storage_comch_to_rdma_gga_offload
application. The .sbc
file contains a minimum viable format for the requirements of these demonstration applications.
The generator takes a user-provided file and a specified block size, compresses and chunks the data, and generates error correction blocks. It produces the following output files (named by the user):
Data1 file (
data_1
)Data2 file (
data_2
)Data-parity file (
data_p
)
These files can be used to initialize the storage with valid data upon startup.
The Zero Copy use case demonstrates the use of the DOCA Comch, DOCA Core, and DOCA RDMA libraries to implement a simple, hardware-accelerated data storage solution that avoids unnecessary data copying. In this configuration, data is transferred directly between the initiator and the storage target via DOCA RDMA, with no awareness of the target required by the initiator.
This use case comprises the following three applications:
Initiator –
doca_storage_initiator_comch
: A simple benchmark/client used to configure and interact with the storage backend via read and write operations.Target –
doca_storage_target_rdma
: A memory-based storage application that performs RDMA operations to fulfill the initiator's requests.Service –
doca_storage_comch_to_rdma_zero_copy
: A bridge that abstracts the target details from the initiator and relays IO messages.

The memory layout is straightforward: both the initiator and target allocate memory regions partitioned into blocks. RDMA read and write operations transfer the data between these memory regions.

The data flow for this use case includes the following steps:
The initiator sends a read or write IO request to the service via
doca_comch_producer
.The service receives the IO request via
doca_comch_consumer
.The service forwards the request to the target via
doca_rdma
.The target performs an RDMA read or write between its memory and the initiator’s memory.
The target sends an IO response message to the service via
doca_rdma
.The service receives the IO response message.
The service forwards the message to the initiator via
doca_comch_producer
.The initiator receives the response via
doca_comch_consumer
.
Running the Zero Copy Use Case
Assumptions:
The service and storage applications can communicate via TCP
The service and storage applications are deployed on BlueField-3 DPUs
Full line-rate RDMA data transfer is supported between service and storage
The examples below use CPU 0. Users should ensure that this CPU is suitable, considering NUMA configuration, pinned cores, kernel interrupts, etc.
Run the applications in the following order:
doca_storage_target_rdma
doca_storage_comch_to_rdma_zero_copy
doca_storage_initiator_comch
Replace the following placeholders with actual values:
{{INITIATOR_DEV_PCI_ADDR}}
{{STORAGE_DEV_PCI_ADDR}}
{{STORAGE_IP_ADDR}}
{{STORAGE_TCP_PORT}}
Run Write Data Validity Test
Run the storage target:
doca_storage_target_rdma \ -d {{STORAGE_DEV_PCI_ADDR}} \ --listen-port {{STORAGE_TCP_PORT}} \ --block-size 4096 \ --block-count 64 \ --cpu 0
Run the zero copy service:
doca_storage_comch_to_rdma_zero_copy \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --storage-server {{STORAGE_IP_ADDR}}:{{STORAGE_TCP_PORT}} \ --cpu 0
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_write_data_validity_test \ --task-count 64 \ --batch-size 4 \ --cpu 0
Run Write Throughput Test
Repeat step 1 from the previous section.
Repeat step 2 from the previous section.
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy write_throughput_test \ --run-limit-operation-count 1000000 \ --task-count 64 \ --batch-size 4 \ --cpu 0
Run Read Data Validity Test
The file provided to the storage and initiator should have the same content, and a size that matches the storage block-size * the storage block-count
.
Run the storage target:
doca_storage_target_rdma \ -d {{STORAGE_DEV_PCI_ADDR}} \ --listen-port {{STORAGE_TCP_PORT}} \ --block-size 4096 \ --block-count 64 \ --cpu 0
Run the zero copy service:
doca_storage_comch_to_rdma_zero_copy \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --storage-server {{STORAGE_IP_ADDR}}:{{STORAGE_TCP_PORT}} \ --cpu 0
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_only_data_validity_test \ --storage-plain-content storage_content.txt \ --task-count 64 \ --batch-size 4 \ --cpu 0
Run Read Throughput Test
Repeat step 1 from the previous section.
Repeat step 2 from the previous section.
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_throughput_test \ --run-limit-operation-count 1000000 \ --task-count 64 \ --batch-size 4 \ --cpu 0
The GGA offload use case demonstrates how to use the DOCA Comch, DOCA Core, DOCA RDMA, DOCA Erasure Coding and DOCA Compress libraries to leverage hardware acceleration for a storage solution that provides transparent compression, recovery, and data mirroring.
The GGA offload use case involves the following applications:
Initiator –
doca_storage_initiator_comch
: A simple benchmark/client used to configure and interact with the storage by issuing read or write operations.Target –
doca_storage_target_rdma
: A basic storage application that uses memory as a backend instead of a physical disk. It performs RDMA operations to fulfill the initiator’s requests.Service –
doca_storage_comch_to_rdma_gga_offload
: A service that abstracts the initiator from the underlying storage target details and implementation.
Application Deployment
The deployment includes five instances:
One instance of
doca_storage_initiator_comch
One instance of
doca_storage_comch_to_rdma_gga_offload
Three instances of
doca_storage_target_rdma
serving as data1, data2, and parity storage targets
Each storage instance holds either data1, data2, or parity content.

The memory layout for this use case is more complex. The initiator allocates a memory region partitioned into data blocks. The service allocates a larger region divided into half-blocks (data1, data2, and parity). Each target allocates a memory region sized as (block count × block size) to represent a full storage replica. RDMA is used to transfer data between targets and the service, while cross-PCIe memory access via DOCA mmap (DOCA Core: mmap from export) is used to decompress recovered or intact blocks into the initiator's memory. During recovery, the service combines data1 and parity to reconstruct data2 (or vice versa). The resulting data is then decompressed into the initiator's memory.

Normal Read Flow
The normal read data flow for the GGA offload use-case consists of the following steps:
The initiator sends a read/write request to the service via
doca_comch_producer
.The service receives the request via
doca_comch_consumer
.The service sends IO requests to both
data1
anddata2
storage targets viadoca_rdma
.Each storage target performs an RDMA transfer from storage memory to initiator memory.
Each target sends an IO response to the service via
doca_rdma
.Upon receiving both responses, the service examines the binary block headers to determine compressed sizes.
The service decompresses the data via
doca_compress
using LZ4.The service forwards an IO response to the initiator via
doca_comch_producer
.The initiator receives the response via
doca_comch_consumer
.
Recovery Read Flow
The recovery read data flow for the GGA offload use-case consists of the following steps:
The initiator sends a read/write request to the service via
doca_comch_producer
.The service receives the request via
doca_comch_consumer
.The service creates two IO requests to either
data_1
anddata_p
ordata_2
anddata_p
storage targets viadoca_rdma
.To recover
data_1
, send requests todata_2
anddata_p
To recover
data_2
, send requests todata_1
anddata_p
Each storage performs a RDMA write from storage memory to the initiator memory.
Each storage sends an IO message response to the service via
doca_rdma
.The service receives both IO message responses via
doca_rdma
.The service uses
doca_ec
to reconstruct the missing half block from the available data and parity. It then inspects the block header to determine the compressed size.The service decompresses the data via
doca_compress
using LZ4.The service forwards the IO response to the initiator via
doca_comch_producer
.The initiator receives the response via
doca_comch_consumer
.
Running the GGA Offload Use Case
It is assumed that:
The service and storage targets communicate via TCP
The applications run on BlueField-3 DPUs
Full-rate RDMA data transfer is achievable between service and targets
All examples use CPU 0. Users should verify whether this is optimal by considering NUMA affinity, core pinning, and kernel interrupt distribution.
Running this use-case involves running the following applications in order:
doca_storage_target_rdma
fordata_1
doca_storage_target_rdma
fordata_2
doca_storage_target_rdma
fordata_p
doca_storage_comch_to_rdma_gga_offload
doca_storage_initiator_comch
Replace the following placeholders with actual values
{{INITIATOR_DEV_PCI_ADDR}}
{{STORAGE_DEV_PCI_ADDR}}
{{STORAGE_IP_ADDR_DATA_1}}
{{STORAGE_IP_ADDR_DATA_2}}
{{STORAGE_IP_ADDR_DATA_P}}
{{STORAGE_TCP_PORT_DATA_1}}
{{STORAGE_TCP_PORT_DATA_2}}
{{STORAGE_TCP_PORT_DATA_P}}
Generate SBC File
doca_storage_gga_offload_sbc_generator
is compiled only if liblz4
dev libraries are present.
doca_storage_gga_offload_sbc_generator \
-d 3b:00.0
\
--original-input-data storage_content.txt \
--block-size 4096
\
--data-1
data_1.sbc \
--data-2
data_2.sbc \
--data-p data_p.sbc
Run Read Data Validity Test
Run three storage targets:
doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_1}} --binary-content data_1.sbc --cpu
0
doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_2}} --binary-content data_2.sbc --cpu0
doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_P}} --binary-content data_p.sbc --cpu0
Run the offload service:
doca_storage_comch_to_rdma_gga_offload \ -d
03
:00.0
\ -r {{INITIATOR_DEV_PCI_ADDR}} \ --data-1
-storage {{STORAGE_IP_ADDR_DATA_1}}:{{STORAGE_TCP_PORT_DATA_1}} \ --data-2
-storage {{STORAGE_IP_ADDR_DATA_2}}:{{STORAGE_TCP_PORT_DATA_2}} \ --data-p-storage {{STORAGE_IP_ADDR_DATA_P}}:{{STORAGE_TCP_PORT_DATA_P}} \ --trigger-recovery-read-every-n0
\ --cpu0
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_only_data_validity_test \ --storage-plain-content storage_content.txt \ --task-count
64
\ --batch-size4
\ --cpu0
Run Read Throughput Test
Repeat step 1 from the previous section.
Repeat step 2 from the previous section.
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_throughput_test \ --run-limit-operation-count 1000000 \ --task-count 64 \ --batch-size 4 \ --cpu 0
This application leverages the following DOCA libraries:
Refer to their respective programming guide for more information.
Refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.
The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application. For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.
The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/storage/
.
doca_storage_gga_offload_sbc_generator
is compiled only if liblz4
dev libraries are present.
Compiling All Applications
All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.
To build all the applications together, run:
cd /opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build
The storage zero copy applications are:
doca_storage_comch_to_rdma_gga_offload
doca_storage_comch_to_rdma_zero_copy
doca_storage_gga_offload_sbc_generator
doca_storage_initiator_comch
doca_storage_target_rdma
The applications are built under /tmp/build/storage/
.
Compiling Storage Applications Only
To directly build only the storage applications:
cd /opt/mellanox/doca/applications/
meson /tmp/build -Denable_all_applications=false
-Denable_storage=true
ninja -C /tmp/build
The storage applications are built under /tmp/build/storage/
.
Alternatively, you can set the desired flags in the meson_options.txt
file instead of providing them in the compilation command line:
Edit the following flags in
/opt/mellanox/doca/applications/meson_options.txt
:Set
enable_all_applications
tofalse
Set
enable_storage
totrue
Run the following compilation commands :
cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build
The storage applications are created under
/tmp/build/storage/
.
Troubleshooting
Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide if you encounter any issues with the compilation of the application.
Refer to each use case for details on how to run the applications: