DOCA Storage Applications
This page outlines NVIDIA DOCA storage applications that demonstrate how to develop data storage implementations on the NVIDIA® BlueField® platform.
The DOCA storage applications serve as a reference to demonstrate how a block storage solution can be developed using the DOCA framework on the NVIDIA BlueField platform.
These applications are simplified to make the flow and overall process easy to understand. A real-world storage solution involves significantly more complexity, and solutions must be tailored to specific use cases, which is not attempted in this reference design.
The general architecture involves an initiator host application sending storage requests to a local DPU application. The DPU then interfaces with a remote storage target using RDMA, orchestrating the storage and retrieval of data directly to or from the initiator application's memory.
The storage reference consists of several applications that can be orchestrated in different combinations to support varying storage approaches and implementations. Each application is described at a high level below, followed by detailed use case examples highlighting how these applications can be combined.
doca_storage_initiator_comch
The doca_storage_initiator_comch application serves two purposes. First, it acts as the user/client of the storage reference application that controls the execution flow. Second, it includes benchmarking capabilities to demonstrate the achievable performance of the storage solution. The application supports the following tests:
Read throughput performance test
Write throughput performance test
Read-only data validation test
Write-then-read data validation test
doca_storage_target_rdma
The doca_storage_target_rdma application provides a simple storage backend by utilizing a block of memory instead of interfacing with physical non-volatile storage. It includes the following features:
Configurable block count
Configurable block size
Import of initial data from a text file
Import of initial data from a binary format (.sbc)
doca_storage_comch_to_rdma_zero_copy
The doca_storage_comch_to_rdma_zero_copy application serves as a bridge between the initiator and a single storage target. It facilitates the exchange of the control path messages and data path messages between the initiator and the target. The data being exchanged between the initiator and the target is never visible in this application as is exchanged directly between the initiator and the target via RDMA which is it is referred to as zero copy. It offers the following features:
Facilitates the zero copy transfer of data from initiator to target
doca_storage_comch_to_rdma_gga_offload
The doca_storage_comch_to_rdma_gga_offload application serves as a bridge between the initiator and three storage targets. It is active in the transfer of data and uses doca_libs to accelerate some data operations transparently to the initiator and targets. It offers the following features:
Transparent data redundancy and error correction
Inline compression and decompression of stored data
Inline creation of redundancy data
Inline recovery of lost data blocks
Configuration of how frequently a simulated data loss occurs
doca_storage_gga_offload_sbc_generator
The doca_storage_gga_offload_sbc_generator application is used to generate Storage Binary Content (.sbc) files. These files are consumed by the doca_storage_target_rdma application when it is used with the doca_storage_comch_to_rdma_gga_offload application.
Using a user-supplied data file, the generator:
Splits the input data into chunks.
Compresses each chunk.
Creates error correction blocks for each compressed block.
The output is stored in three files (named by the user), which can be used to initialize the storage with valid data upon startup:
Data1 file (
data_1)Data2 file (
data_2)Data-parity file (
data_p)
This use case demonstrates a simple, hardware-accelerated storage solution that avoids unnecessary data copying by using the DOCA Comch, DOCA Core, and DOCA RDMA libraries. Data is transferred directly between the initiator's memory and the storage target's memory via DOCA RDMA. The initiator has no awareness of the target; it only communicates with the service.
Application Components
This use case involves three distinct applications:
Initiator (
doca_storage_initiator_comch): A client/benchmark application that sends read and write requests to the storage backend.Service (
doca_storage_comch_to_rdma_zero_copy): A bridge that abstracts the target's details from the initiator. It relays IO messages between the two.Target (
doca_storage_target_rdma): A memory-based storage application that executes RDMA operations to fulfill the initiator's requests.
Operational Flow
The use case operates in two main phases.
Setup Phase
The applications first use DOCA Comch and a TCP socket to register with each other, share configuration, and prepare the data path operations.
Data Path Phase
Once configured, the data path contexts (doca_comch_producer, doca_comch_consumer, and doca_rdma) are used to perform the actual storage requests and send completion responses.
Memory Layout
The memory layout is simple: both the initiator and the target allocate memory regions partitioned into blocks. All RDMA read/write operations transfer data directly between these memory regions.
The number of blocks on the initiator and target does not need to match, but their block size must be identical.
Data Flow
The data flow for a storage operation is as follows:
The Initiator sends a read/write IO request to the Service (via
doca_comch_producer).The Service receives the request (via
doca_comch_consumer).The Service forwards the request to the Target (via
doca_rdma).The Target performs the RDMA read or write operation, transferring data directly between its memory and the Initiator's memory.
The Target sends an IO response (completion) to the Service (via
doca_rdma).The Service receives the IO response.
The Service forwards the response to the Initiator (via
doca_comch_producer).The Initiator receives the final response (via
doca_comch_consumer).
Running the Zero Copy Use Case
Assumptions:
The service and storage applications can communicate via TCP
The service is deployed on a BlueField-3 DPU
Full line-rate RDMA data transfer is possible between service and storage
The examples below use CPU 0. Users should ensure that this CPU is suitable, considering NUMA configuration, pinned cores, kernel interrupts, etc.
Run the applications in the following order:
doca_storage_target_rdmadoca_storage_comch_to_rdma_zero_copydoca_storage_initiator_comch
Replace the following placeholders with actual values:
{{INITIATOR_DEV_PCI_ADDR}}{{STORAGE_DEV_PCI_ADDR}}{{STORAGE_IP_ADDR}}{{STORAGE_TCP_PORT}}
Run Write Data Validity Test
Run the storage target:
doca_storage_target_rdma \ -d {{STORAGE_DEV_PCI_ADDR}} \ --listen-port {{STORAGE_TCP_PORT}} \ --block-size 4096 \ --block-count 1024 \ --cpu 0
Run the zero copy service:
doca_storage_comch_to_rdma_zero_copy \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --storage-server {{STORAGE_IP_ADDR}}:{{STORAGE_TCP_PORT}} \ --cpu 0
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_write_data_validity_test \ --run-limit-operation-count 10000 \ --cpu 0
Run Write Throughput Test
Repeat step 1 from the previous section.
Repeat step 2 from the previous section.
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy write_throughput_test \ --run-limit-operation-count 1000000 \ --cpu 0
Run Read Data Validity Test
The file provided to the storage and initiator should have the same content, and a size that matches the storage block-size * the storage block-count.
Run the storage target:
doca_storage_target_rdma \ -d {{STORAGE_DEV_PCI_ADDR}} \ --listen-port {{STORAGE_TCP_PORT}} \ --block-size 4096 \ --block-count 1024 \ --binary-content storage_content.txt \ --cpu 0
Run the zero copy service:
doca_storage_comch_to_rdma_zero_copy \ -d 03:00.0 \ -r {{INITIATOR_DEV_PCI_ADDR}} \ --storage-server {{STORAGE_IP_ADDR}}:{{STORAGE_TCP_PORT}} \ --cpu 0
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_only_data_validity_test \ --storage-plain-content storage_content.txt \ --run-limit-operation-count 10000 \ --cpu 0
Run Read Throughput Test
Repeat step 1 from the previous section.
Repeat step 2 from the previous section.
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_throughput_test \ --run-limit-operation-count 1000000 \ --cpu 0
This use case demonstrates how to leverage hardware acceleration for a storage solution that provides transparent compression, recovery, and data mirroring. It uses a combination of the DOCA Comch, DOCA Core, DOCA RDMA, DOCA Erasure Coding and DOCA Compress libraries.
Architecture and Components
The GGA offload use case involves three applications running as five distinct instances:
1x Initiator (
doca_storage_initiator_comch): A client/benchmark application that sends read and write requests.1x Service (
doca_storage_comch_to_rdma_gga_offload): A bridge that abstracts the storage details and orchestrates the offload operations (compression, erasure coding).3x Targets (
doca_storage_target_rdma): Three instances of a memory-based storage application, one for each data "shard":Target 1 (
data1): Stores 50% of the logical data.Target 2 (
data2): Stores the other 50% of the logical data.Target 3 (
data_p): Stores parity information for redundancy, allowing recovery ifdata1ordata2is lost.
Operational Flow
The use case operates in two main phases.
Setup Phase
The applications use DOCA Comch and three TCP sockets (one for each target) to register with each other, share configuration, and prepare the data path contexts.
Data Path Phase
Once configured, the contexts are used to perform storage requests and signal completions using doca_comch_producer, doca_comch_consumer, doca_rdma, doca_dma, doca_ec, and doca_compress.
Memory Layout
The memory layout for this use case is more complex:
Initiator: Allocates a memory region partitioned into data blocks.
Service: Allocates a larger region where each block is twice the size of the initiator's blocks. This provides space to perform compression, decompression, and parity calculations.
Storage Targets: Each of the three targets allocates blocks that are 50% of the initiator's block size. All targets must use the same block size and count.
Data is transferred between the targets and the service via RDMA. Cross-PCIe memory access via DOCA mmap (mmap from Export) is used for data transfers between the service (on the DPU) and the initiator (on the host).
The flow of data through this layout is as follows:
During Write Operations: DOCA DMA is used to transfer initiator blocks into the service's larger storage blocks (using the cross-PCIe
DOCA mmap). The block's content is then compressed, parity data is calculated, and the resulting data (data_1,data_2,data_p) is transferred to the storage targets via RDMA.During Normal Read Operations: Data is transferred from the targets to the service blocks via RDMA. The DOCA mmap is then used to decompress the data from the service blocks directly into the initiator's blocks.
During Recovery Read Operations: The service uses RDMA to fetch the valid data half and the parity data into its temporary block space. It combines them to reconstruct the missing data, and the resulting restored data is then decompressed into the initiator's block as normal.
Normal Read Flow
The standard read flow, when all targets are available, is as follows:
The Initiator sends a read request to the Service (via
doca_comch_producer).The Service receives the request (via
doca_comch_consumer).The Service sends two IO requests (via
doca_rdma) to thedata_1anddata_2Targets.Both Targets perform RDMA transfers, sending their storage blocks to the Service's blocks.
Both Targets send an IO response to the Service (via
doca_rdma).The Service, upon receiving both responses, examines the block metadata to determine the compressed data size.
The Service decompresses the data (via
doca_compressusing LZ4), using the cross-PCIeDOCA mmapfeature to write the decompressed data directly into the Initiator's memory.The Service forwards an IO response to the Initiator (via
doca_comch_producer).The Initiator receives the final response (via
doca_comch_consumer).
Recovery Read Flow
If a data target is unavailable, the service performs a recovery read:
The Initiator sends a read request to the Service (via
doca_comch_producer).The Service receives the request (via
doca_comch_consumer).The Service sends two IO requests (via
doca_rdma) to the available data target and the parity target.To recover
data_1: Send requests todata_2anddata_p.To recover
data_2: Send requests todata_1anddata_p.
Both Targets perform RDMA transfers to the Service's blocks.
Both Targets send an IO response to the Service (via
doca_rdma).The Service uses
doca_ecto reconstruct the missing data block from the available data and parity blocks.The Service examines the recovered block's metadata to determine the compressed data size.
The Service decompresses the recovered data (via
doca_compress), using the cross-PCIeDOCA mmapto write the data directly into the Initiator's memory.The Service forwards an IO response to the Initiator (via
doca_comch_producer).The Initiator receives the final response (via
doca_comch_consumer).
Write Flow
The write path is only available if the doca_storage_comch_to_rdma_gga_offload service is compiled with the liblz4 dev libraries.
The Initiator sends a write request to the Service (via
doca_comch_producer).The Service receives the request (via
doca_comch_consumer).The Service fetches the data block from the Initiator using
doca_dmavia the cross-PCIeDOCA mmap.The Service compresses the data and packages it into a block with metadata.
If the data is not compressible enough, an error is returned to the Initiator (via
doca_comch_producer).The Initiator receives the error response (via
doca_comch_consumer).The flow ends.
The Service sends three IO requests (via
doca_rdma) to the Targets to writedata_1,data_2, anddata_p.The Service waits for all three IO responses.
The Service forwards a final IO response to the Initiator (via
doca_comch_producer).The Initiator receives the response (via
doca_comch_consumer).
Running the GGA Offload Use Case
It is assumed that:
The service and storage applications can communicate via TCP
The service (
doca_storage_comch_to_rdma_gga_offload) is deployed on a BlueField-3 DPUFull line-rate RDMA data transfer is possible between the service and the storage targets
All examples use CPU 0. Users should verify if this is optimal by considering NUMA affinity, core pinning, and kernel interrupt distribution.
Execution Order
Running this use case involves running the applications in this specific order:
doca_storage_target_rdma(fordata_1)doca_storage_target_rdma(fordata_2)doca_storage_target_rdma(fordata_p)doca_storage_comch_to_rdma_gga_offload(the service)doca_storage_initiator_comch(the initiator)
Placeholders
Replace the following placeholders in the commands with actual values:
{{INITIATOR_DEV_PCI_ADDR}}{{STORAGE_DEV_PCI_ADDR}}{{STORAGE_IP_ADDR_DATA_1}}{{STORAGE_IP_ADDR_DATA_2}}{{STORAGE_IP_ADDR_DATA_P}}{{STORAGE_TCP_PORT_DATA_1}}{{STORAGE_TCP_PORT_DATA_2}}{{STORAGE_TCP_PORT_DATA_P}}
Generate SBC File
doca_storage_gga_offload_sbc_generator is not compiled if the liblz4 dev libraries are not present.
doca_storage_gga_offload_sbc_generator \
-d 3b:00.0 \
--original-input-data storage_content.txt \
--block-size 4096 \
--data-1 data_1.sbc \
--data-2 data_2.sbc \
--data-p data_p.sbc
Run Read Data Validity Test
Run three storage targets:
doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_1}} --binary-content data_1.sbc --cpu
0doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_2}} --binary-content data_2.sbc --cpu1doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_P}} --binary-content data_p.sbc --cpu2Run the offload service:
doca_storage_comch_to_rdma_gga_offload \ -d
03:00.0\ -r {{INITIATOR_DEV_PCI_ADDR}} \ --data-1-storage {{STORAGE_IP_ADDR_DATA_1}}:{{STORAGE_TCP_PORT_DATA_1}} \ --data-2-storage {{STORAGE_IP_ADDR_DATA_2}}:{{STORAGE_TCP_PORT_DATA_2}} \ --data-p-storage {{STORAGE_IP_ADDR_DATA_P}}:{{STORAGE_TCP_PORT_DATA_P}} \ --trigger-recovery-read-every-n0\ --cpu0Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_only_data_validity_test \ --storage-plain-content storage_content.txt \ --run-limit-operation-count
20000\ --cpu0
Run Read Throughput Test
Repeat step 1 from the "Read Data Validity Test".
Repeat step 2 from the "Read Data Validity Test".
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_throughput_test\ --storage-plain-content storage_content.txt \ --run-limit-operation-count 1000000 \ --cpu 0
Run Write Data Validity Test
Run three storage targets:
doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_1}} --binary-content data_1.sbc --cpu
0doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_2}} --binary-content data_2.sbc --cpu1doca_storage_target_rdma -d {{STORAGE_DEV_PCI_ADDR}} --listen-port {{STORAGE_TCP_PORT_DATA_P}} --binary-content data_p.sbc --cpu2Run the offload service:
doca_storage_comch_to_rdma_gga_offload \ -d
03:00.0\ -r {{INITIATOR_DEV_PCI_ADDR}} \ --data-1-storage {{STORAGE_IP_ADDR_DATA_1}}:{{STORAGE_TCP_PORT_DATA_1}} \ --data-2-storage {{STORAGE_IP_ADDR_DATA_2}}:{{STORAGE_TCP_PORT_DATA_2}} \ --data-p-storage {{STORAGE_IP_ADDR_DATA_P}}:{{STORAGE_TCP_PORT_DATA_P}} \ --cpu0Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy read_write_data_validity_test \ --storage-plain-content storage_content.txt \ --run-limit-operation-count
20000\ --cpu0
Run Write Throughput Test
Repeat step 1 from the "Run Write Data Validity Test".
Repeat step 2 from the "Run Write Data Validity Test".
Run the initiator:
doca_storage_initiator_comch \ -d {{INITIATOR_DEV_PCI_ADDR}} \ --execution-strategy write_throughput_test \ --run-limit-operation-count 50000 \ --cpu 0
This application leverages the following DOCA libraries:
Refer to their respective programming guide for more information.
Refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.
The installation of DOCA's reference applications contains the sources of the applications, alongside the matching compilation instructions. This allows for compiling the applications "as-is" and provides the ability to modify the sources, then compile a new version of the application. For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.
The sources of the application can be found under the application's directory: /opt/mellanox/doca/applications/storage/.
doca_storage_gga_offload_sbc_generator is not compiled when liblz4 dev libraries are not present.
The write path is only available if the doca_storage_comch_to_rdma_gga_offload service is compiled with the liblz4 dev libraries.
Compiling All Applications
All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.
To build all the applications together, run:
cd /opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build
The storage zero copy applications are:
doca_storage_comch_to_rdma_gga_offloaddoca_storage_comch_to_rdma_zero_copydoca_storage_gga_offload_sbc_generatordoca_storage_initiator_comchdoca_storage_target_rdma
The applications are built under /tmp/build/storage/.
Compiling Storage Applications Only
To directly build only the storage applications:
cd /opt/mellanox/doca/applications/
meson /tmp/build -Denable_all_applications=false -Denable_storage=true
ninja -C /tmp/build
The storage applications are built under /tmp/build/storage/.
Alternatively, you can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:
Edit the following flags in
/opt/mellanox/doca/applications/meson_options.txt:Set
enable_all_applicationstofalseSet
enable_storagetotrue
Run the following compilation commands :
cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build
The storage applications are created under
/tmp/build/storage/.
Troubleshooting
Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide if you encounter any issues with the compilation of the application.
Refer to each use case for details on how to run the applications: