DOCA Storage Comch to RDMA GGA Offload Application Guide
The doca_storage_comch_to_rdma_gga_offload
application acts as a bridge between the initiator and three storage targets. It actively participates in data transfer and leverages doca_libs
to accelerate certain data operations transparently to both the initiator and the targets.
The doca_storage_comch_to_rdma_gga_offload
application performs the following key functions:
Orchestrates communication between the initiator and the three storage instances
Periodically triggers data recovery flows to simulate data loss
Performs data recovery using erasure coding
Performs inline data decompression
To accomplish these tasks, the application establishes TCP connections to three storage targets and listens for an incoming connection from a single initiator using doca_comch_server
.
The doca_storage_comch_to_rdma_gga_offload
application is divided into two main functional areas:
Control-time and shared resources
Per-thread data path resources

The application execution follows two primary phases:
Control phase
Data path phase
Control Phase
This phase begins with establishing connections to the required storage targets, followed by awaiting a client connection. Once all connections are in place, the application waits for specific control commands:
Query storage
Init storage
Start storage
Each control command is processed using the following sequence:
Relay the command to each connected storage target.
Wait for responses from all storage targets.
Perform required post-processing and consistency checks on the responses.
Send a response back to the client.
Issuing the start storage command initiates the data path phase. While the data threads begin execution, the main thread continues to wait for final control commands to complete the application's lifecycle:
Stop storage
Shutdown
Data Path Phase
This phase is executed per thread and involves each thread performing I/O operations requested by the client. For a read I/O operation, one of two flows is used:
Regular read (no recovery)
Recovery read
By default, only the regular read flow is executed. However, periodic recovery operations can be enabled by the user—see the "Command Line Flags" section for details.
Regular Read Data Flow
The regular read flow consists of the stages detailed in the following subsections.
1. Initiator Request
The initiator sends an I/O request to the GGA offload application.
The GGA offload application translates the I/O address from the initiator's memory range to the corresponding GGA offload memory range.
For example, a read at offset 8 KB in initiator memory is remapped to
<GGA-offload-base> + 8 KB
.

2. RDMA Transfers
The GGA offload application sends the adjusted read requests to the
data 1
anddata 2
storage targets.Each storage target performs an RDMA write into the GGA offload memory region at the designated offsets.

3. Target Responses
Both storage targets send I/O responses upon completing the RDMA transfers.
The GGA offload application waits until both responses are received.

4. Decompression and Final Response
The application performs decompression using the received data blocks. Decompression output is written directly to the initiator memory using the
doca_decompress
task.The application sends an I/O response to the initiator, completing the operation.
Recovery Read Data Flow
This flow is similar to the regular read, but with added steps for error correction using erasure coding. The process involves the stages detailed in the following subsections.
1. Initiator Request
The initiator sends an I/O request to the GGA offload application.
Given that recovery is required, the application:
Translates the I/O address normally.
Modifies one of the two storage requests to target the parity partition instead of a standard data partition.
Adds an output offset field to redirect the parity data into a reserved region of GGA offload memory.

2. RDMA Transfers
Two RDMA transfers are issued:
One to the surviving data partition.
One to the parity partition.
The transferred blocks are written into the GGA offload memory with appropriate alignment.

3. Target Responses
Both storage targets reply with I/O responses.
The GGA offload application waits for both to complete before proceeding.

4. Data Recovery
The application performs recovery using the available half block and parity data to reconstruct the missing block.
. 5. Decompression and Final Response
The recovered data is then decompressed. As with the regular read, decompression output is written directly to initiator memory.
An I/O response is sent to the initiator, completing the operation.
This application leverages the following DOCA libraries:
This application is compiled as part of the set of storage applications. For compilation instructions, refer to the DOCA Storage Applications page.
Application Execution
This application can only run within the NVIDIA® BlueField® DPU.
DOCA Storage Comch to RDMA GGA Offload is provided in source form. Therefore, compilation is required before the application can be executed.
Application usage instructions:
Usage: doca_storage_comch_to_rdma_gga_offload [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse all command flags from an input json file Program Flags: -d, --device Device identifier -r, --representor Device host side representor identifier --cpu CPU core to which the process affinity can be set --data-1-storage Storage server addresses in <ip_addr>:<port> format --data-2-storage Storage server addresses in <ip_addr>:<port> format --data-p-storage Storage server addresses in <ip_addr>:<port> format --matrix-type Type of matrix to use. One of: cauchy, vandermonde Default: vandermonde --command-channel-name Name of the channel used by the doca_comch_client. Default: "doca_storage_comch" --control-timeout Time (in seconds) to wait while performing control operations. Default: 5 --trigger-recovery-read-every-n Trigger a recovery read flow every N th request. Default: 0 (disabled)
InfoThis usage printout can be printed to the command line using the
-h
(or--help
) options:./doca_storage_comch_to_rdma_gga_offload -h
For additional information, refer to section "Command-line Flags".
CLI example for running the application on the BlueField:
./doca_storage_zero_copy_comch_to_rdma -d 03:00.0 -r 3b:00.0 --data-1-storage 172.17.0.1:12345 --data-2-storage 172.17.0.2:12345 --data-p-storage 172.17.0.3:12345 --cpu 0
NoteBoth the DOCA Comch device PCIe address (
03:00.0
) and the DOCA Comch device representor PCIe address (3b:00.0
) should match the addresses of the desired PCIe devices.NoteStorage target IP
address:port
tuples should be updated to refer to the running storage target applications.The application also supports a JSON-based deployment mode in which all command-line arguments are provided through a JSON file:
./doca_storage_comch_to_rdma_gga_offload --json [json_file]
For example:
./doca_storage_comch_to_rdma_gga_offload --json doca_storage_comch_to_rdma_gga_offload_params.json
NoteBefore execution, ensure that the JSON file contains valid configuration parameters, particularly the correct PCIe device addresses required for deployment.
Command-line Flags
Flag Type | Short Flag | Long Flag/JSON Key | Description | JSON Content |
General flags |
|
| Print a help synopsis | N/A |
|
| Print program version information | N/A | |
|
| Set the log level for the application:
|
| |
N/A |
| Set the log level for the program:
|
| |
|
| Parse all command flags from an input JSON file | N/A | |
Program flags |
|
| DOCA device identifier. One of:
Note
This flag is a mandatory. |
|
|
| DOCA Comch device representor PCIe address Note
This flag is a mandatory. |
| |
N/A |
| Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0. Note
The user can specify this argument multiple times to create more threads.
Note
This flag is a mandatory. |
| |
N/A |
| IP address and port to use to establish the control TCP connection to the target. Note
This flag is a mandatory. |
| |
N/A |
| IP address and port to use to establish the control TCP connection to the target. Note
This flag is a mandatory. |
| |
N/A |
| IP address and port to use to establish the control TCP connection to the target. Note
This flag is a mandatory. |
| |
N/A |
| Type of matrix to use. One of:
|
| |
N/A |
| Allows customizing the server name used for this application instance if multiple comch servers exist on the same device. |
| |
N/A |
| Time, in seconds, to wait while performing control operations |
| |
N/A |
| Trigger a recovery read flow every Nth request. Set to 0 to disable recovery reads entirely. For example, a value of 1 triggers a recovery read for every request, while a value of 10 triggers a recovery read for one out of every ten requests. |
|
Troubleshooting
Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications.
Control Phase
-
gga_offload_app app{parse_cli_args(argc, argv)};
Parse CLI arguments, apply default values, and create the application instance.
-
app.connect_to_storage();
Connect to each of the three storage targets over TCP.
-
app.wait_for_comch_client_connection();
Create a
doca_comch_server
instance and wait for adoca_comch_client
to connect. -
app.wait_for_and_process_query_storage();
Wait for the initiator to send a query storage control message, then:
Send a query storage message to each storage target
Verify that each storage target reports the same storage size
Send a query storage response back to the initiator
-
app.wait_for_and_process_init_storage();
Wait for the initiator to send an init storage control message, then:
Verify that the requested core count does not exceed the available cores
Create and export storage memory
Send an init storage message to each storage target
Wait for responses from all storage targets
Create data path resources:
Worker threads
IO message memory regions
doca_pe
objectsdoca_comch_consumer
objectsdoca_comch_producer
objectsdoca_rdma
connection objectsdoca_ec
objectsdoca_compress
objects
Send an init storage response
-
app.wait_for_and_process_start_storage();
Wait for the initiator to send a start storage control message, then:
Send a start storage message to each storage target
Wait for responses from all storage targets
Create task objects
Submit listening tasks (
doca_comch_consumer
and RDMA receive tasks)Signal worker threads to begin processing
Send a start storage response
-
app.wait_for_and_process_stop_storage();
Wait for the initiator to send a stop storage control message (test complete), then:
Send a stop storage message to each storage target
Wait for responses from all storage targets
Signal worker threads to stop
Gather and post-process execution statistics
Destroy
doca_comch_consumer
objectsDestroy
doca_comch_producer
objectsSend a stop storage response
-
app.wait_for_and_process_shutdown();
Wait for the initiator to send a shutdown control message, then:
Send a shutdown message to each storage target
Wait for responses from all storage targets
Destroy all remaining data path objects
Send a shutdown response
-
app.display_stats();
Display collected statistics and destroy all control path objects.
Data Path Phase
-
while
(m_hot_data.run_flag ==false
) { std::this_thread::yield();if
(m_hot_data.error_flag)return
; }The main data thread enters a spin-wait loop, yielding execution until all threads and resources are initialized. If an error is detected (
error_flag
is set), the thread exits early. -
while
(m_hot_data.run_flag) { doca_pe_progress(m_hot_data.pe) ? ++(m_hot_data.pe_hit_count) : ++(m_hot_data.pe_miss_count); }Once started, the thread enters a tight loop, continuously polling the progress engine (
doca_pe_progress
). Each iteration updates the hit/miss counters based on whether any task completions were triggered. This loop drives the data path by processing task completions as fast as possible. -
while
(m_hot_data.error_flag ==false
&& m_hot_data.in_flight_transaction_count != 0) { doca_pe_progress(m_hot_data.pe) ? ++(m_hot_data.pe_hit_count) : ++(m_hot_data.pe_miss_count); }This final loop ensures that all in-flight transactions complete before exiting. It continues polling the progress engine as long as there are active transactions and no error has occurred.
doca_comch_consumer_task_post_recv_cb
This is the comch consumer callback function which initiates the read or recovery flow. It marks the first step in processing the initiator's request, determining which data must be fetched and which two storage targets to query.
doca_comch_consumer_task_post_recv_cb
reinterprets the task and context user data, and invokesgga_offload_app_worker::hot_data::start_transaction
.start_transaction
prepares two IO requests, depending on the required mode:read
–data 1
anddata 2
recover_a
–parity
anddata 2
(to recoverdata 1
)recover_b
–data 1
andparity
(to recoverdata 2
)IO request preparation:
storage::io_message_view::set_correlation_id(cid, part_a_io_message); storage::io_message_view::set_correlation_id(cid, part_b_io_message); storage::io_message_view::set_correlation_id(cid, response_io_message); storage::io_message_view::set_type(type, part_a_io_message); storage::io_message_view::set_type(type, part_b_io_message); storage::io_message_view::set_type(storage::io_message_type::result, response_io_message); storage::io_message_view::set_user_data(user_data, part_a_io_message); storage::io_message_view::set_user_data(user_data, part_b_io_message); storage::io_message_view::set_user_data(user_data, response_io_message); storage::io_message_view::set_io_address(local_io_addr, part_a_io_message); storage::io_message_view::set_io_address(local_io_addr + half_block_size, part_b_io_message); storage::io_message_view::set_io_address(host_io_addr, response_io_message); storage::io_message_view::set_io_size(half_block_size, part_a_io_message); storage::io_message_view::set_io_size(half_block_size, part_b_io_message); storage::io_message_view::set_io_size(transaction.io_size, response_io_message); storage::io_message_view::set_remote_offset(remote_offset_a, part_a_io_message); storage::io_message_view::set_remote_offset(remote_offset_b, part_b_io_message);
The transaction mode (
read
,recover_a
, orrecover_b
) is saved.The prepared IO requests are then sent to the corresponding storage targets.
doca_rdma_task_receive_cb
After each storage target completes its respective data transfer, it sends a response. Once the callback receives the second of the two responses, it checks the transaction mode. If the mode is read
, it invokes the decompression step: gga_offload_app_worker::hot_data::start_decompress
. Otherwise, it invokes the EC recovery step: gga_offload_app_worker::hot_data::start_recover
.
doca_ec_task_recover_cb
In scenarios requiring data recovery, the decompression step (gga_offload_app_worker::hot_data::start_decompress
) is invoked immediately after the recovery completes.
doca_compress_task_decompress_lz4_stream_cb
Completion of the decompression also entails storing the resulting decompressed data into the remote initiator's memory. At this point, the transaction is considered complete and may be safely re-used.
/opt/mellanox/doca/applications/storage/