DOCA Storage Comch to RDMA GGA Offload Application Guide

Introduction

The doca_storage_comch_to_rdma_gga_offload application acts as a bridge between the initiator and three storage targets. It actively participates in data transfer and leverages doca_libs to accelerate certain data operations transparently to both the initiator and the targets.

System Design

The doca_storage_comch_to_rdma_gga_offload application performs the following key functions:

Orchestrates communication between the initiator and the three storage instances
Periodically triggers data recovery flows to simulate data loss
Performs data recovery using erasure coding
Performs inline data decompression

To accomplish these tasks, the application establishes TCP connections to three storage targets and listens for an incoming connection from a single initiator using doca_comch_server.

Architecture

The doca_storage_comch_to_rdma_gga_offload application is divided into two main functional areas:

Control-time and shared resources
Per-thread data path resources

gga_offload_-_objects-version-2-modificationdate-1744639603937-api-v2.png

The application execution follows two primary phases:

Control phase
Data path phase

Control Phase

This phase begins with establishing connections to the required storage targets, followed by awaiting a client connection. Once all connections are in place, the application waits for specific control commands:

Query storage
Init storage
Start storage

Each control command is processed using the following sequence:

Relay the command to each connected storage target.
Wait for responses from all storage targets.
Perform required post-processing and consistency checks on the responses.
Send a response back to the client.

Issuing the start storage command initiates the data path phase. While the data threads begin execution, the main thread continues to wait for final control commands to complete the application's lifecycle:

Stop storage
Shutdown

Data Path Phase

This phase is executed per thread and involves each thread performing I/O operations requested by the client. For a read I/O operation, one of two flows is used:

Regular read (no recovery)
Recovery read

By default, only the regular read flow is executed. However, periodic recovery operations can be enabled by the user—see the "Command Line Flags" section for details.

Regular Read Data Flow

The regular read flow consists of the stages detailed in the following subsections.

1. Initiator Request

The initiator sends an I/O request to the GGA offload application.
The GGA offload application translates the I/O address from the initiator's memory range to the corresponding GGA offload memory range.
- For example, a read at offset 8 KB in initiator memory is remapped to <GGA-offload-base> + 8 KB.

gga_offload_-_regular_read_01_-_IO_request-version-2-modificationdate-1744639722407-api-v2.png

2. RDMA Transfers

The GGA offload application sends the adjusted read requests to the data 1 and data 2 storage targets.
Each storage target performs an RDMA write into the GGA offload memory region at the designated offsets.

gga_offload_-_regular_read_02_-_RDMA-version-3-modificationdate-1744639779067-api-v2.png

3. Target Responses

Both storage targets send I/O responses upon completing the RDMA transfers.
The GGA offload application waits until both responses are received.

gga_offload_-_regular_read_03_-_RDMA_IO_response-version-3-modificationdate-1744639820683-api-v2.png

4. Decompression and Final Response

The application performs decompression using the received data blocks. Decompression output is written directly to the initiator memory using the doca_decompress task.
The application sends an I/O response to the initiator, completing the operation.

Recovery Read Data Flow

This flow is similar to the regular read, but with added steps for error correction using erasure coding. The process involves the stages detailed in the following subsections.

1. Initiator Request

The initiator sends an I/O request to the GGA offload application.
Given that recovery is required, the application:
1. Translates the I/O address normally.
2. Modifies one of the two storage requests to target the parity partition instead of a standard data partition.
3. Adds an output offset field to redirect the parity data into a reserved region of GGA offload memory.

gga_offload_-_recovery_read_01_-_IO_request-version-2-modificationdate-1744640033580-api-v2.png

2. RDMA Transfers

Two RDMA transfers are issued:
- One to the surviving data partition.
- One to the parity partition.
The transferred blocks are written into the GGA offload memory with appropriate alignment.

gga_offload_-_recovery_read_02_-_RDMA-version-4-modificationdate-1744640077767-api-v2.png

3. Target Responses

Both storage targets reply with I/O responses.
The GGA offload application waits for both to complete before proceeding.

gga_offload_-_recovery_read_03_-_RDMA_IO_response-version-5-modificationdate-1744640170287-api-v2.png

4. Data Recovery

The application performs recovery using the available half block and parity data to reconstruct the missing block.

. 5. Decompression and Final Response

The recovered data is then decompressed. As with the regular read, decompression output is written directly to initiator memory.

An I/O response is sent to the initiator, completing the operation.

DOCA Libraries

This application leverages the following DOCA libraries:

Compiling the Application

This application is compiled as part of the set of storage applications. For compilation instructions, refer to the DOCA Storage Applications page.

Running the Application

Application Execution

Warning

This application can only run within the NVIDIA® BlueField® DPU.

Info

DOCA Storage Comch to RDMA GGA Offload is provided in source form. Therefore, compilation is required before the application can be executed.

Application usage instructions:

Copy
Copied!

            
            Usage: doca_storage_comch_to_rdma_gga_offload [DOCA Flags] [Program Flags]
 
DOCA Flags:
  -h, --help                        Print a help synopsis
  -v, --version                     Print program version information
  -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  -j, --json <path>                 Parse all command flags from an input json file
 
Program Flags:
  -d, --device                      Device identifier
  -r, --representor                 Device host side representor identifier
  --cpu                             CPU core to which the process affinity can be set
  --data-1-storage                  Storage server addresses in <ip_addr>:<port> format
  --data-2-storage                  Storage server addresses in <ip_addr>:<port> format
  --data-p-storage                  Storage server addresses in <ip_addr>:<port> format
  --matrix-type                     Type of matrix to use. One of: cauchy, vandermonde Default: vandermonde
  --command-channel-name            Name of the channel used by the doca_comch_client. Default: "doca_storage_comch"
  --control-timeout                 Time (in seconds) to wait while performing control operations. Default: 5
  --trigger-recovery-read-every-n   Trigger a recovery read flow every N th request. Default: 0 (disabled)

Info

This usage printout can be printed to the command line using the -h (or --help) options:

Copy
Copied!

            
            ./doca_storage_comch_to_rdma_gga_offload -h

For additional information, refer to section "Command-line Flags".

CLI example for running the application on the BlueField:
Copy

Copied!
```
            
            ./doca_storage_zero_copy_comch_to_rdma -d 03:00.0 -r 3b:00.0 --data-1-storage 172.17.0.1:12345 --data-2-storage 172.17.0.2:12345 --data-p-storage 172.17.0.3:12345 --cpu 0
        
```
Note

Both the DOCA Comch device PCIe address (03:00.0) and the DOCA Comch device representor PCIe address (3b:00.0) should match the addresses of the desired PCIe devices.

Note

Storage target IP address:port tuples should be updated to refer to the running storage target applications.
The application also supports a JSON-based deployment mode in which all command-line arguments are provided through a JSON file:
Copy

Copied!
```
            
            ./doca_storage_comch_to_rdma_gga_offload --json [json_file]
        
```
For example:
Copy

Copied!
```
            
            ./doca_storage_comch_to_rdma_gga_offload --json doca_storage_comch_to_rdma_gga_offload_params.json
        
```
Note

Before execution, ensure that the JSON file contains valid configuration parameters, particularly the correct PCIe device addresses required for deployment.

Command-line Flags

Flag Type	Short Flag	Long Flag/JSON Key	Description	JSON Content
General flags	`h`	`help`	Print a help synopsis	N/A
	`v`	`version`	Print program version information	N/A
	`l`	`log-level`	Set the log level for the application: DISABLE=10 CRITICAL=20 ERROR=30 WARNING=40 INFO=50 DEBUG=60 TRACE=70 (requires compilation with `TRACE` log level support)	Copy Copied! `"log-level": 60`
	N/A	`sdk-log-level`	Set the log level for the program: DISABLE=10 CRITICAL=20 ERROR=30 WARNING=40 INFO=50 DEBUG=60 TRACE=70	Copy Copied! `"sdk-log-level": 40`
	`j`	`json`	Parse all command flags from an input JSON file	N/A
Program flags	`d`	`device`	DOCA device identifier. One of: PCIe address: `3b:00.0` InfiniBand name: `mlx5_0` Network interface name: `en3f0pf0sf0` Note This flag is a mandatory.	Copy Copied! `"device": "03:00.0"`
	`r`	`representor`	DOCA Comch device representor PCIe address Note This flag is a mandatory.	Copy Copied! `"representor": "3b:00.0"`
	N/A	`--cpu`	Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0. Note The user can specify this argument multiple times to create more threads. Note This flag is a mandatory.	Copy Copied! `"cpu": 6`
	N/A	`--data-1-storage`	IP address and port to use to establish the control TCP connection to the target. Note This flag is a mandatory.	Copy Copied! `"data-1-storage": "172.17.0.1:12345"`
	N/A	`--data-2-storage`	IP address and port to use to establish the control TCP connection to the target. Note This flag is a mandatory.	Copy Copied! `"data-2-storage": "172.17.0.1:12345"`
	N/A	`--data-p-storage`	IP address and port to use to establish the control TCP connection to the target. Note This flag is a mandatory.	Copy Copied! `"data-p-storage": "172.17.0.1:12345"`
	N/A	`--matrix-type`	Type of matrix to use. One of: `cauchy` `vandermonde`	Copy Copied! `"matrix-type": "vandermonde"`
	N/A	`--command-channel-name`	Allows customizing the server name used for this application instance if multiple comch servers exist on the same device.	Copy Copied! `"command-channel-name": "doca_storage_comch"`
	N/A	`--control-timeout`	Time, in seconds, to wait while performing control operations	Copy Copied! `"control-timeout": 5`
	N/A	`--trigger-recovery-read-every-n`	Trigger a recovery read flow every Nth request. Set to 0 to disable recovery reads entirely. For example, a value of 1 triggers a recovery read for every request, while a value of 10 triggers a recovery read for one out of every ten requests.	Copy Copied! `"trigger-recovery-read-every-n": 0`

Troubleshooting

Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications.

Application Code Flow

Control Phase

Copy
Copied!

            
            gga_offload_app app{parse_cli_args(argc, argv)};

Parse CLI arguments, apply default values, and create the application instance.

Copy
Copied!

            
            app.connect_to_storage();

Connect to each of the three storage targets over TCP.

Copy
Copied!

            
            app.wait_for_comch_client_connection();

Create a doca_comch_server instance and wait for a doca_comch_client to connect.

Copy

Copied!
```
            
            app.wait_for_and_process_query_storage();
        
```
Wait for the initiator to send a query storage control message, then:
- Send a query storage message to each storage target
- Verify that each storage target reports the same storage size
- Send a query storage response back to the initiator
Copy

Copied!
```
            
            app.wait_for_and_process_init_storage();
        
```
Wait for the initiator to send an init storage control message, then:
- Verify that the requested core count does not exceed the available cores
- Create and export storage memory
- Send an init storage message to each storage target
- Wait for responses from all storage targets
- Create data path resources:
  - Worker threads
  - IO message memory regions
  - doca_pe objects
  - doca_comch_consumer objects
  - doca_comch_producer objects
  - doca_rdma connection objects
  - doca_ec objects
  - doca_compress objects
- Send an init storage response
Copy

Copied!
```
            
            app.wait_for_and_process_start_storage();
        
```
Wait for the initiator to send a start storage control message, then:
- Send a start storage message to each storage target
- Wait for responses from all storage targets
- Create task objects
- Submit listening tasks (doca_comch_consumer and RDMA receive tasks)
- Signal worker threads to begin processing
- Send a start storage response
Copy

Copied!
```
            
            app.wait_for_and_process_stop_storage();
        
```
Wait for the initiator to send a stop storage control message (test complete), then:
- Send a stop storage message to each storage target
- Wait for responses from all storage targets
- Signal worker threads to stop
- Gather and post-process execution statistics
- Destroy doca_comch_consumer objects
- Destroy doca_comch_producer objects
- Send a stop storage response
Copy

Copied!
```
            
            app.wait_for_and_process_shutdown();
        
```
Wait for the initiator to send a shutdown control message, then:
- Send a shutdown message to each storage target
- Wait for responses from all storage targets
- Destroy all remaining data path objects
- Send a shutdown response

Copy
Copied!

            
            app.display_stats();

Display collected statistics and destroy all control path objects.

Data Path Phase

Copy
Copied!

            
            while (m_hot_data.run_flag == false) {
    std::this_thread::yield();
    if (m_hot_data.error_flag)
        return;
}

The main data thread enters a spin-wait loop, yielding execution until all threads and resources are initialized. If an error is detected (error_flag is set), the thread exits early.

Copy

Copied!
```
            
            while (m_hot_data.run_flag) {
    doca_pe_progress(m_hot_data.pe) 
        ? ++(m_hot_data.pe_hit_count) 
        : ++(m_hot_data.pe_miss_count);
}
        
```
Once started, the thread enters a tight loop, continuously polling the progress engine (doca_pe_progress). Each iteration updates the hit/miss counters based on whether any task completions were triggered. This loop drives the data path by processing task completions as fast as possible.

Copy
Copied!

            
            while (m_hot_data.error_flag == false && m_hot_data.in_flight_transaction_count != 0) {
    doca_pe_progress(m_hot_data.pe) 
        ? ++(m_hot_data.pe_hit_count) 
        : ++(m_hot_data.pe_miss_count);
}

This final loop ensures that all in-flight transactions complete before exiting. It continues polling the progress engine as long as there are active transactions and no error has occurred.

doca_comch_consumer_task_post_recv_cb

This is the comch consumer callback function which initiates the read or recovery flow. It marks the first step in processing the initiator's request, determining which data must be fetched and which two storage targets to query.

doca_comch_consumer_task_post_recv_cb reinterprets the task and context user data, and invokes gga_offload_app_worker::hot_data::start_transaction.

start_transaction prepares two IO requests, depending on the required mode:

read – data 1 and data 2
recover_a – parity and data 2 (to recover data 1)
recover_b – data 1 and parity (to recover data 2)

IO request preparation:

Copy
Copied!

            
            storage::io_message_view::set_correlation_id(cid, part_a_io_message);
storage::io_message_view::set_correlation_id(cid, part_b_io_message);
storage::io_message_view::set_correlation_id(cid, response_io_message);
 
storage::io_message_view::set_type(type, part_a_io_message);
storage::io_message_view::set_type(type, part_b_io_message);
storage::io_message_view::set_type(storage::io_message_type::result, response_io_message);
 
storage::io_message_view::set_user_data(user_data, part_a_io_message);
storage::io_message_view::set_user_data(user_data, part_b_io_message);
storage::io_message_view::set_user_data(user_data, response_io_message);
 
storage::io_message_view::set_io_address(local_io_addr, part_a_io_message);
storage::io_message_view::set_io_address(local_io_addr + half_block_size, part_b_io_message);
storage::io_message_view::set_io_address(host_io_addr, response_io_message);
 
storage::io_message_view::set_io_size(half_block_size, part_a_io_message);
storage::io_message_view::set_io_size(half_block_size, part_b_io_message);
storage::io_message_view::set_io_size(transaction.io_size, response_io_message);
 
storage::io_message_view::set_remote_offset(remote_offset_a, part_a_io_message);
storage::io_message_view::set_remote_offset(remote_offset_b, part_b_io_message);

The transaction mode (read, recover_a, or recover_b) is saved.
The prepared IO requests are then sent to the corresponding storage targets.

doca_rdma_task_receive_cb

After each storage target completes its respective data transfer, it sends a response. Once the callback receives the second of the two responses, it checks the transaction mode. If the mode is read, it invokes the decompression step: gga_offload_app_worker::hot_data::start_decompress. Otherwise, it invokes the EC recovery step: gga_offload_app_worker::hot_data::start_recover.

/opt/mellanox/doca/applications/storage/

On This Page

DOCA Storage Comch to RDMA GGA Offload Application Guide

Introduction

System Design

Architecture

Control Phase

Data Path Phase

Regular Read Data Flow

1. Initiator Request

2. RDMA Transfers

3. Target Responses

4. Decompression and Final Response

Recovery Read Data Flow

1. Initiator Request

2. RDMA Transfers

3. Target Responses

4. Data Recovery

. 5. Decompression and Final Response

DOCA Libraries

Compiling the Application

Running the Application

Application Execution

Command-line Flags

Troubleshooting

Application Code Flow

Control Phase

Data Path Phase

doca_comch_consumer_task_post_recv_cb

doca_rdma_task_receive_cb

doca_ec_task_recover_cb

doca_compress_task_decompress_lz4_stream_cb

References