DOCA Storage Target RDMA Application Guide

Introduction

The doca_storage_target_rdma application provides a simple, volatile memory-backed RDMA storage target. It is designed to interact with a DOCA storage service, offering direct read and write access to a dedicated memory region using RDMA.

System Design

The doca_storage_target_rdma application performs the following core functions:

Exposes a memory-backed storage region for use by the storage service client.
Handles RDMA-based I/O operations (reads and writes) using the DOCA RDMA library.

To perform these tasks, the application acts as a TCP server, waiting for an incoming connection from a storage service initiator. Once connected, it handles control and data path interactions as in this page.

Architecture

The application is divided into two main functional areas:

Control-Time and Shared Resources – Includes TCP server setup, memory registration, and RDMA connection handling.
Per-Thread Data Path Resources – Includes thread-local RDMA resources and task management structures.

target_rdma_-_objects-version-1-modificationdate-1761294794163-api-v2.png

The application executes in two main phases:

Control Phase
Data Path Phase

Control Phase

This phase begins when a connection is established with a storage service over TCP. The application then processes a sequence of control commands:

Query Storage
- Reports the size and layout of the exposed storage region.
Init Storage
- Validates the requested number of worker threads.
- Allocates and registers local memory for RDMA operations.
- Imports remote memory handles provided by the initiator.
- Creates internal worker objects for task execution.
Wait for RDMA Connection Requests
- Waits for one RDMA connection per requested core/thread.
- Ensures each RDMA session is fully established before proceeding.
Start Storage
- Waits for all RDMA connections to be ready.
- Submits initial tasks to prepare the data path phase.
- Launches the data path threads.

Once the Start Storage command is received and all threads are active, the application transitions to the data path phase. The main thread remains active, waiting for final control commands:

Stop Storage
- Terminates active data threads cleanly.
Shutdown
- Performs cleanup and resource deallocation, shutting down the application.

Data Path Phase

Each data path thread performs I/O processing independently based on client requests. The typical per-thread flow is:

Receive I/O Request
- Handle incoming requests from the initiator via RDMA.
Perform RDMA Operation
- Depending on the request type, either:
  - Execute an RDMA read from the local memory region.
  - Execute an RDMA write to the local memory region.
Send I/O Response
- Return a response back to the initiator, indicating success or failure of the operation.

DOCA Libraries

This application leverages the following DOCA libraries:

DOCA RDMA

Compiling the Application

This application is compiled as part of the set of storage applications. For compilation instructions, refer to the DOCA Storage page.

Running the Application

Application Execution

Info

DOCA Storage Target RDMA is provided in source form. Therefore, compilation is required before the application can be executed.

Application usage instructions:

Copy
Copied!

            
            Usage: doca_storage_target_rdma [DOCA Flags] [Program Flags]
 
DOCA Flags:
  -h, --help                        Print a help synopsis
  -v, --version                     Print program version information
  -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  -j, --json <path>                 Parse command line flags from an input json file
 
Program Flags:
  -d, --device                      Device identifier
  --cpu                             CPU core to which the process affinity can be set
  --listen-port                     TCP listen port number
  --binary-content                  Path to binary.sbc file containing the initial content to be represented by this storage instance
  --block-count                     Number of available storage blocks. (Ignored when using content binary file) Default: 128
  --block-size                      Block size used by the storage. (Ignored when using content binary file) Default: 4096

Info

This usage printout can be printed to the command line using the -h (or --help) options:

Copy
Copied!

            
            ./doca_storage_target_rdma -h

For additional information, refer to section "Command-line Flags".

CLI example for running the application:

Copy
Copied!

            
            ./doca_storage_target_rdma -d 3b:00.0 --listen-port 12345 --block-size 4096 --block-count 64 --cpu 0

Note

The user DOCA device PCIe address (3b:00.0) should match the addresses of the desired PCIe device.

Command-line Flags

Flag Type	Short Flag	Long Flag	Description
General flags	`h`	`help`	Print a help synopsis
	`v`	`version`	Print program version information
	`l`	`log-level`	Set the log level for the application: DISABLE=10 CRITICAL=20 ERROR=30 WARNING=40 INFO=50 DEBUG=60 TRACE=70 (requires compilation with `TRACE` log level support)
	N/A	`sdk-log-level`	Set the log level for the program: DISABLE=10 CRITICAL=20 ERROR=30 WARNING=40 INFO=50 DEBUG=60 TRACE=70
	`j`	`json`	Parse command line flags from an input JSON file as well as from the CLI (if provided)
Program flags	`d`	`device`	DOCA device identifier. One of: PCIe address: `3b:00.0` InfiniBand name: `mlx5_0` Network interface name: `en3f0pf0sf0` Note This flag is a mandatory.
	N/A	`--cpu`	Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0. Note The user can specify this argument multiple times to create more threads. Note This flag is a mandatory.
	N/A	`--listen-port`	Port to listen upon for incomming TCP connections Note This flag is a mandatory.
	N/A	`--binary-content`	Path to a file to be used to provide initial content to the storage instance.
	N/A	`--block-count`	Number of storage blocks to provide
	N/A	`--block-size`	Size of each storage block

Info

A user should provide one of:

--binary-content : Where the file is a.sbc file
- The sbc file provides storage dimmensions and data to populate the blocks

OR (Random / uninitialisaed bytes with a user defined dimmension)

--block-count
--block-size

OR (Initialised bytes with a user defined dimmension)

--block-count
--block-size
--binary-content : Where the file is plain content to be distributed across the storage and its size == block count * block size

Troubleshooting

Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications.

Application Code Flow

The flow of the application is broken down into key functions / steps:

Copy
Copied!

            
            target_rdma_app app{parse_target_rdma_app_cli_args(argc, argv)};
 
storage::install_ctrl_c_handler([&app]() {
    app.abort("User requested abort");
});
 
app.wait_for_client_connection();
app.wait_for_and_process_query_storage();
app.wait_for_and_process_init_storage();
app.wait_for_and_process_create_rdma_connections();
app.wait_for_and_process_start_storage();
app.wait_for_and_process_stop_storage();
app.wait_for_and_process_shutdown();
app.display_stats();

Main/Control Thread Flow

Copy
Copied!

            
            initiator_comch_app app{parse_cli_args(argc, argv)};

Parse CLI arguments and use these to create the application instance. Initial resources are also created at this stage:

Copy

Copied!
```
            
            
        
```
Open a doca_dev as specified by the CLI argument: -d or --device.

Copy
Copied!

            
            m_storage_block_count = m_cfg.block_count;
m_storage_block_size = m_cfg.block_size;
 
auto const page_size = storage::get_system_page_size();
m_local_io_region_size = uint64_t{m_storage_block_count} * m_storage_block_size;
m_local_io_region = static_cast<uint8_t *>(storage::aligned_alloc(page_size, m_local_io_region_size));

Allocate storage IO blocks memory.

Copy
Copied!

            
            if (!m_cfg.content.empty()) {
    std::copy(std::begin(m_cfg.content), std::end(m_cfg.content), m_local_io_region);
}

Copy user defined content into the storage IO blocks memory if it was provided.

Copy

Copied!
```
            
            m_control_channel = storage::control::make_tcp_server_control_channel(m_cfg.listen_port);
        
```
Create a TCP server control channel using the listen port specified by the CLI argument --listen-port. (Control channel objects provide a unified API so that a TCP client, TCP server, doca_comch_client, and doca_comch_server all have a consistent API).

Info

See storage_common/control_channel.hpp for more information about the control channel abstraction.

Copy
Copied!

            
            storage::install_ctrl_c_handler([&app]() {
    app.abort("User requested abort");
});

Set a signal handler for control+c keyboard inputs so the app can shutdown gracefully.

Copy
Copied!

            
            app.wait_for_client_connection();

Wait for a TCP client (the storage service) to connect.

Copy

Copied!
```
            
            app.wait_for_and_process_query_storage();
        
```
Wait for the storage service to send a query_storage_request control message and then perform the required actions to fulfill the request:
1. Populate a query_storage_response with the storage capacity (block count * block size) and the block size.
2. Send response to the storage service.
Copy

Copied!
```
            
            app.wait_for_and_process_init_storage();
        
```
Wait for the storage service to send a init_storage_request control message and then perform the required actions to fulfill the request:
1. Use the init_storage_payload data to:
  1. Set core count (m_core_count) as the number of cores requested by the initiator (number of --cpu arguments provided to the initiator) OR fail if this is more than the number of --cpu arguments provided to the service.
  2. Set number of transactions per core (m_num_transactions).
  3. Create doca_mmap object from the provided export blob.
  4. Create m_core_count worker threads.
  5. Perform the first stages of the worker threads initialization. These steps are carried out for each thread, but only one thread performs the steps at any time this simplifies the sending and receiving of control messages, the user could modify this flow to execute in parallel if they so desired.
    1. Create thread bound to the N^th CPU provided to the service via the --cpu CLI arguments.
    2. Copy
      
      Copied!
      
      m_workers[ii].execute_control_command( worker_create_objects_control_command { m_dev, m_transaction_count, m_remote_io_mmap, m_local_io_mmap} );
      Initialize thread context (asychronously).
  6. Send a response to the storage service:
    1. Send an init_storage_response message upon success or an error_response message if anything failed.
Copy

Copied!
```
            
            app.wait_for_and_process_create_rdma_connections();
        
```
Wait for the storage service to send a create_rdma_connection_request control message and then perform the required actions to fulfill the request:
1. Copy
  
  Copied!
```
            
            void const *blob = nullptr;
size_t blob_size = 0;
 
doca_rdma_export(rdma_ctx, &blob, &blob_size, &rdma_conn);
        
```
  Create local side of the RDMA connection by exporting RDMA context and store the created connection blob.
2. Copy
  
  Copied!
```
            
            doca_rdma_connect(rdma_ctx, cmd.import_blob.data(), cmd.import_blob.size(), rdma_conn);
        
```
  Connect to remote RDMA side of the RDMA connection.
  
  Info
  
  This RDMA connection activity will be performed twice per worker thread. This allows for the IO control messages to be transferred over one connection while the other is used for data transfer. The reason for this is to reduce latency as with a single connection doca_rdma_task_send, doca_rdma_task_read, and doca_rdma_task_write tasks execute in submission order. This could mean that if, for example, 32 IO requests arrive in close succession and the corresponding 32 read / write tasks are submitted, once the first operation completes its response is blocked until all of the remaining 31 data transfers complete. This can cause significant delay in responding to the service / initiator and greatly increases the round trip latency of a storage request. Having two connections means in the previous scenario when the first of the N tasks completes its IO response can be sent immediately while the remaining N-1 read / write operations continue to execute.
3. Send a response to the storage service:
  1. Send a create_rdma_connection_response message upon success or an error_response message if anything failed.
Copy

Copied!
```
            
            app.wait_for_and_process_start_storage();
        
```
Wait for the storage service to send a start_storage_request control message and then perform the required actions to fulfill the request:
1. Copy
  
  Copied!
```
            
            doca_error_t result;
doca_ctx_states ctx_state;
 
result = doca_ctx_get_state(doca_rdma_as_ctx(rdma_ctx), &ctx_state);
if (result == DOCA_SUCCESS && ctx_state == DOCA_CTX_STATE_RUNNING) {
    // RDMA connection is ready
}
        
```
  Poll all worker connections until RDMA connections are fully established.
2. Signal all work threads to begin data path operation.
3. Send a response to the storage service:
  1. Send a start_storageresponse message upon success or an error_response message if anything failed.
Data path execution takes place now until either the user abort the program or a stop message is received.
Copy

Copied!
```
            
            app.wait_for_and_process_stop_storage();
        
```
Wait for the storage service to send a stop_storage_request control message and then perform the required actions to fulfill the request:
1. Join all work threads.
2. Send a response to the storage service:
  1. Send a stop_storage_response message upon success or an error_response message if anything failed.
Copy

Copied!
```
            
            app.wait_for_and_process_shutdown();
        
```
Wait for the storage service to send a shutdown_request control message and then perform the required actions to fulfill the request:
1. Collect runtime stats.
2. Destroy workers.
3. Send a response to the storage service:
  1. Send a shutdown_response message upon success or an error_response message if anything failed.

Copy
Copied!

            
            app.display_stats();

Display runtime statistics.

Application destructor
1. Destroy TCP control channel
2. Destroy doca_mmap objects.
3. Destroy IO blocks memory.
4. Close doca_dev.
Application exits.

Worker/Data Path Thread Flow

The work thread proc executes in two phases: Control / configuration phase, followed by data path phase where read and write operations take place.

Worker Init Process

Copy
Copied!

            
            void target_rdma_worker::thread_proc(target_rdma_worker *self, uint16_t core_idx) noexcept

The worker starts by executing a loop of:

Lock mutex
If message pointer is not null:
1. Process the configuration message.
2. Set the operation result.
Unlock the mutex.
Yield.

The following configuration operations can be performed by the worker thread:

Copy

Copied!
```
            
            void target_rdma_worker::create_worker_objects(worker_create_objects_control_command const &cmd)
        
```
Create the objects required by the worker to carry out data path operations:
1. Create IO messages memory.
2. Create IO messages doca_mmap (to allow the messages to be accessed by RDMA)..
3. Create doca_buf_inventory.
4. Create doca_peto drive the DOCA contexts.
5. Create two doca_rdma contexts.
6. Configure and start RDMA contexts.
Copy

Copied!
```
            
            void target_rdma_worker::create_rdma_connection(worker_create_rdma_connection_command &cmd)
        
```
Create an RDMA connection from one of the two RDMA contexts held by the worker (which one to use is specified by the command):
1. Call doca_rdma_connect to create the connection object, and connection blob.
2. Call doca_rdma_connect to start connecting to the remote side of the connection.
3. Store the local connection blob in the command object for use by the control thread to respond to the service.

Copy
Copied!

            
            void target_rdma_worker::are_contexts_ready(worker_are_contexts_ready_control_command &cmd) const noexcept

Check each RDMA context and report when both are both ready for use.

Copy

Copied!
```
            
            void target_rdma_worker::prepare_tasks(void)
        
```
Final preparations for data path execution:
1. Cache frequently used static values into hot data (a single cache line).
2. Allocate transaction objects.
3. Allocate doca_buf objects.
4. Allocatedoca_task objects.
5. Set task user data.
6. Set task pointers into the transaction context.

Copy
Copied!

            
            void target_rdma_worker::start_data_path()

Break out of the wait for configuration event loop and start the data path loop.

Info

After the configuration phase the mutex is not used again

After breaking out of the initial configuration loop, the thread submits receive tasks (RDMA recv tasks) then enters the data path function: run_data_path_ops.

Worker Data Path Process

Copy
Copied!

            
            void target_rdma_worker::run_data_path_ops(target_rdma_worker::hot_data &hot_data)
{
    DOCA_LOG_INFO("Core: %u running", hot_data.core_idx);
 
    while (hot_data.run_flag) {
        doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count);
    }
 
    while (hot_data.error_flag == false && hot_data.in_flight_transaction_count != 0) {
        doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count);
    }
}

During the data path phase the thread simply polls the doca_pe as quickly as possible to check for a task completion from one of the RDMA contexts. The interesting work is done in the callbacks of these tasks. The flow will always start with a RDMA recv task completion. This is the reception of the IO message from the storage service.

doca_comch_consumer_task_post_recv_cb

Copy
Copied!

            
            void target_rdma_worker::doca_rdma_task_receive_cb(doca_rdma_task_receive *task,
                                                   doca_data task_user_data,
                                                   doca_data ctx_user_data) noexcept

This callback is the main driver of the storage target. It carries out the following steps:

Copy
Copied!

            
            auto *const io_message = storage::get_buffer_bytes(doca_rdma_task_receive_get_dst_buf(task));
auto const message_type = storage::io_message_view::get_type(io_message);

Interpret the type of IO request that has been received.

Copy
Copied!

            
            char *const remote_addr = hot_data->remote_memory_start_addr + storage::io_message_view::get_requester_offset(io_message);
 
char *const local_addr = hot_data->local_memory_start_addr + storage::io_message_view::get_storage_offset(io_message);
 
uint32_t const transfer_size = storage::io_message_view::get_io_size(io_message);

Extract IO addresses and transfer size from the IO request.

Copy
Copied!

            
            auto *const transfer_ctx = static_cast<transfer_context *>(task_user_data.ptr);

Get a reference to the transaction object from the task user data.

Copy
Copied!

            
            switch (message_type) {
    case storage::io_message_type::read: {
        oca_buf_inventory_buf_reuse_by_data(transfer_ctx->storage_buf, local_addr, transfer_size);
        doca_buf_inventory_buf_reuse_by_addr(transfer_ctx->host_buf, remote_addr, transfer_size);
 
        doca_task_submit(doca_rdma_task_write_as_task(transfer_ctx->write_task));
    } break;
    case storage::io_message_type::write: {
        doca_buf_inventory_buf_reuse_by_data(transfer_ctx->host_buf, remote_addr, transfer_size);
        doca_buf_inventory_buf_reuse_by_addr(transfer_ctx->storage_buf, local_addr, transfer_size);
 
        doca_task_submit(doca_rdma_task_read_as_task(transfer_ctx->read_task));
    } break;

Submit the appropriate RDMA task.

IO read is moving data from storage IO blocks to the remote IO blocks; so from the storage point of view its a write.
IO write is moving data from the remote IO blocks to the storage IO blocks; so from the storage point of view its a read.

If the any action in this process has failed send an IO error response to the caller, otherwise wait for the RDMA read / write task to complete.

on_transfer_complete

Copy
Copied!

            
            void target_rdma_worker::on_transfer_complete(doca_task *task,
                                              doca_data task_user_data,
                                              doca_data ctx_user_data) noexcept
{
    auto *const hot_data = static_cast<target_rdma_worker::hot_data *>(ctx_user_data.ptr);
    auto *const response_task = static_cast<doca_rdma_task_send *>(task_user_data.ptr);
    auto *const io_message =
        storage::get_buffer_bytes(const_cast<doca_buf *>(doca_rdma_task_send_get_src_buf(response_task)));
 
    ++(hot_data->completed_transaction_count);
 
    storage::io_message_view::set_type(storage::io_message_type::result, io_message);
    storage::io_message_view::set_result(DOCA_SUCCESS, io_message);
 
    auto const ret = doca_task_submit(doca_rdma_task_send_as_task(response_task));
    if (ret != DOCA_SUCCESS) {
        DOCA_LOG_ERR("Failed submit response task: %s", doca_error_get_name(ret));
}

A callback that is re-used for both doca_rdma_task_read and doca_rdma_task_write. The work is the same regardless of which task type completed and more detailed task type info is not required. This callback will simply update the IO message to change its type to a response, and set the result to success. The IO message is then sent back to the storage service.

doca_rdma_task_send_cb

Copy
Copied!

            
            void target_rdma_worker::doca_rdma_task_send_cb(doca_rdma_task_send *task,
                                                doca_data task_user_data,
                                                doca_data ctx_user_data) noexcept
{
    auto *const request_task = static_cast<doca_rdma_task_receive *>(task_user_data.ptr);
 
    doca_buf_reset_data_len(doca_rdma_task_receive_get_dst_buf(request_task));
    auto const ret = doca_task_submit(doca_rdma_task_receive_as_task(request_task));
    if (ret != DOCA_SUCCESS) {
        DOCA_LOG_ERR("Failed re-submit request task: %s", doca_error_get_name(ret));
    }
 
    auto *const hot_data = static_cast<target_rdma_worker::hot_data *>(ctx_user_data.ptr);
    --(hot_data->in_flight_transaction_count);
}

The RDMA task send callback resets the task data and resubmits the recv task allowing the transaction to be reused.

References

/opt/mellanox/doca/applications/storage/

On This Page