DOCA Storage Target RDMA Application Guide
The doca_storage_target_rdma application provides a simple, volatile memory-backed RDMA storage target. It is designed to interact with a DOCA storage service, offering direct read and write access to a dedicated memory region using RDMA.
The doca_storage_target_rdma application performs the following core functions:
Exposes a memory-backed storage region for use by the storage service client.
Handles RDMA-based I/O operations (reads and writes) using the DOCA RDMA library.
To perform these tasks, the application acts as a TCP server, waiting for an incoming connection from a storage service initiator. Once connected, it handles control and data path interactions as in this page.
The application is divided into two main functional areas:
Control-Time and Shared Resources – Includes TCP server setup, memory registration, and RDMA connection handling.
Per-Thread Data Path Resources – Includes thread-local RDMA resources and task management structures.
The application executes in two main phases:
Control Phase
Data Path Phase
Control Phase
This phase begins when a connection is established with a storage service over TCP. The application then processes a sequence of control commands:
Query Storage
Reports the size and layout of the exposed storage region.
Init Storage
Validates the requested number of worker threads.
Allocates and registers local memory for RDMA operations.
Imports remote memory handles provided by the initiator.
Creates internal worker objects for task execution.
Wait for RDMA Connection Requests
Waits for one RDMA connection per requested core/thread.
Ensures each RDMA session is fully established before proceeding.
Start Storage
Waits for all RDMA connections to be ready.
Submits initial tasks to prepare the data path phase.
Launches the data path threads.
Once the Start Storage command is received and all threads are active, the application transitions to the data path phase. The main thread remains active, waiting for final control commands:
Stop Storage
Terminates active data threads cleanly.
Shutdown
Performs cleanup and resource deallocation, shutting down the application.
Data Path Phase
Each data path thread performs I/O processing independently based on client requests. The typical per-thread flow is:
Receive I/O Request
Handle incoming requests from the initiator via RDMA.
Perform RDMA Operation
Depending on the request type, either:
Execute an RDMA read from the local memory region.
Execute an RDMA write to the local memory region.
Send I/O Response
Return a response back to the initiator, indicating success or failure of the operation.
This application leverages the following DOCA libraries:
This application is compiled as part of the set of storage applications. For compilation instructions, refer to the DOCA Storage page.
Application Execution
DOCA Storage Target RDMA is provided in source form. Therefore, compilation is required before the application can be executed.
Application usage instructions:
Usage: doca_storage_target_rdma [DOCA Flags] [Program Flags] DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse command line flags from an input json file Program Flags: -d, --device Device identifier --cpu CPU core to which the process affinity can be set --listen-port TCP listen port number --binary-content Path to binary.sbc file containing the initial content to be represented by this storage instance --block-count Number of available storage blocks. (Ignored when using content binary file) Default: 128 --block-size Block size used by the storage. (Ignored when using content binary file) Default: 4096
InfoThis usage printout can be printed to the command line using the
-h(or--help) options:./doca_storage_target_rdma -h
For additional information, refer to section "Command-line Flags".
CLI example for running the application:
./doca_storage_target_rdma -d 3b:00.0 --listen-port 12345 --block-size 4096 --block-count 64 --cpu 0
NoteThe user DOCA device PCIe address (
3b:00.0) should match the addresses of the desired PCIe device.
Command-line Flags
Flag Type | Short Flag | Long Flag | Description |
General flags |
|
| Print a help synopsis |
|
| Print program version information | |
|
| Set the log level for the application:
| |
N/A |
| Set the log level for the program:
| |
|
| Parse command line flags from an input JSON file as well as from the CLI (if provided) | |
Program flags |
|
| DOCA device identifier. One of:
Note
This flag is a mandatory. |
N/A |
| Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0. Note
The user can specify this argument multiple times to create more threads.
Note
This flag is a mandatory. | |
N/A |
| Port to listen upon for incomming TCP connections Note
This flag is a mandatory. | |
N/A |
| Path to a file to be used to provide initial content to the storage instance. | |
N/A |
| Number of storage blocks to provide | |
N/A |
| Size of each storage block |
A user should provide one of:
--binary-content: Where the file is a.sbc fileThe sbc file provides storage dimmensions and data to populate the blocks
OR (Random / uninitialisaed bytes with a user defined dimmension)
--block-count--block-size
OR (Initialised bytes with a user defined dimmension)
--block-count--block-size--binary-content: Where the file is plain content to be distributed across the storage and its size == block count * block size
Troubleshooting
Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications.
The flow of the application is broken down into key functions / steps:
target_rdma_app app{parse_target_rdma_app_cli_args(argc, argv)};
storage::install_ctrl_c_handler([&app]() {
app.abort("User requested abort");
});
app.wait_for_client_connection();
app.wait_for_and_process_query_storage();
app.wait_for_and_process_init_storage();
app.wait_for_and_process_create_rdma_connections();
app.wait_for_and_process_start_storage();
app.wait_for_and_process_stop_storage();
app.wait_for_and_process_shutdown();
app.display_stats();
Main/Control Thread Flow
-
initiator_comch_app app{parse_cli_args(argc, argv)};
Parse CLI arguments and use these to create the application instance. Initial resources are also created at this stage:
-
Open a doca_dev as specified by the CLI argument: -d or
--device. -
m_storage_block_count = m_cfg.block_count; m_storage_block_size = m_cfg.block_size; auto
constpage_size = storage::get_system_page_size(); m_local_io_region_size = uint64_t{m_storage_block_count} * m_storage_block_size; m_local_io_region =static_cast<uint8_t *>(storage::aligned_alloc(page_size, m_local_io_region_size));Allocate storage IO blocks memory.
-
if(!m_cfg.content.empty()) { std::copy(std::begin(m_cfg.content), std::end(m_cfg.content), m_local_io_region); }Copy user defined content into the storage IO blocks memory if it was provided.
-
m_control_channel = storage::control::make_tcp_server_control_channel(m_cfg.listen_port);
Create a TCP server control channel using the listen port specified by the CLI argument
--listen-port. (Control channel objects provide a unified API so that a TCP client, TCP server,doca_comch_client, anddoca_comch_serverall have a consistent API).InfoSee
storage_common/control_channel.hppfor more information about the control channel abstraction.
-
-
storage::install_ctrl_c_handler([&app]() { app.
abort("User requested abort"); });Set a signal handler for control+c keyboard inputs so the app can shutdown gracefully.
-
app.wait_for_client_connection();
Wait for a TCP client (the storage service) to connect.
-
app.wait_for_and_process_query_storage();
Wait for the storage service to send a
query_storage_requestcontrol message and then perform the required actions to fulfill the request:Populate a
query_storage_responsewith the storage capacity (block count * block size) and the block size.Send response to the storage service.
-
app.wait_for_and_process_init_storage();
Wait for the storage service to send a
init_storage_requestcontrol message and then perform the required actions to fulfill the request:Use the
init_storage_payloaddata to:Set core count (
m_core_count) as the number of cores requested by the initiator (number of--cpuarguments provided to the initiator) OR fail if this is more than the number of--cpuarguments provided to the service.Set number of transactions per core (
m_num_transactions).Create
doca_mmapobject from the provided export blob.Create
m_core_countworker threads.Perform the first stages of the worker threads initialization. These steps are carried out for each thread, but only one thread performs the steps at any time this simplifies the sending and receiving of control messages, the user could modify this flow to execute in parallel if they so desired.
Create thread bound to the Nth CPU provided to the service via the
--cpuCLI arguments.-
m_workers[ii].execute_control_command( worker_create_objects_control_command { m_dev, m_transaction_count, m_remote_io_mmap, m_local_io_mmap} );
Initialize thread context (asychronously).
Send a response to the storage service:
Send an
init_storage_responsemessage upon success or anerror_responsemessage if anything failed.
-
app.wait_for_and_process_create_rdma_connections();
Wait for the storage service to send a
create_rdma_connection_requestcontrol message and then perform the required actions to fulfill the request:-
voidconst*blob = nullptr;size_tblob_size = 0; doca_rdma_export(rdma_ctx, &blob, &blob_size, &rdma_conn);Create local side of the RDMA connection by exporting RDMA context and store the created connection blob.
-
doca_rdma_connect(rdma_ctx, cmd.import_blob.data(), cmd.import_blob.size(), rdma_conn);
Connect to remote RDMA side of the RDMA connection.
InfoThis RDMA connection activity will be performed twice per worker thread. This allows for the IO control messages to be transferred over one connection while the other is used for data transfer. The reason for this is to reduce latency as with a single connection
doca_rdma_task_send,doca_rdma_task_read, anddoca_rdma_task_writetasks execute in submission order. This could mean that if, for example, 32 IO requests arrive in close succession and the corresponding 32 read / write tasks are submitted, once the first operation completes its response is blocked until all of the remaining 31 data transfers complete. This can cause significant delay in responding to the service / initiator and greatly increases the round trip latency of a storage request. Having two connections means in the previous scenario when the first of the N tasks completes its IO response can be sent immediately while the remaining N-1 read / write operations continue to execute. Send a response to the storage service:
Send a
create_rdma_connection_responsemessage upon success or anerror_responsemessage if anything failed.
-
-
app.wait_for_and_process_start_storage();
Wait for the storage service to send a
start_storage_requestcontrol message and then perform the required actions to fulfill the request:-
doca_error_t result; doca_ctx_states ctx_state; result = doca_ctx_get_state(doca_rdma_as_ctx(rdma_ctx), &ctx_state);
if(result == DOCA_SUCCESS && ctx_state == DOCA_CTX_STATE_RUNNING) {// RDMA connection is ready}Poll all worker connections until RDMA connections are fully established.
Signal all work threads to begin data path operation.
Send a response to the storage service:
Send a
start_storageresponsemessage upon success or anerror_responsemessage if anything failed.
-
Data path execution takes place now until either the user abort the program or a stop message is received.
-
app.wait_for_and_process_stop_storage();
Wait for the storage service to send a
stop_storage_requestcontrol message and then perform the required actions to fulfill the request:Join all work threads.
Send a response to the storage service:
Send a
stop_storage_responsemessage upon success or anerror_responsemessage if anything failed.
-
app.wait_for_and_process_shutdown();
Wait for the storage service to send a
shutdown_requestcontrol message and then perform the required actions to fulfill the request:Collect runtime stats.
Destroy workers.
Send a response to the storage service:
Send a
shutdown_responsemessage upon success or anerror_responsemessage if anything failed.
-
app.display_stats();
Display runtime statistics.
Application destructor
Destroy TCP control channel
Destroy
doca_mmapobjects.Destroy IO blocks memory.
Close
doca_dev.
Application exits.
Worker/Data Path Thread Flow
The work thread proc executes in two phases: Control / configuration phase, followed by data path phase where read and write operations take place.
Worker Init Process
void target_rdma_worker::thread_proc(target_rdma_worker *self, uint16_t core_idx) noexcept
The worker starts by executing a loop of:
Lock mutex
If message pointer is not null:
Process the configuration message.
Set the operation result.
Unlock the mutex.
Yield.
The following configuration operations can be performed by the worker thread:
-
voidtarget_rdma_worker::create_worker_objects(worker_create_objects_control_commandconst&cmd)Create the objects required by the worker to carry out data path operations:
Create IO messages memory.
Create IO messages
doca_mmap(to allow the messages to be accessed by RDMA)..Create
doca_buf_inventory.Create
doca_peto drive the DOCA contexts.Create two
doca_rdmacontexts.Configure and start RDMA contexts.
-
voidtarget_rdma_worker::create_rdma_connection(worker_create_rdma_connection_command &cmd)Create an RDMA connection from one of the two RDMA contexts held by the worker (which one to use is specified by the command):
Call
doca_rdma_connectto create the connection object, and connection blob.Call
doca_rdma_connectto start connecting to the remote side of the connection.Store the local connection blob in the command object for use by the control thread to respond to the service.
-
voidtarget_rdma_worker::are_contexts_ready(worker_are_contexts_ready_control_command &cmd)constnoexceptCheck each RDMA context and report when both are both ready for use.
-
voidtarget_rdma_worker::prepare_tasks(void)Final preparations for data path execution:
Cache frequently used static values into hot data (a single cache line).
Allocate transaction objects.
Allocate
doca_bufobjects.Allocate
doca_taskobjects.Set task user data.
Set task pointers into the transaction context.
-
voidtarget_rdma_worker::start_data_path()Break out of the wait for configuration event loop and start the data path loop.
After the configuration phase the mutex is not used again
After breaking out of the initial configuration loop, the thread submits receive tasks (RDMA recv tasks) then enters the data path function: run_data_path_ops.
Worker Data Path Process
void target_rdma_worker::run_data_path_ops(target_rdma_worker::hot_data &hot_data)
{
DOCA_LOG_INFO("Core: %u running", hot_data.core_idx);
while (hot_data.run_flag) {
doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count);
}
while (hot_data.error_flag == false && hot_data.in_flight_transaction_count != 0) {
doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count);
}
}
During the data path phase the thread simply polls the doca_pe as quickly as possible to check for a task completion from one of the RDMA contexts. The interesting work is done in the callbacks of these tasks. The flow will always start with a RDMA recv task completion. This is the reception of the IO message from the storage service.
doca_comch_consumer_task_post_recv_cb
void target_rdma_worker::doca_rdma_task_receive_cb(doca_rdma_task_receive *task,
doca_data task_user_data,
doca_data ctx_user_data) noexcept
This callback is the main driver of the storage target. It carries out the following steps:
-
auto *
constio_message = storage::get_buffer_bytes(doca_rdma_task_receive_get_dst_buf(task)); autoconstmessage_type = storage::io_message_view::get_type(io_message);Interpret the type of IO request that has been received.
-
char*constremote_addr = hot_data->remote_memory_start_addr + storage::io_message_view::get_requester_offset(io_message);char*constlocal_addr = hot_data->local_memory_start_addr + storage::io_message_view::get_storage_offset(io_message); uint32_tconsttransfer_size = storage::io_message_view::get_io_size(io_message);Extract IO addresses and transfer size from the IO request.
-
auto *
consttransfer_ctx =static_cast<transfer_context *>(task_user_data.ptr);Get a reference to the transaction object from the task user data.
-
switch(message_type) {casestorage::io_message_type::read: { oca_buf_inventory_buf_reuse_by_data(transfer_ctx->storage_buf, local_addr, transfer_size); doca_buf_inventory_buf_reuse_by_addr(transfer_ctx->host_buf, remote_addr, transfer_size); doca_task_submit(doca_rdma_task_write_as_task(transfer_ctx->write_task)); }break;casestorage::io_message_type::write: { doca_buf_inventory_buf_reuse_by_data(transfer_ctx->host_buf, remote_addr, transfer_size); doca_buf_inventory_buf_reuse_by_addr(transfer_ctx->storage_buf, local_addr, transfer_size); doca_task_submit(doca_rdma_task_read_as_task(transfer_ctx->read_task)); }break;Submit the appropriate RDMA task.
IO read is moving data from storage IO blocks to the remote IO blocks; so from the storage point of view its a write.
IO write is moving data from the remote IO blocks to the storage IO blocks; so from the storage point of view its a read.
If the any action in this process has failed send an IO error response to the caller, otherwise wait for the RDMA read / write task to complete.
on_transfer_complete
void target_rdma_worker::on_transfer_complete(doca_task *task,
doca_data task_user_data,
doca_data ctx_user_data) noexcept
{
auto *const hot_data = static_cast<target_rdma_worker::hot_data *>(ctx_user_data.ptr);
auto *const response_task = static_cast<doca_rdma_task_send *>(task_user_data.ptr);
auto *const io_message =
storage::get_buffer_bytes(const_cast<doca_buf *>(doca_rdma_task_send_get_src_buf(response_task)));
++(hot_data->completed_transaction_count);
storage::io_message_view::set_type(storage::io_message_type::result, io_message);
storage::io_message_view::set_result(DOCA_SUCCESS, io_message);
auto const ret = doca_task_submit(doca_rdma_task_send_as_task(response_task));
if (ret != DOCA_SUCCESS) {
DOCA_LOG_ERR("Failed submit response task: %s", doca_error_get_name(ret));
}
A callback that is re-used for both doca_rdma_task_read and doca_rdma_task_write. The work is the same regardless of which task type completed and more detailed task type info is not required. This callback will simply update the IO message to change its type to a response, and set the result to success. The IO message is then sent back to the storage service.
doca_rdma_task_send_cb
void target_rdma_worker::doca_rdma_task_send_cb(doca_rdma_task_send *task,
doca_data task_user_data,
doca_data ctx_user_data) noexcept
{
auto *const request_task = static_cast<doca_rdma_task_receive *>(task_user_data.ptr);
doca_buf_reset_data_len(doca_rdma_task_receive_get_dst_buf(request_task));
auto const ret = doca_task_submit(doca_rdma_task_receive_as_task(request_task));
if (ret != DOCA_SUCCESS) {
DOCA_LOG_ERR("Failed re-submit request task: %s", doca_error_get_name(ret));
}
auto *const hot_data = static_cast<target_rdma_worker::hot_data *>(ctx_user_data.ptr);
--(hot_data->in_flight_transaction_count);
}
The RDMA task send callback resets the task data and resubmits the recv task allowing the transaction to be reused.
/opt/mellanox/doca/applications/storage/