DOCA Documentation v3.2.0

DOCA Storage Comch to RDMA Zero Copy Application Guide

The doca_storage_comch_to_rdma_zero_copy application serves as a bridge between the initiator and a single storage target. It's only role in the data path is to forward the io requests and io responses between the initiator and storage target.

The doca_storage_comch_to_rdma_zero_copy application performs the following functions:

  • Relay of io requests from the initiator to the storage target

  • Relay of io responses from the storage target to the initiator

To achieve this it expects to be able to connect to a storage target using TCP connections and will then listen for an incoming connection from a single initiator using doca_comch_server.

The doca_storage_comch_to_rdma_zero_copy application is split into to two functional areas:

  • Control time and shared resources

  • Per thread data path resources

zero_copy_-_objects-version-1-modificationdate-1761208216313-api-v2.png

The flow of the application similarity executes in two main phases:

  • Control phase

  • Data path phase

Control Phase

The state starts by connecting to the storage target, then waiting for a client connection. Once all connections are established the application waits for the appropriate control commands:

  • Query storage

  • Init storage

  • Start storage

Processing each control command follows a similar pattern of:

  • Relay the command to the storage target

  • Wait for the storage target to respond

  • Do the required post processing and consistency checks on the storage responses

  • Respond to the client

The start storage control command will kick off the data path phase. Data threads will begin executing while the main thread proceeds to wait for the final control messages to complete the application lifecycle:

  • Stop storage

  • Shutdown

Data Path Phase

This phase happens per thread and involves each thread performing the requested IO operations requested by the client. Read and write requests are simply forwarded to the storage target, no actual processing is carried out by the data threads.

Read Data Flow

The regular read flow consists of the stages detailed in the following subsections.

1. Initiator Request

  1. The initiator sends an I/O request to the zero copy application.

  2. The zero copy application forwards the request verbatim to the storage target

zero_copy_-_read_01_-_IO_request-version-1-modificationdate-1761208215993-api-v2.png

2. RDMA Transfer

  1. The storage target performs a RDMA write operation

zero_copy_-_read_02_-_RDMA-version-1-modificationdate-1761208215677-api-v2.png

3. Target Response

  1. The zero copy application receives a response from the storage target

  2. The zero copy application forwards the request verbatim to the initiator

zero_copy_-_read_03_-_IO_response-version-1-modificationdate-1761208215370-api-v2.png

Write Data Flow

1. Initiator Request

  1. The initiator sends an I/O request to the zero copy application.

  2. The zero copy application forwards the request verbatim to the storage target

zero_copy_-_read_01_-_IO_request-version-1-modificationdate-1761208215993-api-v2.png

2. RDMA Transfer

The storage target performs a RDMA read operation.

zero_copy_-_write_02_-_RDMA-version-1-modificationdate-1761208215110-api-v2.png

3. Target Response

  1. The zero copy application receives a response from the storage target

  2. The zero copy application forwards the request verbatim to the initiator

zero_copy_-_write_03_-_IO_Response-version-1-modificationdate-1761208214777-api-v2.png

This application leverages the following DOCA libraries:

This application is compiled as part of the set of storage applications. For compilation instructions, refer to the DOCA Storage page.

Application Execution

Warning

This application can only run within the NVIDIA® BlueField® DPU.

Info

DOCA Storage Comch to RDMA Zero Copy is provided in source form. Therefore, compilation is required before the application can be executed.

  • Application usage instructions:

    Copy
    Copied!
                

    Usage: doca_storage_comch_to_rdma_zero_copy [DOCA Flags] [Program Flags]   DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>  -j, --json <path>                 Parse command line flags from an input json file   Program Flags: -d, --device Device identifier -r, --representor Device host side representor identifier --cpu CPU core to which the process affinity can be set --storage-server Storage server addresses in <ip_addr>:<port> format --command-channel-name Name of the channel used by the doca_comch_client. Default: "doca_storage_comch" --control-timeout Time (in seconds) to wait while performing control operations. Default: 5

    Info

    This usage printout can be printed to the command line using the -h (or --help) options:

    Copy
    Copied!
                

    ./doca_storage_comch_to_rdma_zero_copy -h

    For additional information, refer to section "Command-line Flags".

  • CLI example for running the application on the BlueField:

    Copy
    Copied!
                

    ./doca_storage_comch_to_rdma_zero_copy -d 03:00.0 -r 3b:00.0 --storage-server 172.17.0.1:12345 --cpu 0

    Note

    Both the DOCA Comch device PCIe address (03:00.0) and the DOCA Comch device representor PCIe address (3b:00.0) should match the addresses of the desired PCIe devices.

    Note

    Storage target IP address:port tuples should be updated to refer to the running storage target applications.

Command-line Flags

Flag Type

Short Flag

Long Flag

Description

General flags

h

help

Print a help synopsis

v

version

Print program version information

l

log-level

Set the log level for the application:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70 (requires compilation with TRACE log level support)

N/A

sdk-log-level

Set the log level for the program:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70

j

json

Parse command line flags from an input JSON file as well as from the CLI (if provided)

Program flags

d

device

DOCA device identifier. One of:

  • PCIe address: 3b:00.0

  • InfiniBand name: mlx5_0

  • Network interface name: en3f0pf0sf0

Note

This flag is a mandatory.

r

representor

DOCA Comch device representor PCIe address

Note

This flag is a mandatory.

N/A

--cpu

Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0.

Note

The user can specify this argument multiple times to create more threads.

Note

This flag is a mandatory.

N/A

--storage-server

IP address and port to use to establish the control TCP connection to the target.

Note

This flag is a mandatory.

N/A

--command-channel-name

Allows customizing the server name used for this application instance if multiple comch servers exist on the same device.

N/A

--control-timeout

Time, in seconds, to wait while performing control operations


Troubleshooting

Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications.

The flow of the application is broken down into key functions / steps:

Copy
Copied!
            

zero_copy_app app{parse_cli_args(argc, argv)};   storage::install_ctrl_c_handler([&app]() { app.abort("User requested abort"); });   app.connect_to_storage(); app.wait_for_comch_client_connection(); app.wait_for_and_process_query_storage(); app.wait_for_and_process_init_storage(); app.wait_for_and_process_start_storage(); app.wait_for_and_process_stop_storage(); app.wait_for_and_process_shutdown(); app.display_stats();

Main/Control Thread Flow

  1. Copy
    Copied!
                

    zero_copy_app app{parse_cli_args(argc, argv)};

    Parse CLI arguments and use these to create the application instance. Initial resources are also created at this stage:

    1. Copy
      Copied!
                  

      DOCA_LOG_INFO("Open doca_dev: %s", m_cfg.device_id.c_str()); m_dev = storage::open_device(m_cfg.device_id);

      Open a doca_dev as specified by the CLI argument: -d or --device

    2. Copy
      Copied!
                  

      DOCA_LOG_INFO("Open doca_dev_rep: %s", m_cfg.representor_id.c_str()); m_dev_rep = storage::open_representor(m_dev, m_cfg.representor_id);

      Open a doca_dev_rep as specified by the CLI argument: -r or --representor

    3. Copy
      Copied!
                  

      m_storage_control_channel = storage::control::make_tcp_client_control_channel(m_cfg.storage_server_address);

      Create TCP client control channels (Control channel objects provide a unified API so that a TCP client, TCP server, doca_comch_client, and doca_comch_server all have a consistent API)

      Info

      See storage_common/control_channel.hpp for more information about the control channel abstraction.

    4. Copy
      Copied!
                  

      m_client_control_channel = storage::control::make_comch_server_control_channel(m_dev, m_dev_rep, m_cfg.command_channel_name.c_str(), this, new_comch_consumer_callback, expired_comch_consumer_callback);

      Create a Comch server control channel (Containing adoca_comch_server instance) using the device, representor and channel name as specified by the CLI argument --command-channel-name or the default value if none was specified.

  2. Copy
    Copied!
                

    storage::install_ctrl_c_handler([&app]() { app.abort("User requested abort"); });

    Set a signal handler for Ctrl+c keyboard inputs so the app can shutdown gracefully.

  3. Copy
    Copied!
                

    app.connect_to_storage();

    Connect to the TCP server hosted by the storage target as defined by the CLI argument: --storage-server

    1. Copy
      Copied!
                  

      void zero_copy_app::connect_to_storage(void) { while (!m_storage_control_channel->is_connected()) { std::this_thread::sleep_for(std::chrono::milliseconds{100}); if (m_abort_flag) { throw storage::runtime_error{DOCA_ERROR_CONNECTION_ABORTED, "Aborted while connecting to storage"}; } } }

      Poll the storage target control channel until either it connects, or the user aborts the application.

  4. Copy
    Copied!
                

    app.wait_for_comch_client_connection();

    Wait for a doca_comch_client to connect.

    1. Copy
      Copied!
                  

      void zero_copy_app::wait_for_comch_client_connection(void) { while (!m_client_control_channel->is_connected()) { std::this_thread::sleep_for(std::chrono::milliseconds{100}); if (m_abort_flag) { throw storage::runtime_error{DOCA_ERROR_CONNECTION_ABORTED, "Aborted while connecting to client"}; } } }

      Poll the Comch server control channel until a doca_comch_client has connected, or the user aborts the application. If any further Comch client attempts to connect to the server it will be automatically rejected by the control channel which is designed for a 1:1 relationship between clients and servers. A sleep is placed in this loop as it may take the user / operator a few seconds to start the client so there is no gain to polling any faster.

  5. Copy
    Copied!
                

    app.wait_for_and_process_query_storage();

    Wait for the initiator to send a query_storage_request control message and then perform the required actions to fulfill the request:

    1. Forward the query storage request to the storage target.

    2. Wait for storage target to respond.

    3. Send a response to the initiator:

      1. Send a start_storage_response message upon success or an error_response message if anything failed

  6. Copy
    Copied!
                

    app.wait_for_and_process_init_storage();

    Wait for the initiator to send a init_storage_request control message and then perform the required actions to fulfill the request:

    1. use the init_storage_payload data to:

      1. Set core count (m_core_count) as the number of cores requested by the initiator (number of --cpu arguments provided to the initiator) OR fail if this is more than the number of --cpu arguments provided to the service.

      2. Set number of transactions per core (m_transaction_count) to: the number of transactions requested by the initiator doubled. This is doubled to allow for batched task submission and avoid race conditions where the initiator can see a response to a transaction and try to re-submit it before the associated Comch producer event callback is received by the server meaning the initiator will continually re-try to send the task and degrade performance until the service catches up and re-submits the consumer task. This should be uncommon, but to make sure it can never happen double the transaction count is allocated so even if every single transaction on the initiator hit this issue there is a full second set of transactions ready on the service side to receive the tasks and avoid any contention. A user can experiment with reducing this value to save memory if desired).

      3. Import then re-export initiator IO blocks mmap, this allows the storage target to read / write directly to / from the initiator memory.

      4. Send init storage request to storage target using:

        1. The service transaction count (double the initiator value).

        2. The initiator core count.

        3. The re-exported IO blocks mmap.

      5. Send a response to the initiator:

        1. Send a init_storage_response message upon success or an error_response message if anything failed

      6. Perform the first stages of the worker threads initialization. These steps are carried out for each thread, but only one thread performs the steps at any time this simplifies the sending and receiving of control messages, the user could modify this flow to execute in parallell if they so desired.

        1. Create thread bound to the Nth CPU provided to the service via the --cpu CLI arguments

        2. Copy
          Copied!
                      

          m_workers[ii].execute_control_command( worker_create_objects_control_command { m_dev, m_client_control_channel->get_comch_connection(), m_transaction_count} );

          Initialize thread context (asychronously)

        3. Copy
          Copied!
                      

          connect_rdma(ii, storage::control::rdma_connection_role::io_data, cid);

          Create RDMA data connections (asynchronously) The thread will connect to the storage target and create a RDMA context which will be idle from the service's point of view, but is used by the storage target to perform RDMA read / write operations. See the DOCA Storage Target RDMA Application Guide for an explanation why there are two RDMA contexts per thread

        4. Copy
          Copied!
                      

          connect_rdma(ii, storage::control::rdma_connection_role::io_control, cid);

          Create RDMA data connections (asynchronously) The thread will connect to the storage target and create a RDMA context which will be used to exchange IO requests and responses using RDMA send/recv tasks. See the DOCA Storage Target RDMA Application Guide for an explanation why there are two RDMA contexts per thread.

  7. Copy
    Copied!
                

    app.wait_for_and_process_start_storage();

    Wait for the initiator to send a start_storage_request control message and then perform the required actions to fulfill the request:

    1. Forward the start storage request to the storage target.

    2. Wait for storage target to respond.

    3. Signal all work threads to begin data path operation.

    4. Send a response to the initiator:

      1. Send a start_storage_response message upon success or an error_response message if anything failed.

  8. Data path execution takes place now until either the user abort the program or a stop message is received.

  9. Copy
    Copied!
                

    app.wait_for_and_process_stop_storage();

    Wait for the initiator to send a stop_storage_request control message and then perform the required actions to fulfill the request:

    1. Forward the stop storage request the storage target.

    2. Wait for storage target to respond.

    3. Signal all work threads to stop data path operation.

    4. Collect run time stats.

    5. Send a response to the initiator:

      1. Send a stop_storage_response message upon success or an error_response message if anything failed.

  10. Copy
    Copied!
                

    app.wait_for_and_process_shutdown();

    Wait for the initiator to send a shutdown_request control message and then perform the required actions to fulfill the request:

    1. Forward the shutdown request the storage target.

    2. Wait for storage target to respond.

    3. Destroy worker thread objects.

    4. Send a response to the initiator:

      1. Send a stop_storage_response message upon success or an error_response message if anything failed.

  11. Copy
    Copied!
                

    app.display_stats();

    Display runtime statistics.

  12. Application destructor is triggered:

    1. Destroy control channels.

    2. Destroy initiator IO blocks doca_mmap.

    3. Close doca_dev_rep.

    4. Close doca_dev.

  13. Program exits.

Worker/Data Path Thread Flow

The work thread proc executes in two phases: Control / configuration phase, followed by data path phase where read, write, and recovery operations take place.

Worker Init Process

Copy
Copied!
            

void zero_copy_app_worker::thread_proc(zero_copy_app_worker *self, uint16_t core_idx) noexcept

The worker starts by executing a loop of:

  1. Lock mutex.

  2. If message pointer is not null:

    1. Process the configuration message.

    2. Set the operation result.

  3. Unlock the mutex.

  4. Yield.

The following configuration operations can be performed by the worker thread:

  1. Copy
    Copied!
                

    void zero_copy_app_worker::create_worker_objects(worker_create_objects_control_command const &cmd)

    Create general worker objects:

    1. Create IO message memory.

    2. Create IO message mmap (to allow the messages to be accessed by DOCA comch and DOCA RDMA).

    3. Allocate doca buffer inventory.

    4. Create doca_pe to drive the DOCA contexts (doca_rdma , doca_comch_consumer, doca_comch_producer).

    5. Create doca_comch_consumerand doca_comch_producercontexts.

    6. Initialize and start contexts.

  2. Copy
    Copied!
                

    void zero_copy_app_worker::export_local_rdma_connection_blob(worker_export_local_rdma_connection_command &cmd)

    Export RDMA context connection binary blob.

  3. Copy
    Copied!
                

    void zero_copy_app_worker::import_remote_rdma_connection_blob(worker_import_local_rdma_connection_command const &cmd)

    Import remote RDMA context connection binary blob.

  4. Copy
    Copied!
                

    void zero_copy_app_worker::are_contexts_ready(worker_are_contexts_ready_control_command &cmd) const noexcept

    Poll all contexts to check they are ready to perform data path operations.

  5. Copy
    Copied!
                

    void zero_copy_app_worker::prepare_tasks(worker_prepare_tasks_control_command const &cmd)

    Prepare transaction contexts by:

    1. Allocating doca_buf objects.

    2. Allocating doca_task objects.

    3. Setting task user data.

  6. Copy
    Copied!
                

    void zero_copy_app_worker::start_data_path(void)

    Break out of the wait for configuration event loop and start the data path loop.

Info

After the configuration phase the mutex is not used again

After breaking out of the initial configuration loop, the thread submits receive tasks (comch consumer tasks, and RDMA recv tasks) then enters the data path function: run_data_path_ops.

Worker Data Path Process

Copy
Copied!
            

void zero_copy_app_worker::run_data_path_ops(zero_copy_app_worker::hot_data &hot_data) { DOCA_LOG_INFO("Core: %u running", hot_data.core_idx);   while (hot_data.run_flag) { doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count); }   while (hot_data.error_flag == false && hot_data.in_flight_transaction_count != 0) { doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count); }   DOCA_LOG_INFO("Core: %u complete", hot_data.core_idx); }

During the data path phase the thread simply polls the doca_pe as quickly as possible to check for a task completion from one of the thread DOCA contexts (doca_rdma , doca_comch_consumer, doca_comch_producer) The interesting work is done in the callbacks of these tasks. The flow will always start with a consumer task completion. This is the reception of the IO message from the initiator. For the zero copy use-case the callbacks simply forward the IO requests:

  • Comch consumer → RDMA send

  • RDMA recv → Comch producer

  • /opt/mellanox/doca/applications/storage/

© Copyright 2025, NVIDIA. Last updated on Nov 20, 2025