DOCA Documentation v3.2.0

DOCA Storage Initiator ComCh Application Guide

The doca_storage_initiator_comch application functions as a client to the DOCA storage service. It benchmarks the performance of data transfers over the ComCh interface, providing detailed measurements of throughput, bandwidth, and latency.

The doca_storage_initiator_comch application performs the following key tasks:

  • Initiates a sequence of I/O requests to exercise the DOCA storage service.

  • Measures and reports storage performance, including:

    • Millions of I/O operations per second (IOPS)

    • Effective data bandwidth

    • I/O operation latency:

      • Minimum latency

      • Maximum latency

      • Mean latency

The application establishes a connection to the storage service running on an NVIDIA® BlueField® platform using the doca_comch_client interface.

The application is logically divided into two functional areas:

  • Control-time and shared resources

  • Per-thread data path resources

initiator_comch_-_objects-version-1-modificationdate-1761221305317-api-v2.png

The execution consists of two main phases:

  • Control Phase

  • Data Path Phase

Control Phase

The control phase is responsible for establishing and managing the storage session lifecycle. It performs the following steps:

  1. Query storage – Retrieve storage service metadata.

  2. Init storage – Initialize the storage window and prepare for data transfers.

  3. Start storage – Begin the data path phase and launch data threads.

Once the data phase is initiated, the main thread waits for test completion. Afterward, it issues final cleanup commands:

  1. Stop storage – Stop data movement.

  2. Shutdown – Tear down the session and release resources.

Data Path Phase

The selected --execution-strategy is used to pre-initialise a set of per task flags. These flags are used to decide which action the task should take next. This allows the task to perform any combination of; read, write, memory set, memory clear, and validate. The order of exectuion of each action is hard coded for simplicity. The order of actions is:

  1. Initialise transaction:

    1. Set actions bitmask to match --execution-strategy.

    2. Get a local memory address from the pool (of initiator IO blocks assigned to this thread).

    3. Get a remote address from the pool (of storage IO blocks assigned to this thread).

  2. Set memory to initial value:

    1. Copy content from the data provided by --storage-plain-content at the address defined by the storage IO block and place it in the memory defined by initiator IO block.

  3. Start an IO write:

    1. Send an IO write operation request.

    2. Wait for the IO response.

  4. Clear current memory content:

    1. Clear the Initiator memory defined by initiator IO block.

  5. Start an IO read:

    1. Send an IO write operation request.

    2. Wait for the IO response.

  6. Validate memory content:

    1. Compare the content of the initiator IO block against the expected value in --storage-plain-content.

  7. Finish transaction:

    1. Calculate and record stats.

    2. Return initiator IO block and storage IO block to the pool.

    3. End the test if the --run-limit-operation-count has been reached.

    4. Recycle the transaction (go back to step 1).

This application leverages the following DOCA libraries:

This application is compiled as part of the set of storage applications. For compilation instructions, refer to the DOCA Storage page.

Application Execution

Warning

This application can only run from the host.

Info

DOCA Storage Initiator Comch is provided in source form. Therefore, compilation is required before the application can be executed.

  • Application usage instructions:

    Copy
    Copied!
                

    Usage: doca_storage_initiator_comch [DOCA Flags] [Program Flags]   DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>  -j, --json <path>                 Parse command line flags from an input json file   Program Flags: -d, --device Device identifier --cpu CPU core to which the process affinity can be set --storage-plain-content File containing the plain data that is represented by the storage --execution-strategy Define what to run. One of: read_throughput_test | write_throughput_test | read_write_data_validity_test | read_only_data_validity_test --run-limit-operation-count Run N operations (per thread) then stop. Default: 1000000 --transaction-count               Number of concurrent IO transactions (per thread) to use. Default: 64   --command-channel-name Name of the channel used by the doca_comch_client. Default: "doca_storage_comch" --control-timeout Time (in seconds) to wait while performing control operations. Default: 5 --local-io-region-size            Size in bytes of the local region. Value can be 0 meaning automatically size to the same size as the storage target. Default: 0 --blocks-per-io                   Use multiple blocks per IO request. Default: 1

    Info

    This usage printout can be printed to the command line using the -h (or --help) options:

    Copy
    Copied!
                

    ./doca_storage_initiator_comch -h

    For additional information, refer to section "Command-line Flags".

  • CLI example for running the application on the BlueField:

    Copy
    Copied!
                

    ./doca_storage_initiator_comch -d 3b:00.0 --execution-strategy read_throughput_test --run-limit-operation-count 1000000 --cpu 0

    Note

    Both the DOCA Comch device PCIe address (3b:00.0) should match the addresses of the desired PCIe devices.

Command-line Flags

Flag Type

Short Flag

Long Flag

Description

General flags

h

help

Print a help synopsis

v

version

Print program version information

l

log-level

Set the log level for the application:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70 (requires compilation with TRACE log level support)

N/A

sdk-log-level

Set the log level for the program:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70

j

json

Parse command line flags from an input JSON file as well as from the CLI (if provided)

Program flags

d

device

DOCA device identifier. One of:

  • PCIe address: 3b:00.0

  • InfiniBand name: mlx5_0

  • Network interface name: en3f0pf0sf0

Note

This flag is a mandatory.

N/A

--execution-strategy

The data path routine to run. One of:

  • read_throughput_test

  • write_throughput_test

  • read_only_data_validity_test

  • read_write_data_validity_test

Note

This flag is a mandatory.

N/A

--cpu

Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0.

Note

The user can specify this argument multiple times to create more threads.

Note

This flag is a mandatory.

N/A

--storage-plain-content

Expected plain content that is expected to be read from storage during a read_only_data_validity_test

N/A

--run-limit-operation-count

Number of IO operations to perform when performing a throughput test

N/A

--transaction-count

Number of parallel tasks per thread to use

N/A

--command-channel-name

Allows customizing the server name used for this application instance if multiple comch servers exist on the same device.

N/A

--control-timeout

Time, in seconds, to wait while performing control operations

N/A

--local-io-region-size

Set the size of the local memory region size, allows for asymetric memory setups where the local application using the storage engine has more or less memory thatn the storage target.

N/A

--blocks-per-io

Allow for testing the case where multiple sequential data blocks are read or written in a single IO operation.


Troubleshooting

Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications.

The flow of the application is broken down into key functions / steps:

Copy
Copied!
            

initiator_comch_app app{parse_cli_args(argc, argv)};   storage::install_ctrl_c_handler([&app]() { app.abort("User requested abort"); });   app.connect_to_storage_service(); app.query_storage(); app.init_storage(); app.prepare_threads(); app.start_storage(); auto const run_success = app.run(); app.join_threads(); app.stop_storage();   if (run_success) { app.display_stats(); } else { exit_value = EXIT_FAILURE; fprintf(stderr, "+================================================+\n"); fprintf(stderr, "| Test failed!!\n"); fprintf(stderr, "+================================================+\n"); }   app.shutdown();

Main/Control Thread Flow

  1. Copy
    Copied!
                

    initiator_comch_app app{parse_cli_args(argc, argv)};

    Parse CLI arguments and use these to create the application instance. Initial resources are also created at this stage:

    1. Copy
      Copied!
                  

      DOCA_LOG_INFO("Open doca_dev: %s", m_cfg.device_id.c_str()); m_dev = storage::open_device(m_cfg.device_id);

      Open a doca_dev as specified by the CLI argument: -d or --device

    2. Copy
      Copied!
                  

      m_service_control_channel = storage::control::make_comch_client_control_channel(m_dev, m_cfg.command_channel_name.c_str(), this, new_comch_consumer_callback, expired_comch_consumer_callback);

      Create a comch client control channel (Containing adoca_comch_client instance) using the device and channel name as specified by the CLI argument --command-channel-name or the default value if none was specified. (Control channel objects provide a unified API so that a TCP client, TCP server, doca_comch_client, and doca_comch_server all have a consistent API)

      Info

      See storage_common/control_channel.hpp for more information about the control channel abstraction.

  2. Copy
    Copied!
                

    storage::install_ctrl_c_handler([&app]() { app.abort("User requested abort"); });

    Set a signal handler for control+c keyboard inputs so the app can shutdown gracefully.

  3. Copy
    Copied!
                

    app.connect_to_storage_service();

    Connect to the doca_comch_server in the service.

  4. Copy
    Copied!
                

    app.query_storage();

    Query the storage details:

    1. Send a query_storage_request to the service.

    2. Wait for the service to respond.

    3. Set the iop size (m_effective_block_size) based on the reported storage block size and the --blocks-per-io CLI argument.

    4. If performing a read_write_data_validity_test create storage for the "expected content" (if --storage-plain-contentCLI argument was not provided).

    5. Verify that the size of the "expected content" matches the size of the reported storage capacity.

  5. Copy
    Copied!
                

    app.init_storage();

    Prepare to use the storage:

    1. Allocate IO block memory.

    2. Create IO block doca_mmap.

    3. Export IO block doca_mmap.

    4. Send init_storage_request to the service.

    5. Wait for the service to respond.

    6. Create worker thread objects.

    7. Configure the ops actions mask based on the --execution-strategy CLI argument

    8. Partition local and storage IO blocks so that each worker gets a unique region of the IO blocks. This avoids race conditions when performing data validation. In a real use-case each thread could be allowed to access any region of the memory delegating the responsibility for the neccessary concurrency control to a higher level.

    9. Perform the first stages of the worker threads initialization. These steps are carried out for each thread, but only one thread performs the steps at any time this simplifies the sending and receiving of control messages, the user could modify this flow to execute in parallell if they so desired.

      1. Create thread bound to the Nth CPU provided to the service via the --cpu CLI arguments

      2. Init the worker thread:

        Info

        Init actions are performed by the thread itself after its core affinity has been set to improve memory allocation efficiency as the memory allocator is NUMA aware so allocating from the core that will be using the memory allows the allocator to pick memory that is local / closest to that core

        1. Create local IO address pool.

        2. Create remote IO address pool.

        3. Allocation IO messages memory.

        4. Allocate N transaction contexts.

        5. Create IO messages doca_mmap.

        6. Create doca_buf_inventory.

        7. Create doca_pe.

        8. Create doca_comch_consumer.

        9. Create doca_comch_producer.

    10. Wait for all threads to be ready:

      1. Consumer and producer contexts are connected.

  6. Copy
    Copied!
                

    app.prepare_threads();

    Prepare thread for data path operations:

    1. Prepare doca_buf objects.

    2. Prepare doca_task objects.

    3. Initialise IO message content.

  7. Copy
    Copied!
                

    app.start_storage();

    Start the storage service:

    1. Send a start_storage_request to the service.

    2. Wait for the service to respond.

  8. Copy
    Copied!
                

    app.run();

    Run the initiator data path:

    1. Signal all worker threads to end configuration phase and begin data path phase.

    2. Data path execution takes place now until either the user abort the program or a stop message is received.

    3. Infinitly loop; polling workers utill all workers have finished performing data path operations.

    4. Collect workers exit codes.

    5. If all workers completed successfully collect runtime stats.

  9. Copy
    Copied!
                

    app.join_threads();

    Join the worker threads.

  10. Copy
    Copied!
                

    app.stop_storage();

    Stop the storage service:

    1. Send a stop_storage_request to the service.

    2. Wait for the service to respond.

  11. Copy
    Copied!
                

    if (run_success) { app.display_stats(); } else { exit_value = EXIT_FAILURE; fprintf(stderr, "+================================================+\n"); fprintf(stderr, "| Test failed!!\n"); fprintf(stderr, "+================================================+\n"); }

    If the worker threads all succeeded (encountered no errors while running data path operations) display stats, else display an error banner

  12. Copy
    Copied!
                

    app.shutdown();

    Destroy resources:

    1. Destroy worker objects.

    2. Send shutdown_request to the service.

    3. Wait for storage consumers to complete de-registration (Don't tear down the comch_clientcomch_server yet as they use that to de-register)

  13. Application destructor is triggered:

    1. IO doca_mmap is destroyed

    2. IO blocks memory is destroyed

    3. Comch client control channel is destroyed (doca_comch_client is destroyed)

    4. Close doca_dev

  14. Application exits

  • /opt/mellanox/doca/applications/storage/

© Copyright 2025, NVIDIA. Last updated on Nov 20, 2025