DOCA Storage Initiator ComCh Application Guide

Introduction

The doca_storage_initiator_comch application is a client to the storage service and performs a benchmark of the performance in the process.

System Design

The doca_storage_initiator_comch application performs the following key functions:

Initiates a series of IO requests to exercise the storage service
Measures performance
- Millions of IO operations per second
- Effective IO data bandwidth
- IO operation latency
  - Min
  - Max
  - Mean

To accomplish these tasks, the application establishes a connection to the service running on the NVIDIA® BlueField® platform using doca_comch_client.

Application architecture

The doca_storage_initiator_comch application is divided into two main functional areas:

Control-time and shared resources
Per-thread data path resources

initiator_comch_-_objects-version-1-modificationdate-1744726845063-api-v2.png

The application execution follows two primary phases:

Control phase
Data path phase

Control Phase

This phase begins with establishing connections to the storage service. Once connected it starts issuing the comtrol commands to prepare the use-case and begin data transfers:

Query storage
Init storage
Start storage

Issuing the start storage command initiates the data path phase. While the data threads begin execution, the main thread waits for the test execution to complete before finallt closing down the session by issuing the final control commands:

Stop storage
Shutdown

Data Path Phase

There are three data path routines that can be executed depending on mode:

Throughput test (read or write)
Read only data validity test
Write then read data validitiy test

Throughput test

In this mode tasks are submitted for the first them then a tight loop polling the progress engine is executed until the target operation count has been reached. As each task is submitted its assigned a memory location in the storage window to use; this location is then advanced ready for the next task to use, or if the end of storage is reached reset back to the start. In this way the memory is processed sequentially in a round robin fashion.

Read only data validity test

In this mode a copy of the expected storage memory is held by the initiator. tasks are submitted in th same was as with a throughput test but with the change being that once each segment of storage has been read the test is completed. As each task completes it compares the memory read from storage matches the data held in the relevant section of expected data file.

Write then read data validity test

This mode works start by writing a pattern accross the entire storage memory until each part of the storage memory has been "set". Following this the steps to perfom a read only data validity test are carried out but instead of using the content of a file loaded from disk to compare the read data to, the memory pattern that was generated to be written to the storage is compared against.

DOCA Libraries

This application leverages the following DOCA libraries:

DOCA Comch

Compiling the Application

This application is compiled as part of the set of storage applications. For compilation instructions, refer to the DOCA Storage Applications page.

Running the Application

Application Execution

Warning

This application can only run from the host.

Info

DOCA Storage Initiator Comch is provided in source form. Therefore, compilation is required before the application can be executed.

Application usage instructions:

Copy
Copied!

            
            Usage: doca_storage_initiator_comch [DOCA Flags] [Program Flags]
 
DOCA Flags:
  -h, --help                        Print a help synopsis
  -v, --version                     Print program version information
  -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  -j, --json <path>                 Parse all command flags from an input json file
 
Program Flags:
  -d, --device                      Device identifier
  --cpu                             CPU core to which the process affinity can be set
  --storage-plain-content           File containing the plain data that is represented by the storage
  --execution-strategy              Define what to run. One of: read_throughput_test | write_throughput_test | read_write_data_validity_test | read_only_data_validity_test
  --run-limit-operation-count       Run N operations (per thread) then stop. Default: 1000000
  --task-count                      Number of concurrent tasks (per thread) to use. Default: 64
  --command-channel-name            Name of the channel used by the doca_comch_client. Default: "doca_storage_comch"
  --control-timeout                 Time (in seconds) to wait while performing control operations. Default: 5
  --batch-size                      Batch size: Default: 4

Info

This usage printout can be printed to the command line using the -h (or --help) options:

Copy
Copied!

            
            ./doca_storage_initiator_comch -h

For additional information, refer to section "Command-line Flags".

CLI example for running the application on the BlueField:

Copy
Copied!

            
            ./doca_storage_initiator_comch -d 3b:00.0 --execution-strategy read_throughput_test --run-limit-operation-count 1000000 --cpu 0

Note

Both the DOCA Comch device PCIe address (3b:00.0) should match the addresses of the desired PCIe devices.

The application also supports a JSON-based deployment mode in which all command-line arguments are provided through a JSON file:
Copy

Copied!
```
            
            ./doca_storage_initiator_comch --json [json_file]
        
```
For example:
Copy

Copied!
```
            
            ./doca_storage_initiator_comch --json doca_storage_initiator_comch_params.json
        
```
Note

Before execution, ensure that the JSON file contains valid configuration parameters, particularly the correct PCIe device addresses required for deployment.

Command-line Flags

Flag Type	Short Flag	Long Flag/JSON Key	Description	JSON Content
General flags	`h`	`help`	Print a help synopsis	N/A
	`v`	`version`	Print program version information	N/A
	`l`	`log-level`	Set the log level for the application: DISABLE=10 CRITICAL=20 ERROR=30 WARNING=40 INFO=50 DEBUG=60 TRACE=70 (requires compilation with `TRACE` log level support)	Copy Copied! `"log-level": 60`
	N/A	`sdk-log-level`	Set the log level for the program: DISABLE=10 CRITICAL=20 ERROR=30 WARNING=40 INFO=50 DEBUG=60 TRACE=70	Copy Copied! `"sdk-log-level": 40`
	`j`	`json`	Parse all command flags from an input JSON file	N/A
Program flags	`d`	`device`	DOCA device identifier. One of: PCIe address: `3b:00.0` InfiniBand name: `mlx5_0` Network interface name: `en3f0pf0sf0` Note This flag is a mandatory.	Copy Copied! `"device": "3b:00.0"`
	N/A	`--execution-strategy`	The data path routine to run. One of: `read_throughput_test` `write_throughput_test` `read_only_data_validity_test` `read_write_data_validity_test` Note This flag is a mandatory.	Copy Copied! `"execution-strategy": "read_throughput_test"`
	N/A	`--cpu`	Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0. Note The user can specify this argument multiple times to create more threads. Note This flag is a mandatory.	Copy Copied! `"cpu": 6`
	N/A	`--storage-plain-content`	Expected plain content that is expected to be read from storage during a `read_only_data_validity_test`	Copy Copied! `"storage-plain-content": "expected_data.txt"`
	N/A	`--run-limit-operation-count`	Number of IO operations to perform when performing a throughput test	Copy Copied! `"run-limit-operation-count": 1000000`
	N/A	`--task-count`	Number of parallel tasks per thread to use	Copy Copied! `"task-count": 64`
	N/A	`--command-channel-name`	Allows customizing the server name used for this application instance if multiple comch servers exist on the same device.	Copy Copied! `"command-channel-name": "doca_storage_comch"`
	N/A	`--control-timeout`	Time, in seconds, to wait while performing control operations	Copy Copied! `"control-timeout": 5`
	N/A	`--batch-size`	Set how many tasks should be submitted before triggering the hardware to start processing them.	Copy Copied! `"batch-size": 4`

Copy
Copied!

            
            initiator_comch_app app{parse_cli_args(argc, argv)};

Parse CLI arguments, apply default values, and create the application instance.

Copy
Copied!

            
            app.connect_to_storage_service();

Connect to the storage service via doca_comch_client.

Copy
Copied!

            
            app.query_storage();

Query the storage service so thats its capacity and block size can be known.

Copy

Copied!
```
            
            app.init_storage();
        
```
Prepare the storage service:
1. Allocate memory resources.
2. Send init storage control message.
3. Create per thread resources including comch_consumer and comch_producer.

Copy
Copied!

            
            app.prepare_threads();

Create thread objects and allocate per thread tasks

Copy

Copied!
```
            
            app.start_storage();
        
```
Send start storage control message, wait for response then start data path threads
Copy

Copied!
```
            
            app.run();
        
```
Run the data path:
1. Start threads.
2. Wait for threads to complete.
3. Collect and post-process statistics.

Copy
Copied!

            
            app.join_threads();

Join work threads.

Copy
Copied!

            
            app.stop_storage();

Send stop storage control message

Copy
Copied!

            
            if (run_success) {
		app.display_stats();
} else {
	exit_value = EXIT_FAILURE;
	fprintf(stderr, "+================================================+\n");
	fprintf(stderr, "| Test failed!!\n");
	fprintf(stderr, "+================================================+\n");
}

Display execution result, or a failure banner if something wen't wrong during the data path

Copy

Copied!
```
            
            app.shutdown();
        
```
Send the shutdown control command to trigger storage service and storage targets to shutdown and release resources

Read/Write Throughput: data path

Copy
Copied!

            
            while (hot_data.run_flag == false) {
	std::this_thread::yield();
	if (hot_data.error_flag)
		return;
}

Wait for run flag to be set

Copy
Copied!

            
            auto const initial_task_count = std::min(hot_data.transactions_size, hot_data.remaining_tx_ops);
for (uint32_t ii = 0; ii != initial_task_count; ++ii)
	hot_data.start_transaction(hot_data.transactions[ii], std::chrono::steady_clock::now());

Submit initial tasks

Copy
Copied!

            
            while (hot_data.run_flag) {
	doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count);
}

Run until the test is complete

Read only data validity: data path

Copy
Copied!

            
            while (hot_data.run_flag == false) {
	std::this_thread::yield();
	if (hot_data.error_flag)
		return;
}

Wait for run flag to be set

Copy
Copied!

            
            read_and_validate_storage_memory(hot_data, hot_data.storage_plain_content)

Invoke the read and verify routine

Copy
Copied!

            
            hot_data.remaining_tx_ops = hot_data.remaining_rx_ops = io_region_size / hot_data.io_block_size;

Set the expected op count to be 1 op per block

Copy
Copied!

            
            auto const initial_task_count = std::min(hot_data.transactions_size, hot_data.remaining_tx_ops);
for (uint32_t ii = 0; ii != initial_task_count; ++ii) {
	char *io_request;
	doca_buf_get_data(doca_comch_producer_task_send_get_buf(hot_data.transactions[ii].request), (void**)&io_request));
	storage::io_message_view::set_type(storage::io_message_type::read, io_request);
 
	hot_data.start_transaction(hot_data.transactions[ii], std::chrono::steady_clock::now());
}

Force all io_requests into read mode then start executing them

Copy
Copied!

            
            while (hot_data.remaining_rx_ops != 0) {
	doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count);
}

Run until the test is complete

Copy
Copied!

            
            for (size_t offset = 0; offset != io_region_size; ++offset) {
	if (hot_data.io_region_begin[offset] != expected_memory_content[offset]) {
		DOCA_LOG_ERR("Data mismatch @ position %zu: %02x != %02x", offset, hot_data.io_region_begin[offset], expected_memory_content[offset]);
		hot_data.error_flag = true;
		break;
	}
}

Validate memory content

Read write data validity: data path

Copy
Copied!

            
            while (hot_data.run_flag == false) {
	std::this_thread::yield();
	if (hot_data.error_flag)
		return;
}

Wait for run flag to be set

Copy
Copied!

            
            size_t const io_region_size = hot_data.io_region_end - hot_data.io_region_begin;
std::vector<uint8_t> write_data;
write_data.resize(io_region_size);
for (size_t ii = 0; ii != io_region_size; ++ii) {
	write_data[ii] = static_cast<uint8_t>(ii);
}

Prepare new memory content

Copy
Copied!

            
            write_storage_memory(hot_data, write_data.data());

Invoke write memory routine

Copy
Copied!

            
            hot_data.remaining_tx_ops = hot_data.remaining_rx_ops = io_region_size / hot_data.io_block_size;

Set the expected op count to be 1 op per block

Copy
Copied!

            
            hot_data.io_addr = hot_data.io_region_begin;
std::copy(expected_memory_content, expected_memory_content + io_region_size, hot_data.io_region_begin);

Place data to write to storage service into local memory

Copy
Copied!

            
            auto const initial_task_count = std::min(hot_data.transactions_size, hot_data.remaining_tx_ops);
for (uint32_t ii = 0; ii != initial_task_count; ++ii) {
	char *io_request;
	doca_buf_get_data(doca_comch_producer_task_send_get_buf(hot_data.transactions[ii].request), (void**)&io_request));
	storage::io_message_view::set_type(storage::io_message_type::write, io_request);
 
	hot_data.start_transaction(hot_data.transactions[ii], std::chrono::steady_clock::now());
}

Force all io_requests into write mode then start executing them

Copy
Copied!

            
            while (hot_data.remaining_rx_ops != 0) {
	doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count);
}

Execute write operations

Copy
Copied!

            
            read_and_validate_storage_memory(hot_data, write_data.data())

Invoke the read and verify routine

Copy
Copied!

            
            hot_data.remaining_tx_ops = hot_data.remaining_rx_ops = io_region_size / hot_data.io_block_size;

Set the expected op count to be 1 op per block

Copy
Copied!

            
            auto const initial_task_count = std::min(hot_data.transactions_size, hot_data.remaining_tx_ops);
for (uint32_t ii = 0; ii != initial_task_count; ++ii) {
	char *io_request;
	doca_buf_get_data(doca_comch_producer_task_send_get_buf(hot_data.transactions[ii].request), (void**)&io_request));
	storage::io_message_view::set_type(storage::io_message_type::read, io_request);
 
	hot_data.start_transaction(hot_data.transactions[ii], std::chrono::steady_clock::now());
}

Force all io_requests into read mode then start executing them

Copy
Copied!

            
            while (hot_data.remaining_rx_ops != 0) {
	doca_pe_progress(hot_data.pe) ? ++(hot_data.pe_hit_count) : ++(hot_data.pe_miss_count);
}

Run until the test is complete

Copy
Copied!

            
            for (size_t offset = 0; offset != io_region_size; ++offset) {
	if (hot_data.io_region_begin[offset] != expected_memory_content[offset]) {
		DOCA_LOG_ERR("Data mismatch @ position %zu: %02x != %02x", offset, hot_data.io_region_begin[offset], expected_memory_content[offset]);
		hot_data.error_flag = true;
		break;
	}
}

Validate memory content

References

/opt/mellanox/doca/applications/storage/

On This Page

DOCA Storage Initiator ComCh Application Guide

Introduction

System Design

Application architecture

Control Phase

Data Path Phase

Throughput test

Read only data validity test

Write then read data validity test

DOCA Libraries

Compiling the Application

Running the Application

Application Execution

Command-line Flags

Troubleshooting

Application Code Flow

Control Phase

Read/Write Throughput: data path

Read only data validity: data path

Read write data validity: data path

References