DOCA Documentation v2.9.0
DOCA SDK 2.9.0 Download PDF

NVIDIA DOCA Storage Zero Copy Comch to RDMA Application Guide

DOCA Storage Zero Copy Comch to RDMA (comch_to_rdma) is a communications bridge between the doca_storage_zero_copy_initiator_comch (initiator_comch) and the doca_storage_zero_copy_target_rdma (target_rdma). This keeps the initiator_comch insulated from the details of target_rdma.

  1. Comch_to_rdma connects to target_rdma via TCP.

  2. Comch_to_rdma creates a comch server and waits for the initiator_comch to connect.

  3. Comch_to_rdma waits for control messages from the initiator_comch and reacts to them appropriately.

    Info

    Two RDMA connections are made per thread to avoid the large RDMA data transfers interfering with or introducing latency to the smaller IO messages.

comch_to_rdma_system_design-version-4-modificationdate-1730128882550-api-v2.png

DOCA Storage Zero Copy Comch to RDMA executes in three stages:

  1. Preparation.

  2. Data path.

  3. Teardown.

Preparation Stage

During this stage, the application performs the following:

  1. Connects to target_rdma via TCP.

  2. Creates a DOCA Comch server and waits for a client connection.

  3. Waits for a "configure data path" control message from initiator_comch (including buffer count, buffer size, doca_mmap export details).

    1. Create a doca_mmap using the exported details from initiator_comch then re-export it to provide access to target_rdma.

    2. Send a configure data path control message to target_rdma.

    3. Wait for a configure data path control message response with a success status from target_rdma.

    4. Send a configure data path control message response to initiator_comch.

  4. Waits for a "start data path connections" control message from initiator_comch.

    1. Create comch data path objects.

    2. Create N RDMA connections, exchanging connection details with target_rdma.

    3. Relay the start data path connections control message to target_rdma.

    4. Wait for a start data path connections control message response with a success status from target_rdma.

    5. Send a start data path connections control message response to initiator_comch.

  5. Waits for a "start storage" control message from initiator_comch.

    1. Verify that all RDMA and Comch connections are ready to use.

    2. Send a start storage control message to target_rdma.

    3. Wait for a start storage control message response with a success status from target_rdma.

    4. Start data path threads.

    5. Send a start storage control message response to initiator_comch.

comch_to_rdma_preparation_stage-version-2-modificationdate-1730128937360-api-v2.png


Data Path Stage

This stage starts the data path threads. Each thread begins by submitting receive comch and RDMA tasks, then executing a tight loop polling the progress engine (PE) as quickly as possible until a "data path stop" IO message is received. The work of the data path threads is reactive, so is performed in task completion callbacks. As each IO message is received from initiator_comch, it is forwarded to the storage application. Similarily, as each IO message response is received from target_rdma, it is relayed back to initiator_comch.

Teardown Stage

In this stage, the application performs the following:

  1. Wait for a destroy objects control message from initiator_comch.

  2. Send a destroy objects control message to target_rdma.

  3. Wait for a destroy objects control message response from target_rdma.

  4. Destroy data path objects.

  5. Send a destroy objects control message response to initiator_comch.

  6. Destroy control path objects.

This application leverages the following DOCA libraries:

This application is compiled as part of the set of storage zero copy applications. For compilation instructions, refer to NVIDIA DOCA Storage Zero Copy.

Application Execution

Note

This application can only be run on the host.

DOCA Storage Zero Copy Comch to RDMA is provided in source form. Therefore, compilation is required before the application can be executed.

  • Application usage instructions:

    Copy
    Copied!
                

    Usage: doca_storage_zero_copy_comch_to_rdma [DOCA Flags] [Program Flags]   DOCA Flags:   -h, --help                        Print a help synopsis   -v, --version                     Print program version information   -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   -j, --json <path>                 Parse all command flags from an input json file   Program Flags:   -d, --device                      Device identifier   -r, --representor                 Device host side representor identifier   --cpu                             CPU core to which the process affinity can be set   --storage-server                  One or more storage server addresses in <ip_addr>:<port> format   --command-channel-name            Name of the channel used by the doca_comch_server. Default: storage_zero_copy_comch

    Info

    This usage printout can be printed to the command line using the -h (or --help) options:

    Copy
    Copied!
                

    ./doca_storage_zero_copy_comch_to_rdma -h

    For additional information, refer to section "Command Line Flags".

  • CLI example for running the application on the BlueField:

    Copy
    Copied!
                

    ./doca_storage_zero_copy_comch_to_rdma -d 03:00.0 -r 3b:00.0 --storage-server 172.17.0.1:12345 --cpu 12

    Note

    Both the DOCA Comch device PCIe address (03:00.0) and the DOCA Comch device representor PCIe address (3b:00.0) should match the addresses of the desired PCIe devices.

  • The application also supports a JSON-based deployment mode in which all command-line arguments are provided through a JSON file:

    Copy
    Copied!
                

    ./doca_storage_zero_copy_comch_to_rdma --json [json_file]

    For example:

    Copy
    Copied!
                

    ./doca_storage_zero_copy_comch_to_rdma --json doca_storage_zero_copy_comch_to_rdma_params.json

    Note

    Before execution, ensure that the used JSON file contains the correct configuration parameters, and especially the PCIe addresses necessary for the deployment.

Command Line Flags

Flag Type

Short Flag

Long Flag/JSON Key

Description

JSON Content

General flags

h

help

Print a help synopsis

N/A

v

version

Print program version information

N/A

l

log-level

Set the log level for the application:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70 (requires compilation with TRACE log level support)

Copy
Copied!
            

"log-level": 60

N/A

sdk-log-level

Set the log level for the program:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70

Copy
Copied!
            

"sdk-log-level": 40

j

json

Parse all command flags from an input JSON file

N/A

Program flags

d

device

DOCA device identifier. One of:

  • PCIe address: 3b:00.0

  • InfiniBand name: mlx5_0

  • Network interface name: en3f0pf0sf0

Note

This flag is a mandatory.

Copy
Copied!
            

"device": "03:00.0"

r

representor

DOCA Comch device representor PCIe address

Note

This flag is a mandatory.

Copy
Copied!
            

"representor": "3b:00.0"

N/A

--cpu

Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0.

Note

The user can specify this argument multiple times to create more threads.

Note

This flag is a mandatory.

Copy
Copied!
            

"cpu": 6

N/A

--storage-server

IP Address and port to use to establish the control TCP connection to the target.

Note

This flag is a mandatory.

Copy
Copied!
            

"storage-server": "172.17.0.1:12345"

N/A

--command-channel-name

Allows customizing the server name used for this application instance if multiple comch servers exist on the same device.

Copy
Copied!
            

"command-channel-name": "storage_zero_copy_comch"


Troubleshooting

Refer to the NVIDIA DOCA Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications.

Control Thread Flow

  1. Parse application arguments:

    Copy
    Copied!
                

    auto const cfg = parse_cli_args(argc, argv);

    1. Prepare the parser (doca_argp_init).

    2. Register parameters (doca_argp_param_create).

    3. Parse the arguments (doca_argp_start).

    4. Destroy the parser (doca_argp_destroy).

  2. Display the configuration:

    Copy
    Copied!
                

    print_config(cfg);

  3. Create application instance:

    Copy
    Copied!
                

    g_app.reset(storage::zero_copy::make_dpu_application(cfg));

  4. Run the application:

    Copy
    Copied!
                

    g_app->run()

    1. Find and open the specified device:

      Copy
      Copied!
                  

      m_dev = storage::common::open_device(m_cfg.device_id);

    2. Find and open the selected representor:

      Copy
      Copied!
                  

      m_dev_rep = storage::common::open_representor(m_dev, m_cfg.representor_id);

    3. Create control path progress engine:

      Copy
      Copied!
                  

      doca_pe_create(&m_ctrl_pe);

    4. Connect to target_rdma:

      Copy
      Copied!
                  

      connect_storage_server();

      1. Create a TCP socket.

      2. Connect the TCP socket.

    5. Create comch server and wait for comch client to connect:

      Copy
      Copied!
                  

      create_comch_server();   while (m_client_connection == nullptr) { static_cast<void>(doca_pe_progress(m_ctrl_pe));   if (m_abort_flag) return; }

    6. Wait for configure storage control message.

    7. Configure storage:

      Copy
      Copied!
                  

      configure_storage();

      1. Create mmap using the exported details provided by initiator_comch.

      2. Export the mmap to allow RDMA access.

    8. Send "configure storage" control message to target_rdma with re-exported mmap details.

    9. Wait for configure storage control message response from target_rdma.

    10. Send configure storage control message response to initiator_comch.

    11. Wait for "start data path" control message.

    12. Prepare data path:

      Copy
      Copied!
                  

      for (uint32_t ii = 0; ii != m_cfg.cpu_set.size(); ++ii) { prepare_storage_context(ii, msg.correlation_id); }

      1. Create per thread data context:

        1. Create IO messages.

        2. Create progress engine.

        3. Create mmap for IO message buffers.

        4. Create comch producer.

        5. Create comch consumer.

        6. Create RDMA contexts.

        7. Create RDMA connections:

          1. Export RDMA connection details (doca_rdma_export).

          2. Send "create RDMA connection" control message.

          3. Wait for create RDMA connection control message.

          4. Start connection using remote RDMA connection details doca_rdma_connect.

      2. Send data path control message to target_rdma.

      3. Wait for data path control message response from target_rdma.

      4. Send data path control message response to initiator_comch.

    13. Wait for start storage control message.

    14. Verify all connections are ready (comch and RDMA):

      Copy
      Copied!
                  

      wait_for_connections_to_establish();

    15. Send start storage control message to target_rdma.

    16. Create threads:

      Copy
      Copied!
                  

      if (op_type == io_message_type::read) { m_thread_contexts[ii].thread = std::thread{&thread_hot_data::non_validated_test, std::addressof(m_thread_contexts[ii].hot_context)}; } else if (op_type == io_message_type::write) { if (m_cfg.validate_writes) { m_thread_contexts[ii].thread = std::thread{&thread_hot_data::validated_test, std::addressof(m_thread_contexts[ii].hot_context)}; } else { m_thread_contexts[ii].thread = std::thread{&thread_hot_data::non_validated_test, std::addressof(m_thread_contexts[ii].hot_context)}; } }

    17. Wait for "start storage" control message response from target_rdma.

    18. Start data path threads.

    19. Send start storage control message response to initiator_comch.

    20. Run all threads until completion.

    21. Wait for "destroy objects" control message.

    22. Send destroy objects control message to target_rdma.

    23. Wait for destroy objects control message response from target_rdma.

    24. Destroy data path objects.

    25. Send destroy objects control message response to initiator_comch.

  5. Display stats:

    Copy
    Copied!
                

    printf("+================================================+\n"); printf("| Stats\n"); printf("+================================================+\n"); for (uint32_t ii = 0; ii != stats.size(); ++ii) { printf("| Thread[%u]\n", ii); auto const pe_hit_rate_pct = (static_cast<double>(stats[ii].pe_hit_count) / (static_cast<double>(stats[ii].pe_hit_count) + static_cast<double>(stats[ii].pe_miss_count))) * 100.; printf("| PE hit rate: %2.03lf%% (%lu:%lu)\n", pe_hit_rate_pct, stats[ii].pe_hit_count, stats[ii].pe_miss_count);   printf("+------------------------------------------------+\n"); } printf("+================================================+\n");

  6. Destroy control path objects.

Performance Data Path Thread Flow

The data path involves polling the PE as quickly as possible; to receive IO messages from either initiator_comch or target_rdma.

  1. Run until initiator_comch sends a stop IO message:

    Copy
    Copied!
                

    while (hot_data->running_flag) { doca_pe_progress(pe) ? ++(hot_data->pe_hit_count) : ++(hot_data->pe_miss_count); }

  2. Handle IO message from initiator_comch:

    Copy
    Copied!
                

    auto *const hot_data = static_cast<thread_hot_data *>(ctx_user_data.ptr); ... doca_task_submit(static_cast<doca_task *>(task_user_data.ptr));

  3. Handle IO message from target_rdma:

    Copy
    Copied!
                

    auto *const hot_data = static_cast<thread_hot_data *>(ctx_user_data.ptr); doca_error_t ret;   auto *const io_message = storage::common::get_buffer_bytes(doca_rdma_task_receive_get_dst_buf(task));   if (io_message_view::get_type(io_message) != io_message_type::stop) { io_message_view::set_type(io_message_type::result, io_message); io_message_view::set_result(DOCA_SUCCESS, io_message); } else { hot_data->app_impl->stop_all_threads(); }   do { ret = doca_task_submit(static_cast<doca_task *>(task_user_data.ptr)); } while (ret == DOCA_ERROR_AGAIN);

  • /opt/mellanox/doca/applications/storage/

© Copyright 2024, NVIDIA. Last updated on Nov 19, 2024.