DOCA Documentation v2.9.0
DOCA SDK 2.9.0 Download PDF

NVIDIA DOCA Storage Zero Copy Target RDMA Application Guide

DOCA Storage Zero Copy Target RDMA (target_rdma) acts as a mock storage service, preparing an area of memory equal in size to the block created by the doca_storage_zero_copy_initiator_comch (initiator_comch). This application waits for IO messages from the doca_storage_zero_copy_comch_to_rdma (comch_to_rdma) and performs the necessary RDMA read or write operations to fulfill the initiators' read or write request (i.e., RDMA write for a read IO message, DMA read for a write IO message).

DOCA Storage Zero Copy Target RDMA uses a TCP socket for out-of-band control messages, then uses two DOCA RDMA connections:

  • One for the data path to receive and reply to IO messages; and

  • Another to perform the RDMA read and write operations which actually move data to or from the memory created by initiator_comch

target_system_design-version-2-modificationdate-1730152085270-api-v2.png

DOCA Storage Zero Copy Target RDMA executes in three stages:

  1. Preparation.

  2. Data path.

  3. Teardown.

Preparation Stage

During this stage the application performs the following:

  1. Creates a TCP server socket.

  2. Waits for comch_to_rdma to connect.

  3. Waits for a configure data path control message (buffer count, buffer size, doca_mmap export details) from comch_to_rdma.

    1. Imports the received doca_mmap.

    2. Create a local memory region.

    3. Creates a local doca_mmap.

    4. Creates a doca_buf_inventory.

    5. Sends a configure data path control message response to comch_to_rdma.

  4. Waits for N "create RDMA connection" control messages from comch_to_rdma.

    1. Creates the RDMA context.

    2. Exports the connection details.

    3. Starts connecting using the provided remote connection details.

    4. Sends a create RDMA connection control message response to comch_to_rdma.

  5. Waits for a "start data path connections" control message from comch_to_rdma.

    1. Verifies that all RDMA connections are ready to use.

    2. Sends a start data path connections control message response to comch_to_rdma.

  6. Waits for a start storage control message from comch_to_rdma.

    1. Starts data path threads.

    2. Sends a start storage control message response to comch_to_rdma.

target_preparation_stage-version-2-modificationdate-1730152155900-api-v2.png


Data Path Stage

In this stage, the data path threads start. Each thread begins by submitting receive RDMA tasks then executing a tight loop and polling the progress engine (PE) as quickly as possible until a "data path stop" IO message is received.

The process of handling an IO message involves the following steps:

  1. Determine memory locations to be used for decoding the IO message.

  2. Submit a RDMA read/RDMA write operation.

  3. Upon completion of the RDMA read/write, send a response IO message to BlueField.

  4. Resubmit the RDMA receive task.

Teardown Stage

In this stage the application performs the following:

  1. Waits for a destroy objects control message from.

  2. Destroys data path objects.

  3. Sends a destroy objects control message response to comch_to_rdma.

  4. Destroys control path objects.

This application leverages the following DOCA libraries:

This application is compiled as part of the set of storage zero copy applications. For compilation instructions, refer to NVIDIA DOCA Storage Zero Copy.

Application Execution

DOCA Storage Zero Copy Comch to RDMA is provided in source form. Therefore, a compilation is required before the application can be executed.

  • Application usage instructions:

    Copy
    Copied!
                

    Usage: doca_storage_zero_copy_target_rdma [DOCA Flags] [Program Flags]   DOCA Flags:   -h, --help                        Print a help synopsis   -v, --version                     Print program version information   -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   -j, --json <path>                 Parse all command flags from an input json file   Program Flags:   -d, --device                      Device identifier   -r, --representor                 Device host side representor identifier   --listen-port                     TCP Port on which to listen for incoming connections   --cpu                             CPU core to which the process affinity can be set

    Info

    This usage printout can be printed to the command line using the -h (or --help) options:

    Copy
    Copied!
                

    ./ doca_storage_zero_copy_target_rdma -h

    For additional information, refer to section "Command Line Flags".

  • CLI example for running the application on the BlueField:

    Copy
    Copied!
                

    ./doca_storage_zero_copy_target_rdma -d 03:00.0 --listen-port 12345 --cpu 12

    Info

    The DOCA device PCIe address, 3b:00.0, should match the address of the desired PCIe device.

  • The application also supports a JSON-based deployment mode, in which all command-line arguments are provided through a JSON file:

    Copy
    Copied!
                

    ./doca_storage_zero_copy_target_rdma --json [json_file]

    For example:

    Copy
    Copied!
                

    ./doca_storage_zero_copy_target_rdma --json doca_storage_zero_copy_comch_to_rdma_params.json

    Note

    Before execution, ensure that the used JSON file contains the correct configuration parameters, and especially the PCIe addresses necessary for the deployment.

Command Line Flags

Flag Type

Short Flag

Long Flag/JSON Key

Description

JSON Content

General flags

h

help

Print a help synopsis

N/A

v

version

Print program version information

N/A

l

log-level

Set the log level for the application:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70 (requires compilation with TRACE log level support)

Copy
Copied!
            

"log-level": 60

N/A

sdk-log-level

Set the log level for the program:

  • DISABLE=10

  • CRITICAL=20

  • ERROR=30

  • WARNING=40

  • INFO=50

  • DEBUG=60

  • TRACE=70

Copy
Copied!
            

"sdk-log-level": 40

j

json

Parse all command flags from an input JSON file

N/A

Program flags

d

device

DOCA device identifier. One of:

  • PCIe address – 3b:00.0

  • InfiniBand name – mlx5_0

  • Network interface name – en3f0pf0sf0

Note

This is a mandatory flag.

Copy
Copied!
            

"device": "03:00.0"

N/A

--listen-port

TCP port on which to listen for incoming connections

Note

This is a mandatory flag.

Copy
Copied!
            

"lister-port": 12345

N/A

--cpu

Index of CPU to use. One data path thread is spawned per CPU. Index starts at 0.

Note

The user can specify this argument multiple times to create more threads.

Note

This is a mandatory flag.

Copy
Copied!
            

"cpu": 6


Troubleshooting

Refer to the NVIDIA DOCA Troubleshooting Guide for any issue encountered with the installation or execution of the DOCA applications.

Control Thread Flow

  1. Parse application arguments:

    Copy
    Copied!
                

    auto const cfg = parse_cli_args(argc, argv);

    1. Prepare the parser (doca_argp_init).

    2. Register parameters (doca_argp_param_create).

    3. Parse the arguments (doca_argp_start).

    4. Destroy the parser (doca_argp_destroy).

  2. Display the configuration:

    Copy
    Copied!
                

    print_config(cfg);

  3. Create application instance:

    Copy
    Copied!
                

    g_app.reset(storage::zero_copy::make_storage_application(cfg));

  4. Run the application:

    Copy
    Copied!
                

    g_app->run()

    1. Find and open the specified device:

      Copy
      Copied!
                  

      m_dev = storage::common::open_device(m_cfg.device_id);

    2. Start the TCP server and wait for comch_to_rdma to connect:

      Copy
      Copied!
                  

      start_listening(); wait_for_tcp_client();

    3. Wait for a "configure storage" control message from comch_to_rdma.

    4. Configure storage:

      Copy
      Copied!
                  

      configure_storage(configuration);

      1. Create thread contexts:

        1. Create transaction contexts.

        2. Create IO messages.

        3. Create PE.

        4. Create mmap for IO message buffers.

    5. Send configure storage control message response to comch_to_rdma.

    6. Wait for N "create RDMA connection" control messages from comch_to_rdma:

      1. Create RDMA context.

      2. Export connection details.

      3. Start connection using received remote connection details.

      4. Send a "create RDMA connection" control message response (containing RDMA connection details from target_rdma RDMA context) to comch_to_rdma.

    7. Wait for "start data path" control message from comch_to_rdma:

      1. Verify all connections are ready (comch and RDMA):

        Copy
        Copied!
                    

        establish_rdma_connections();

    8. Send a "start storage" control message response to comch_to_rdma.

    9. Wait for start storage control message from comch_to_rdma:

      1. Create data path threads.

      2. Start data path threads.

    10. Send a "start storage" control message response to comch_to_rdma.

    11. Run all threads until completion.

    12. Wait for "destroy objects" control message.

    13. Destroy data path objects.

    14. Send destroy objects control message response to BlueField.

  5. Display stats:

    Copy
    Copied!
                

    printf("+================================================+\n"); printf("| Stats\n"); printf("+================================================+\n"); for (uint32_t ii = 0; ii != stats.size(); ++ii) { printf("| Thread[%u]\n", ii); auto const pe_hit_rate_pct = (static_cast<double>(stats[ii].pe_hit_count) / (static_cast<double>(stats[ii].pe_hit_count) + static_cast<double>(stats[ii].pe_miss_count))) * 100.; printf("| PE hit rate: %2.03lf%% (%lu:%lu)\n", pe_hit_rate_pct, stats[ii].pe_hit_count, stats[ii].pe_miss_count);   printf("+------------------------------------------------+\n"); } printf("+================================================+\n");

  6. Destroy control path objects.

Performance Data Path Thread Flow

The data path involves polling the PE as quickly as possible to receive IO messages from BlueField.

  1. Run until BlueField sends a stop IO message:

    Copy
    Copied!
                

    while (hot_data->running_flag) { doca_pe_progress(pe) ? ++(hot_data->pe_hit_count) : ++(hot_data->pe_miss_count); }

  2. Handle BlueField IO message:

    1. Calculate memory addresses to use for local and remote memory.

    2. Set buffer addresses and sizes into source and destination buffers into RDMA task.

    3. Start RDMA read/write task.

    4. Upon completion of RDMA task respond to BlueField.

    5. Re-submit RDMA recv task.

  • /opt/mellanox/doca/applications/storage/

© Copyright 2024, NVIDIA. Last updated on Nov 19, 2024.