DOCA STA - NVIDIA Docs

Introduction

A storage target is any application that can access data storage. This includes hard drives, SSDs, network-attached storage (NAS), and cloud storage services.

It serves storage access requests from local or remote initiators.

To allow the storage target to receive remote requests (over the network), the NVMe ( Non-Volatile Memory Express ) over fabrics (NVMeOF) protocol was introduced.

NVMe over Fabrics (NVMe-oF) is a protocol specification designed to extend the capabilities of NVMe storage across network fabrics (for example, RoCEv1/IB RDMA and NVMeTCP).

NVMe-oF extends the concept of storage targets by enabling NVMe commands to be executed over a network, rather than being limited to direct-attached storage.

Storage target application that supports NVMe-oF is a heavy application that consumes a substantial number of CPU cores.

To improve latency, free CPU resources and save power, an offload solution is required. Starting from BF3, the target application shall be offloaded on the DPA.

To abstract the communication between the target offload application (in our case SPDK) integration with the underlying offload accelerator and to ease the integration,

a new DOCA library is introduced (DOCA_STA). The library shall expose a public API that shall be used by the target application (DOCA application) for control-plane and data-plane handling.

While a software-based storage target accesses the NVMe drives directly over the PCIe, the DPA storage target offload accelerator accesses the NVMe drives

over PCIe-P2P (peer-to-peer) topology. To allow such access special kernel configuration is required.

Prerequisites

To operate DPA storage target offload, there is a need to use a dedicated patched P2P kernel:

In case the target application runs on host machine, need to install a patched kernel that supports P2P (TBD: add patch link)
In case the target application runs on BF machine (attached directly to NVMe drives) then a P2P supported BFB should be installed (TBD: link to BFB)

In addition, the library makes use of existing DOCA libraries, therefore, it is recommended to read the following sections before:

Changes From Previous Release

N/A

Environment

DOCA STA-based applications can run either on the host machine or on the NVIDIA® BlueField3® networking platform (DPU or SuperNIC).

When running of BlueField platform, the NVMe drives should be directly PCIe-attached to the platform, in addition, need to use at least BlueField3 as

it is the first platform that contains the DPA accelerator.

Architecture

DOCA STA library is composed of two parts:

Host side - exposes API to be used by the target application. Using this API the application can:
- Initialize and configure the accelerator parameters
- Register callbacks for data processing - some NVMe-oF command capsules could not processed by the offload engine and are forwarded to the library (non-offloaded capsules).
  The application can process these capsules using a registered callback
- Register callbacks for asynchronous events processing - the offload engine can generate and notify the library about events that occurred, such as NVMe drive timeout.
  The application can process these events using a registered callback
- Register tasks for various datapath and control-path operations:
  - Datapath tasks - after the application processes non-offloaded capsules, it can register tasks to the library for RDMA transmission (R/W/Send)
  - Control-path tasks - some control-path operations can take time and may be completed asynchronously, for example, QP disconnect or destroy the backend.
    The application can register tasks for these operations
Device side - accelerator low-level code that implements the target offload logic (e.g. DPA handlers code).

DOCA STA has two types of DOCA contexts:

DOCA STA context - a DOCA context that is responsible for system initialization and configuration. Perform all target system-wide operations.
There should be only a single instance of DOCA STA context in the system (VM)
DOCA STA IO context - a DOCA context that is responsible for RDMA IO operations. A DOCA STA IO context can exist per system thread (pthread)

Since DOCA STA makes use of other DOCA libraries. The key libraries are:

DOCA RDMA - used to connect/disconnect QPs on the accelerator datapath (e.g DPA)
DOCA COMCH - used as a communication to/from the offload accelerator engine. For example, post RDMA tasks and receive completions or post QP disconnect tasks and get completions

The following diagram illustrates the high-level connectivity of DOCA STA:

image-2024-8-29_12-43-8-version-1-modificationdate-1737755324047-api-v2.png

Objects

Device and Device Representor

The STA library uses a control device (PF) and a network device(s) (SF). The control device is used to set-up

the DPA, e.g. load the image (DPA-related application).

The network device is used to manage incoming and outgoing NVMe-oF traffic.

The application that uses doca-sta library, must select a control and network devices with an appropriate representor.

For more details - see DOCA Core Device Discovery and DOCA Core Device Representor Discovery.

Configuration Phase

To start using the library, users must go through a configuration phase as described in the DOCA Core Context Configuration Phase.

This section describes how to configure and start the doca-sta context.

Configurations

The doca-sta context should be configured to match the application use case.

Such configuration includes:

creation doca_sta object (main context)
add network device(s)
connect doca-sta context to the doca progress engine (PE)
start doca-sta context
create doca-sta-io object (io context/io threads)
connect doca-sta-io context to the doca progress engine (PE
start doca-sta-io context(s)

Note: the io context may be created only after the start of the main context.

Only the following start configuration order is allowed:

create the main context
start the main context
create io context(s)
start io context(s)

Only the following stop configuration order is allowed:

stop io context(s)
stop the main context

Create main context

The library extensively uses the DPA, so the control device (PF) that supports STA functionality should be used to create the doca-sta object.

You should use doca_sta_cap_is_supported API to check if the PF device supports the STA functionality.

Add network device

The library will utilize the network device(s) (SF) to manage incoming and outgoing NVMe-oF traffic.

Before starting the doca-sta object, the network devices (resources) must be added to it.

Use the doca_sta_add_dev API to add a network device (SF) for this purpose.

Progress engine

The PE is used to progress tasks and events. The progress engine is associated with one or

more doca contexts. A doca context can only be associated with one progress engine.

Note: it is the responsibility of the application that used the library to create a dedicated PE and

connect doca-sta context to this PE.

Start main context

When doc-sta context is created and configured, it should be started.

Create io context

The io context represents the io thread. The io thread is used to initiate RDMA connectivity flows

as well as handle 'non-offload' commands.

The following are example of such flows:

connect QP
disconnect QP
handle non-offload command

The non-offload command is the command that should not be handled by doca-sta library itself.

Such commands should be delivered to the application layer for further processing.

The io context should be created in the same way as the main context:

It is the responsibility of the application that uses the library to create a dedicated PE for each io context

and connect doca-sta-io context to this PE.

Start io context

When doc-sta-io context is created and configured, it should be started.

Execution Phase

The library should be used for offloading the NVMe-oF traffic on DPA.

The library provides APIs for configuring an NVMe-oF target application in compliance with the NVMe specification.

The application must complete the following steps before it can receive traffic from the initiator side.

add subsystems
add namespaces into subsystems
bound NVMe PCI disks to the namespaces

Note: The application utilizing the library is responsible for implementing the RDMA connection management service.

Please refer to the content of the doca_sta_subsystem.h and doca_sta_be.h header files for more information.

RDMA connect

Upon receiving the RDMA connect request, the application should call an appropriate API:

doca_sta_io_qp_alloc + doca_sta_io_qp_accept
or
doca_sta_io_qp_connect

Upon successful completion, the sta QP is created. The sta QP handle represents the sta QP.

Note: subsequent communication of the application layer with QP API and vice versa, should be done in the context of

the same io thread (io context) from which the QP was created.

Fabric connect

Immediately, after establishing a connection, the "fabric connect" (FC) command capsule will be sent from the initiator.

The FC should be handled by the application (non-offload flow).

The application layer is responsible for:

get data/payload for the FC command (if it was not received as inline data)
verify parameters inside FC command (host nqn, subsystem nqn, etc.)
complete FC processing (send a response to the initiator)

Note: all actions initiated by the application layer are asynchronous. They are part of the

non-offload flow/communication between DPA ← → host (DPU).

Please refer to the content of the doca_sta_io_non_offload.h header file for more information.

Tasks

The DOCA STA exposes asynchronous tasks that should be used for different async flows like:

disconnect QP
remove (detach) namespace
destroy back-end queue

The asynchronous flow is initiated by allocating a task through the appropriate API.

The application is responsible for submitting the task to initiate the desired action.

The library is responsible for submitting and monitoring the completion of the asynchronous flow.

Once completed, the library will notify the application about completion by issuing the callback function.

Task Input

Common input as described in DOCA Core Task.

Task Output

Common output as described in DOCA Core Task.

Task Limitations

The operation is not atomic
Other limitations are described in the DOCA Core Task

Non-offload flow task

The app layer is responsible for processing the NVMeoF capsules for the non-offload flow.

First, the NVMeoF capsule should be delivered from the DPA → host and eventually delegated to the app layer.

It might be required to perform the RDMA operations during the processing of the capsule.

Note: the QP is owned by DOCA STA library and must be accessed by the library API only.

The following are the non-offload tasks that can be used for initiating the RDMA operation

(by the application layer, for processing non-offload capsules):

RDMA-READ
RDMA_WRITE with RDMA_SEND
RDMA_SEND

Please refer to the content of the doca_sta_io_non_offload.h file for more information.

non-offload RDMA-READ

The task is responsible for initiating RDMA-READ operation.

Upon completion, an appropriate callback function will be invoked to notify the application layer about completion.

Task Configuration

Description	API to set the configuration
Enable the task	`doca_sta_io_task_non_offload_set_rdma_read_conf`

Task Allocation

Description	API to set the configuration
Allocation the task	`doca_sta_io_task_non_offload_rdma_read_alloc_init`

Task Completion Success

After the task is completed successfully:

The RDMA-READ operation was completed successfully

Task Completion Failure

If the task fails midway:

The RDMA-READ operation was failed

non-offload RDMA-WRITE + RDMA-SEND

The task is responsible for initiating the RDMA-WRITE operation in conjunction with the RDMA-SEND.

Upon completion, an appropriate callback function will be invoked to notify the application layer about completion.

Task Configuration

Description	API to set the configuration
Enable the task	`doca_sta_io_task_non_offload_set_rdma_write_send_conf`

Task Allocation

Description	API to set the configuration
Allocation the task	`doca_sta_io_task_non_offload_rdma_write_send_alloc_init`

Task Completion Success

After the task is completed successfully:

The RDMA-WRITE & RDMA-SEND operation was completed successfully

Task Completion Failure

If the task fails midway:

The RDMA-WRITE & RDMA-SEND operation was failed

non-offload RDMA-SEND

The task is responsible for initiating the RDMA-SEND operation.

Upon completion, an appropriate callback function will be invoked to notify the application layer about completion.

Task Configuration

Description	API to set the configuration
Enable the task	`doca_sta_io_task_non_offload_set_rdma_write_send_conf`

Task Allocation

Description	API to set the configuration
Allocation the task	`doca_sta_io_task_non_offload_rdma_send_alloc_init`

Task Completion Success

After the task is completed successfully:

The RDMA-SEND operation was completed successfully

Task Completion Failure

If the task fails midway:

The RDMA-SEND operation was failed

QP disconnect task

The QP disconnect task is responsible for initiating the asynchronous flow of QP disconnect:

move QP to the error state
send a notification message from a host (DPU) to the appropriate DPA
about QP disconnect
DPA should verify that there are no in/out IO is available
DPA will send a response to the host/DPU
notify the application about task completion

Task Configuration

Description	API to set the configuration
Enable the task	`doca_sta_io_task_disconnect_set_conf`

Task Allocation

Description	API to set the configuration
Allocation the task	`doca_sta_io_task_disconnect_alloc_init`

Task Completion Success

After the task is completed successfully:

The QP is disconnected and might be destroyed

Task Completion Failure

If the task fails midway:

the QP was not disconnected

Remove namespace task

The remove (detach) namespace task is responsible for initiating the asynchronous flow

of the detach namespace:

send a notification message from a host (DPU) to the appropriate DPA about detach namespace
DPA should mark the namespace as not valid anymore
DPA will send a response to the host/DPU
notify the application about task completion

Task Configuration

Description	API to set the configuration
Enable the task	`doca_sta_subsystem_task_rm_ns_set_conf`

Task Allocation

Description	API to set the configuration
Allocation the task	`doca_sta_subsystem_task_rm_ns_alloc_init`

Task Completion Success

After the task is completed successfully:

The namespace is removed
Any I/O sent to the removed namespace will be completed with an error

Task Completion Failure

If the task fails midway:

The namespace is not removed

Destroy back-end queue task

The destroy 'be queue' task is responsible for initiating the asynchronous flow of the destroy back-end queue:

send a notification message from a host (DPU) to the appropriate DPA about destroying be queue
DPA should mark the 'be queue' as destroyed
Any outstanding commands should be completed with an error
DPA will send a response to the host/DPU
notify the application about task completion

Task Configuration

Description	API to set the configuration
Enable the task	`doca_sta_be_task_destroy_queue_set_conf`

Task Allocation

Description	API to set the configuration
Allocation the task	`doca_sta_be_destroy_queue_task_alloc_init`

Task Completion Success

After the task is completed successfully:

The be queue is removed

Task Completion Failure

If the task fails midway:

The be queue is not removed

State Machine

The DOCA STA library follows the context state machine as described in DOCA Core Context State Machine.

The following section describes how to move states and what is allowed in each state.

Idle

In this state, it is expected that the application:

Destroys the context
Starts the context

Allowed operations:

Configuring the context according to the section
Starting the context

It is possible to reach this state as follows:

Previous State	Transition Action
None	Create the context
Running	Call stop after making sure all tasks have been freed
Stopping	Call progress until all tasks are completed and freed

Starting

This state cannot be reached.

Running

In this state, it is expected that the application:

Allocates and submits tasks
Calls progress to complete tasks and/or receive events

Allowed operations:

Allocating a previously configured task
Submitting a task
Calling stop

It is possible to reach this state as follows:

Previous State	Transition Action
Idle	Call start after the configuration

Stopping

In this state, it is expected that the application:

Calls progress to complete all inflight tasks (tasks completed with failure)
Frees any completed tasks

Allowed operations:

Call progress

It is possible to reach this state as follows:

Previous State	Transition Action
Running	Call progress and a fatal error occurs
Running	Call stop without freeing all tasks

DOCA-STA sample application

Offload sample

This sample application demonstrates the basic usage of the doca-sta API.

Copy
Copied!

            
            cd /opt/mellanox/doca/applications
meson setup /tmp/build
ninja -C /tmp/build
 
tmp/build/sta_offload/doca_sta_offload --doca_lib_lvl 50 --doca_app_lvl 40 --sf_dev mlx5_0

For a reference application utilizing the DOCA STA library, please get in touch with NVIDIA Enterprise Support.

On This Page