DOCA STA - NVIDIA Docs

Introduction

The DOCA STA library simplifies the integration and offloading of storage target applications, such as SPDK, onto NVIDIA® BlueField®-3 and newer platforms. A storage target is any application that manages data storage access, whether from hard drives, SSDs, network-attached storage (NAS), or cloud services, and serves requests from local or remote initiators.

To enable network-based access, the NVMe-oF protocol extends the capabilities of NVMe storage across network fabrics like RoCEv1/IB RDMA and NVMeTCP. By allowing NVMe commands to execute over the network, NVMe-oF overcomes the limitations of direct-attached storage.

Applications supporting NVMe-oF are resource-intensive, requiring substantial CPU resources. To reduce CPU usage, improve latency, and conserve power, an offload solution is essential. With BlueField-3, storage target applications can leverage the data processing accelerator (DPA) to offload their operations.

DOCA STA provides a public API to manage both control-plane and data-plane operations, abstracting the complexity of communication between the offloaded target application and the hardware accelerator. This abstraction allows developers to streamline the integration process and focus on application logic.

Unlike traditional software-based storage targets that access NVMe drives over PCIe, DPA offload accelerators utilize PCIe peer-to-peer (P2P) topology to connect to NVMe drives. Special kernel configurations are required to enable this topology.

Prerequisites

To enable DPA storage target offload, a dedicated patched P2P kernel is required:

For target applications running on the host machine, install a patched kernel that supports P2P.

Note that the patches need to be applied manually and a custom kernel needs to be built to make it work.

For target applications running on a BlueField machine, ensure a P2P-supported BFB is installed when the BlueField is directly attached to NVMe drives. For this you need the latest BFB with Ubuntu 22.04 (and DOCA v2.10.0).

In addition, the library makes use of existing DOCA libraries, therefore, it is recommended to read the following sections before:

Changes From Previous Release

N/A

Environment

DOCA STA-based applications can run either on the host machine or on the NVIDIA® BlueField-3® networking platform (DPU or SuperNIC).

When running on the BlueField platform, NVMe drives must be directly PCIe-attached, and BlueField-3 or newer is required.

Architecture

The DOCA STA library consists of two main components:

Host side – The host-side component exposes an API for use by the target application. Using this API, the application can:
- Initialize and configure accelerator parameters.
- Register callbacks for data processing – Certain NVMe-oF command capsules that cannot be processed by the offload engine are forwarded to the library as "non-offloaded capsules." The application can process these capsules using registered callbacks.
- Register callbacks for asynchronous events – The offload engine can generate notifications for events such as NVMe drive timeouts. The application can handle these events using registered callbacks.
- Register tasks for various operations:
  - Datapath tasks – After processing non-offloaded capsules, the application can register tasks for RDMA operations, such as read, write, or send.
  - Control-path tasks – Some control operations, such as queue pair (QP) disconnection or backend destruction, may take time to complete asynchronously. The application can register tasks for these operations.
Device side – The device-side component consists of low-level accelerator code that implements the target offload logic, such as DPA handler code.

DOCA STA Contexts

DOCA STA operates with two types of DOCA contexts:

DOCA STA context
- Responsible for system-wide initialization and configuration.
- Manages all target system-wide operations.
- There should be only one instance of the DOCA STA context in the system (or virtual machine).
DOCA STA IO context:
- Responsible for RDMA IO operations.
- Can exist for each system thread (e.g., per pthread).

Key DOCA Libraries

DOCA STA relies on several other DOCA libraries for its functionality. The key libraries are:

DOCA RDMA – Used for connecting or disconnecting queue pairs (QPs) on the accelerator datapath (e.g., DPA).
DOCA Comch – Enables communication to and from the offload accelerator engine. For example, it facilitates posting RDMA tasks and receiving completions or handling QP disconnect tasks and their completions.

High-Level Connectivity

The following diagram illustrates the high-level connectivity of DOCA STA:

image-2024-8-29_12-43-8-version-1-modificationdate-1737755324553-api-v2.png

Objects

Device and Device Representor

The DOCA STA library utilizes two types of devices: a control device (PF) and one or more network devices (SF).

Control device (PF) – This device is used to configure the DPA, such as loading the DPA-related application image.
Network device (SF) – This device handles incoming and outgoing NVMe-oF traffic.

Applications using the DOCA STA library must select both a control device and network devices with the appropriate representors. For more information, refer to the sections on DOCA Core Device Discovery and DOCA Core Device Representor Discovery.

Configuration Phase

To begin using the DOCA STA library, users must complete a configuration phase as outlined in the DOCA Core Context Configuration Phase.

This section details the steps required to configure and initialize the DOCA STA context.

Configurations

The DOCA STA context must be configured to align with the application's use case.

The configuration process involves the following steps:

Create the DOCA STA object (main context).
Add network device(s).
Connect the DOCA STA context to the DOCA progress engine (PE).
Start the DOCA STA context.
Create the DOCA STA IO object(s) (IO context/IO threads).
Connect the DOCA STA IO context(s) to the DOCA PE.
Start the DOCA STA IO context(s).

Note

The IO context(s) can only be created after the main context has been started.

The following order must be followed during the start phase:

Create the main context.
Start the main context.
Create IO context(s).
Start IO context(s).

The following order must be followed during the stop phase:

Stop IO context(s).
Stop the main context.

Create Main Context

The library relies heavily on the DPA, so the control device (PF) with STA functionality must be used to create the doca-sta object.

Use the doca_sta_cap_is_supported API to verify if the PF device supports STA functionality before proceeding.

Add Network Device

The library utilizes network device(s) (SF) to manage incoming and outgoing NVMe-oF traffic.

Before starting the doca-sta object, network devices (resources) must be added to it. Use the doca_sta_add_dev API to add network devices (SF) as needed.

Progress Engine

The Progress Engine (PE) is responsible for progressing tasks and events. Each PE is associated with one or more DOCA contexts, but a DOCA context can only be associated with a single PE.

Note

It is the responsibility of the application using the library to create a dedicated PE and connect the doca-sta context to it.

Start Main Context

Once the doca-sta context is created and configured, it must be started to enable its functionality.

Create IO Context

The IO context represents the IO thread, which is responsible for initiating RDMA connectivity flows and handling "non-offload" commands.

Examples of such flows include:

Connecting QPs
Disconnecting QPs
Handling non-offload commands

Non-offload commands refer to commands that the doca-sta library does not handle internally. These commands must be forwarded to the application layer for processing.

The IO context should be created in a similar manner to the main context.

Note

The application using the library is responsible for creating a dedicated PE for each IO context and connecting the doca-sta-io context to this PE.

Start IO Context

Once the doca-sta-io context is created and configured, it must be started to enable IO-related operations.

Execution Phase

The DOCA STA library enables offloading of NVMe-oF traffic to the DPA, providing APIs to configure an NVMe-oF target application in compliance with the NVMe specification.

Before the application can begin receiving traffic from the initiator, the following steps must be completed:

Add subsystems.
Add namespaces to the subsystems.
Bind NVMe PCI disks to the namespaces.

Note

The application is responsible for implementing the RDMA connection management service.

For additional details, refer to the doca_sta_subsystem.h and doca_sta_be.h header files.

RDMA Connect

Upon receiving an RDMA connection request, the application should use one of the following APIs:

doca_sta_io_qp_alloc + doca_sta_io_qp_accept
doca_sta_io_qp_connect

If the operation completes successfully, an STA QP (Queue Pair) is created, represented by an STA QP handle.

Note

All communication between the application layer and QP API (and vice versa) must occur within the same IO thread (IO context) from which the QP was created.

Fabric Connect

After a connection is established, the initiator will immediately send a "Fabric Connect" (FC) command capsule.

The application is responsible for handling the FC command as part of the non-offload flow. This includes:

Retrieving the data/payload for the FC command (if not received as inline data).
Verifying the parameters inside the FC command (e.g., host NQN, subsystem NQN).
Completing FC processing by sending a response to the initiator.

Note

All actions initiated by the application layer are asynchronous and are part of the non-offload flow. Communication between the DPA and host (DPU) is handled in this context.

For further details, refer to the doca_sta_io_non_offload.h header file.

Tasks

The DOCA STA library provides support for asynchronous tasks to handle various asynchronous flows, such as:

Disconnecting a QP
Removing (detaching) a namespace
Destroying a back-end queue

The asynchronous flow is initiated by the application through the allocation of a task using the appropriate API. Once the task is allocated, the application is responsible for submitting it to trigger the desired action.

After submission, the library manages the execution and monitors the completion of the asynchronous flow. When the operation is completed, the library notifies the application by invoking a callback function.

Task Input

Common input as described in DOCA Core Task.

Task Output

Common output as described in DOCA Core Task.

Task Limitations

The operation is not atomic
Other limitations are described in the DOCA Core Task

Non-offload Flow Task

The application layer is responsible for processing NVMe-oF capsules within the non-offload flow.

Initially, the NVMe-oF capsule is delivered from the DPA to the host and then delegated to the application layer for processing. During this process, the application may need to perform RDMA operations to handle the capsule effectively.

Note

The QP is managed by the DOCA STA library and must only be accessed through the library's API.

The following non-offload tasks can be used by the application layer to initiate RDMA operations for processing non-offload capsules:

RDMA-READ
RDMA-WRITE with RDMA-SEND
RDMA-SEND

For additional details, refer to the doca_sta_io_non_offload.h file.

Non-offload RDMA-READ

The RDMA-READ task initiates an RDMA-READ operation. Upon completion of the operation, the library will invoke the appropriate callback function to notify the application layer of its completion.

Task Configuration

Description	API to Set the Configuration
Enable the task	`doca_sta_io_task_non_offload_set_rdma_read_conf`

Task Allocation

Description	API to Set the Configuration
Allocation the task	`doca_sta_io_task_non_offload_rdma_read_alloc_init`

Task Completion Success

After the task is completed successfully:

The RDMA-READ operation is completed successfully

Task Completion Failure

If the task fails midway:

The RDMA-READ operation fails

Non-offload RDMA-WRITE + RDMA-SEND

This task is responsible for initiating an RDMA-WRITE operation followed by an RDMA-SEND operation.

Upon successful completion of the operation, the library will invoke the corresponding callback function to notify the application layer about its completion.

Task Configuration

Description	API to Set the Configuration
Enable the task	`doca_sta_io_task_non_offload_set_rdma_write_send_conf`

Task Allocation

Description	API to Set the Configuration
Allocation the task	`doca_sta_io_task_non_offload_rdma_write_send_alloc_init`

Task Completion Success

After the task is completed successfully:

The RDMA-WRITE and RDMA-SEND operation completed successfully

Task Completion Failure

If the task fails midway:

The RDMA-WRITE and RDMA-SEND operation fails

Non-offload RDMA-SEND

This task is responsible for initiating an RDMA-SEND operation.

Upon completion, the library will invoke the appropriate callback function to notify the application layer of its successful completion.

Task Configuration

Description	API to Set the Configuration
Enable the task	`doca_sta_io_task_non_offload_set_rdma_write_send_conf`

Task Allocation

Description	API to Set the Configuration
Allocation the task	`doca_sta_io_task_non_offload_rdma_send_alloc_init`

Task Completion Success

After the task is completed successfully:

The RDMA-SEND operation is completed successfully

Task Completion Failure

If the task fails midway:

The RDMA-SEND operation fails

QP Disconnect Task

The QP disconnect task is responsible for initiating the asynchronous flow required to disconnect a QP. The process includes the following steps:

Move the QP to the error state.
Send a notification message from the host (DPU) to the corresponding DPA, indicating the QP disconnect request.
The DPA verifies that there are no pending inbound or outbound IO operations.
Once verification is complete, the DPA sends a response back to the host (DPU).
The library notifies the application about the completion of the disconnect task through a callback function.

Task Configuration

Description	API to Set the Configuration
Enable the task	`doca_sta_io_task_disconnect_set_conf`

Task Allocation

Description	API to Set the Configuration
Allocation the task	`doca_sta_io_task_disconnect_alloc_init`

Task Completion Success

After the task is completed successfully:

The QP is disconnected and might be destroyed

Task Completion Failure

If the task fails midway:

The QP was not disconnected

Remove Namespace Task

The remove (detach) namespace task initiates the asynchronous process for detaching a namespace. The flow includes the following steps:

Send a notification message from the host (DPU) to the corresponding DPA about the detach namespace request.
The DPA marks the namespace as no longer valid.
The DPA sends a response back to the host (DPU) confirming the detachment.
The library notifies the application about the task's completion through a callback function.

Task Configuration

Description	API to Set the Configuration
Enable the task	`doca_sta_subsystem_task_rm_ns_set_conf`

Task Allocation

Description	API to Set the Configuration
Allocation the task	`doca_sta_subsystem_task_rm_ns_alloc_init`

Task Completion Success

After the task is completed successfully:

The namespace is removed
Any I/O sent to the removed namespace is completed with an error

Task Completion Failure

If the task fails midway:

The namespace is not removed

Destroy Back-end Queue Task

The destroy back-end (BE) queue task initiates the asynchronous process for destroying a back-end queue. The flow includes the following steps:

Send a notification message from the host (DPU) to the appropriate DPA to request the destruction of the back-end queue.
The DPA marks the back-end queue as destroyed.
Any outstanding commands in the queue are completed with an error status.
The DPA sends a response back to the host (DPU) confirming the destruction.
The library notifies the application about the task's completion through a callback function.

Task Configuration

Description	API to Set the Configuration
Enable the task	`doca_sta_be_task_destroy_queue_set_conf`

Task Allocation

Description	API to Set the Configuration
Allocation the task	`doca_sta_be_destroy_queue_task_alloc_init`

Task Completion Success

After the task is completed successfully:

The BE queue is removed

Task Completion Failure

If the task fails midway:

The BE queue is not removed

State Machine

The DOCA STA library follows the context state machine as described in DOCA Core Context State Machine.

The following section describes how to move states and what is allowed in each state.

Idle

In this state, it is expected that the application:

Destroys the context
Starts the context

Allowed operations:

Configuring the context according to the section
Starting the context

It is possible to reach this state as follows:

Previous State	Transition Action
None	Create the context
Running	Call stop after making sure all tasks have been freed
Stopping	Call progress until all tasks are completed and freed

Starting

This state cannot be reached.

Running

In this state, it is expected that the application:

Allocates and submits tasks
Calls progress to complete tasks and/or receive events

Allowed operations:

Allocating a previously configured task
Submitting a task
Calling stop

It is possible to reach this state as follows:

Previous State	Transition Action
Idle	Call start after the configuration

Stopping

In this state, it is expected that the application:

Calls progress to complete all inflight tasks (tasks completed with failure)
Frees any completed tasks

Allowed operations:

Call progress

It is possible to reach this state as follows:

Previous State	Transition Action
Running	Call progress and a fatal error occurs
Running	Call stop without freeing all tasks

DOCA STA Sample Application

Offload Sample

This sample application demonstrates the basic usage of the DOCA STA API:

Copy
Copied!

            
            cd /opt/mellanox/doca/applications
meson setup /tmp/build
ninja -C /tmp/build
 
tmp/build/sta_offload/doca_sta_offload --doca_lib_lvl 50 --doca_app_lvl 40 --sf_dev mlx5_0

For a reference application utilizing the DOCA STA library, please get in touch with NVIDIA Enterprise Support.

On This Page