DOCA Documentation v3.3.0

DOCA PCC

This guide provides an overview and configuration instructions for the DOCA Programmable Congestion Control (PCC) API.

Note

The quality status of DOCA libraries is listed here.

The DOCA PCC library offers a high-level programming interface that enables users to implement customized congestion control (CC) algorithms. By leveraging the NVIDIA® BlueField®-3 platform hardware acceleration, it facilitates efficient network congestion management while abstracting away hardware complexities.

With the DOCA PCC API, users can:

  • Configure probe packets for sending and receiving

  • Retrieve CC events or packets and access their fields

  • Set flow rate limits to regulate network traffic

  • Maintain per-flow contexts for individualized management

  • Initiate and configure CC algorithms tailored to application needs

  • Process incoming request packets and generate appropriate response packets

This streamlined API allows developers to focus on designing and implementing congestion control logic without worrying about low-level hardware operations.

DOCA PCC-based applications can run on either the host machine or the NVIDIA BlueField-3 Platform (or later) target.

Info

Currently, DOCA PCC is supported only for the ETHERNET link type.

Enabling DOCA PCC

To enable DOCA PCC RP:

  1. On the host/VM, run:

    Copy
    Copied!
                

    mlxconfig -d <mlx_device> -y s USER_PROGRAMMABLE_CC=1

  2. Perform a graceful shutdown and then power cycle the host.

To enable DOCA PCC NP:

  1. On the host/VM, run:

    Copy
    Copied!
                

    mlxconfig -d <mlx_device> -y s PCC_INT_EN=0

  2. Perform a graceful shutdown and then power cycle the host.

Configuration Notes

  • Setting PCC_INT_EN=1 blocks the creation of DOCA PCC NP context and enables the legacy NP solution. It also only supports DOCA PCC RP context for setting Congestion Control Message After Drop (CCMAD) probe packet format.

  • For IFA2.0 support, enable both DOCA PCC RP and DOCA PCC NP on all cluster nodes.

  • The DOCA PCC NP process requires root access.

  • For BlueField-3 devices in DPU mode, executing the DOCA PCC NP process on an x86 host is not supported.

  • When running from an x86 host in NIC mode, privileged permissions are required. Check the privilege level using mlxprivhost -d <mlx_device> q.

  • To enable the injection of response timestamp information into probe response payloads, you must set FLEX_PARSER_PROFILE_ENABLE=10 in the device's non-volatile (NV) configuration.

    Copy
    Copied!
                

    mlxconfig -d <mlx_device> -y s FLEX_PARSER_PROFILE_ENABLE=10

DPACC Tool

The DPACC tool compiles and links user algorithms and device code with the DOCA PCC device library to create loadable applications. DPACC is included in the DOCA SDK installation package. For more information, refer to DOCA DPACC Compiler documentation.

Changes in 3.3.0

Added

  • Support for injecting response timestamp in DOCA PCC NP.

The library requires firmware version 32.38.1000 and higher.

DOCA PCC comprises three main components which are part of the DOCA SDK installation package.

Host Library

The host library offers a unified interface for managing the DOCA PCC context configuration.

As part of the control path, the host library integrates passively within the application, orchestrating congestion control activities without directly handling data transmission.

Host/device library and header files:

image-2025-10-21_13-52-46-1-version-1-modificationdate-1770531217307-api-v2.png

Device Libraries

The DOCA PCC context assumes one of two roles:

  • Reaction point (RP): Monitors network conditions actively, dynamically adjusting data transmission rates to alleviate congestion promptly. RP context is global per NIC.

    Device library and header files:

    image-2024-9-19_14-23-12-version-1-modificationdate-1770531194417-api-v2.png

  • Notification point (NP): Passively receives congestion notifications from external sources, processing them intelligently to facilitate informed decisions within the application. NP context is global per e-switch owner.

    Device library and header files:

    image-2024-2-28_16-42-13-version-1-modificationdate-1770531214053-api-v2.png

Both RP and NP device libraries share common headers:

image-2024-2-28_16-45-6-version-1-modificationdate-1770531214547-api-v2.png

Currently, the device library and the user algorithm are implemented and managed over the BlueField's data-path accelerator (DPA) subsystem.

For more info on DPA, refer to DPA Subsystem.

Development Flow

DOCA enables developers to program the congestion control algorithm into the system using the DOCA PCC library.

The following are the required steps to start programming:

  1. Implement CC algorithms and probe packet handling using the API provided by the device header files.

  2. Implement the user callbacks defined by the library for DataPath:

    • For RP: doca_pcc_dev_user_init(), doca_pcc_dev_user_set_algo_params(), doca_pcc_dev_user_algo().

    • For NP: doca_pcc_dev_np_user_packet_handler()

  3. Use DPACC to build a DPA application (i.e., a host library which contains an embedded device executable). Input for DPACC are the files containing the implementation of the previous steps.

  4. Build host executable using a host compiler. Inputs for the host compiler are the DPA application generated in the previous step and the user application host source files.

  5. In the host executable, create and start a DOCA PCC context which is set with the DPA application containing the device code.

development-flow-version-1-modificationdate-1770531216780-api-v2.png

For a more descriptive example, refer to NVIDIA DOCA PCC Application Guide.

Note

The PCC program must be loaded before the QP is created. If the QP is created first, it will not be able to access the PCC program context. This dependency is strict; reversing the order causes PCC features to fail or result in undefined library behavior.


System Design

DOCA PCC flow for implementing an RP program:

image-2024-6-25_15-58-25-version-1-modificationdate-1770531215430-api-v2.png

DOCA PCC flow for implementing an NP program:

image-2024-6-25_15-58-50-version-1-modificationdate-1770531216097-api-v2.png

For the library API reference, refer to PCC API documentation in the API References.

The following sections provide additional details about the library API.

Host API

The host library API consists of calls to set the PCC context attributes and observe availability of the process.

Selecting and Opening DOCA Device

To perform PCC operations, a device must be selected. To select a device, users may iterate over all DOCA devices using doca_devinfo_list_create() and check whether the device supports the desired PCC role either via doca_devinfo_get_is_pcc_supported() for RP, or doca_pcc_np_cap_is_supported() for NP.

Setting Up and Starting DOCA PCC Context

After selecting a DOCA device, a PCC context can be created.

As described in the Architecture section, The DOCA PCC library provides APIs to leverage Reaction Points (RP) and Notification Points (NP) to implement programmable congestion control strategies.

Call doca_pcc_create() to create a DOCA PCC RP context, and doca_pcc_np_create() to create a DOCA PCC NP context.

Afterwards, the following attributes must be set for the PCC context:

  • Context app – the name of the DPA application compiled using DPACC, consisting of the device algorithm and code. This is set using the call doca_pcc_set_app().

  • Context threads – the affinity of DPA threads to be used to handle CC events. This is set using the call doca_pcc_set_thread_affinity(). The number of threads to be used must be constrained between the minimum and maximum number of threads allowed to run the PCC process (see doca_pcc_get_min_num_threads()and doca_pcc_get_max_num_threads()). The availability and usage of the threads for PCC is dependent on the complexity of the CC algorithm, link rate, and other potential DPA users.

    Note

    Users can manage DPA threads in the system using EU pre-configuration with the dpaeumgmt tool. For more information, refer to Single Point of Resource Distribution.

After setting up the context attributes, the context can be started using doca_pcc_start(). Starting the context initiates the CC algorithm supplied by the user.

Configuring Probe Packets

The DOCA PCC library provides APIs to configure the probe packet settings to tailor congestion control behaviors according to specific network conditions.

The probe packet serves to probe the network for congestion and gather essential feedback for congestion control algorithms.

The DOCA PCC Library supports the following probe packet types:

  • CCMAD – Provides information about the network's round-trip time so the algorithm can detect and adapt to congestion proactively

  • IFA1 – In-band Flow Analyzer 1 packets provide in-band congestion feedback for proactive congestion control

  • IFA2 – In-band Flow Analyzer 2 packets offer an alternative method for in-band congestion feedback, optimized for specific network environments

Configuring Dedicated Fields for Different Probe Types

The DOCA PCC library provides APIs to configure specific fields in different supported probe packet types.

  • IFA1 – support to configure probe marker

  • IFA2 – support to configure gns and hop limit

Configuring Remote NP Handler

To enable Reaction Point contexts to interact with remote Notification Point contexts, the DOCA PCC library provides an API to set the expected remote handler type.

When the DOCA PCC RP process expects CCMAD probe packet responses from a DOCA PCC NP process, it should set it as so using the API doca_pcc_rp_set_ccmad_remote_sw_handler(). If not set, the DOCA PCC RP process expects that no remote DOCA PCC NP process is activated, and that responses are handled by the remote node's hardware. Note that if using probe types other than CCMAD, probe packet responses are always expected to be generated from a remote DOCA Notification Point process.

RTC Timestamps for PCC

To use the real-time clock (RTC) as the timetable for both DOCA PCC RP (requesting party) and DOCA PCC NP (notifying party), the clocks on both endpoints must be synchronized (e.g., using PTP). Synchronization details are outside the scope of this documentation.

By default, RTT probe packet timestamps are taken from the device's free-running clock. The steps below describe how to configure and use RTC-based timestamps.

Configuring RTC in DOCA PCC RP

  1. Enable RTC on the device.

    Copy
    Copied!
                

    mlxconfig -d <mlx_device> -y s REAL_TIME_CLOCK_ENABLE=1

  2. Configure RTT timestamp format to use RTC (value 2).

    Copy
    Copied!
                

    mlxconfig -d <mlx_device> -y s ROCE_CC_RTT_TIMESTAMP_FORMAT=2

    Note

    This NV config setting requires MFT version 4.34 or newer.

  3. retrieve the RTT request timestamp (T1) using the API doca_pcc_dev_get_rtt_req_recv_timestamp();. This function is compatible with both RTC and free-running clock sources.

Configure RTC in DOCA PCC NP

  1. Enable RTC on the device.

    Copy
    Copied!
                

    mlxconfig -d <mlx_device> -y s REAL_TIME_CLOCK_ENABLE=1

  2. In the host-side code, configure the NP to use the RTC timestamp source using the following APIs:

    • doca_pcc_np_cap_is_ts_source_supported(): Verifies if the device supports the specific timestamp source (defined in doca_pcc_np_ts_source_t).

    • doca_pcc_np_set_ts_source(): Sets the NP to use the specified timestamp source.

      Note

      Configuring the timestamp source in the DOCA PCC NP requires flexio-sdk version 25.10.xxxx or newer.

  3. In the device code, retrieve the RTT request receive timestamp (T2) using doca_pcc_np_dev_get_t2_ns().

    Note

    This API returns a 30-bit value for nanoseconds in little-endian format. It is compatible with both RTC and free-running clock sources.

DOCA PCC Notification Point: Response Timestamp

Response timestamp allows the DOCA PCC Notification Point (NP) to inject timestamp information directly into probe response payloads. This facilitates precise timing measurements in congestion control scenarios by capturing a timestamp at the moment the probe response is transmitted from the port.

When enabled, the timestamp is automatically inserted into the least-significant bits of the last DWORD of the Congestion Control (CC) probe response payload. This allows the Reaction Point (RP) to calculate accurate round-trip times (RTT) and make informed congestion control decisions.

Configuring Response Timestamp

To enable the injection of response timestamps, you must set the Flex Parser profile to 10 in the device's non-volatile (NV) configuration.

Copy
Copied!
            

mlxconfig -d <mlx_device> -y s FLEX_PARSER_PROFILE_ENABLE=10


Response Timestamp API Reference

  • doca_pcc_np_cap_is_resp_ts_supported() – Queries whether the device supports injecting response timestamps into probe response payloads for Notification Point operations.

  • doca_pcc_np_set_resp_ts_size() – Configures the number of bits (ts_size) used for the timestamp within the last DWORD of the payload.

    • 0: Response timestamp injection is disabled (Default).

    • 1-32: The timestamp is injected into the least-significant ts_size bits of the last DWORD.

      Note

      Larger sizes provide higher precision but consume more payload space. Use doca_pcc_np_set_resp_ts_resolution() to adjust the time scale if the value range is insufficient.

  • doca_pcc_np_set_resp_ts_resolution() – Sets the granularity of the timestamp by configuring a right-bit shift applied to the raw timestamp before injection. This allows trading precision for an extended time range.

    • 0: No shift applied (maximum precision) (Default).

    • N: The raw timestamp is right-shifted by N bits (effectively dividing by 2N).

  • doca_pcc_np_set_ts_source() – Selects the hardware clock source. This setting applies globally to NP operations, affecting both the response timestamp injection and the doca_pcc_np_dev_get_t2_ns() device API.

Note

The DOCA_PCC_NP_TS_SOURCE_DEFAULT enum value is deprecated in this release and will be removed in an upcoming version. Applications must explicitly select a valid timestamp source (e.g., FREE_RUNNING or REAL_TIME).

Copy
Copied!
            

typedef enum { DOCA_PCC_NP_TS_SOURCE_FREE_RUNNING = 0, /**< Free running timestamp. */ DOCA_PCC_NP_TS_SOURCE_DEFAULT DOCA_DEPRECATED_ENUM = 0x1, /**< @deprecated Deprecated in this release; will be removed in an upcoming release. */ DOCA_PCC_NP_TS_SOURCE_REAL_TIME = 0x2, /**< Real time timestamp. */ } doca_pcc_np_ts_source_t;

Debuggability

The DOCA PCC library provides a comprehensive set of debugging APIs. These tools allow you to diagnose issues, pinpoint bottlenecks, and access real-time information from your running device-side application.

PCC Tracer

The library’s tracer is optimized for high-frequency use. It allows you to observe device algorithm behavior and diagnose issues with negligible impact on application performance.

Note

PCC tracing is enabled by default when doca_pcc_start() is called. By default, trace output is routed to stdout.

Tracer API reference:

Category

API Function

Description

State control

doca_pcc_deactivate_tracer()

Disables runtime trace printing.

doca_pcc_activate_tracer()

Re-enables tracing dynamically without restarting the PCC context.

Destination config

doca_pcc_trace_buf_set()

Routes trace output to a user-supplied buffer. 1

doca_pcc_trace_file_set()

Routes trace output to a specified file. 1

doca_pcc_trace_buf_get()

Queries the currently configured destination buffer.

doca_pcc_trace_file_get()

Queries the currently configured destination file. Returns DOCA_ERROR_BAD_CONFIG if no file is set.

Formatting and handling

doca_pcc_set_trace_message()

Sets the specific trace message string used for device printing. 1

doca_pcc_register_trace_handler()

Registers a custom callback function for programmatic handling. The callback receives a user context pointer and an array of arriving device trace reports.

  1. Must be set before doca_pcc_start().    

PCC Logger

The logger handles explicit messages generated by the device API doca_pcc_dev_printf().

Note

Unlike the tracer, device-side prints incur measurable performance overhead. Use them sparingly for short-term debugging or targeted diagnostics. For ongoing observability, use the PCC Tracer.

Logger API reference:

  • doca_pcc_set_print_buffer_size()

    Configures the size of the buffer used to accumulate device-side print data (doca_pcc_dev_printf()) before sending it to the host.

    Warning

    Incurs measurable performance overhead; use sparingly.

    Note

    Must be set before doca_pcc_start().

Device Coredump File

The coredump utility captures crucial device crash data when an unrecoverable error occurs on the device side of the application. The resulting file includes a memory snapshot at the exact time of the crash, detailing the program's state, variable values, and the call stack.

Coredump API reference:

  • doca_pcc_set_dev_coredump_file() – Configures the host-side file path to capture device crash data (memory snapshot, call stack, variables) when an unrecoverable error occurs on the device side.

Device Mailbox

The DOCA PCC library provides a set of APIs for sending and receiving messages through a mailbox. This service allows communication between the host and device :

  • doca_pcc_set_mailbox() – API to set the mailbox attributes for the process .

  • doca_pcc_mailbox_get_request_buffer() and doca_pcc_mailbox_get_response_buffer() – API to get the buffers with which the communication will be handled . User can set the request he wants to send to the device, and get a response back.

  • doca_pcc_mailbox_send() – API to send the mailbox request to the device. This is a blocking call which invokes a callback on the device doca_pcc_dev_user_mailbox_handle() which user can handle.

High Availability

The DOCA PCC library supports High Availability (HA) to ensure continuous operation and recovery if the running PCC process malfunctions. You can achieve this by running multiple PCC processes in parallel.

Process Lifecycle and Failover

  1. Call doca_pcc_start() to register multiple PCC processes in parallel with the NIC firmware.

  2. The firmware designates the first process to register as the ACTIVE process (running on the DPA and handling Congestion Control events). All subsequent processes are automatically placed in STANDBY mode.

  3. Continuously observe the status of your processes using doca_pcc_get_process_state(). If a state change occurs, the doca_pcc_wait() function will return.

  4. If the currently ACTIVE process encounters an error or stops processing events, the firmware automatically promotes one of the STANDBY processes to become the new ACTIVE process.

  5. The defunct (failed) process must explicitly call doca_pcc_destroy() to safely free its allocated resources.

    Note

    Configuration state is not replicated across processes. When a failover occurs, the replacement process does not automatically receive new algorithm configurations, and any user-applied PPCC commands are lost and must be re-applied manually.

Process States (doca_pcc_process_state_t)

The following table details the possible states of a PCC process at any given time:

State Enum

Value

Description

Action Required

DOCA_PCC_PS_ACTIVE

0

The process is actively handling CC events.

None (Only one process is active at a time).

DOCA_PCC_PS_STANDBY

1

The process is waiting in standby mode.

None (Another process is currently ACTIVE).

DOCA_PCC_PS_DEACTIVATED

2

The process was deactivated by the NIC Firmware.

Must call doca_pcc_destroy().

DOCA_PCC_PS_ERROR

3

The process has encountered an error.

Must call doca_pcc_destroy().

PCC Resources

The PCC Resources API provides a mechanism to parse and query pre-defined PCC resource configurations. This enables applications to discover and utilize the Execution Units (EUs) allocated to them in the Single Point of Resource Distribution (SPRD) file.

The SPRD file is a YAML-formatted configuration file that defines PCC application resources. It specifies:

  • Application names (keys).

  • The number of allocated EUs per application.

  • The specific EU IDs assigned to each application.

Info

For file syntax and examples, refer to the Single Point of Resource Distribution documentation.

Key functions:

Function

Description

doca_pcc_resources_create

Creates a PCC resources object by parsing an SPRD buffer and extracting the configuration for a specific application

doca_pcc_resources_get_num_eus

Releases all resources associated with a PCC resources object

doca_pcc_resources_get_eus

Retrieves the number of EUs allocated to the application in the PCC resources object

doca_pcc_resources_destroy

Retrieves the array of EU IDs allocated to the application

Device API

The device library API provides the necessary calls to set up and manage your congestion control (CC) algorithms so they can handle CC events arriving directly on the hardware.

Counter Sampling

These APIs allow you to sample NIC byte counters to monitor the amount of data transmitted and received through the NIC.

Note

It is highly recommended to configure the counters inside the doca_pcc_dev_user_port_info_changed() callback, as this indicates the correct port state to sample from.

API Function

Description

doca_pcc_dev_nic_counters_config()

Prepares the list of counters you want to read.

doca_pcc_dev_nic_counters_sample()

Samples and retrieves the new counter values.


Algorithm Access

The Reaction Point (RP) device library supports running multiple PCC algorithms, useful for fast A/B testing and comparative runs. You can utilize the default library algorithm alongside your own custom algorithms.

Category

API Function

Description

Core

doca_pcc_dev_default_internal_algo()

Loads the default CC algorithm (can be used fully or partially).

doca_pcc_dev_init_algo_slot()

Assigns a specific algorithm to run on a designated device port (slot).

Initialization

doca_pcc_dev_algo_init_param()

Initiates the algorithm's parameters.

doca_pcc_dev_algo_init_counter()

Initiates the algorithm's counters.

doca_pcc_dev_algo_init_metadata()

Initiates the algorithm's metadata base.


Algorithm Selection

Algorithms are enabled or disabled on specific "algo slots" using either doca_pcc_dev_init_algo_slot() or the mlxreg command (cmd_type 1 and 2).

The algorithm that is ultimately selected for traffic depends on the negotiation between the two connection endpoints:

  • Successful negotiation (with ECE): Occurs if both endpoints support Enhanced Congestion Control (ECE). Each Queue Pair (QP) specifies its CC algo slot via ECE. If multiple algorithms are enabled, the one with the lowest shared slot index is selected.

  • No negotiation (without ECE): If ECE is not supported or not enabled, no negotiation occurs. The system defaults to the default algorithm slot.

    • For example, When testing with ib_write_bw, algorithm negotiation only executes if you pass the --rdma_cm parameter on both the client and server (which enables ECE).

Events

The RP device library provides optimized helper functions to access CC events. These supply the runtime data needed to analyze and inspect hardware events and build out your CC algorithm logic.

Utilities

A set of optimized utility macros (such as fixed-point math operations, memory space fences, etc.) are included to streamline the programming of your CC algorithm on the device.

User Callbacks

The library relies on specific user-implemented callbacks to initiate the CC algorithm and handle packet I/O.

Note

These callbacks must be implemented in your code and compiled by DPACC to be properly provided to the DOCA PCC context in your DPA application.

Reaction point (RP) callbacks:

Callback Function

Trigger/Purpose

doca_pcc_dev_user_init()

Called on PCC process load. Use this to initialize the data for all user algorithms.

doca_pcc_dev_user_algo()

The primary entry point to your custom user algorithm handling code.

doca_pcc_dev_user_set_algo_params()

Called whenever an algorithm parameter change is triggered externally.

Notification point (NP) callbacks:

Callback Function

Trigger/Purpose

doca_pcc_dev_np_user_packet_handler()

Called immediately upon the arrival of probe packets.


Debuggability

PCC Tracer

The device-side tracer is the preferred method for ongoing observability, designed to minimize performance impact compared to standard prints.

Note

The message format must be pre-configured on the host using doca_pcc_set_trace_message().

API Function

Description

doca_pcc_dev_trace_5()

Emits formatted trace records containing up to five arguments.

doca_pcc_dev_trace_flush()

Forces a partially filled trace buffer to the host.

Warning

Avoid frequent use; typically reserved for the end of a run.


PCC Logger

Intended strictly for short-term debugging convenience.

API Function

Description

doca_pcc_dev_printf()

Prints device-side messages directly to the host stdout.

Warning

Frequent use degrades performance, and dropped messages may occur due to limited host buffering.

© Copyright 2026, NVIDIA. Last updated on Mar 2, 2026