DOCA Documentation v3.2.0

DOCA Telemetry Adp Retx

This guide provides instructions for building and developing applications that require telemetry data collection from NVIDIA® BlueField® and NVIDIA® ConnectX® families of networking platforms.

The doca_telemetry_adp_retx library provides statistics on Adaptive Retransmission Algorithm timeouts that have been configured on a given DOCA device, corresponding to an NVIDIA® BlueField® or NVIDIA® ConnectX® network card.

The library includes mechanisms for configuring and reading Adaptive Retransmissions in a histogram format. Each histogram read provides a series of bins, where each bin corresponds to a specific time range. The value of the bin is a count of the retransmissions that occurred due to a timeout falling within that time range.

The histogram can return information about events on all QPs of functions associated with the DOCA device, or it can be configured to track the QPs of a single VHCA ID.

Note

DOCA Telemetry Adp Retx is supported at an alpha level.

To use DOCA Telemetry Adp Retx, the following prerequisites must be met:

  • fwctl driver installed and loaded (see instructions in NVIDIA MLNX_OFED Documentation v24.07-0.6.1.0)

    Note

    To verify whether the fwctl driver is successfully loaded:

    Copy
    Copied!
                

    $ ls /sys/class/fwctl/

    Expected output:

    Copy
    Copied!
                

    fwctl0  fwctl1

    If the directory /sys/class/fwctl does not exist or is empty, follow these steps:

    1. Search for the fwctl package:

      Copy
      Copied!
                  

      $ apt search fwctl

      The output may indicate either fwctl-dkms or fwctl-modules.

    2. Install the appropriate package:

      Copy
      Copied!
                  

      $ sudo apt install fwctl-dkms

      Or:

      Copy
      Copied!
                  

      $ sudo apt install fwctl-modules

    3. Load the mlx5_fwctl module:

      Copy
      Copied!
                  

      $ sudo modprobe mlx5_fwctl

    4. Confirm the module is loaded:

      Copy
      Copied!
                  

      $ lsmod | grep fwctl

      Expected output:

      Copy
      Copied!
                  

      mlx5_fwctl 20480 0 fwctl 24576 1 mlx5_fwctl mlx5_core 2211840 2 mlx5_fwctl,mlx5_ib mlx_compat 20480 17 rdma_cm,ib_ipoib,mlxdevm,nvme,mlxfw,mlx5_fwctl,iw_cm,nvme_core,nvme_fabrics,ib_umad,fwctl,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core

  • Firmware version greater than 28.47.1000 for ConnectX-7, 40.47.1000 for ConnectX-8, or 32.47.1000 for BlueField-3

DOCA Telemetry-based applications can run on either the host machine (ConnectX-7 or BlueField-3 and newer) or on the DPU (BlueField-3 and newer).

The doca_telemetry_adp_retx library provides statistics on Adaptive Retransmission configured devices, including the number of retransmissions and their timeout ranges in a histogram format.

To interact with a device (typically corresponding to a specific NIC port), you must create a doca_telemetry_adp_retx context using doca_telemetry_adp_retx_create().

Device Support

A DOCA device is required for the library to operate. For guidance on selecting a device, refer to the " DOCA Core Device Discovery " documentation.

Device support for doca_telemetry_adp_retx and its features can be checked with the following capability calls:

  • doca_telemetry_adp_retx_cap_is_supported()

  • doca_telemetry_adp_retx_cap_histogram_is_supported()

The maximum number of bins and the supported time units can be queried using:

  • doca_telemetry_adp_retx_cap_get_hist_max_bins()

  • doca_telemetry_adp_retx_cap_get_hist_time_units()

Histogram Configuration

The histogram divides retransmission events into bins, each representing a time range. If a retransmission timeout falls within a bin's range, that bin's counter is incremented. The number of bins and their time ranges are configurable.

The bin widths and timespans are determined by five main configuration options:

API Configuration

Description

doca_telemetry_adp_retx_set_hist_num_bins()

Number of bins to use in the histogram

doca_telemetry_adp_retx_set_hist_bin0_width()

Width (in time units) of the first bin

doca_telemetry_adp_retx_set_hist_bin1_width()

Width (in time units) of the second bin; also used as the base for calculating subsequent bins

doca_telemetry_adp_retx_set_hist_time_unit()

The time unit for bin0 and bin1 widths (e.g., nsec, usec, msec)

doca_telemetry_adp_retx_set_hist_bin_width_node()

The calculation mode for bins after bin1: either fixed (same width as bin1) or double (exponentially doubling)

Example:

  • Fixed Mode: 4 bins, bin0_width=50, bin1_width=100, time_unit=msec, width_mode=fixed.

    • Bin 0: 0-50 msec

    • Bin 1: 50-150 msec (base + 100)

    • Bin 2: 150-250 msec (base + 100)

    • Bin 3: 250-350 msec (base + 100)

      image-2025-9-24_11-15-8-1-version-1-modificationdate-1762456395697-api-v2.png

  • Double Mode: 5 bins, bin0_width=50, bin1_width=100, time_unit=msec, width_mode=double.

    • Bin 0: 0-50 msec

    • Bin 1: 50-150 msec (base + 100)

    • Bin 2: 150-350 msec (base + 200)

    • Bin 3: 350-750 msec (base + 400)

    • Bin 4: 750-1550 msec (base + 800)

      image-2025-9-24_11-12-33-1-version-1-modificationdate-1762456395967-api-v2.png

Further options control how the histogram is populated:

API Configuration

Description

doca_telemetry_adp_retx_set_hist_vhca_id()

Populates the histogram with retransmissions from a single VHCA ID only

doca_telemetry_adp_retx_set_hist_clear_on_read()

Clears (resets to 0) the histogram bin counters after each read

doca_telemetry_adp_retx_set_hist_count_enable()

Enables the counters. This must be set for the histogram to start gathering statistics.


After configuration, the histogram is loaded and begins running on the device when doca_telemetry_adp_retx_start() is called. The bin counters can then be read from the device.

Note

doca_telemetry_adp_retx contexts do not have sole ownership or a locking mechanism on the device histogram. It is possible for another process to update the histogram's configuration while your context is in the execution phase, which can lead to misinterpretation of the bin counters.

The user is responsible for ensuring sole ownership of the histogram and verifying data integrity. An API function is provided to help detect these external changes.

The following functions are used during the execution phase:

API Datapath Functions

Description

doca_telemetry_adp_retx_read_hist_bins()

Reads the configured N histogram bin counters as an array of N 64-bit values

doca_telemetry_adp_retx_detect_hist_conf_change()

Indicates if the device's active histogram configuration matches the one defined in the context

This section outlines the states of the doca_telemetry_adp_retx context.

Idle

The context has been created and is Idle.

In this state, it is expected for the application to:

  • Destroy the context.

  • Start the context for processing.

Allowed operations:

It is possible to reach this state as follows:

Previous State

Transition Action

None

Create the context

Running

Call stop


Running

In this state it is expected for the application to:

  • Stop the context.

Allowed operations:

  • Reading data from the device according to section "Execution".

It is possible to reach this state as follows:

Previous State

Transition Action

Idle

Successfully start the context


DOCA Telemetry Adp Retx supports only CPU-based datapaths.

The doca_telemetry_adp_retx sample demonstrates how to configure the histogram from command-line arguments, run for a set period, and then print the values of the configured bin counters. This sample is also available on GitHub.

Running the Sample

  • Before you begin, refer to the following documents:

  • To build a given sample:

    Copy
    Copied!
                

    # Update path if you downloaded from GitHub cd /opt/mellanox/doca/samples/doca_telemetry/telemetry_adp_retx meson /tmp/build ninja -C /tmp/build

    The binary doca_telemetry_adp_retx is created under/tmp/build/.

  • Sample usage:

    Copy
    Copied!
                

    Usage: doca_telemetry_adp_retx [DOCA Flags] [Program Flags]   DOCA Flags: -h, --help Print a help synopsis -v, --version Print program version information -l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> --sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE> -j, --json <path> Parse all command flags from an input json file   Program Flags: -p, --pci-addr DOCA device PCI device address -u, --time-unit Time unit to use - 'nsec', 'usec', 'usec_100', or 'msec' -w, --width-mode Bin width mode to use - 'fixed', or 'double' -n, --number-bins The number of bins to configure the histogram for -vid, --vhca-id VHCA ID to get histogram events from -b0, --bin-0-width Width of bin 0 to configure histogram -b1, --bin-1-width Width of bin 1 to configure histogram -t, --wait-time Time in seconds to wait before reading histogram bins

The sample includes:

  1. Locates and opens a DOCA device.

  2. Creates a doca_telemetry_adp_retx instance.

  3. Queries the device for histogram support, max bins, and time unit capabilities.

  4. Configures the histogram with the values provided via command line (number of bins, bin widths, time unit, width mode, VHCA ID, clear on read, and counter enable).

  5. Waits for the specified time, then reads and displays the value of each bin.

  6. Destroys the doca_telemetry_adp_retx context.

© Copyright 2025, NVIDIA. Last updated on Nov 20, 2025