What can I help you with?
DOCA Documentation v3.0.0

DOCA Telemetry DPA

This guide provides instructions for building and developing applications that require telemetry data collection from NVIDIA® BlueField® and NVIDIA® ConnectX® networking platforms using the DOCA Telemetry DPA API.

Note

DOCA Telemetry DPA is supported at alpha level.

The DOCA Telemetry DPA library provides access to detailed telemetry data and performance statistics for the Data Path Accelerator (DPA) on supported NVIDIA networking platforms. With its API, developers can monitor and analyze DPA processes, threads, and profiling data for efficient application performance optimization.

To use DOCA Telemetry DPA, the following prerequisites must be met:

  • fwctl driver installed and loaded (see instructions in NVIDIA MLNX_OFED Documentation v24.07-0.6.1.0)

    Note

    To verify whether the fwctl driver is successfully loaded:

    Copy
    Copied!
                

    $ ls /sys/class/fwctl/

    Expected output:

    Copy
    Copied!
                

    fwctl0  fwctl1

    If the directory /sys/class/fwctl does not exist or is empty, follow these steps:

    1. Search for the fwctl package:

      Copy
      Copied!
                  

      $ apt search fwctl

      The output may indicate either fwctl-dkms or fwctl-modules.

    2. Install the appropriate package:

      Copy
      Copied!
                  

      $ sudo apt install fwctl-dkms

      Or:

      Copy
      Copied!
                  

      $ sudo apt install fwctl-modules

    3. Load the mlx5_fwctl module:

      Copy
      Copied!
                  

      $ sudo modprobe mlx5_fwctl

    4. Confirm the module is loaded:

      Copy
      Copied!
                  

      $ lsmod | grep fwctl

      Expected output:

      Copy
      Copied!
                  

      mlx5_fwctl 20480 0 fwctl 24576 1 mlx5_fwctl mlx5_core 2211840 2 mlx5_fwctl,mlx5_ib mlx_compat 20480 17 rdma_cm,ib_ipoib,mlxdevm,nvme,mlxfw,mlx5_fwctl,iw_cm,nvme_core,nvme_fabrics,ib_umad,fwctl,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core

  • Firmware version 28.43.1000 for ConnectX-7 or 32.43.1000 for BlueField-3

DOCA Telemetry DPA-based applications can run on:

  • Host machines – ConnectX-7 or BlueField-3 and newer

  • DPU targets – BlueField-3 and newer

DOCA Telemetry DPA provides comprehensive profiling data, including:

  • Process and thread information – Monitor all active DPA processes and threads

  • Cumulative performance counters – Track cumulative performance metrics to evaluate application behavior

  • Event tracer data – Capture detailed event-based traces for in-depth analysis

To interact with a device, users must create a DOCA Telemetry DPA context using doca_telemetry_dpa_create(). Each context is independent of DOCA DPA contexts, meaning changes in DPA configurations are not automatically reflected in the telemetry context. A device typically corresponds to a specific port on a NIC.

Note

Performance counters are not owned by the doca_telemetry_dpa context. Other active telemetry contexts could manage the same counters. Ensure only a single context is used to profile the DPA to avoid conflicts.

Device Support

DOCA Telemetry DPA requires a device to operate. For picking a device, refer to "DOCA Core Device Discovery".

As device capabilities may change (see DOCA Core Device Support), it is recommended to check your device using the doca_telemetry_dpa_cap_is_supported() method:

Output Structure Format

The user application is responsible for allocating the output structures. To that end, DOCA Telemetry DPA provides helper methods that return the structure size in bytes (see section Execution Phase for more details).

The doca_telemetry_dpa context supports the following layout structures for the profile data:

doca telemetry_dpa_process_info

dpa_process_id

Global DPA process ID

num_of_threads

Number of threads in the process

process_name

The name of the process

doca_telemetry_dpa_thread_info

dpa_process_id

Global DPA process ID

dpa_thread_id

Global DPA thread ID

thread_name

The name of the thread

doca_telemetry_dpa_cumul_info

dpa_process_id

Global DPA process ID

dpa_thread_id

Global DPA thread ID

time

Total time in ticks the thread has been active

cycles

Total execution unit cycles the thread used

instructions

Total number of instructions the thread executed

num_executions

Total number of thread executions

doca_telemetry_dpa_event_sample

timestamp

Timestamp in µsec

cycles

Stamp of total execution unit (EU) cycles

instructions

Stamp of total number of instructions of this DPA EU

dpa_thread_id

Global DPA thread ID

eu_id

Execution unit ID

sample_id_in_eu

Running sample ID per EU. A single sample_id is assigned to both schedule in and out samples.

type

Type of event sample:

  • DOCA_TELEMETRY_DPA_EVENT_SAMPLE_TYPE_EMPTY_SAMPLE

  • DOCA_TELEMETRY_DPA_EVENT_SAMPLE_TYPE_SCHEDULE_IN

  • DOCA_TELEMETRY_DPA_EVENT_SAMPLE_TYPE_SCHEDULE_OUT

  • DOCA_TELEMETRY_DPA_EVENT_SAMPLE_TYPE_BUFFER_FULL

Note

The user can retrieve the DPA timer ticks frequency, given in kHZ, using doca_telemetry_dpa_get_dpa_timer_freq(). With this frequency, timer ticks can be converted to running clock using the formula: clock_time = ticks/dpa_timer_frequency.


Establishing the Amount of Event Tracer Samples

The user must set the maximum amount of event tracer samples to retrieve. This value can be set using doca_telemetry_dpa_set_max_perf_event_samples() and retrieved using doca_telemetry_dpa_get_max_perf_event_samples().

Addressing All Processes and Threads

  • To retrieve a unique ID to address all processes running on the DPA:

    Copy
    Copied!
                

    uint32_t all_processes_id; doca_telemetry_dpa_get_all_process_id(&all_processes_id);

  • To retrieve a unique ID to address all threads running on the DPA:

    Copy
    Copied!
                

    uint32_t all_threads_id; doca_telemetry_dpa_get_all_threads_id(&all_threads_id);

Retrieving Running DPA Process Information

I nformation about specific process or all processes running on the DPA can be retrieved using these steps:

  1. Get memory size for process list allocation:

    Copy
    Copied!
                

    uint32_t size; doca_telemetry_dpa_get_process_list_size(context, process_id, &size);

  2. Retrieve process list:

    Copy
    Copied!
                

    doca_telemetry_dpa_read_processes_list(context, processs_id, &process_num, &process_list);

Note

Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_processes_list().


Retrieving Running DPA Thread Information

Information about specific thread or all threads r unning on the DPA can be retrieved using these steps:

  1. Get memory size for thread list allocation:

    Copy
    Copied!
                

    uint32_t size; doca_telemetry_dpa_get_thread_list_size(context, process_id, thread_id, &size);

  2. Retrieve thread list:

    Copy
    Copied!
                

    doca_telemetry_dpa_read_thread_list(context, process_id, thread_id, &threads_num, &thread_list);

Note

If the process ID is set to address all processes, the thread ID must also be set to address all threads.

Note

Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_thread_list() .


Retrieving Cumulative Performance Profile Information Samples

Cumulative performance samples can be retrieved for specific processes and threads or for all processes and threads (see Address all processes and threads) running on the DPA using these steps:

  1. Start cumulative counter/s:

    Copy
    Copied!
                

    doca_telemetry_dpa_counter_start(context, process_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_CUMULATIVE_EVENT);

  2. Stop cumulative counter/s:

    Copy
    Copied!
                

    doca_telemetry_dpa_counter_stop(context, process_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_CUMULATIVE_EVENT);

  3. Get memory size for cumulative samples list allocation:

    Copy
    Copied!
                

    uint32_t size; doca_telemetry_dpa_get_cumul_samples_size(context, process_id, thread_id, &size);

  4. Retrieve cumulative sample list:

    Copy
    Copied!
                

    doca_telemetry_dpa_read_cumul_info_list(context, process_id, thread_id, &cumul_samples_num, &cumul_info_list);

Info

The current state and type of a counter can be retrieved using doca_telemetry_dpa_get_counter_state() and doca_telemetry_dpa_get_counter_type().

Note

If the process ID is set to address all processes, the thread ID must also be set to address all threads.

Note

Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_cumul_info_list() .

Note

Reset counters using doca_telemetry_dpa_counter_restart() to clear previous data.


Retrieving Event Tracer Profile Information Samples

Event tracer samples can be retrieved for specific processes and threads or for all processes and threads running on the DPA using these steps:

  1. Start event tracer counter/s:

    Copy
    Copied!
                

    doca_telemetry_dpa_counter_start(context, all_processes_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_EVENT_TRACER);

  2. Stop event tracer counter/s:

    Copy
    Copied!
                

    doca_telemetry_dpa_counter_stop(context, all_processes_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_EVENT_TRACER);

  3. Get memory size for event samples list allocation:

    Copy
    Copied!
                

    uint32_t size; doca_telemetry_dpa_get_perf_event_samples_size(context, process_id, thread_id, &size);

  4. Retrieve event tracer sample list:

    Copy
    Copied!
                

    doca_telemetry_dpa_read_perf_event_list(context, process_id, thread_id, &perf_event_samples_num, &event_info_list);

Info

The current state and type of a counter can be retrieved using doca_telemetry_dpa_get_counter_state() and doca_telemetry_dpa_get_counter_type().

Note

Ensure the maximum number of samples is set before starting counters using doca_telemetry_dpa_set_max_perf_event_samples() .

Note

If the process ID is set to address all processes, the thread ID must also be set to address all threads.

Note

Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_perf_event_list().

Note

Reset counters using doca_telemetry_dpa_counter_restart() before reuse.


The following section describes the different states the doca_telemetry_dpa context goes through, how to move between states and what is allowed in each state.

Idle

The context has been created and is idle.

In this state, it is expected for the application to:

  • Destroy the context

  • Start the context for processing

Allowed operations:

It is possible to reach this state as follows:

Previous State

Transition Action

None

Create the context

Running

Call stop


Running

In this state it is expected for the application to:

  1. Stop the context.

  2. Retrieve process information list

  3. Retrieve thread information list.

  4. Start/stop/reset counters for profiling capabilities.

  5. Retrieve profile samples for cumulative performance counters.

  6. Retrieve profile samples for event tracer.

Allowed operations:

  • Calling stop, moving the application to "Idle" state

It is possible to reach this state as follows:

Previous State

Transition Action

Idle

Successfully start the context

Note

There are currently no state restrictions on the majority of API functions.


DOCA Telemetry DPA supports only CPU-based datapaths.

Running the Sample

  1. Refer to the following documents:

    1. DOCA Installation Guide for Linux for details on how to install BlueField-related software.

    2. NVIDIA BlueField Platform Software Troubleshooting Guide for any issue you may encounter with the installation, compilation, or execution of DOCA samples.

  2. To build a given sample, run the following command. If you downloaded the sample from GitHub, update the path in the first line to reflect the location of the sample file:

    Copy
    Copied!
                

    cd /opt/mellanox/doca/samples/doca_telemetry/telemetry_dpa meson /tmp/build ninja -C /tmp/build

    Info

    The binary doca_telemetry_dpa is created under /tmp/build/.

Sample usage:

Copy
Copied!
            

Usage: doca_telemetry_dpa [DOCA Flags] [Program Flags]   DOCA Flags:   -h, --help                        Print a help synopsis   -v, --version                     Print program version information   -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>   -j, --json <path>                 Parse all command flags from an input json file   Program Flags:   -p, --pci-addr                    DOCA device PCI device address   -rt, --sample-run-time            Total sample run time, in miliseconds   -ct, --counter-type               Counter type, cumulus (0) or event (1)   -es, --event-samples              Set the maximum number of perf event samples to retrieve   -pi, --process-id                 Specific process id to address   -ti, --thread-id                  Specific thread id to address

The sample includes:

  1. Locating and opening a DOCA device.

  2. Creating a doca_telemetry_dpa instance.

  3. Retrieval of all or one specific process

  4. Retrieval of all or one specific thread

  5. Starting counters for the selected profile capability

  6. Retrieving the profile samples for the selected profile capability

  7. Displaying the retrieve profile information.

  8. Destroying the doca_telemetry_dpa context.

© Copyright 2025, NVIDIA. Last updated on May 5, 2025.