DOCA Telemetry DPA
This guide provides instructions for building and developing applications that require telemetry data collection from NVIDIA® BlueField® and NVIDIA® ConnectX® networking platforms using the DOCA Telemetry DPA API.
DOCA Telemetry DPA is supported at alpha level.
The DOCA Telemetry DPA library provides access to detailed telemetry data and performance statistics for the Data Path Accelerator (DPA) on supported NVIDIA networking platforms. With its API, developers can monitor and analyze DPA processes, threads, and profiling data for efficient application performance optimization.
To use DOCA Telemetry DPA, the following prerequisites must be met:
fwctl
driver installed and loaded (see instructions in NVIDIA MLNX_OFED Documentation v24.07-0.6.1.0)NoteTo verify whether the
fwctl
driver is successfully loaded:$
ls
/sys/class/fwctl/Expected output:
fwctl0 fwctl1
If the directory
/sys/class/fwctl
does not exist or is empty, follow these steps:Search for the
fwctl
package:$ apt search fwctl
The output may indicate either
fwctl-dkms
orfwctl-modules
.Install the appropriate package:
$
sudo
aptinstall
fwctl-dkmsOr:
$
sudo
aptinstall
fwctl-modulesLoad the
mlx5_fwctl
module:$
sudo
modprobe mlx5_fwctlConfirm the module is loaded:
$ lsmod |
grep
fwctlExpected output:
mlx5_fwctl 20480 0 fwctl 24576 1 mlx5_fwctl mlx5_core 2211840 2 mlx5_fwctl,mlx5_ib mlx_compat 20480 17 rdma_cm,ib_ipoib,mlxdevm,nvme,mlxfw,mlx5_fwctl,iw_cm,nvme_core,nvme_fabrics,ib_umad,fwctl,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
Firmware version 28.43.1000 for ConnectX-7 or 32.43.1000 for BlueField-3
DOCA Telemetry DPA-based applications can run on:
Host machines – ConnectX-7 or BlueField-3 and newer
DPU targets – BlueField-3 and newer
DOCA Telemetry DPA provides comprehensive profiling data, including:
Process and thread information – Monitor all active DPA processes and threads
Cumulative performance counters – Track cumulative performance metrics to evaluate application behavior
Event tracer data – Capture detailed event-based traces for in-depth analysis
To interact with a device, users must create a DOCA Telemetry DPA context using doca_telemetry_dpa_create()
. Each context is independent of DOCA DPA contexts, meaning changes in DPA configurations are not automatically reflected in the telemetry context.
A device typically corresponds to a specific port on a NIC.
Performance counters are not owned by the doca_telemetry_dpa
context. Other active telemetry contexts could manage the same counters. Ensure only a single context is used to profile the DPA to avoid conflicts.
Device Support
DOCA Telemetry DPA requires a device to operate. For picking a device, refer to "DOCA Core Device Discovery".
As device capabilities may change (see DOCA Core Device Support), it is recommended to check your device using the
doca_telemetry_dpa_cap_is_supported()
method:
Output Structure Format
The user application is responsible for allocating the output structures. To that end, DOCA Telemetry DPA provides helper methods that return the structure size in bytes (see section Execution Phase for more details).
The
doca_telemetry_dpa
context supports the following layout structures for the profile data:
doca telemetry_dpa_process_info | |
| Global DPA process ID |
| Number of threads in the process |
| The name of the process |
doca_telemetry_dpa_thread_info | |
| Global DPA process ID |
| Global DPA thread ID |
| The name of the thread |
doca_telemetry_dpa_cumul_info | |
| Global DPA process ID |
| Global DPA thread ID |
| Total time in ticks the thread has been active |
| Total execution unit cycles the thread used |
| Total number of instructions the thread executed |
| Total number of thread executions |
doca_telemetry_dpa_event_sample | |
| Timestamp in µsec |
| Stamp of total execution unit (EU) cycles |
| Stamp of total number of instructions of this DPA EU |
| Global DPA thread ID |
| Execution unit ID |
|
Running sample ID per EU. A single |
| Type of event sample:
|
The user can retrieve the DPA timer ticks frequency, given in kHZ, using
doca_telemetry_dpa_get_dpa_timer_freq()
. With this
frequency, timer ticks can be converted to running clock using the formula: clock_time = ticks/dpa_timer_frequency
.
Establishing the Amount of Event Tracer Samples
The user must set the maximum amount of event tracer samples to retrieve. This value can be set using
doca_telemetry_dpa_set_max_perf_event_samples()
and retrieved using
doca_telemetry_dpa_get_max_perf_event_samples()
.
Addressing All Processes and Threads
To retrieve a unique ID to address all processes running on the DPA:
uint32_t all_processes_id; doca_telemetry_dpa_get_all_process_id(&all_processes_id);
To retrieve a unique ID to address all threads running on the DPA:
uint32_t all_threads_id; doca_telemetry_dpa_get_all_threads_id(&all_threads_id);
Retrieving Running DPA Process Information
I nformation about specific process or all processes running on the DPA can be retrieved using these steps:
Get memory size for process list allocation:
uint32_t size; doca_telemetry_dpa_get_process_list_size(context, process_id, &size);
Retrieve process list:
doca_telemetry_dpa_read_processes_list(context, processs_id, &process_num, &process_list);
Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_processes_list()
.
Retrieving Running DPA Thread Information
Information about specific thread or all threads r unning on the DPA can be retrieved using these steps:
Get memory size for thread list allocation:
uint32_t size; doca_telemetry_dpa_get_thread_list_size(context, process_id, thread_id, &size);
Retrieve thread list:
doca_telemetry_dpa_read_thread_list(context, process_id, thread_id, &threads_num, &thread_list);
If the process ID is set to address all processes, the thread ID must also be set to address all threads.
Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_thread_list()
.
Retrieving Cumulative Performance Profile Information Samples
Cumulative performance samples can be retrieved for specific processes and threads or for all processes and threads (see Address all processes and threads) running on the DPA using these steps:
Start cumulative counter/s:
doca_telemetry_dpa_counter_start(context, process_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_CUMULATIVE_EVENT);
Stop cumulative counter/s:
doca_telemetry_dpa_counter_stop(context, process_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_CUMULATIVE_EVENT);
Get memory size for cumulative samples list allocation:
uint32_t size; doca_telemetry_dpa_get_cumul_samples_size(context, process_id, thread_id, &size);
Retrieve cumulative sample list:
doca_telemetry_dpa_read_cumul_info_list(context, process_id, thread_id, &cumul_samples_num, &cumul_info_list);
The current state and type of a counter can be retrieved using doca_telemetry_dpa_get_counter_state()
and doca_telemetry_dpa_get_counter_type()
.
If the process ID is set to address all processes, the thread ID must also be set to address all threads.
Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_cumul_info_list()
.
Reset counters using doca_telemetry_dpa_counter_restart()
to clear previous data.
Retrieving Event Tracer Profile Information Samples
Event tracer samples can be retrieved for specific processes and threads or for all processes and threads running on the DPA using these steps:
Start event tracer counter/s:
doca_telemetry_dpa_counter_start(context, all_processes_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_EVENT_TRACER);
Stop event tracer counter/s:
doca_telemetry_dpa_counter_stop(context, all_processes_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_EVENT_TRACER);
Get memory size for event samples list allocation:
uint32_t size; doca_telemetry_dpa_get_perf_event_samples_size(context, process_id, thread_id, &size);
Retrieve event tracer sample list:
doca_telemetry_dpa_read_perf_event_list(context, process_id, thread_id, &perf_event_samples_num, &event_info_list);
The current state and type of a counter can be retrieved using doca_telemetry_dpa_get_counter_state()
and doca_telemetry_dpa_get_counter_type()
.
Ensure the maximum number of samples is set before starting counters using
doca_telemetry_dpa_set_max_perf_event_samples()
.
If the process ID is set to address all processes, the thread ID must also be set to address all threads.
Ensure memory is properly allocated using the retrieved size before calling
doca_telemetry_dpa_read_perf_event_list()
.
Reset counters using doca_telemetry_dpa_counter_restart()
before reuse.
The following section describes the different states the doca_telemetry_dpa
context goes through, how to move between states and what is allowed in each state.
Idle
The context has been created and is idle.
In this state, it is expected for the application to:
Destroy the context
Start the context for processing
Allowed operations:
Configuring the context according to section "Configuration Phase"
It is possible to reach this state as follows:
Previous State | Transition Action |
None | Create the context |
Running | Call stop |
Running
In this state it is expected for the application to:
Stop the context.
Retrieve process information list
Retrieve thread information list.
Start/stop/reset counters for profiling capabilities.
Retrieve profile samples for cumulative performance counters.
Retrieve profile samples for event tracer.
Allowed operations:
Calling stop, moving the application to "Idle" state
It is possible to reach this state as follows:
Previous State | Transition Action |
Idle | Successfully start the context |
There are currently no state restrictions on the majority of API functions.
DOCA Telemetry DPA supports only CPU-based datapaths.
Running the Sample
Refer to the following documents:
DOCA Installation Guide for Linux for details on how to install BlueField-related software.
NVIDIA BlueField Platform Software Troubleshooting Guide for any issue you may encounter with the installation, compilation, or execution of DOCA samples.
To build a given sample, run the following command. If you downloaded the sample from GitHub, update the path in the first line to reflect the location of the sample file:
cd /opt/mellanox/doca/samples/doca_telemetry/telemetry_dpa meson /tmp/build ninja -C /tmp/build
InfoThe binary
doca_telemetry_dpa
is created under/tmp/build/
.
Sample usage:
Usage: doca_telemetry_dpa [DOCA Flags] [Program Flags]
DOCA Flags:
-h, --help Print a help synopsis
-v, --version Print program version information
-l, --log-level Set the (numeric) log level for
the program <10
=DISABLE, 20
=CRITICAL, 30
=ERROR, 40
=WARNING, 50
=INFO, 60
=DEBUG, 70
=TRACE>
--sdk-log-level Set the SDK (numeric) log level for
the program <10
=DISABLE, 20
=CRITICAL, 30
=ERROR, 40
=WARNING, 50
=INFO, 60
=DEBUG, 70
=TRACE>
-j, --json <path> Parse all command flags from an input json file
Program Flags:
-p, --pci-addr DOCA device PCI device address
-rt, --sample-run-time Total sample run time, in miliseconds
-ct, --counter-type Counter type, cumulus (0
) or event (1
)
-es, --event-samples Set the maximum number of perf event samples to retrieve
-pi, --process-id Specific process id to address
-ti, --thread-id Specific thread id to address
The sample includes:
Locating and opening a DOCA device.
Creating a
doca_telemetry_dpa
instance.Retrieval of all or one specific process
Retrieval of all or one specific thread
Starting counters for the selected profile capability
Retrieving the profile samples for the selected profile capability
Displaying the retrieve profile information.
Destroying the
doca_telemetry_dpa
context.