DOCA Documentation v3.1.0

DOCA Telemetry PCI

This guide provides instructions for building and developing applications that require telemetry data collection from NVIDIA® BlueField and NVIDIA® ConnectX® families of networking platforms.

The doca_telemetry_pci library provides access to PCIe status and performance information from NVIDIA® BlueField or ConnectX® networking platforms.

Note

DOCA Telemetry PCI is supported at alpha level.

To use DOCA Telemetry PCI, the following prerequisites must be met:

  • fwctl driver installed and loaded (see instructions in NVIDIA MLNX_OFED Documentation v24.07-0.6.1.0)

    Note

    To verify whether the fwctl driver is successfully loaded:

    Copy
    Copied!
                

    $ ls /sys/class/fwctl/

    Expected output:

    Copy
    Copied!
                

    fwctl0  fwctl1

    If the directory /sys/class/fwctl does not exist or is empty, follow these steps:

    1. Search for the fwctl package:

      Copy
      Copied!
                  

      $ apt search fwctl

      The output may indicate either fwctl-dkms or fwctl-modules.

    2. Install the appropriate package:

      Copy
      Copied!
                  

      $ sudo apt install fwctl-dkms

      Or:

      Copy
      Copied!
                  

      $ sudo apt install fwctl-modules

    3. Load the mlx5_fwctl module:

      Copy
      Copied!
                  

      $ sudo modprobe mlx5_fwctl

    4. Confirm the module is loaded:

      Copy
      Copied!
                  

      $ lsmod | grep fwctl

      Expected output:

      Copy
      Copied!
                  

      mlx5_fwctl 20480 0 fwctl 24576 1 mlx5_fwctl mlx5_core 2211840 2 mlx5_fwctl,mlx5_ib mlx_compat 20480 17 rdma_cm,ib_ipoib,mlxdevm,nvme,mlxfw,mlx5_fwctl,iw_cm,nvme_core,nvme_fabrics,ib_umad,fwctl,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core

  • Firmware version 28.43.1000 for ConnectX-7 or 32.43.1000 for BlueField-3

DOCA Telemetry-based applications can run on either the host machine (ConnectX-7 or BlueField-3 and newer) or the DPU target (BlueField-3 and newer).

DOCA Telemetry PCI provides insights into PCI devices, including:

  • Management information: PCIe link and speed details, power usage, function count, error detection flags, and more.

  • PCI performance counters: Data transfer rates, error rates, stall counters, L0 recovery count, and other performance metrics.

  • PCI latency histogram: Helps understand the duration of PCI operations.

To interact with a device, typically corresponding to a specific NIC port, create a DOCA Telemetry PCI context using doca_telemetry_pci_create().

Device Support

DOCA Telemetry PCI requires a device to operate. For picking a device, refer to "DOCA Core Device Discovery".

As device capabilities may change, it is recommended to check your device using the required set of PCI telemetry options you desire before opening it to be confident the operations you desire and available. The set of available capability checks for DOCA Telemetry PCI are out lined below:

Functionality

Method

PCI Management Information

doca_telemetry_pci_cap_management_info_is_supported()

PCI Performance Counters Group 1

doca_telemetry_pci_cap_perf_counters_1_is_supported()

PCI Performance Counters Group 2

doca_telemetry_pci_cap_perf_counters_2_is_supported()

PCI Latency Histogram

doca_telemetry_pci_cap_latency_histogram_is_supported()

Within the structures provided during the execution phase some fields are only populated which a further sub-capability is also supported:

Functionality

Field(s)

Method

PCI Management Information

pwr_status, pci_power

doca_telemetry_pci_cap_management_info_power_reporting_is_supported()

link_peer_max_speed

doca_telemetry_pci_cap_management_info_link_peer_max_speed_is_supported()

PCI Performance Counters Group 1

tx_overflow_buffer_pkt, tx_overflow_buffer_marked_pkt

doca_telemetry_pci_cap_perf_counters_1_tx_overflow_is_supported()

outbound_stalled_reads, outbound_stalled_writes, outbound_stalled_reads_events, outbound_stalled_writes_events

doca_telemetry_pci_cap_perf_counters_1_outbound_stalled_is_supported()

fec_correctable_error_counter, fec_uncorrectable_error_counter

doca_telemetry_pci_cap_perf_counters_1_fec_error_is_supported()

fber_magnitude, fber_coef

doca_telemetry_pci_cap_perf_counters_1_fber_is_supported()


Retrieving PCI Management Information

Using a running doca_telemetry_pci context which supports PCI management information the user can call doca_telemetry_pci_read_management_info as many times as they like to get the most recent data available with each call.

The following is a more complete example:

Copy
Copied!
            

doca_error_t result;   // Check for Ability to read management info result = doca_telemetry_pci_cap_management_info_is_supported(devinfo); if(result != DOCA_SUCCESS) // Capability is not supported or an error occoured, stop   // Check any sub capabilities if you require those fields   // Create PCI telemetry struct doca_telemetry_pci *pci_telem; result = doca_telemetry_pci_create(dev, &pci_telem); if(result != DOCA_SUCCESS) // Handle failure to create telemetry instance   // Start PCI telemetry result = doca_telemetry_pci_start(pci_telem); if(result != DOCA_SUCCESS) // Handle failure to start telemetry instance   // Read management info struct doca_telemetry_pci_dpn dpn = {0, 0, 0}; struct doca_telemetry_pci_management_info management_info = {};   result = doca_telemetry_pci_read_management_info(pci_telem, dpn, &management_info); if(result != DOCA_SUCCESS) // Handle failure to read data   // Use the data   // Cleanup doca_telemetry_pci_stop(pci_telem); doca_telemetry_pci_destroy(pci_telem);


Retrieving Performance Counters Group 1

Using a running doca_telemetry_pci context which supports performance counters group 1 the user can call doca_telemetry_pci_read_perf_counters_1 as many times as they like to get the most recent data available with each call.

The following is a more complete example:

Copy
Copied!
            

doca_error_t result;   // Check for Ability to read perf counters group 1 result = doca_telemetry_pci_cap_perf_counters_1_is_supported(devinfo); if(result != DOCA_SUCCESS) // Capability is not supported or an error occoured, stop   // Check any sub capabilities if you require those fields   // Create PCI telemetry struct doca_telemetry_pci *pci_telem; result = doca_telemetry_pci_create(dev, &pci_telem); if(result != DOCA_SUCCESS) // Handle failure to create telemetry instance   // Start PCI telemetry result = doca_telemetry_pci_start(pci_telem); if(result != DOCA_SUCCESS) // Handle failure to start telemetry instance   // Read perf counters group 1 struct doca_telemetry_pci_dpn dpn = {0, 0, 0}; struct doca_telemetry_pci_perf_counters_1 counters= {};   result = doca_telemetry_pci_read_perf_counters_1(pci_telem, dpn, &counters); if(result != DOCA_SUCCESS) // Handle failure to read data   // Use the data   // Cleanup doca_telemetry_pci_stop(pci_telem); doca_telemetry_pci_destroy(pci_telem);


Retrieving Performance Counters Group 2

Using a running doca_telemetry_pci context which supports performance counters group 2 the user can call doca_telemetry_pci_read_perf_counters_2 as many times as they like to get the most recent data available with each call.

The following is a more complete example:

Copy
Copied!
            

doca_error_t result;   // Check for Ability to read perf counters group 2 result = doca_telemetry_pci_cap_perf_counters_2_is_supported(devinfo); if(result != DOCA_SUCCESS) // Capability is not supported or an error occoured, stop   // Create PCI telemetry struct doca_telemetry_pci *pci_telem; result = doca_telemetry_pci_create(dev, &pci_telem); if(result != DOCA_SUCCESS) // Handle failure to create telemetry instance   // Start PCI telemetry result = doca_telemetry_pci_start(pci_telem); if(result != DOCA_SUCCESS) // Handle failure to start telemetry instance   // Read perf counters group 2 struct doca_telemetry_pci_dpn dpn = {0, 0, 0}; struct doca_telemetry_pci_perf_counters_2 counters= {};   result = doca_telemetry_pci_read_perf_counters_2(pci_telem, dpn, &counters); if(result != DOCA_SUCCESS) // Handle failure to read data   // Use the data   // Cleanup doca_telemetry_pci_stop(pci_telem); doca_telemetry_pci_destroy(pci_telem);


Retrieving Latency Histogram

Using a running doca_telemetry_pci context which supports latency histogram the user must first call doca_telemetry_pci_get_latency_histogram_dimensions to learn the correct dimmensions of the histogram. They can then allocate an array of histogram values and then finally they can call doca_telemetry_pci_read_latency_histogram as many times as they like to get the most recent data available with each call.

The following is a more complete example:

Copy
Copied!
            

doca_error_t result;   // Check for Ability to read perf counters group 2 result = doca_telemetry_pci_cap_latency_histogram_is_supported(devinfo); if(result != DOCA_SUCCESS) // Capability is not supported or an error occoured, stop   // Create PCI telemetry struct doca_telemetry_pci *pci_telem; result = doca_telemetry_pci_create(dev, &pci_telem); if(result != DOCA_SUCCESS) // Handle failure to create telemetry instance   // Start PCI telemetry result = doca_telemetry_pci_start(pci_telem); if(result != DOCA_SUCCESS) // Handle failure to start telemetry instance   // Learn the histograms dimmensions struct doca_telemetry_pci_dpn dpn = {0, 0, 0}; uint32_t bucket_count; uint32_t bucket_width_ns; result = doca_telemetry_pci_get_latency_histogram_dimensions(pci_telem, dpn, &bucket_count, &bucket_width_ns); if(result != DOCA_SUCCESS) // Handle failure to get histogram dimmensions   // Allocate memory to hold histogram data uint64_t* buckets_arr = malloc(bucket_count * sizeof(uint64_t)); if( buckets_arr == NULL) // Handle failure to allocate memory   // Fetch histogram data result = doca_telemetry_pci_read_latency_histogram(pci_telem, dpn, buckets_arr); if(result != DOCA_SUCCESS) // Handle failure to read data   // Use the data   // Cleanup free(buckets_arr); doca_telemetry_pci_stop(pci_telem); doca_telemetry_pci_destroy(pci_telem);


DOCA Telemetry PCI supports only CPU-based datapaths.

© Copyright 2025, NVIDIA. Last updated on Sep 4, 2025.