DOCA Telemetry Diagnostics
This guide provides instructions on building and developing applications which require collecting telemetry information provided by NVIDIA® BlueField and NVIDIA® ConnectX® families of networking platforms.
The doca_telemetry_diag provides programable access to an on-device mechanism which allows sampling of diagnostic data (such as statistics and counters). The doca_telemetry_diag allows configuring such parameters as required data IDs or sampling period, and retrieving the generated information in several formats.
Diagnostic data is stored in hardware as a cyclic buffer of samples. Each sample represents all the requested diagnostic data IDs and their corresponding sampling timestamps. The sampling period and the number of samples in the buffer can be configured.
The DOCA Telemetry Diagnostics library supports the following operational methods:
Single sampling – the samples are stored and once the samples buffer is filled, sampling is terminated
Repetitive sampling – when the sample buffer is filled, new samples override old samples
On demand – the device does not collect samples. Upon query of the diagnostic data, the device fetches a single sample of the data.
Samples are retrieved by calling the doca_telemetry_diag_query_counters function. Multiple samples can be retrieved in a single call. The application defines the maximum number of samples it wishes to retrieve and supplies a buffer large enough to contain these samples (sample size can be received using a dedicated API). The library only retrieves new samples without duplications and returns fewer samples than requested if there are no more new samples.
Synchronized Start
Diagnostics data is sampled by the device every given sampling period. When sampling this way, each data entry in a sample may be recorded at a slightly different time.
Synchronized start mode enables diagnostics counters to begin all data measurements at the same time (i.e., during the same clock cycle). This way, the sample period is guaranteed to be identical for all samples.
In synchronized start mode, counters are stopped during the collection time of each sample.
Not all data IDs can be sampled in synchronized start mode. Setting a data ID failure with the error code DOCA_ERROR_BAD_CONFIG indicates that the given data ID does not support synchronized start mode.
Synchronized start diagnostic counters can be cleared at the beginning of each sampling period.
The following diagrams illustrate how synchronized start affects the sampling timeline:
Output Formats
doca_telemetry_diag supports the following layout modes of the sampled data:
Mode 0 – data_id is present in the output; data size is 64 bits; timestamp information per data
Mode 1 – no data_id in the output; data size is 64 bits; timestamp information per sample (start and end)
Mode 2 – no data_id in the output; data size is 32 bits; timestamp information per sample (start and end)
The sample layout of these modes is illustrated in the following diagrams:
Device and Ownership
doca_telemetry_diag requires a ConnectX/BlueField DOCA device to sample from. The device can be accessed using any of its physical functions (PFs). If multiple devices exist in a setup, a doca_telemetry_diag context should be created for each device.
doca_telemetry_diag, is designed to operate as a singleton per device. Upon creation, the doca_telemetry_diag context assumes control of the associated hardware resources to prevent conflicts and ensure accurate data sampling. In rare instances, ownership may be overridden (e.g., if a process crashed before releasing ownership). The force_ownership parameter may be used when creating the context from a second process.
Once ownership is enforced for one PF, it cannot be claimed by a different PF. It is recommended to always use PF0 to prevent potential conflicts.
State Machine
The doca_telemetry_diag context goes through the following states as it is being set up:
Idle – context is created. Ownership is taken. Capabilities can be queried. All configuration setters should be called except for configuring data IDs.
Configured – after calling apply_configuration. Internal initialization is called based on the applied configuration. Data IDs should be configured.
Ready – after setting the data IDs. Context is ready to start sampling.
Running – samples are generated and can be retrieved.
Data IDs
The on-device mechanism provides the following diagnostic data classes:
Counter – monotonically increasing and counting different events in the device.
If doca_telemetry_diag_set_data_clear is set, the counters are cleared at the beginning of each sampling period (valid only if synchronized start mode is used and operational mode is set to single or repetitive sampling).
Statistic – other collected diagnostic data about the performance of the device. Statistic diagnostic data is cleared on each sample.
Each diagnostic data is represented by a unique identifier, the data ID. Appendix "List of Supported Data IDs" lists the currently supported data IDs.
After applying the configuration, the list of data IDs to be sampled can be applied by calling doca_telemetry_diag_apply_counters_list_by_id. Not all combinations of data IDs can be configured. If any of the_data_ids fail to be configured, the operation fails, returning the index of the failed data ID and the reason of failure. The operation can be retried after omitting the faulty data ID.
This section describes a telemetry diagnostics sample based on the doca_telemetry_diag library. The sample illustrates the utilization of DOCA telemetry diagnostics APIs to initialize and configure the doca_telemetry_diag context, as well as querying and parsing diagnostic counters.
Sample usage:
Usage: doca_telemetry_diag [DOCA Flags] [Program Flags]
DOCA Flags:
-h, --help Print a help synopsis
-v, --version Print program version information
-l, --log-level Set the (numeric) log level for
the program <10
=DISABLE, 20
=CRITICAL, 30
=ERROR, 40
=WARNING, 50
=INFO, 60
=DEBUG, 70
=TRACE>
--sdk-log-level Set the SDK (numeric) log level for
the program <10
=DISABLE, 20
=CRITICAL, 30
=ERROR, 40
=WARNING, 50
=INFO, 60
=DEBUG, 70
=TRACE>
-j, --json <path> Parse all command flags from an input json file
Program Flags:
-p, --pci-addr DOCA device PCI device address
-o, --output Output CSV file - default
: "/tmp/out.csv"
-rt, --sample-run-time Total sample run time, in seconds
-sp, --sample-period Sample period, in nanoseconds
-ns, --log-num-samples Log max number of samples
-sr, --max-samples-per-read Max num samples per read
-sm, --sync-mode Enable sync mode
The following table lists the data IDs currently supported by DOCA:
Name |
Description |
Data Class |
Data ID |
port_rx_bytes |
The number of received bytes on the physical port 1 |
Counter |
0x10200001000000XX
|
port_priority_rx_bytes |
The number of received bytes on the physical port and priority 1 |
Counter |
0x1020000200000YXX
|
port_rx_packets |
The number of received packets on the physical port 1 |
Counter |
0x10200003000000XX
|
port_priority_rx_packets |
The number of received packets on the physical port and priority 1 |
Counter |
0x1020000400000YXX
|
port_rx_discard_buf_packets |
The number of received packets dropped due to lack of buffers on a physical port |
Counter |
0x10200005000000XX
|
port_priority_rx_pauses_packets |
The number of link-layer pause packets received on a physical port and priority |
Counter |
0x1020000600000YXX
|
host_rx_transport_out_of_buffer_packets |
The number of dropped packets due to a lack of WQE for the associated QPs/RQs (excluding hairpin QPs/RQs) |
Counter |
0x10800002000000XX
|
host_rx_transport_out_of_buffer_hairpin_packets |
The number of dropped packets due to a lack of WQE for the associated hairpin QPs/RQs |
Counter |
0x10800003000000XX
|
port_rx_transport_ecn_packets |
The number of RoCEv2 packets received by the notification point which were marked for experiencing the congestion (i.e., ECN bits 11 on the ingress RoCE traffic), per port |
Counter |
0x10800004000000XX
|
port_rx_transport_cnp_handled_packets |
The number of CNP received packets handled by the Reaction Point, per port |
Counter |
0x10800005000000XX
|
port_tx_transport_cnp_sent_packets |
The number of CNP packets sent by the Notification Point, per port |
Counter |
0x11000001000000XX
|
tx_transport_done_due_to_cc_deschedule_events |
The number of QP descheduled due to congestion control rate limitation |
Counter |
0x1100000200000000 |
port_tx_bytes |
The number of transmitted bytes on the physical port (excluding loopback traffic) |
Counter |
0x11400001000000XX
|
port_priority_tx_bytes |
The number of transmitted bytes on the physical port and priority (excluding loopback traffic) |
Counter |
0x1140000200000YXX
|
port_tx_packets |
The number of transmitted packets on the physical port (excluding loopback traffic) |
Counter |
0x11400003000000XX
|
port_priority_tx_packets |
The number of transmitted packets on the physical port and priority (excluding loopback traffic) |
Counter |
0x1140000400000YXX
|
port_priority_tx_pauses_packets |
The number of link-layer pause packets transmitted on a physical port and priority |
Counter |
0x1140000500000YXX XX - Port ID
|
pcie_link_inbound_bytes |
The number of bytes received from the PCIe toward the device, per PCIe link |
Counter |
0x1160000100ZZYYXX
|
pcie_link_outbound_bytes |
The number of bytes transmitted from the device toward the PCIe, per PCIe link |
Counter |
0x1160000200ZZYYXX
|
pcie_link_inbound_data_bytes |
The number of data bytes received from the PCIe (excluding headers) toward the device, per PCIe link |
Counter |
0x1160000200ZZYYXX
|
pcie_link_outbound_data_bytes |
The number of data bytes transmitted from the device toward the PCI (excluding headers), per PCIe link |
Counter |
0x1160000400ZZYYXX
|
pcie_link_write_stalled_time_no_posted_data_credits_ns |
The time period (in nanoseconds) in which the device had outbound posted write requests but stalled due to insufficient data credits per PCIe link |
Counter |
0x1160000500ZZYYXX
|
pcie_link_write_stalled_time_no_posted_header_credits_ns |
The time period (in nanoseconds) in which the device had outbound posted write requests but stalled due to insufficient header credits per PCIe link |
Counter |
0x1160000600ZZYYXX
|
pcie_link_read_stalled_time_no_non_posted_data_credits_ns |
The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to insufficient data credits per PCIe link |
Counter |
0x1160000700ZZYYXX
|
pcie_link_read_stalled_time_no_non_posted_header_credits_ns |
The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to insufficient header credits per PCIe link |
Counter |
0x1160000800ZZYYXX
|
pcie_link_read_stalled_time_no_completion_buffers_ns |
The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to no NIC completion buffers per PCIe link |
Counter |
0x1160000900ZZYYXX
|
pcie_link_tclass_read_stalled_time_ordering_ns |
The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to PCIe ordering semantics per PCIe link and PCIe tclass |
Counter |
0x1160000aZZZZYYXX
|
pcie_link_latency_total_read_ns |
The total latency (in nanoseconds) for all PCIe read from the device per PCIe link Info
Dividing this counter by pcie_link_latency_total_read_packets yields the average PCIe read latency of those reads.
|
Counter |
0x1160000b00ZZYYXX
|
pcie_link_latency_total_read_packets |
The total number of packets used for the pcie_link_latency_total_read_ns calculation |
Counter |
0x1160000c00ZZYYXX
|
pcie_link_latency_max_read_ns |
The maximum latency (in nanoseconds) for a single PCIe read from the device per PCIe link |
Statistic |
0x1160000d00ZZYYXX
|
pcie_link_latency_min_read_ns |
The maximum latency (in nanoseconds) for a single PCIe read from the device per PCIe link |
Statistic |
0x1160000e00ZZYYXX
|
global_completion_engine_rx_cqes |
Number of responder (RX) CQEs |
Counter |
0x10c0000100000000 |
function_completion_engine_rx_cqes |
Number of RX CQEs per function |
Counter |
0x10c000020000XXXX
|
global_completion_engine_tx_cqes |
Number of requestor (TX) CQEs |
Counter |
x10c0000400000000 |
function_completion_engine_tx_cqes |
Number of TX CQEs per function |
Counter |
0x10c000050000XXXX
|
global_icmc_request |
Number of accesses to ICMC |
Counter |
0x1180000100000000 |
global_icmc_hit |
Number of ICMC hits |
Counter |
0x1180000200000000 |
global_icmc_miss |
Number of ICMC misses |
Counter |
0x1180000300000000 |
Currently, the doca_telemetry library is supported at alpha level and is intended to allow developers to start testing applications using it.
The following table lists the currently known limitations:
# |
Item |
Limitation |
1 |
Output format |
Only DOCA_TELEMETRY_DIAG_OUTPUT_FORMAT_1 is supported. |
2 |
Sample mode |
Only DOCA_TELEMETRY_DIAG_SAMPLE_MODE_REPETITIVE is supported. |