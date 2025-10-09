On This Page
- 1. Introduction
- 2. Prerequisites
- 3. Environment
- 4. Architecture
- 5. Configuration Phase
- 6. Execution Phase
- 7. State Machine
- 8. Alternative Datapath Options
- 9. DOCA Telemetry Diagnostics Sample
- 10. Appendix - List of Supported Data IDs
DOCA Telemetry
This guide provides instructions on building and developing applications which require collecting telemetry information provided by NVIDIA® BlueField and NVIDIA® ConnectX® families of networking platforms.
The
doca_telemetry_diag provides programable access to an on-device mechanism which allows sampling of diagnostic data (such as statistics and counters). The
doca_telemetry_diag allows configuring such parameters as required data IDs or sampling period, and retrieving the generated information in several formats.
In order to use DOCA telemetry Diagnostics, please ensure that the following prerequisites are met:
fwctl, an OFED s ubsystem, must be installed. Please refer to NVIDIA MLNX_OFED Documentation v24.07-0.6.1.0 | Changes and New Features for more information.
System must have a minimal FW version XX.43.1000 ( XX - HW type; CX7 is 28, BF3 is 32).
DOCA telemetry-based applications can run either on the host machine (ConnectX-7 or Bluefield-3 and above) or on the DPU target (NVIDIA® BlueField®-3 and above).
DOCA telemetry can only be run with DPU configured in DPU mode as described in NVIDIA BlueField Modes of Operation.
Diagnostic data is stored in hardware as a cyclic buffer of samples. Each sample represents the values of all the requested diagnostic data IDs and their corresponding sampling timestamps. The sampling period and the number of samples in the buffer can be configured.
The DOCA Telemetry Diagnostics library supports the following operational methods:
Single sampling – the samples are stored and once the samples buffer is filled, sampling is terminated
Repetitive sampling – when the sample buffer is filled, new samples override old samples
On demand – the device does not collect samples. Upon each query of the diagnostic data, the device fetches a single sample of the data.
Samples are retrieved by calling the
doca_telemetry_diag_query_counters function. Multiple samples can be retrieved in a single call. The application defines the maximum number of samples it wishes to retrieve and supplies a buffer large enough to contain these samples (sample size can be obtained using a dedicated API). The library only retrieves new samples without duplications and returns fewer samples than requested if there are no more new samples.
4.1 Sampling Period
The sampling period can be configured using
doca_telemetry_diag_set_sample_period. In some cases, depending on the number and type of data IDs configured, the actual sample period may be higher. The actual sample period can be queried using
doca_telemetry_diag_get_sample_period after configuring the data IDs.
4.2 Considerations for Repetitive Sampling Mode
When configuring the DOCA Telemetry Diagnostics library to repetitive sampling, it is important to ensure that the buffer is adequately sized to handle the data flow between hardware sampling and software retrieval.
To ensure smooth data processing and prevent data loss, the buffer should be large enough to accommodate at least twice the average number of samples collected during the retrieval period.
Determine sampling rates:
Hardware sampling rate – the frequency at which the hardware collects data (e.g., every 100 µ sec)
Software retrieval rate – the average time interval between successive data retrievals by the software (e.g., every 500 msec)
Calculate
AverageSamplesPerRetrievalusing the following equation:
For example:
To ensure reliability, configure the buffer should to hold at least twice the average number of samples per retrieval:
For example:
samples
Moreover, the number of samples in the buffer should be enlarged if the retrieving process may spike occasionally. For example, if the process time between retrieval calls is up to 6 times of the average, then the number of samples should be multiplied by 6+1=7.
4.3 Synchronized Start
Diagnostics data is sampled by the device every given sampling period. When sampling this way, each data entry in a sample may be recorded at a slightly different time.
Synchronized start mode enables diagnostics counters to begin all data measurements at the same time (i.e., during the same clock cycle). This way, the sample period is guaranteed to be identical for all samples. S ynchronized start diagnostic counters can be configured to be cleared at the beginning of each sampling period,.
Not all data IDs can be sampled in synchronized start mode. See section "Data IDs" for additional details
The following diagrams illustrate how synchronized start affects the sampling timeline:
In synchronized start mode, counters are stopped during the collection time of each sample (illustrated in red in the diagram). If the application is required to normalize the counter to time, the actual sample period should be taken into account.
For example, if the
global_icmc_hit (GIH) counter is sampled and the sample period is 100
µ
sec, then the
global_icmc_hit per second, should be calculated as follows:
4.4 Output Formats
doca_telemetry_diag supports the following layout modes of the sampled data:
Mode 0 –
data_idis present in the output; data size is 64 bits; timestamp information per data
Mode 1 – no
data_idin the output; data size is 64 bits; timestamp information per sample (start and end)
Mode 2 – no
data_idin the output; data size is 32 bits; timestamp information per sample (start and end)
The order of the data IDs in the output is the same as the order in which the data IDs were applied, using
doca_telemetry_diag_apply_counters_list_by_id
The sample layout of these modes is illustrated in the following diagrams:
4.5 Device and Ownership
doca_telemetry_diag requires a ConnectX/BlueField DOCA device to sample from. The device can be accessed using any of its physical functions (PFs). If multiple devices exist in a setup, a
doca_telemetry_diag context should be created for each device.
doca_telemetry_diag, is designed to operate as a singleton per device. Upon creation, the
doca_telemetry_diag context assumes control of the associated hardware resources to prevent conflicts and ensure accurate data sampling. In rare instances, ownership may be overridden (e.g., if a process crashed before releasing ownership). The
force_ownership parameter may be used when creating the context from a second process.
Once ownership is enforced for one PF, it cannot be claimed by a different PF. It is recommended to always use PF0 to prevent potential conflicts.
4.6 Data IDs
The on-device mechanism provides the following diagnostic data classes:
Counter – monotonically increasing and counting different events in the device.
If
doca_telemetry_diag_set_data_clearis set, the counters are cleared at the beginning of each sampling period (valid only if synchronized start mode is used and operational mode is set to single or repetitive sampling).
Statistic – other collected diagnostic data about the performance of the device. Statistic diagnostic data is cleared on each sample.
Each diagnostic data is represented by a unique identifier, the data ID. Appendix "List of Supported Data IDs" lists the currently supported data IDs.
After applying the configuration, the list of data IDs to be sampled should be applied by calling
doca_telemetry_diag_apply_counters_list_by_id.
Not all combinations of data IDs can be configured. If any of the
data_ids fail to be configured, the operation fails, returning the index of the failed data ID and the reason of failure. The operation can be retried after omitting the faulty data ID.
Not all data IDs support synchronized start mode. If synchronized start mode is configured and
doca_telemetry_diag_apply_counters_list_by_id fails with error
DOCA_ERROR_BAD_CONFIG, this indicates that the failed data ID does not support synchronized start mode.
The following section describes the different states the
doca_telemetry_diag context goes through, how to move states and what is allowed in each state.
7.1 Idle
The Context is Idle, and has ownership over the Diagnostics Data Registers interface.
In this state it is expected that application:
Destroys the context (Releases the ownership).
Applies a configuration.
Allowed operations:
Configuring the context according to Configurations
It is possible to reach this state as follows:
Previous State
Transition Action
None
Create the context
Configured
Call stop
Ready
Call stop
Running
Call stop
7.2 Configured
In this state it is expected that application:
Applies the list of data IDs configuration, using the
Allowed operations:
Checking if a data ID is supported, using
doca_telemetry_diag_check_data_id.
Calling stop.
It is possible to reach this state as follows:
Previous State
Transition Action
Idle
Successfully apply the configuration, calling
7.3 Ready
All the necessary configuration was applied and the context is ready to start sampling.
In this state it is expected that application:
Starts the context.
Allowed operations:
Calling stop.
It is possible to reach this state as follows:
Previous State
Transition Action
Configured
Successfully apply the list of counters, calling
7.4 Running
In this state samples are generated and can be retrieved.
In this state it is expected that application:
Queries the counters.
Allowed operations:
For 'Single' sample mode - restart the context if needed.
Calling stop.
It is possible to reach this state as follows:
Previous State
Transition Action
Ready
Successfully Start the context.
DOCA Telemetry only supports datapath on the CPU.
This section describes a telemetry diagnostics sample based on the
doca_telemetry_diag library. The sample illustrates the utilization of DOCA telemetry diagnostics APIs to initialize and configure the
doca_telemetry_diag context, as well as querying and parsing diagnostic counters.
Sample usage:
Usage: doca_telemetry_diag [DOCA Flags] [Program Flags]
DOCA Flags:
-h, --help Print a help synopsis
-v, --version Print program version information
-l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
--sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
-j, --json <path> Parse all command flags from an input json file
Program Flags:
-p, --pci-addr DOCA device PCI device address
-di, --data-ids Path to data ids JSON file
-o, --output Output CSV file - default: "/tmp/out.csv"
-rt, --sample-run-time Total sample run time, in seconds
-sp, --sample-period Sample period, in nanoseconds
-ns, --log-num-samples Log max number of samples
-sr, --max-samples-per-read Max num samples per read
-sm, --sample-mode sample mode (0 - single, 1 - repetitive, 2 - on demand)
-of, --output-format output format
-f, --force-ownership Force ownership when creating context
-e, --example-json-path Generate an example json file with the default data_ids to the given path and exit immediately. This file can be used as input later on. All other flags are ignored
The sample logic includes:
Locating a DOCA device.
Initializing and configuring the
doca_telemetry_diaginstance.
Applying a list of data IDs to sample (either from a source JSON file or the default data IDs).
Starting the
doca_telemetry_diaginstance.
Allocating a buffer according to the sample size and amount of desired samples.
Querying the actual sample time, after start.
Retrieving samples and writing the retrieved data to a
*.csvfile (either once or periodically).
Stopping the data IDs sampling.
Releasing all resources and destroying the context.
The sample can use data IDs given by the user using a JSON file. An example of the JSON file format can be created by using the "-e" flag on the sample, to export an example JSON file containing the default data IDs to a given path.
The following table lists the data IDs currently supported by DOCA:
Name
Description
Data Class
Data ID
The number of received bytes on the physical port 1
Counter
0x10200001000000XX
The number of received bytes on the physical port and priority 1
Counter
0x1020000200000YXX
The number of received packets on the physical port 1
Counter
0x10200003000000XX
The number of received packets on the physical port and priority 1
Counter
0x1020000400000YXX
The number of received packets dropped due to lack of buffers on a physical port
Counter
0x10200005000000XX
The number of link-layer pause packets received on a physical port and priority
Counter
0x1020000600000YXX
The number of packets discarded due to no available data or descriptor buffers in the RX buffer, per host
Counter
0x10400001000000XX
The number of packets that pass from the RX Transport to the Scatter engine, per host
Counter
0x10800001000000XX
The number of dropped packets due to a lack of WQE for the associated QPs/RQs (excluding hairpin QPs/RQs)
Counter
0x10800002000000XX
The number of dropped packets due to a lack of WQE for the associated hairpin QPs/RQs
Counter
0x10800003000000XX
The number of RoCEv2 packets received by the notification point which were marked for experiencing the congestion (i.e., ECN bits
Counter
0x10800004000000XX
The number of CNP received packets handled by the Reaction Point, per port
Counter
0x10800005000000XX
The number of CNP packets sent by the Notification Point, per port
Counter
0x11000001000000XX
The number of QP descheduled due to congestion control rate limitation
Counter
0x1100000200000000
The number of transmitted bytes on the physical port (excluding loopback traffic)
Counter
0x11400001000000XX
The number of transmitted bytes on the physical port and priority (excluding loopback traffic)
Counter
0x1140000200000YXX
The number of transmitted packets on the physical port (excluding loopback traffic)
Counter
0x11400003000000XX
The number of transmitted packets on the physical port and priority (excluding loopback traffic)
Counter
0x1140000400000YXX
The number of link-layer pause packets transmitted on a physical port and priority
Counter
0x1140000500000YXX
XX - Port ID
The number of bytes received from the PCIe toward the device, per PCIe link
Counter
0x1160000100ZZYYXX
The number of bytes transmitted from the device toward the PCIe, per PCIe link
Counter
0x1160000200ZZYYXX
The number of data bytes received from the PCIe (excluding headers) toward the device, per PCIe link
Counter
0x1160000300ZZYYXX
The number of data bytes transmitted from the device toward the PCI (excluding headers), per PCIe link
Counter
0x1160000400ZZYYXX
The time period (in nanoseconds) in which the device had outbound posted write requests but stalled due to insufficient data credits per PCIe link
Counter
0x1160000500ZZYYXX
The time period (in nanoseconds) in which the device had outbound posted write requests but stalled due to insufficient header credits per PCIe link
Counter
0x1160000600ZZYYXX
The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to insufficient data credits per PCIe link
Counter
0x1160000700ZZYYXX
The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to insufficient header credits per PCIe link
Counter
0x1160000800ZZYYXX
The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to no NIC completion buffers per PCIe link
Counter
0x1160000900ZZYYXX
The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to PCIe ordering semantics per PCIe link and PCIe tclass
Counter
0x1160000aZZZZYYXX
The total latency (in nanoseconds) for all PCIe read from the device per PCIe link
Info
Dividing this counter by
Counter
0x1160000b00ZZYYXX
The total number of packets used for the
Counter
0x1160000c00ZZYYXX
The maximum latency (in nanoseconds) for a single PCIe read from the device per PCIe link
Statistic
0x1160000d00ZZYYXX
The minimum latency (in nanoseconds) for a single PCIe read from the device per PCIe link
Statistic
0x1160000e00ZZYYXX
Number of responder (RX) CQEs
Counter
0x10c0000100000000
Number of RX CQEs per function
Counter
0x10c000020000XXXX
Number of requestor (TX) CQEs
Counter
x10c0000400000000
Number of TX CQEs per function
Counter
0x10c000050000XXXX
Number of accesses to ICMC
Counter
0x1180000100000000
Number of ICMC hits
Counter
0x1180000200000000
Number of ICMC misses
Counter
0x1180000300000000