Diagnostic data is stored in hardware as a cyclic buffer of samples. Each sample represents the values of all the requested diagnostic data IDs and their corresponding sampling timestamps. The sampling period and the number of samples in the buffer can be configured.

The DOCA Telemetry Diagnostics library supports the following operational methods:

Single sampling – the samples are stored and once the samples buffer is filled, sampling is terminated

Repetitive sampling – when the sample buffer is filled, new samples override old samples

On demand – the device does not collect samples. Upon each query of the diagnostic data, the device fetches a single sample of the data.

Samples are retrieved by calling the doca_telemetry_diag_query_counters function. Multiple samples can be retrieved in a single call. The application defines the maximum number of samples it wishes to retrieve and supplies a buffer large enough to contain these samples (sample size can be obtained using a dedicated API). The library only retrieves new samples without duplications and returns fewer samples than requested if there are no more new samples.

The sampling period can be configured using doca_telemetry_diag_set_sample_period . In some cases, depending on the number and type of data IDs configured, the actual sample period may be higher. The actual sample period can be queried using doca_telemetry_diag_get_sample_period after configuring the data IDs.

When configuring the DOCA Telemetry Diagnostics library to repetitive sampling, it is important to ensure that the buffer is adequately sized to handle the data flow between hardware sampling and software retrieval.

To ensure smooth data processing and prevent data loss, the buffer should be large enough to accommodate at least twice the average number of samples collected during the retrieval period.

Determine sampling rates: Hardware sampling rate – the frequency at which the hardware collects data (e.g., every 100 µ sec)

Software retrieval rate – the average time interval between successive data retrievals by the software (e.g., every 500 msec) Calculate AverageSamplesPerRetrieval using the following equation: For example: To ensure reliability, configure the buffer should to hold at least twice the average number of samples per retrieval: For example: samples

Moreover, the number of samples in the buffer should be enlarged if the retrieving process may spike occasionally. For example, if the process time between retrieval calls is up to 6 times of the average, then the number of samples should be multiplied by 6+1=7.

Diagnostics data is sampled by the device every given sampling period. When sampling this way, each data entry in a sample may be recorded at a slightly different time.

Synchronized start mode enables diagnostics counters to begin all data measurements at the same time (i.e., during the same clock cycle). This way, the sample period is guaranteed to be identical for all samples. S ynchronized start diagnostic counters can be configured to be cleared at the beginning of each sampling period,.

Note Not all data IDs can be sampled in synchronized start mode. See section "Data IDs" for additional details

The following diagrams illustrate how synchronized start affects the sampling timeline:

Note In synchronized start mode, counters are stopped during the collection time of each sample (illustrated in red in the diagram). If the application is required to normalize the counter to time, the actual sample period should be taken into account. For example, if the global_icmc_hit (GIH) counter is sampled and the sample period is 100 µ sec, then the global_icmc_hit per second, should be calculated as follows:





doca_telemetry_diag supports the following layout modes of the sampled data:

Mode 0 – data_id is present in the output; data size is 64 bits; timestamp information per data

Mode 1 – no data_id in the output; data size is 64 bits; timestamp information per sample (start and end)

Mode 2 – no data_id in the output; data size is 32 bits; timestamp information per sample (start and end)

Note The order of the data IDs in the output is the same as the order in which the data IDs were applied, using doca_telemetry_diag_apply_counters_list_by_id

The sample layout of these modes is illustrated in the following diagrams:

doca_telemetry_diag requires a ConnectX/BlueField DOCA device to sample from. The device can be accessed using any of its physical functions (PFs). If multiple devices exist in a setup, a doca_telemetry_diag context should be created for each device.

doca_telemetry_diag , is designed to operate as a singleton per device. Upon creation, the doca_telemetry_diag context assumes control of the associated hardware resources to prevent conflicts and ensure accurate data sampling. In rare instances, ownership may be overridden (e.g., if a process crashed before releasing ownership). The force_ownership parameter may be used when creating the context from a second process.

Note Once ownership is enforced for one PF, it cannot be claimed by a different PF. It is recommended to always use PF0 to prevent potential conflicts.





The on-device mechanism provides the following diagnostic data classes:

Counter – monotonically increasing and counting different events in the device. If doca_telemetry_diag_set_data_clear is set, the counters are cleared at the beginning of each sampling period (valid only if synchronized start mode is used and operational mode is set to single or repetitive sampling).

Statistic – other collected diagnostic data about the performance of the device. Statistic diagnostic data is cleared on each sample.

Each diagnostic data is represented by a unique identifier, the data ID. Appendix "List of Supported Data IDs" lists the currently supported data IDs.

After applying the configuration, the list of data IDs to be sampled should be applied by calling doca_telemetry_diag_apply_counters_list_by_id .

Note Not all combinations of data IDs can be configured. If any of the data_ids fail to be configured, the operation fails, returning the index of the failed data ID and the reason of failure. The operation can be retried after omitting the faulty data ID.

Note Not all data IDs support synchronized start mode. If synchronized start mode is configured and doca_telemetry_diag_apply_counters_list_by_id fails with error DOCA_ERROR_BAD_CONFIG , this indicates that the failed data ID does not support synchronized start mode.



