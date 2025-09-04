DTS supports on-board data collection from sysf , ethtool , and tc providers. Fluent and Prometheus aggregator providers can collect the data from other applications.

Other providers are available based on different conditions (e.g., specific container mounts or host only such as amber , ppcc_eth , etc). Such providers are described with their dependencies in their corresponding sections.

The sysfs provider has several components: ib_port , hw_port , mr_cache , eth , hwmon and bf_ptm . By default, all the components (except bf_ptm ) are enabled when the provider is enabled:

Copy Copied! #disable-provider=sysfs

The components can be disabled separately. For instance, to disable eth :

Copy Copied! enable-provider=sysfs disable-provider=sysfs.eth

Note ib_port and ib_hvw are state counters which are collected per port. These counters are only collected for ports whose state is active.

ib_port counters: Copy Copied! {hca_name}:{port_num}:ib_port_state {hca_name}:{port_num}:VL15_dropped {hca_name}:{port_num}:excessive_buffer_overrun_errors {hca_name}:{port_num}:link_downed {hca_name}:{port_num}:link_error_recovery {hca_name}:{port_num}:local_link_integrity_errors {hca_name}:{port_num}:multicast_rcv_packets {hca_name}:{port_num}:multicast_xmit_packets {hca_name}:{port_num}:port_rcv_constraint_errors {hca_name}:{port_num}:port_rcv_data {hca_name}:{port_num}:port_rcv_errors {hca_name}:{port_num}:port_rcv_packets {hca_name}:{port_num}:port_rcv_remote_physical_errors {hca_name}:{port_num}:port_rcv_switch_relay_errors {hca_name}:{port_num}:port_xmit_constraint_errors {hca_name}:{port_num}:port_xmit_data {hca_name}:{port_num}:port_xmit_discards {hca_name}:{port_num}:port_xmit_packets {hca_name}:{port_num}:port_xmit_wait {hca_name}:{port_num}:symbol_error {hca_name}:{port_num}:unicast_rcv_packets {hca_name}:{port_num}:unicast_xmit_packets

ib_hw counters: Copy Copied! {hca_name}:{port_num}:hw_state {hca_name}:{port_num}:hw_duplicate_request {hca_name}:{port_num}:hw_implied_nak_seq_err {hca_name}:{port_num}:hw_lifespan {hca_name}:{port_num}:hw_local_ack_timeout_err {hca_name}:{port_num}:hw_out_of_buffer {hca_name}:{port_num}:hw_out_of_sequence {hca_name}:{port_num}:hw_packet_seq_err {hca_name}:{port_num}:hw_req_cqe_error {hca_name}:{port_num}:hw_req_cqe_flush_error {hca_name}:{port_num}:hw_req_remote_access_errors {hca_name}:{port_num}:hw_req_remote_invalid_request {hca_name}:{port_num}:hw_resp_cqe_error {hca_name}:{port_num}:hw_resp_cqe_flush_error {hca_name}:{port_num}:hw_resp_local_length_error {hca_name}:{port_num}:hw_resp_remote_access_errors {hca_name}:{port_num}:hw_rnr_nak_retry_err {hca_name}:{port_num}:hw_rx_atomic_requests {hca_name}:{port_num}:hw_rx_dct_connect {hca_name}:{port_num}:hw_rx_icrc_encapsulated {hca_name}:{port_num}:hw_rx_read_requests {hca_name}:{port_num}:hw_rx_write_requests

ib_mr_cache counters: Copy Copied! {hca_name}:mr_cache:size_{n}:cur {hca_name}:mr_cache:size_{n}:limit {hca_name}:mr_cache:size_{n}:miss {hca_name}:mr_cache:size_{n}:size Note Where n ranges from 0 to 24.

eth counters: Copy Copied! {hca_name}:{device_name}:eth_collisions {hca_name}:{device_name}:eth_multicast {hca_name}:{device_name}:eth_rx_bytes {hca_name}:{device_name}:eth_rx_compressed {hca_name}:{device_name}:eth_rx_crc_errors {hca_name}:{device_name}:eth_rx_dropped {hca_name}:{device_name}:eth_rx_errors {hca_name}:{device_name}:eth_rx_fifo_errors {hca_name}:{device_name}:eth_rx_frame_errors {hca_name}:{device_name}:eth_rx_length_errors {hca_name}:{device_name}:eth_rx_missed_errors {hca_name}:{device_name}:eth_rx_nohandler {hca_name}:{device_name}:eth_rx_over_errors {hca_name}:{device_name}:eth_rx_packets {hca_name}:{device_name}:eth_tx_aborted_errors {hca_name}:{device_name}:eth_tx_bytes {hca_name}:{device_name}:eth_tx_carrier_errors {hca_name}:{device_name}:eth_tx_compressed {hca_name}:{device_name}:eth_tx_dropped {hca_name}:{device_name}:eth_tx_errors {hca_name}:{device_name}:eth_tx_fifo_errors {hca_name}:{device_name}:eth_tx_heartbeat_errors {hca_name}:{device_name}:eth_tx_packets {hca_name}:{device_name}:eth_tx_window_errors

BlueField-2 hwmon counters: Collapse Source Copy Copied! {hwmon_name}:{l3cache}:CYCLES {hwmon_name}:{l3cache}:HITS_BANK0 {hwmon_name}:{l3cache}:HITS_BANK1 {hwmon_name}:{l3cache}:MISSES_BANK0 {hwmon_name}:{l3cache}:MISSES_BANK1 {hwmon_name}:{pcie}:IN_C_BYTE_CNT {hwmon_name}:{pcie}:IN_C_PKT_CNT {hwmon_name}:{pcie}:IN_NP_BYTE_CNT {hwmon_name}:{pcie}:IN_NP_PKT_CNT {hwmon_name}:{pcie}:IN_P_BYTE_CNT {hwmon_name}:{pcie}:IN_P_PKT_CNT {hwmon_name}:{pcie}:OUT_C_BYTE_CNT {hwmon_name}:{pcie}:OUT_C_PKT_CNT {hwmon_name}:{pcie}:OUT_NP_BYTE_CNT {hwmon_name}:{pcie}:OUT_NP_PKT_CNT {hwmon_name}:{pcie}:OUT_P_PKT_CNT {hwmon_name}:{tile}:MEMORY_READS {hwmon_name}:{tile}:MEMORY_WRITES {hwmon_name}:{tile}:MSS_NO_CREDIT {hwmon_name}:{tile}:VICTIM_WRITE {hwmon_name}:{tilenet}:CDN_DIAG_C_OUT_OF_CRED {hwmon_name}:{tilenet}:CDN_REQ {hwmon_name}:{tilenet}:DDN_REQ {hwmon_name}:{tilenet}:NDN_REQ {hwmon_name}:{trio}:TDMA_DATA_BEAT {hwmon_name}:{trio}:TDMA_PBUF_MAC_AF {hwmon_name}:{trio}:TDMA_RT_AF {hwmon_name}:{trio}:TPIO_DATA_BEAT {hwmon_name}:{triogen}:TX_DAT_AF {hwmon_name}:{triogen}:TX_DAT_AF

BlueField-3 hwmon counters: Copy Copied! {hwmon_name}:{llt}:GDC_BANK0_RD_REQ {hwmon_name}:{llt}:GDC_BANK1_RD_REQ {hwmon_name}:{llt}:GDC_BANK0_WR_REQ {hwmon_name}:{llt}:GDC_BANK1_WR_REQ {hwmon_name}:{llt_miss}:GDC_MISS_MACHINE_RD_REQ {hwmon_name}:{llt_miss}:GDC_MISS_MACHINE_WR_REQ {hwmon_name}:{mss}:SKYLIB_DDN_TX_FLITS {hwmon_name}:{mss}:SKYLIB_DDN_RX_FLITS

BlueField-3 bf_ptm counters: Copy Copied! bf:ptm:active_power_profile bf:ptm:atx_power_available bf:ptm:core_temp bf:ptm:ddr_temp bf:ptm:error_state bf:ptm:power_envelope bf:ptm:power_throttling_event_count bf:ptm:power_throttling_state bf:ptm:thermal_throttling_event_count bf:ptm:thermal_throttling_state bf:ptm:throttling_state bf:ptm:total_power bf:ptm:vr0_power bf:ptm:vr1_power

The following parameters are located in /sys/class/infiniband/mlx5_0/ports/1/counters .

Counter Description InfiniBand Spec Name Group port_rcv_data The total number of data octets, divided by 4, (counting in double words, 32 bits), received on all VLs from the port. PortRcvData Informative port_rcv_packets Total number of packets (this may include packets containing Errors. This is 64 bit counter. PortRcvPkts Informative port_multicast_rcv_packets Total number of multicast packets, including multicast packets containing errors. PortMultiCastRcvPkts Informative port_unicast_rcv_packets Total number of unicast packets, including unicast packets containing errors. PortUnicastRcvPkts Informative port_xmit_data The total number of data octets, divided by 4, (counting in double words, 32 bits), transmitted on all VLs from the port. PortXmitData Informative port_xmit_packets port_xmit_packets_64 Total number of packets transmitted on all VLs from this port. This may include packets with errors. This is 64 bit counter. PortXmitPkts Informative port_rcv_switch_relay_errors Total number of packets received on the port that were discarded because they could not be forwarded by the switch relay. PortRcvSwitchRelayErrors Error port_rcv_errors Total number of packets containing an error that were received on the port. PortRcvErrors Informative port_rcv_constraint_errors Total number of packets received on the switch physical port that are discarded. PortRcvConstraintErrors Error local_link_integrity_errors The number of times that the count of local physical errors exceeded the threshold specified by LocalPhyErrors . LocalLinkIntegrityErrors Error port_xmit_wait The number of ticks during which the port had data to transmit but no data was sent during the entire tick (either because of insufficient credits or because of lack of arbitration). PortXmitWait Informative port_multicast_xmit_packets Total number of multicast packets transmitted on all VLs from the port. This may include multicast packets with errors. PortMultiCastXmitPkts Informative port_unicast_xmit_packets Total number of unicast packets transmitted on all VLs from the port. This may include unicast packets with errors. PortUnicastXmitPkts Informative port_xmit_discards Total number of outbound packets discarded by the port because the port is down or congested. PortXmitDiscards Error port_xmit_constraint_errors Total number of packets not transmitted from the switch physical port. PortXmitConstraintErrors Error port_rcv_remote_physical_errors Total number of packets marked with the EBP delimiter received on the port. PortRcvRemotePhysicalErrors Error symbol_error Total number of minor link errors detected on one or more physical lanes. SymbolErrorCounter Error VL15_dropped Number of incoming VL15 packets dropped due to resource limitations (e.g., lack of buffers) of the port. VL15Dropped Error link_error_recovery Total number of times the Port Training state machine has successfully completed the link error recovery process. LinkErrorRecoveryCounter Error link_downed Total number of times the Port Training state machine has failed the link error recovery process and downed the link. LinkDownedCounter Error

The hardware counters, found under /sys/class/infiniband/mlx5_0/ports/1/hw_counters/ , are counted per function and exposed on the function. Some counters are not counted per function. These counters are commented with a relevant comment.

Counter Description Group duplicate_request Number of received packets. A duplicate request is a request that had been previously executed. Error implied_nak_seq_err Number of time the requested decided an ACK. with a PSN larger than the expected PSN for an RDMA read or response. Error lifespan The maximum period in ms which defines the aging of the counter reads. Two consecutive reads within this period might return the same values Informative local_ack_timeout_err The number of times QP's ack timer expired for RC, XRC, DCT QPs at the sender side. The QP retry limit was not exceed, therefore it is still recoverable error. Error np_cnp_sent The number of CNP packets sent by the Notification Point when it noticed congestion experienced in the RoCEv2 IP header (ECN bits). Informative np_ecn_marked_roce_packets The number of RoCEv2 packets received by the notification point which were marked for experiencing the congestion (ECN bits where '11' on the ingress RoCE traffic) . Informative out_of_buffer The number of drops occurred due to lack of WQE for the associated QPs. Error out_of_sequence The number of out of sequence packets received. Error packet_seq_err The number of received NAK sequence error packets. The QP retry limit was not exceeded. Error req_cqe_error The number of times requester detected CQEs completed with errors. Error req_cqe_flush_error The number of times requester detected CQEs completed with flushed errors. Error req_remote_access_errors The number of times requester detected remote access errors. Error req_remote_invalid_request The number of times requester detected remote invalid request errors. Error resp_cqe_error The number of times responder detected CQEs completed with errors. Error resp_cqe_flush_error The number of times responder detected CQEs completed with flushed errors. Error resp_local_length_error The number of times responder detected local length errors. Error resp_remote_access_errors The number of times responder detected remote access errors. Error rnr_nak_retry_err The number of received RNR NAK packets. The QP retry limit was not exceeded. Error rp_cnp_handled The number of CNP packets handled by the Reaction Point HCA to throttle the transmission rate. Informative rp_cnp_ignored The number of CNP packets received and ignored by the Reaction Point HCA. This counter should not raise if RoCE Congestion Control was enabled in the network. If this counter raise, verify that ECN was enabled on the adapter. See HowTo Configure DCQCN (RoCE CC) values for ConnectX-4 (Linux). Error rx_atomic_requests The number of received ATOMIC request for the associated QPs. Informative rx_dct_connect The number of received connection request for the associated DCTs. Informative rx_read_requests The number of received READ requests for the associated QPs. Informative rx_write_requests The number of received WRITE requests for the associated QPs. Informative rx_icrc_encapsulated The number of RoCE packets with ICRC errors. Error roce_adp_retrans Counts the number of adaptive retransmissions for RoCE traffic Informative roce_adp_retrans_to Counts the number of times RoCE traffic reached timeout due to adaptive retransmission Informative roce_slow_restart Counts the number of times RoCE slow restart was used Informative roce_slow_restart_cnps Counts the number of times RoCE slow restart generated CNP packets Informative roce_slow_restart_trans Counts the number of times RoCE slow restart changed state to slow restart Informative roce_adp_retrans_to Counts the number of adaptive retransmissions for RoCE traffic Informative roce_slow_restart Counts the number of times RoCE traffic reached timeout due to adaptive retransmission Informative

The following parameters are located in /sys/class/net/<interface>/debug .

Parameter Description Default lro_timeout Sets the LRO timer period value in usecs which will be used as LRO session expiration time. For example: Copy Copied! Actual timeout: 32 Supported timeout: 8 16 32 1024 32 link_down_reason Link down reason will allow the user to query the reason which is preventing the link from going up. For example: Copy Copied! $ cat /sys/class/net/ethXX/debug/link_down_reason monitor_opcode: 0x0 status_message: The port is Active. Refer to the adapter PRM for all possible options (PDDR register). N/A

The bf_ptm component collects BlueField-3 power thermal counters using remote collection. It is disabled by default and can be enabled as follows:

Load kernel module mlxbf-ptm : Copy Copied! modprobe - v mlxbf-ptm Enable component using remote collection: Copy Copied! enable-provider=grpc.sysfs.bf_ptm Note DPE server should be active before changing the dts_config.ini file. See section "Remote Collection" for details.

Ethtool counters is the generated list of counters which corresponds to Ethtool utility. Counters are generated on a per-device basis.

There are several counter groups, depending on where the counter is counted:

Ring – software ring counters

Software port – an aggregation of software ring counters

vPort counters – traffic counters and drops due to steering or no buffers. May indicate BlueField issues. These counters include Ethernet traffic counters (including raw Ethernet) and RDMA/RoCE traffic counters.

Physical port counters – the physical port connecting BlueField to the network. May indicate device issues or link or network issues. This measuring point holds information on standardized counters like IEEE 802.3, RFC2863, RFC 2819, RFC 3635 and additional counters like flow control, FEC, and more. Physical port counters are not exposed to virtual machines.

Priority port counters – a set of the physical port counters, per priory per port

Each group of counters may have different counter types:

Traffic informative counters – counters which counts traffic. These counters can be used for load estimation of for general debug.

Traffic acceleration counters – counters which counts traffic accelerated by NVIDIA drivers or by hardware. The counters are an additional layer to the informative counter set and the same traffic is counted in both informative and acceleration counters. Acceleration counters are marked with [A].

Error counters – increment of these counters might indicate a problem

The following acceleration mechanisms have dedicated counters:

TCP segmentation offload (TSO) – increasing outbound throughput and reducing CPU utilization by allowing the kernel to buffer multiple packets in a single large buffer. The BlueField splits the buffer into packet and transmits it.

Large receive offload (LRO) – increasing inbound throughput and reducing CPU utilization by aggregation of multiple incoming packets of a single stream to a single buffer

CHECKSUM – calculation of TCP checksum (by the BlueField). The following checksum offloads are available ( refer to skbuff.h for detailed explanation) CHECKSUM_UNNECESSARY CHECKSUM_NONE – no checksum acceleration was used CHECKSUM_COMPLETE – device provided checksum on the entire packet CHECKSUM_PARTIAL – device provided checksum

CQE compress – compression of completion queue events (CQE) used for sparing bandwidth on PCIe and hence achieve better performance.

The following counters are available per ring or software port.

These counters provide information on the amount of traffic accelerated by the BlueField. The counters tally the accelerated traffic in addition to the standard counters which tally that (i.e. accelerated traffic is counted twice).

The counter names in the table below refers to both ring and port counters. the notation for ring counters includes the [i] index without the braces. the notation for port counters does not include the [i] . a counter name rx[i]_packets will be printed as rx0_packets for ring 0 and rx_packets for the software port

Counter Description Type rx[i]_packets The number of packets received on ring i. Informative rx[i]_bytes The number of bytes received on ring i. Informative tx[i]_packets The number of packets transmitted on ring i. Informative tx[i]_bytes The number of bytes transmitted on ring i. Informative tx[i]_tso_packets The number of TSO packets transmitted on ring i [A]. Acceleration tx[i]_tso_bytes The number of TSO bytes transmitted on ring i [A]. Acceleration tx[i]_tso_inner_packets The number of TSO packets which are indicated to be carry internal encapsulation transmitted on ring i [A] Acceleration tx[i]_tso_inner_bytes The number of TSO bytes which are indicated to be carry internal encapsulation transmitted on ring i [A]. Acceleration rx[i]_lro_packets The number of LRO packets received on ring i [A]. Acceleration rx[i]_lro_bytes The number of LRO bytes received on ring i [A]. Acceleration rx[i]_csum_unnecessary Packets received with a CHECKSUM_UNNECESSARY on ring i [A]. Acceleration rx[i]_csum_none Packets received with CHECKSUM_NONE on ring i [A]. Acceleration rx[i]_csum_complete Packets received with a CHECKSUM_COMPLETE on ring i [A]. Acceleration rx[i]_csum_unnecessary_inner Packets received with inner encapsulation with a CHECK_SUM UNNECESSARY on ring i [A]. Acceleration tx[i]_csum_partial Packets transmitted with a CHECKSUM_PARTIAL on ring i [A]. Acceleration tx[i]_csum_partial_inner Packets transmitted with inner encapsulation with a CHECKSUM_PARTIAL on ring i [A]. Acceleration tx[i]_csum_none Packets transmitted with no hardware checksum acceleration on ring i. Informative tx[i]_stopped tx_queue_stopped Events where SQ was full on ring i. If this counter is increased, check the amount of buffers allocated for transmission. Error tx[i]_wake tx_queue_wake Events where SQ was full and has become not full on ring i. Error tx[i]_dropped tx_queue_dropped Packets transmitted that were dropped due to DMA mapping failure on ring i. If this counter is increased, check the amount of buffers allocated for transmission. Error rx[i]_wqe_err The number of wrong opcodes received on ring i. Error tx[i]_nop The number of no WQEs (empty WQEs) inserted to the SQ (related to ring i) due to the reach of the end of the cyclic buffer. When reaching near to the end of cyclic buffer the driver may add those empty WQEs to avoid handling a state the a WQE start in the end of the queue and ends in the beginning of the queue. This is a normal condition. Informative rx[i]_mpwqe_frag The number of WQEs that failed to allocate compound page and hence fragmented MPWQE's (multipacket WQEs) were used on ring i. If this counter raise, it may suggest that there is no enough memory for large pages, the driver allocated fragmented pages. This is not abnormal condition. Informative rx[i]_mpwqe_filler_cqes The number of filler CQEs events that where issued on ring i. Info The counter name before kernel 4.19 was rx[i]_mpwqe_filler . Informative rx[i]_cqe_compress_blks The number of receive blocks with CQE compression on ring i [A]. Acceleration rx[i]_cqe_compress_pkts The number of receive packets with CQE compression on ring i [A]. Acceleration rx[i]_cache_reuse The number of events of successful reuse of a page from a driver's internal page cache Acceleration rx[i]_cache_full The number of events of full internal page cache where driver can't put a page back to the cache for recycling (page will be freed) Acceleration rx[i]_cache_empty The number of events where cache was empty - no page to give. driver shall allocate new page Acceleration rx[i]_cache_busy The number of events where cache head was busy and cannot be recycled. driver allocated new page Acceleration rx[i]_xmit_more The number of packets sent with xmit_more indication set on the skbuff (no doorbell) Acceleration tx[i]_cqes The number of completions received on the CQ of TX ring. Informative ch[i]_poll The number of invocations of NAPI poll of channel. Informative ch[i]_arm The number of times the NAPI poll function completed and armed the completion queues on channel Info Supported from kernel 4.19. Informative ch[i]_aff_change The number of times the NAPI poll function explicitly stopped execution on a CPU due to a change in affinity, on channel. Informative rx[i]_congst_umr The number of times an outstanding UMR request is delayed due to congestion, on ring. Info Supported from kernel 4.19. Error ch[i]_events The number of hard interrupt events on the completion queues of channel. Informative rx[i]_mpwqe_filler_strides The number of strides consumed by filler CQEs on ring. Informative rx[i]_xdp_tx_xmit The number of packets forwarded back to the port due to XDP program XDP_TX action (bouncing). these packets are not counted by other software counters. These packets are counted by physical port and vPort counters. Informative rx[i]_xdp_tx_full The number of packets that should have been forwarded back to the port due to XDP_TX action but were dropped due to full tx queue. these packets are not counted by other software counters. These packets are counted by physical port and vPort counters You may open more rx queues and spread traffic rx over all queues and/or increase rx ring size. Error rx[i]_xdp_tx_err The number of times an XDP_TX error such as frame too long and frame too short occurred on XDP_TX ring of RX ring. Error rx[i]_xdp_tx_cqes rx_xdp_tx_cqe The number of completions received on the CQ of the XDP-TX ring. Informative rx[i]_xdp_drop The number of packets dropped due to XDP program XDP_DROP action. these packets are not counted by other software counters. These packets are counted by physical port and vPort counters. Informative rx[i]_xdp_redirect The number of times an XDP redirect action has been triggered on ring. Acceleration tx[i]_xdp_xmit The number of packets redirected to the interface (due to XDP redirect). These packets are not counted by other software counters. These packets are counted by physical port and vPort counters. Informative tx[i]_xdp_full The number of packets redirected to the interface (due to XDP redirect) but were dropped due to the Tx queue being full. These packets are not counted by other software counters. Users may enlarge Tx queues. Informative tx[i]_xdp_err The number of packets redirected to the interface (due to XDP redirect) but were dropped due to an error (e.g., frame too long and frame too short). Error tx[i]_xdp_cqes The number of completions received for packets redirected to the interface (due to XDP redirect) on the CQ. Informative rx[i]_cache_waive The number of cache evacuation. This can occur due to page move to another NUMA node or page was pfmemalloc-ed and should be freed as soon as possible. Acceleration

Counters on the eswitch port that is connected to the vNIC.

Counter Description Type rx_vport_unicast_packets Unicast packets received, steered to a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative rx_vport_unicast_bytes Unicast bytes received, steered to a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative tx_vport_unicast_packets Unicast packets transmitted, steered from a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative tx_vport_unicast_bytes Unicast bytes transmitted, steered from a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative rx_vport_multicast_packets Multicast packets received, steered to a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative rx_vport_multicast_bytes Multicast bytes received, steered to a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative tx_vport_multicast_packets Multicast packets transmitted, steered from a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative tx_vport_multicast_bytes Multicast bytes transmitted, steered from a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative rx_vport_broadcast_packets Broadcast packets received, steered to a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative rx_vport_broadcast_bytes Broadcast bytes received, steered to a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative tx_vport_broadcast_packets Broadcast packets transmitted, steered from a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative tx_vport_broadcast_bytes Broadcast packets transmitted, steered from a port including raw Ethernet QP/DPDK traffic, excluding RDMA traffic Informative rx_vport_rdma_unicast_packets RDMA unicast packets received, steered to a port (counters counts RoCE/UD/RC traffic) [A] Acceleration rx_vport_rdma_unicast_bytes RDMA unicast bytes received, steered to a port (counters counts RoCE/UD/RC traffic) [A] Acceleration tx_vport_rdma_unicast_packets RDMA unicast packets transmitted, steered from a port (counters counts RoCE/UD/RC traffic) [A] Acceleration tx_vport_rdma_unicast_bytes RDMA unicast bytes transmitted, steered from a port (counters counts RoCE/UD/RC traffic) [A] Acceleration rx_vport_ rdma _multicast_packets RDMA multicast packets received, steered to a port (counters counts RoCE/UD/RC traffic) [A] Acceleration rx_vport_ rdma _multicast_bytes RDMA multicast bytes received, steered to a port (counters counts RoCE/UD/RC traffic) [A] Acceleration tx_vport_ rdma _multicast_packets RDMA multicast packets transmitted, steered from a port (counters counts RoCE/UD/RC traffic) [A] Acceleration tx_vport_ rdma _multicast_bytes RDMA multicast bytes transmitted, steered from a port (counters counts RoCE/UD/RC traffic) [A] Acceleration rx_steer_missed_packets Number of packets received by the NIC but discarded due to not matching any flow in the NIC flow table. Info Supported from kernel 4.16. Error rx_packets Representor only: packets received, that were handled by the hypervisor. Info Supported from kernel 4.18. Informative rx_bytes Representor only: bytes received, that were handled by the hypervisor. Info Supported from kernel 4.18. Informative tx_packets Representor only: packets transmitted which have been handled by the hypervisor. Info Supported from kernel 4.18. Informative tx_bytes Representor only: bytes transmitted which have been handled by the hypervisor. Info Supported from kernel 4.18. Informative

The physical port counters are the counters on the external port connecting adapter to the network. This measuring point holds information on standardized counters like IEEE 802.3, RFC2863, RFC 2819, RFC 3635 and additional counters like flow control, FEC and more.

Counter Description Type rx_packets_phy The number of packets received on the physical port. This counter doesn’t include packets that were discarded due to FCS, frame size and similar errors. Informative tx_packets_phy The number of packets transmitted on the physical port. Informative rx_bytes_phy The number of bytes received on the physical port, including Ethernet header and FCS. Informative tx_bytes_phy The number of bytes transmitted on the physical port. Informative rx_multicast_phy The number of multicast packets received on the physical port. Informative tx_multicast_phy The number of multicast packets transmitted on the physical port. Informative rx_broadcast_phy The number of broadcast packets received on the physical port. Informative tx_broadcast_phy The number of broadcast packets transmitted on the physical port. Informative rx_crc_errors_phy The number of dropped received packets due to frame check sequence (FCS) error on the physical port. If this counter is increased in high rate, check the link quality using rx_symbol_error_phy and rx_corrected_bits_phy counters below. Error rx_in_range_len_errors_phy The number of received packets dropped due to length/type errors on a physical port. Error rx_out_of_range_len_phy The number of received packets dropped due to length greater than allowed on a physical port. If this counter is increasing, it implies that the peer connected to the adapter has a larger MTU configured. Using same MTU configuration shall resolve this issue. Error rx_oversize_pkts_phy The number of dropped received packets due to length which exceed MTU size on a physical port. If this counter is increasing, it implies that the peer connected to the adapter has a larger MTU configured. Using same MTU configuration shall resolve this issue. Error rx_symbol_err_phy The number of received packets dropped due to physical coding errors (symbol errors) on a physical port. Error rx_mac_control_phy The number of MAC control packets received on the physical port. Informative tx_mac_control_phy The number of MAC control packets transmitted on the physical port. Informative rx_pause_ctrl_phy The number of link layer pause packets received on a physical port. If this counter is increasing, it implies that the network is congested and cannot absorb the traffic coming from to the adapter. Informative tx_pause_ctrl_phy The number of link layer pause packets transmitted on a physical port. If this counter is increasing, it implies that the NIC is congested and cannot absorb the traffic coming from the network. Informative rx_unsupported_op_phy The number of MAC control packets received with unsupported opcode on a physical port. Error rx_discards_phy The number of received packets dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. Error tx_discards_phy The number of packets which were discarded on transmission, even no errors were detected. the drop might occur due to link in down state, head of line drop, pause from the network, etc. Error tx_errors_phy The number of transmitted packets dropped due to a length which exceed MTU size on a physical port. Error rx_undersize_pkts_phy The number of received packets dropped due to length which is shorter than 64 bytes on a physical port. If this counter is increasing, it implies that the peer connected to the adapter has a non-standard MTU configured or malformed packet had arrived. Error rx_fragments_phy The number of received packets dropped due to a length which is shorter than 64 bytes and has FCS error on a physical port. If this counter is increasing, it implies that the peer connected to the adapter has a non-standard MTU configured. Error rx_jabbers_phy The number of received packets d due to a length which is longer than 64 bytes and had FCS error on a physical port. Error rx_64_bytes_phy The number of packets received on the physical port with size of 64 bytes. Informative rx_65_to_127_bytes_phy The number of packets received on the physical port with size of 65 to 127 bytes. Informative rx_128_to_255_bytes_phy The number of packets received on the physical port with size of 128 to 255 bytes. Informative rx_256_to_511_bytes_phy The number of packets received on the physical port with size of 256 to 512 bytes. Informative rx_512_to_1023_bytes_phy The number of packets received on the physical port with size of 512 to 1023 bytes. Informative rx_1024_to_1518_bytes_phy The number of packets received on the physical port with size of 1024 to 1518 bytes. Informative rx_1519_to_2047_bytes_phy The number of packets received on the physical port with size of 1519 to 2047 bytes. Informative rx_2048_to_4095_bytes_phy The number of packets received on the physical port with size of 2048 to 4095 bytes. Informative rx_4096_to_8191_bytes_phy The number of packets received on the physical port with size of 4096 to 8191 bytes. Informative rx_8192_to_10239_bytes_phy The number of packets received on the physical port with size of 8192 to 10239 bytes. Informative link_down_events_phy The number of times where the link operative state changed to down. In case this counter is increasing it may imply on port flapping. You may need to replace the cable/transceiver. Error rx_out_of_buffer Number of times receive queue had no software buffers allocated for the adapter's incoming traffic. Error module_bus_stuck The number of times that module's I2C bus (data or clock) short-wire was detected. You may need to replace the cable/transceiver. Info Supported from kernel 4.10. Error module_high_temp The number of times that the module temperature was too high. If this issue persists, you may need to check the ambient temperature or replace the cable/transceiver module. Info Supported from kernel 4.10. Error module_bad_shorted The number of times that the module cables were shorted. You may need to replace the cable/transceiver module. Info Supported from kernel 4.10. Error module_unplug The number of times that module was ejected. Info Supported from kernel 4.10. Informative rx_buffer_passed_thres_phy The number of events where the port receive buffer was over 85% full. Info Supported from kernel 4.14. Informative tx_pause_storm_warning_events The number of times the device was sending pauses for a long period of time. Info Supported from kernel 4.15. Informative tx_pause_storm_error_events The number of times the device was sending pauses for a long period of time, reaching time out and disabling transmission of pause frames. on the period where pause frames were disabled, drop could have been occurred. Info Supported from kernel 4.15. Error rx[i]_buff_alloc_err / rx_buff_alloc_err Failed to allocate a buffer to received packet (or SKB) on port (or per ring) Error rx_bits_phy This counter provides information on the total amount of traffic that could have been received and can be used as a guideline to measure the ratio of errored traffic in rx_pcs_symbol_err_phy and rx_corrected_bits_phy . Informative rx_pcs_symbol_err_phy This counter counts the number of symbol errors that wasn’t corrected by FEC correction algorithm or that FEC algorithm was not active on this interface. If this counter is increasing, it implies that the link between the NIC and the network is suffering from high BER, and that traffic is lost. You may need to replace the cable/transceiver. The error rate is the number of rx_pcs_symbol_err_phy divided by the number of rx_phy_bits on a specific time frame. Error rx_corrected_bits_phy The number of corrected bits on this port according to active FEC (RS/FC). If this counter is increasing, it implies that the link between the NIC and the network is suffering from high BER. The corrected bit rate is the number of rx_corrected_bits_phy divided by the number of rx_phy_bits on a specific time frame Error phy_raw_errors_lane[l] This counter counts the number of physical raw errors per lane [l] index. The counter counts errors before FEC corrections. If this counter is increasing, it implies that the link between the NIC and the network is suffering from high BER, and that traffic might be lost. You may need to replace the cable/transceiver. Please check in accordance with rx_corrected_bits_phy . Info Supported from kernel 4.20. Error

The following counters are physical port counters that being counted per L2 priority (0-7).

Info p in the counter name represents the priority.

Counter Description Type rx_prio[p]_bytes The number of bytes received with priority p on the physical port. Informative rx_prio[p]_packets The number of packets received with priority p on the physical port. Informative tx_prio[p]_bytes The number of bytes transmitted on priority p on the physical port. Informative tx_prio[p]_packets The number of packets transmitted on priority p on the physical port. Informative rx_prio[p]_pause The number of pause packets received with priority p on a physical port. If this counter is increasing, it implies that the network is congested and cannot absorb the traffic coming from the adapter. Note: This counter is available only if PFC was enabled on priority p. Refer to HowTo Configure PFC on ConnectX-4. Informative rx_prio[p]_pause_duration The duration of pause received (in microSec) on priority p on the physical port. The counter represents the time the port did not send any traffic on this priority. If this counter is increasing, it implies that the network is congested and cannot absorb the traffic coming from the adapter. Note: This counter is available only if PFC was enabled on priority p. Refer to HowTo Configure PFC on ConnectX-4. Informative rx_prio[p]_pause_transition The number of times a transition from Xoff to Xon on priority p on the physical port has occurred. Note: This counter is available only if PFC was enabled on priority p. Refer to HowTo Configure PFC on ConnectX-4. Informative tx_prio[p]_pause The number of pause packets transmitted on priority p on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. Note: This counter is available only if PFC was enabled on priority p. Refer to HowTo Configure PFC on ConnectX-4. Informative tx_prio[p]_pause_duration The duration of pause transmitter (in microSec) on priority p on the physical port. Note: This counter is available only if PFC was enabled on priority p. Refer to HowTo Configure PFC on ConnectX-4. Informative rx_prio[p]_buf_discard The number of packets discarded by device due to lack of per host receive buffers. Info Supported from kernel 5.3. Informative rx_prio[p]_cong_discard The number of packets discarded by device due to per host congestion. Info Supported from kernel 5.3. Informative rx_prio[p]_marked The number of packets ecn marked by device due to per host congestion. Info Supported from kernel 5.3. Informative rx_prio[p]_discard The number of packets discarded by device due to lack of receive buffers. Info Supported from kernel 5.6. Infornative

Counter Description Type rx_pci_signal_integrity Counts physical layer PCIe signal integrity errors, the number of transitions to recovery due to Framing errors and CRC (dlp and tlp). If this counter is raising, try moving the adapter card to a different slot to rule out a bad PCIe slot. Validate that you are running with the latest firmware available and latest server BIOS version. Error tx_pci_signal_integrity Counts physical layer PCIe signal integrity errors, the number of transition to recovery initiated by the other side (moving to recovery due to getting TS/EIEOS). If this counter is raising, try moving the adapter card to a different slot to rule out a bad PCI slot. Validate that you are running with the latest firmware available and latest server BIOS version. Error outbound_pci_buffer_overflow The number of packets dropped due to pci buffer overflow. If this counter is raising in high rate, it might indicate that the receive traffic rate for a host is larger than the PCIe bus and therefore a congestion occurs. Info Supported from kernel 4.14. Informative outbound_pci_stalled_rd The percentage (in the range 0...100) of time within the last second that the NIC had outbound non-posted reads requests but could not perform the operation due to insufficient posted credits. Info Supported from kernel 4.14. Informative outbound_pci_stalled_wr The percentage (in the range 0...100) of time within the last second that the NIC had outbound posted writes requests but could not perform the operation due to insufficient posted credits. Info Supported from kernel 4.14. Informative outbound_pci_stalled_rd_events The number of seconds where outbound_pci_stalled_rd was above 30%. Info Supported from kernel 4.14. Informative outbound_pci_stalled_wr_events The number of seconds where outbound_pci_stalled_wr was above 30%. Info Supported from kernel 4.14. Informative dev_out_of_buffer The number of times the device owned queue had not enough buffers allocated. Error

Collapse Source Copy Copied! # ethtool -S eth5 NIC statistics: rx_packets: 10 rx_bytes: 3420 tx_packets: 18 tx_bytes: 1296 tx_tso_packets: 0 tx_tso_bytes: 0 tx_tso_inner_packets: 0 tx_tso_inner_bytes: 0 tx_added_vlan_packets: 0 tx_nop: 0 rx_lro_packets: 0 rx_lro_bytes: 0 rx_ecn_mark: 0 rx_removed_vlan_packets: 0 rx_csum_unnecessary: 0 rx_csum_none: 0 rx_csum_complete: 10 rx_csum_unnecessary_inner: 0 rx_xdp_drop: 0 rx_xdp_redirect: 0 rx_xdp_tx_xmit: 0 rx_xdp_tx_full: 0 rx_xdp_tx_err: 0 rx_xdp_tx_cqe: 0 tx_csum_none: 18 tx_csum_partial: 0 tx_csum_partial_inner: 0 tx_queue_stopped: 0 tx_queue_dropped: 0 tx_xmit_more: 0 tx_recover: 0 tx_cqes: 18 tx_queue_wake: 0 tx_udp_seg_rem: 0 tx_cqe_err: 0 tx_xdp_xmit: 0 tx_xdp_full: 0 tx_xdp_err: 0 tx_xdp_cqes: 0 rx_wqe_err: 0 rx_mpwqe_filler_cqes: 0 rx_mpwqe_filler_strides: 0 rx_buff_alloc_err: 0 rx_cqe_compress_blks: 0 rx_cqe_compress_pkts: 0 rx_page_reuse: 0 rx_cache_reuse: 0 rx_cache_full: 0 rx_cache_empty: 2688 rx_cache_busy: 0 rx_cache_waive: 0 rx_congst_umr: 0 rx_arfs_err: 0 ch_events: 75 ch_poll: 75 ch_arm: 75 ch_aff_change: 0 ch_eq_rearm: 0 rx_out_of_buffer: 0 rx_if_down_packets: 15 rx_steer_missed_packets: 0 rx_vport_unicast_packets: 0 rx_vport_unicast_bytes: 0 tx_vport_unicast_packets: 0 tx_vport_unicast_bytes: 0 rx_vport_multicast_packets: 2 rx_vport_multicast_bytes: 172 tx_vport_multicast_packets: 12 tx_vport_multicast_bytes: 936 rx_vport_broadcast_packets: 37 rx_vport_broadcast_bytes: 9270 tx_vport_broadcast_packets: 6 tx_vport_broadcast_bytes: 360 rx_vport_rdma_unicast_packets: 0 rx_vport_rdma_unicast_bytes: 0 tx_vport_rdma_unicast_packets: 0 tx_vport_rdma_unicast_bytes: 0 rx_vport_rdma_multicast_packets: 0 rx_vport_rdma_multicast_bytes: 0 tx_vport_rdma_multicast_packets: 0 tx_vport_rdma_multicast_bytes: 0 tx_packets_phy: 0 rx_packets_phy: 0 rx_crc_errors_phy: 0 tx_bytes_phy: 0 rx_bytes_phy: 0 tx_multicast_phy: 0 tx_broadcast_phy: 0 rx_multicast_phy: 0 rx_broadcast_phy: 0 rx_in_range_len_errors_phy: 0 rx_out_of_range_len_phy: 0 rx_oversize_pkts_phy: 0 rx_symbol_err_phy: 0 tx_mac_control_phy: 0 rx_mac_control_phy: 0 rx_unsupported_op_phy: 0 rx_pause_ctrl_phy: 0 tx_pause_ctrl_phy: 0 rx_discards_phy: 0 tx_discards_phy: 0 tx_errors_phy: 0 rx_undersize_pkts_phy: 0 rx_fragments_phy: 0 rx_jabbers_phy: 0 rx_64_bytes_phy: 0 rx_65_to_127_bytes_phy: 0 rx_128_to_255_bytes_phy: 0 rx_256_to_511_bytes_phy: 0 rx_512_to_1023_bytes_phy: 0 rx_1024_to_1518_bytes_phy: 0 rx_1519_to_2047_bytes_phy: 0 rx_2048_to_4095_bytes_phy: 0 rx_4096_to_8191_bytes_phy: 0 rx_8192_to_10239_bytes_phy: 0 link_down_events_phy: 0 rx_prio0_bytes: 0 rx_prio0_packets: 0 tx_prio0_bytes: 0 tx_prio0_packets: 0 rx_prio1_bytes: 0 rx_prio1_packets: 0 tx_prio1_bytes: 0 tx_prio1_packets: 0 rx_prio2_bytes: 0 rx_prio2_packets: 0 tx_prio2_bytes: 0 tx_prio2_packets: 0 rx_prio3_bytes: 0 rx_prio3_packets: 0 tx_prio3_bytes: 0 tx_prio3_packets: 0 rx_prio4_bytes: 0 rx_prio4_packets: 0 tx_prio4_bytes: 0 tx_prio4_packets: 0 rx_prio5_bytes: 0 rx_prio5_packets: 0 tx_prio5_bytes: 0 tx_prio5_packets: 0 rx_prio6_bytes: 0 rx_prio6_packets: 0 tx_prio6_bytes: 0 tx_prio6_packets: 0 rx_prio7_bytes: 0 rx_prio7_packets: 0 tx_prio7_bytes: 0 tx_prio7_packets: 0 module_unplug: 0 module_bus_stuck: 0 module_high_temp: 0 module_bad_shorted: 0 ch0_events: 9 ch0_poll: 9 ch0_arm: 9 ch0_aff_change: 0 ch0_eq_rearm: 0 ch1_events: 23 ch1_poll: 23 ch1_arm: 23 ch1_aff_change: 0 ch1_eq_rearm: 0 ch2_events: 8 ch2_poll: 8 ch2_arm: 8 ch2_aff_change: 0 ch2_eq_rearm: 0 ch3_events: 19 ch3_poll: 19 ch3_arm: 19 ch3_aff_change: 0 ch3_eq_rearm: 0 ch4_events: 8 ch4_poll: 8 ch4_arm: 8 ch4_aff_change: 0 ch4_eq_rearm: 0 ch5_events: 8 ch5_poll: 8 ch5_arm: 8 ch5_aff_change: 0 ch5_eq_rearm: 0 rx0_packets: 0 rx0_bytes: 0 rx0_csum_complete: 0 rx0_csum_unnecessary: 0 rx0_csum_unnecessary_inner: 0 rx0_csum_none: 0 rx0_xdp_drop: 0 rx0_xdp_redirect: 0 rx0_lro_packets: 0 rx0_lro_bytes: 0 rx0_ecn_mark: 0 rx0_removed_vlan_packets: 0 rx0_wqe_err: 0 rx0_mpwqe_filler_cqes: 0 rx0_mpwqe_filler_strides: 0 rx0_buff_alloc_err: 0 rx0_cqe_compress_blks: 0 rx0_cqe_compress_pkts: 0 rx0_page_reuse: 0 rx0_cache_reuse: 0 rx0_cache_full: 0 rx0_cache_empty: 448 rx0_cache_busy: 0 rx0_cache_waive: 0 rx0_congst_umr: 0 rx0_arfs_err: 0 rx0_xdp_tx_xmit: 0 rx0_xdp_tx_full: 0 rx0_xdp_tx_err: 0 rx0_xdp_tx_cqes: 0 rx1_packets: 10 rx1_bytes: 3420 rx1_csum_complete: 10 rx1_csum_unnecessary: 0 rx1_csum_unnecessary_inner: 0 rx1_csum_none: 0 rx1_xdp_drop: 0 rx1_xdp_redirect: 0 rx1_lro_packets: 0 rx1_lro_bytes: 0 rx1_ecn_mark: 0 rx1_removed_vlan_packets: 0 rx1_wqe_err: 0 rx1_mpwqe_filler_cqes: 0 rx1_mpwqe_filler_strides: 0 rx1_buff_alloc_err: 0 rx1_cqe_compress_blks: 0 rx1_cqe_compress_pkts: 0 rx1_page_reuse: 0 rx1_cache_reuse: 0 rx1_cache_full: 0 rx1_cache_empty: 448 rx1_cache_busy: 0 rx1_cache_waive: 0 rx1_congst_umr: 0 rx1_arfs_err: 0 rx1_xdp_tx_xmit: 0 rx1_xdp_tx_full: 0 rx1_xdp_tx_err: 0 rx1_xdp_tx_cqes: 0 rx2_packets: 0 rx2_bytes: 0 rx2_csum_complete: 0 rx2_csum_unnecessary: 0 rx2_csum_unnecessary_inner: 0 rx2_csum_none: 0 rx2_xdp_drop: 0 rx2_xdp_redirect: 0 rx2_lro_packets: 0 rx2_lro_bytes: 0 rx2_ecn_mark: 0 rx2_removed_vlan_packets: 0 rx2_wqe_err: 0 rx2_mpwqe_filler_cqes: 0 rx2_mpwqe_filler_strides: 0 rx2_buff_alloc_err: 0 rx2_cqe_compress_blks: 0 rx2_cqe_compress_pkts: 0 rx2_page_reuse: 0 rx2_cache_reuse: 0 rx2_cache_full: 0 rx2_cache_empty: 448 rx2_cache_busy: 0 rx2_cache_waive: 0 rx2_congst_umr: 0 rx2_arfs_err: 0 rx2_xdp_tx_xmit: 0 rx2_xdp_tx_full: 0 rx2_xdp_tx_err: 0 rx2_xdp_tx_cqes: 0 ... tx0_packets: 1 tx0_bytes: 60 tx0_tso_packets: 0 tx0_tso_bytes: 0 tx0_tso_inner_packets: 0 tx0_tso_inner_bytes: 0 tx0_csum_partial: 0 tx0_csum_partial_inner: 0 tx0_added_vlan_packets: 0 tx0_nop: 0 tx0_csum_none: 1 tx0_stopped: 0 tx0_dropped: 0 tx0_xmit_more: 0 tx0_recover: 0 tx0_cqes: 1 tx0_wake: 0 tx0_cqe_err: 0 tx1_packets: 5 tx1_bytes: 300 tx1_tso_packets: 0 tx1_tso_bytes: 0 tx1_tso_inner_packets: 0 tx1_tso_inner_bytes: 0 tx1_csum_partial: 0 tx1_csum_partial_inner: 0 tx1_added_vlan_packets: 0 tx1_nop: 0 tx1_csum_none: 5 tx1_stopped: 0 tx1_dropped: 0 tx1_xmit_more: 0 tx1_recover: 0 tx1_cqes: 5 tx1_wake: 0 tx1_cqe_err: 0 tx2_packets: 0 tx2_bytes: 0 tx2_tso_packets: 0 tx2_tso_bytes: 0 tx2_tso_inner_packets: 0 tx2_tso_inner_bytes: 0 tx2_csum_partial: 0 tx2_csum_partial_inner: 0 tx2_added_vlan_packets: 0 tx2_nop: 0 tx2_csum_none: 0 tx2_stopped: 0 tx2_dropped: 0 tx2_xmit_more: 0 tx2_recover: 0 tx2_cqes: 0 tx2_wake: 0 tx2_cqe_err: 0 ...

The following TC objects are supported and reported regarding the ingress filters:

Filters flower

Actions mirred tunnel_key



The info is provided as one of the following events:

Basic filter event

Flower/IPv4 filter event

Flower/IPv6 filter event

Basic action event

Mirred action event

Tunnel_key/IPv4 action event

Tunnel_key/IPv6 action event

General notes:

Actions always belong to a filter, so action events share the filter event's ID via the event_id data member

Basic filter event only contains textual kind (so users can see which real life objects' support they are lacking)

Basic action event only contains textual kind and some basic common statistics if available

Amber data for both InfiniBand and Ethernet MST devices in amBER format.

Info MST device names can be found under /dev/mst/ .

Note /dev/mst should be accessible within DTS container.

The following config files are available:

Copy Copied! amber_devices=DEV1,DEV2,DEV3 # Default:all, or set comma separated list of devices under /dev/mst amber_update_interval_sec=30 # Sample rate for collection amber counters





Programmable congestion control counters are based on an algorithm defined by an end-user, although default algorithms are also available.

Counters are collected per MST device and algorithm parameters.

Info MST device names can be found under /dev/mst/ .

Note /dev/mst should be accessible within the DTS container.

The counter list depends on the installed MFT version.

Note /usr/lib64/mft or /usr/lib/mft should be mounted to the DTS container to get the counter list according to the installed MFT version. If not mounted, the internal DTS version of the counters is used.

A comma-separated list of device names is required to enable this provider:

Copy Copied! ppcc_eth_devices=mt41692_pciconf0,mt41692_pciconf0.1

The following algorithm parameters are available:

Copy Copied! ppcc_algo_slot=1 ppcc_algo_param_index=0 local_port=1 pnat=0 lp_msb=0

Info For more details, consult the official PPCC documentation.

Note Some of the algo_slots are not implemented: If there are no counters to collect, the device is ignored

If there are no devices to collect, the provider is disabled





fluent_aggr listens on a port for Fluent Bit Forward protocol input connections. Received data can be streamed via a Fluent Bit exporter.

The default port is 42442. This can be changed by updating the following option:

Copy Copied! fluent-aggr-port=42442





prometheus_aggr polls data from a list of Prometheus endpoints.

Each endpoint is listed in the following format:

Copy Copied! prometheus_aggr_endpoint.{N}={host_name},{host_port_url},{poll_inteval_msec}

Where N starts from 0.

Aggregated data can be exported via a Prometheus Aggr Exporter endpoint.

ifconfig collects network interface data. To enable, set:

Copy Copied! enable-provider=ifconfig

If the Prometheus endpoint is enabled, add the following configuration to cache every collected network interface and arrange the index according to their names:

Copy Copied! prometheus-fset-indexes=name

Metrices are collected for each network interface as follows:

Copy Copied! name rx_packets tx_packets rx_bytes tx_bytes rx_errors tx_errors rx_dropped tx_dropped multicast collisions rx_length_errors rx_over_errors rx_crc_errors rx_frame_errors rx_fifo_errors rx_missed_errors tx_aborted_errors tx_carrier_errors tx_fifo_errors tx_heartbeat_errors tx_window_errors rx_compressed tx_compressed rx_nohandler





hcaperf collects HCA performance data. Since it requires access to an RDMA device, it must use remote collection on the DPU. On the host, the user runs the container in privileged mode and RDMA device mount.

The counter list is device dependent.

To enable hcaperf in remote collection mode, set:

Copy Copied! enable-provider=grpc.hcaperf # specify HCAs to sample grpc.hcaperf.mlx5_0=sample grpc.hcaperf.mlx5_1=sample

Note DPE server should be active before changing the dts_config.ini file. See section "Remote Collection" for details.





To enable hcaperf in regular mode, set:

Copy Copied! enable-provider=hcaperf # specify HCAs to sample hcaperf.mlx5_0=sample hcaperf.mlx5_1=sample

The nvidia-smi provider collects GPU and GPU process information provided by the NVIDIA system management interface.

This provider is supported only on x86_64 hosts with installed GPUs. All GPU cards supported by nvidia-smi are supported by this provider.

The counter list is GPU dependent. Additionally, per-process information is collected for the first 20 (by default) nvidia_smi_max_processes processes.

Counters can be either collected as string data "as is" in nvidia-smi or converted to numbers when nvsmi_with_numeric_fields is set.

To enable nvidia-smi provider and change parameters, set:

Copy Copied! enable-provider=nvidia-smi # Optional parameters: #nvidia_smi_max_processes=20 #nvsmi_with_numeric_fields=1





The dcgm provider collects GPU information provided by the NVIDIA data center GPU manager (DCGM) API.

This provider is supported only on x86_64 hosts with installed GPUs, and requires running the nv-hostengine service (refer to DCGM documentation for details).

DCGM counters are split into several groups by context:

GPU – basic GPU information (always)

COMMON – common fields that can be collected from all devices

PROF – profiling fields

ECC – ECC errors

NVLINK / NVSWITCH / VGPU – fields depending on the device type

To enable DCGM provider and counter groups, set:

Copy Copied! enable-provider=dcgm dcgm_events_enable_common_fields=1 #dcgm_events_enable_prof_fields=0 #dcgm_events_enable_ecc_fields=0 #dcgm_events_enable_nvlink_fields=0 #dcgm_events_enable_nvswitch_fields=0 #dcgm_events_enable_vgpu_fields=0





The bfperf provider collects calculated performance counters of BlueField Arm cores. It requires the executable bfperf_pmc , which is integrated in the DOCA BFB bundle of BlueField-3, as well as an active DPE.

To enable BlueField performance provider, set:

Copy Copied! enable-provider=bfperf

Note When running, the bfperf provider is expected to recurrently reset the counters of the sysfs.hwmon component. Consider disabling it if bfperf is enabled.





Ngauge is comprised of two providers which gather diagnostic data counters from network interface cards (NICs). These providers support the same counters (as defined in a YAML file), but they differ in usage and collection frequency:

Low frequency provider is defined in dts_config.ini and is controlled by DTS collection loop

High frequency provider is defined in dts_high_freq_config.ini and operates in a distinct flow for a limited duration

The fwctl and mlx5_fwctl drivers (supported on NVIDIA networking devices from BlueField-3 and ConnectX-7 and onward) are required for firmware interaction, and are part of MLNX_OFED driver. To load them, run:

Copy Copied! modprobe -a fwctl mlx5_fwctl

Both providers get the counter set from a YAML file.

To enable the Ngauge low frequency provider, set:

Copy Copied! enable-provider=ngauge_low_freq

To verify that the YAML file name matches the connected NIC's type:

Copy Copied! ngauge-yml-file=/config/ngauge_configs/all-single-port.yml

To configure the Ngauge timestamp collection type, set the following:

Copy Copied! ngauge-timestamp-collection-type=<method>

Where <method> can be one of the following:

no_counters – Do not collect timestamp counters. Default.

start_and_end – Collect sample start and end timestamps

per_counter – Collect every counter collection timestamp

To configure the clock firmware should use when collecting time stamps, set the following:

Copy Copied! ngauge-timestamp-source=<clock>

Where <clock> can be one of the following:

RTC - Real-time clock. Default.

RFC - Free-running clock

This provider is designed to support higher sampling frequencies with sub-millisecond resolution. Due to the large scale of the collected data, this provider is aimed to run ad-hoc, for a limited time period, unlike the usual DTS providers which are configured with the DTS configuration file /opt/mellanox/doca/services/telemetry/config/dts_config.ini .

If the DTS standard flow constitutes an endless collect-export loop, then High Frequency Telemetry (HFT) is an additional external flow designed for the Ngauge high-frequency provider, based on the HFT configuration, located in /opt/mellanox/doca/services/telemetry/config/dts_high_freq_config.ini . This file defines the HFT session timing parameters, provider settings, and export settings. This means that an HFT session can export to different endpoints and/or protocols than those DTS used in the standard collection loop.The standard DTS configuration file references the HFT configuration file, enabling DTS to monitor the file's status. The HFT configuration file is also the trigger for the HFT session. That is, when the HFT configuration file is modified, the current HFT session is removed, and a new HFT session is configured (if defined). Removing the HFT configuration file stops pending sessions.

This table provides the details of the required HFT parameters. Refer to section "HFT Configuration File Example" for more helpful tips.

Option Description start-time HFT session start time. If not used, the session starts immediately. UTC epoch timestamp (in microseconds). Syntax: HH:MM:SS / HH:MM end-time HFT session end time. Ignored if start-time is missing. If not used, end-time is calculated using num-iterations . UTC epoch timestamp (in microseconds). Syntax: HH:MM:SS / HH:MM num-iterations Number of iterations. If not used, start-time and end-time are required, and the number of iterations is calculated. sample-time-us Time interval between iterations (in microseconds) provider Provider to use. Should be ngauge_high_freq . file-write Whether to write collected telemetry to files. If enabled, could potentially write several MB of data every second. data-root Root folder for file writing. Ignored if file-write=false . provider.ngauge-num-samples Number of samples to collect in one iteration. Affects the buffer used by the firmware for diagnostic data. provider.ngauge-sample-period Sample period between samples (in nanoseconds). This option specifies the sample interval per iteration, as the provider collects N samples during each iteration. provider.ngauge-yml-file The Ngauge counters YAML file to use

Both low and high frequency providers can run concurrently. The low frequency provider samples at the DTS standard frequency (defined in dts_config.ini ), and the high frequency provider samples counters based on the HFT configuration file ( dts_high_freq_config.ini ).

To allow both providers to run concurrently, verify that the counters, the timestamp collection type, and the timestamp collection source are identical. Otherwise, when the high frequency provider starts sampling, the low frequency provider hangs until the end of the HFT session.

Collapse Source Copy Copied! ## DTS configuration file for ad-hoc high frequency collection ## When modified, the file is parsed and applied. ## Note that the folders path is the container path, not the host path. ## Each section defines a collection. A file may have several sections, each one defines a high frequency collection. ## Section names must be unique and will be used as collection name by clx. [hft-collection-session] ### Time between samples in microseconds sample-time-us= 100000 ### Start time of high frequency collection. Can be in the format HH:MM:SS or HH:MM or as epoch timestamp in microseconds ### Note - in container, the time is in UTC start-time= 18 : 00 : 00 ### End time of high frequency collection. Can be in the format HH:MM:SS or HH:MM or as epoch timestamp in microseconds ### Note - in container, the time is in UTC end-time= 18 : 01 : 00 ### Alternatively, you can set the number of iterations. This and start_time field will determine the end time #num-iterations= 300 ### Data provider to use provider=ngauge_high_freq ### Write data to file system. Could potentially fill up the disk file-write= false ### Root directory to store the data # Ignored if file-write is set to false data-root=/data ### Enable busy wait between iterations, for a more accurate sample time ( default is false ) #busy-wait-sampling= true ### Set prometheus endpoint to enable http endpoint #prometheus-endpoint=http: ### Set fluentbit config dir to enable fluentbit export #fluentbit-config-dir=/config/fluent_bit_configs ### Set open telemetry receiver to enable open telemetry export #open-telemetry-receiver=http: ### Set remote write receiver to enable remote write export #remote-write-receiver=http: ### Provider specific parameters. Format is 'provider.$KEY=$VALUE' . ### The options below are specific to the ngauge high frequency provider # Number of samples to collect on each iteration provider.ngauge-num-samples= 1000 # The time period (in nanoseconds) between samples provider.ngauge-sample-period-nsec= 1000 # The YAML file with the configuration for the ngauge provider provider.ngauge-yml-file=/config/ngauge_configs/all-dual-port.yml # Ngauge timestamp collection type. Options are [ 'no_counters' , 'start_and_end' , 'per_counter' ]. default : 'no_counters' #provider.ngauge-timestamp-collection-type=start_and_end # Ngauge timestamp source. Options are [ 'RTC' , 'FRC' ]. default : 'RTC' #provider.ngauge-timestamp-source=FRC

For Ngauge compatibility, the counter set is defined in a YAML file.

There are 4 existing YAML files within a DTS container (one per permutation of BlueField-3 and ConnectX-7 with dual or single ports). The path to the YAMLs folder is /opt/mellanox/doca/services/telemetry/config/ngauge_configs which is mounted to /config/ngauge_configs .

By default, YAML files include a counter set that is not device-specific. This implies that the same counter set is utilized across all devices by default.

It is possible to assign a specific device within a YAML file; however, this requires maintaining a separate copy of the YAML file for each device. To manage multiple devices, use the ngauge-yml-dir option to specify a directory for YAML files, where each .yml / .yaml file is utilized. This folder should be available to the container under /opt/mellanox/doca/services/telemetry/config .

The following list describes the expected entries in the YAML file:

counters – sequence of counters to collect id – counter data ID desc – counter description (optional) unit – name of unit to collect from (optional) name – name of counter to use (optional). If not specified, the generated name is based on the counter description. Otherwise, it is based on the data ID.

device – name of the mlx device to collect (optional). If not used, the provider requires a single file containing a list of counters, which it then applies to all available devices on the host.

The following is the default all-dual-port.yml provided in DTS:

Collapse Source Copy Copied! counters: - id: 0x1020000100000000 desc: RX bytes port 0 unit: RX port - id: 0x1020000100000001 desc: RX bytes port 1 unit: RX port - id: 0x1020000300000000 desc: RX packets port 0 unit: RX port - id: 0x1020000300000001 desc: RX packets port 1 unit: RX port - id: 0x1140000100000000 desc: TX bytes port 0 unit: TX port - id: 0x1140000100000001 desc: TX bytes port 1 unit: TX port - id: 0x1140000300000000 desc: TX packets port 0 unit: TX port - id: 0x1140000300000001 desc: TX packets port 1 unit: TX port - id: 0x1100000100000000 desc: CNP sent packets port 0 unit: TX Transport - id: 0x1100000100000001 desc: CNP sent packets port 1 unit: TX Transport - id: 0x1080000400000000 desc: CNP handled packets port 0 unit: RX Transport - id: 0x1080000400000001 desc: CNP handled packets port 1 unit: RX Transport - id: 0x1080000500000000 desc: ECN RoCE packets port 0 unit: RX Transport - id: 0x1080000500000001 desc: ECN RoCE packets port 1 unit: RX Transport - id: 0x1160000b00000000 desc: PCIe link latency total read ns unit: PCIe cutoff_min: 1 cutoff_max: 2e6 - id: 0x1160000c00000000 desc: PCIe link latency total read packets unit: PCIe cutoff_min: 1 cutoff_max: 3000 - id: 0x1160000d00000000 desc: PCIe link latency max read ns unit: PCIe cutoff_min: 1 cutoff_max: 3000 - id: 0x1160000e00000000 desc: PCIe link latency min read ns unit: PCIe cutoff_min: 1 cutoff_max: 3000

Info The NVIDIA Adapters Programmer's Reference Manual (PRM) "Diagnostic Data" section defines the rules for data IDs.

The following counters are available from the DTS default YAML files (and correspond the YAML file example):

Copy Copied! cnp_handled_packets_port_0 cnp_handled_packets_port_1 cnp_sent_packets_port_0 cnp_sent_packets_port_1 ecn_roce_packets_port_0 ecn_roce_packets_port_1 pcie_link_latency_max_read_ns pcie_link_latency_min_read_ns pcie_link_latency_total_read_ns pcie_link_latency_total_read_packets rx_bytes_port_0 rx_bytes_port_1 rx_packets_port_0 rx_packets_port_1 tx_bytes_port_0 tx_bytes_port_1 tx_packets_port_0 tx_packets_port_1

DTS can send the collected data to the following outputs:

Data writer (saves binary data to disk)

Fluent Bit (push-model streaming)

Prometheus endpoint (keeps the most recent data to be pulled)

The data writer is disabled by default to save space on BlueField. Steps for activating data write during debug can be found under section Enabling Data Write.

The schema folder contains JSON-formatted metadata files which allow reading the binary files containing the actual data. The binary files are written according to the naming convention shown in the following example ( apt install tree ):

Copy Copied! tree /opt/mellanox/doca/services/telemetry/data/ /opt/mellanox/doca/services/telemetry/data/ ├── {year} │ └── {mmdd} │ └── {hash} │ ├── {source_id} │ │ └── {source_tag}{timestamp}.bin │ └── {another_source_id} │ └── {another_source_tag}{timestamp}.bin └── schema └── schema_{MD5_digest}.json

New binary files appears when the service starts or when binary file age/size restriction is reached. If no schema or no data folders are present, refer to the Troubleshooting section.

Note source_id is usually set to the machine hostname. source_tag is a line describing the collected counters, and it is often set as the provider's name or name of user-counters.

Reading the binary data can be done from within the DTS container using the following command:

Copy Copied! crictl exec -it <Container ID> /opt/mellanox/collectx/bin/clx_read -s /data/schema /data/path/to/datafile.bin

Note The path to the data file must be an absolute path.

Example output:

Copy Copied! { "timestamp": 1634815738799728, "event_number": 0, "iter_num": 0, "string_number": 0, "example_string": "example_str_1" } { "timestamp": 1634815738799768, "event_number": 1, "iter_num": 0, "string_number": 1, "example_string": "example_str_2" } …





The Prometheus endpoint keeps the most recent data to be pulled by the Prometheus server and is enabled by default.

To check that data is available, run the following command on BlueField:

Copy Copied! curl -s http://0.0.0.0:9100/metrics

The command dumps every counter in the following format:

Copy Copied! counter_name {list of meta fields} counter_value timestamp

Additionally, endpoint supports JSON and CSV formats:

Copy Copied! curl -s http://0.0.0.0:9100/json/metrics curl -s http://0.0.0.0:9100/csv/metrics

Note The default port for Prometheus can be changed in dts_config.ini .





Prometheus is configured as a part of dts_config.ini .

By default, the Prometheus HTTP endpoint is set to port 9100. Comment this line out to disable Prometheus export.

Copy Copied! prometheus=http://0.0.0.0:9100

Prometheus can use the data field as an index to keep several data records with different index values. Index fields are added to Prometheus labels.

Copy Copied! # Comma-separated counter set description for Prometheus indexing: #prometheus-indexes=idx1,idx2 # Comma-separated fieldset description for prometheus indexing #prometheus-fset-indexes=idx1,idx2

The default fset index is device_name . It allows Prometheus to keep ethtool data up for both the p0 and p1 devices.

Copy Copied! prometheus-fset-indexes=device_name

If fset index is not set, the data from p1 overwrites p0 's data.

For quick name filtering, the Prometheus exporter supports being provided with a comma-separated list of counter names to be ignored:

Copy Copied! #prometheus-ignore-names=counter_name1,counter_name_2

For quick filtering of data by tag, the Prometheus exporter supports being provided with a comma-separated list of data source tags to be ignored.

Users should add tags for all streaming data since the Prometheus exporter cannot be used for streaming. By default, FI_metrics are disabled.

Copy Copied! prometheus-ignore-tags=FI_metrics





Prometheus aggregator exporter is an endpoint that keeps the latest aggregated data using prometheus_aggr .

This exporter labels data according to its source.

To enable this provider, users must set 2 parameters in dts_config.ini :

Copy Copied! prometheus-aggr-exporter-host=0.0.0.0 prometheus-aggr-exporter-port=33333





Fluent Bit allows streaming to multiple destinations. Destinations are configured in .exp files that are documented in-place and can be found under:

Copy Copied! /opt/mellanox/doca/services/telemetry/config/fluent_bit_configs

Fluent Bit allows exporting data via "Forward" protocol which connects to the Fluent Bit/FluentD instance on customer side.

Export can be enabled manually:

Uncomment the line with fluent_bit_configs=… in dts_config.ini . Set enable=1 in required .exp files for the desired plugins. Additional configurations can be set according to instructions in the .exp file if needed. Restart the DTS. Set up receiving instance of Fluent Bit/FluentD if needed. See the data on the receiving side.

Export file destinations are set by configuring .exp files or creating new ones. It is recommended to start by going over documented example files. Documented examples exist for the following supported plugins:

forward

file

stdout

kafka

es (elastic search)

influx

Note All .exp files are disabled by default if not configured by initContainer entry point through .yaml file.

Note To forward the data to several destinations, create several forward_{num}.exp files. Each of these files must have their own destination host and port.

Each export destination has the following fields:

name – configuration name

plugin_name – Fluent Bit plugin name

enable – 1 or 0 values to enable/disable this destination

host – the host for Fluent Bit plugin

port – port for Fluent Bit plugin

msgpack_data_layout – the msgpacked data format. Default is flb_std . The other option is custom. See section Msgpack Data Layout for details.

plugin_key=val – key-value pairs of Fluent Bit plugin parameter (optional)

counterset / fieldset – file paths (optional). See details in section Cset/Fset Filtering.

source_tag=source_tag1,source_tag2 – comma-separated list of data page source tags for filtering. The rest tags are filtered out during export. Event tags are event provider names. All counters can be enabled/disabled only simultaneously with a counters keyword.

Note Use # to comment a configuration line.





Data layout can be configured using .exp files by setting msgpack_data_layout=layout . There are two available layouts: Standard and Custom.

The standard flb_std data layout is an array of 2 fields:

timestamp double value

a plain dictionary (key-value pairs)

The standard layout is appropriate for all Fluent Bit plugins. For example:

Copy Copied! [timestamp_val, {"timestamp"->ts_val, type=>"counters/events", "source"=>"source_val", "key_1"=>val_1, "key_2"=>val_2,...}]

The custom data layout is a dictionary of meta-fields and counter fields. Values are placed into a separate plain dictionary. Custom data format can be dumped with stdout_raw output plugin of Fluent-Bit installed or can be forwarded with forward output plugin.

Counters example:

Copy Copied! {"timestamp"=>timestamp_val, "type"=>"counters", "source"=>"source_val", "values"=> {"key_1"=>val_1, "key_2"=>val_2,...}}

Events example:

Copy Copied! {"timestamp"=>timestamp_val, "type"=>"events", "type_name"=>"type_name_val", "source"=>" source_val", "values"=>{"key_1"=>val_1, "key_2"=>val_2,...}}





Each export file can optionally use one cset and one fset file to filter UFM telemetry counters and events data.

cset contains tokens per line to filter data with "type"="counters" .

fset contains several blocks started with the header line [event_type_name] and tokens under that header. An Fset file is used to filter data with "type"="events" . Note Event type names could be prefixed to apply the same tokens to all fitting types. For example, to filter all ethtool events, use [ethtool_event_*] .

If several tokens must be matched simultaneously, use <tok1>+<tok2>+<tok3> . Exclusive tokens are available as well. For example, the line <tok1>+<tok2>-<tok3>-<tok4> filters names that match both tok1 and tok2 and do not match tok3 or tok4.

The following are the details of writing cset files:

Copy Copied! # Put tokens on separate lines # Tokens are the actual name 'fragments' to be matched # port$ # match names ending with token "port" # ^port # match names starting with token "port" # ^port$ # include name that is exact token "port # port+xmit # match names that contain both tokens "port" and "xmit" # port-support # match names that contain the token "port" and do not match the "-" token "support" # # Tip: To disable counter export put a single token line that fits nothing

The following are the details of writing fset files:

Copy Copied! # Put your events here # Usage: # # [type_name_1] # tokens # [type_name_2] # tokens # [type_name_3] # tokens # ... # Tokens are the actual name 'fragments' to be matched # port$ # match names ending with token "port" # ^port # match names starting with token "port" # ^port$ # include name that is exact token "port # port+xmit # match names that contain both tokens "port" and "xmit" # port-support # match names that contain the token "port" and do not match the "-" token "support" # The next example will export all the "tc" events and all events with type prefix "ethtool_" "ethtool" are filtered with token "port": # [tc] # # [ethtool_*] # packet # To know which event type names are available check export and find field "type_name"=>"ethtool_event_p0" # ... # Corner cases: # 1. Empty fset file will export all events. # 2. Tokens written above/without [event_type] will be ignored. # 3. If cannot open fset file, warning will be printed, all event types will be exported.

NetFlow exporter must be used when data is collected as NetFlow packets from the telemetry client applications. In this case, DOCA Telemetry Exporter NetFlow API sends NetFlow data packages to DTS via IPC. DTS uses NetFlow exporter to send data to the NetFlow collector (3rd party service).

To enable NetFlow exporter, set netflow-collector-ip and netflow-collector-port in dts_config.ini . netflow-collector-ip could be set either to IP or an address.