Port Counters

ibdiagnet collects and processes standard InfiniBand port counters and vendor-specific port counters. The following counters are collected by the ibdiagnet:

  • PortCounters (collected by default)

  • PortCountersExtended (collected by default)

  • PortRcvErrorDetails (collected by default)

  • PortXmitDiscardDetails (collected by default)

  • LLRCounters (collected by default from supporting devices, ConnectX3/SwicthX only)

  • PerSL/VL counters (for supporting devices when corresponding option is specified)

  • PortExtendedSpeedCounters (for supporting devices when corresponding option is specified)

  • Mellanox Diagnostic Counters (for supporting devices when corresponding option is specified)

The following options are applicable when port counters are collected and processed by ibdiagnet:

Parameter

Description

Example

--per_slvl_cntrs

Provides a report of all per sl/vl port counters (if supported by devices)
These counters are reported in ibdiagnet2.db_csv file.

-

--sc

Provides a report of NVIDIA Diagnostic counters in ibdiagnet2.mlnx_cntrs file and ibdiagnet2.db_csv.

-

--scr

Resets all the NVIDIA Diagnostic counters (should be used with -sc option).

ibdiagnet --scr --sc

--extended_speeds <dev-type>

Collects and tests port extended speeds counters.
Supported dev-type:

  • sw (switch only)

  • all (all devices)

These counters are reported in ibdiagnet2.db_csv file (PM_INFO section).

-

--pm_per_lane

Lists all counters per lane (if supported by devices). Should be used on combination with --extended_speeds.

ibdiagnet --extended_speeds all --pm_per_lane

--pm_get_all

Get all PM counters. activate the following flags:
--per_slvl_cntrs
--sc
--extended_speeds all
--pm_per_lane

ibdiagnet --pm_get_all

-P | -counter <<PM>=<value>>

If any of the provided PM is greater than its provided value, then print it to ibdiagnet2.log file.

ibdiagnet -P vl15_dropped=1, port_xmit_discard=1

or

ibdiagnet -P vl15_dropped=1 -P port_xmit_discard=1

Supported PM Counter names are:

  • symbol_error_counter

  • port_rcv_remote_physical_errors

  • port_rcv_errors

  • port_xmit_discard

  • port_rcv_switch_relay_errors

  • vl15_dropped

  • link_error_recovery_counter

  • link_down_counter

  • port_xmit_constraint_errors

  • port_rcv_constraint_errors

  • local_link_integrity_errors

  • excessive_buffer_errors

  • port_xmit_data

  • port_rcv_data

  • port_xmit_pkts

  • port_rcv_pkts

  • port_xmit_wait

  • port_xmit_data_extended

  • port_rcv_data_extended

  • port_xmit_pkts_extended

  • port_rcv_pkts_extended

  • port_unicast_xmit_pkts

  • port_unicast_rcv_pkts

  • port_multicast_xmit_pkts

  • port_multicast_rcv_pkts

  • sync_header_err_cnt

  • unknown_block_cnt

  • error_detection_counter_lane0

  • error_detection_counter_lane1

  • .....

  • error_detection_counter_lane11

  • fec_correctable_block_counter_lane0

  • fec_correctable_block_counter_lane1

  • .....

  • fec_correctable_block_counter_lane11

  • fec_uncorrectable_block_counter_lane0

  • fec_uncorrectable_block_counter_lane1

  • .....

  • fec_uncorrectable_block_counter_lane11

  • port_rcv_cells

  • port_rcv_cell_for_retry

  • port_rcv_retry

  • port_xmit_cells

  • port_xmit_retry_cells

  • port_xmit_retry

  • port_symbol_error

  • port_error_detection_counter_lane0

  • .....

  • port_error_detection_counter_lane3

  • max_retransmission_rate

  • retransmission_per_sec

  • fec_corrected_symbol_counter_lane0

  • fec_corrected_symbol_counter_lane1

  • ......

  • fec_corrected_symbol_counter_lane11

  • port_fec_correctable_block_counter

  • port_fec_uncorrectable_block_counter

  • port_fec_corrected_symbol_counter

  • all

Parameter

Description

--pc

Resets all fabric IB spec compliant port counters (PortCounters and PortCountersExtended), RN, AR and HBF counters.

Note: It is recommended to use this option with –reset_phy_info, as both options have cross counters and using only one of them can be confusing on the next iteration of counters or registers collection.

--pm_clear_all

Clear all PM counters. activate the following flags:
--scr
--p

Parameter

Description

Example

--pm_pause_time <seconds>

Specifies a delay (in seconds) between counters samples. If set to 0, only single sampling is performed. (default - 1 second)
The delta between the first and the second counter samples will be written to the PM_DELTA section in db_csv file.

ibdiagnet --pm_pause_time 60

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.