image image image image image

On This Page

ibdiagnet collects and processes standard InfiniBand port counters and vendor-specific port counters. The following counters are collected by the ibdiagnet:

  • PortCounters (collected by default)
  • PortCountersExtended (collected by default)
  • PortRcvErrorDetails (collected by default)
  • PortXmitDiscardDetails (collected by default)
  • LLRCounters (collected by default from supporting devices, ConnectX3/SwicthX only)
  • PerSL/VL counters (for supporting devices when corresponding option is specified)
  • PortExtendedSpeedCounters (for supporting devices when corresponding option is specified)
  • Mellanox Diagnostic Counters (for supporting devices when corresponding option is specified)

Port Counter Types

The following options are applicable when port counters are collected and processed by ibdiagnet:

ParameterDescriptionExample
--per_slvl_cntrsProvides a report of all per sl/vl port counters (if supported by devices)
These counters are reported in ibdiagnet2.db_csv file.
-
--scProvides a report of NVIDIA Diagnostic counters in ibdiagnet2.mlnx_cntrs file and ibdiagnet2.db_csv.-
--scrResets all the NVIDIA Diagnostic counters (should be used with -sc option).ibdiagnet --scr --sc
--extended_speeds <dev-type>

Collects and tests port extended speeds counters.
Supported dev-type:

  • sw (switch only)
  • all (all devices)

These counters are reported in ibdiagnet2.db_csv file (PM_INFO section).

-
--pm_per_laneLists all counters per lane (if supported by devices). Should be used on combination with --extended_speeds.ibdiagnet --extended_speeds all --pm_per_lane
--pm_get_allGet all PM counters. activate the following flags:
--per_slvl_cntrs
--sc
--extended_speeds all
--pm_per_lane
ibdiagnet --pm_get_all
-P | -counter <<PM>=<value>>

If any of the provided counter is greater than its provided value, then print it.

If 'all' is used, all counters get the same threshold (0 by default).

ibdiagnet -P vl15_dropped=1, port_xmit_discard=1

or

ibdiagnet -P vl15_dropped=1 -P port_xmit_discard=1

or

ibdiagnet -P all

Supported PM Counter names are:

  • symbol_error_counter
  • port_rcv_remote_physical_errors
  • port_rcv_errors
  • port_xmit_discard
  • port_rcv_switch_relay_errors
  • vl15_dropped
  • link_error_recovery_counter
  • link_down_counter
  • port_xmit_constraint_errors
  • port_rcv_constraint_errors
  • local_link_integrity_errors
  • excessive_buffer_errors
  • port_xmit_data
  • port_rcv_data
  • port_xmit_pkts
  • port_rcv_pkts
  • port_xmit_wait
  • port_xmit_data_extended
  • port_rcv_data_extended
  • port_xmit_pkts_extended
  • port_rcv_pkts_extended
  • port_unicast_xmit_pkts
  • port_unicast_rcv_pkts
  • port_multicast_xmit_pkts
  • port_multicast_rcv_pkts
  • sync_header_err_cnt
  • unknown_block_cnt
  • error_detection_counter_lane0
  • error_detection_counter_lane1
  • .....
  • error_detection_counter_lane11
  • fec_correctable_block_counter_lane0
  • fec_correctable_block_counter_lane1
  • .....
  • fec_correctable_block_counter_lane11
  • fec_uncorrectable_block_counter_lane0
  • fec_uncorrectable_block_counter_lane1
  • .....
  • fec_uncorrectable_block_counter_lane11
  • port_rcv_cells
  • port_rcv_cell_for_retry
  • port_rcv_retry
  • port_xmit_cells
  • port_xmit_retry_cells
  • port_xmit_retry
  • port_symbol_error
  • port_error_detection_counter_lane0
  • .....
  • port_error_detection_counter_lane3
  • max_retransmission_rate
  • retransmission_per_sec
  • fec_corrected_symbol_counter_lane0
  • fec_corrected_symbol_counter_lane1
  • ......
  • fec_corrected_symbol_counter_lane11
  • port_fec_correctable_block_counter
  • port_fec_uncorrectable_block_counter
  • port_fec_corrected_symbol_counter
  • all

Port Counters Reset

ParameterDescription
--pc

Resets all fabric IB spec compliant port counters (PortCounters and PortCountersExtended), RN, AR and HBF counters.

Note: It is recommended to use this option with –reset_phy_info, as both options have cross counters and using only one of them can be confusing on the next iteration of counters or registers collection.

--pm_clear_allClear all PM counters. activate the following flags:
--scr
--p

Port Counters Delta Validation

ParameterDescriptionExample
--pm_pause_time <seconds>Specifies a delay (in seconds) between counters samples. If set to 0, only single sampling is performed. (default - 1 second)
The delta between the first and the second counter samples will be written to the PM_DELTA section in db_csv file.
ibdiagnet --pm_pause_time 60