ibdiagnet InfiniBand Fabric Diagnostic Tool User Manual v2.8.0

Basic Commands

Command

Description

--aguid

Run Alias GUID stage.

--am_key <am_key>

Specifies constant SHARP am_key for the fabric.

--am_key_file <path_to_am_key_file>

Specifies the path to the SHARP am_key_file: guid2am_key.

--back_compat_db <version.sub_version>

Shows ports section in "ibdiagnet2.db_csv" according to given version. Default version 2.0. (0 - latest version)

--ber_test

Provides a BER test for each port. Calculate BER for each port and check no BER value has exceeded the BER threshold. (default threshold="10^-12").

This option applies for SwitchX/ConnectX-4/ConnectX-3 devices only. For later devices use --get_phy_info for BER validation.

--ber_thresh <value>

Specifies the threshold value for the BER test. The reciprocal number of the BER should be provided. Example: for 10^-12 then value needs to be 1000000000000 or 0xe8d4a51000 (10^12).

If threshold given is 0 then all BER values for all ports will be reported. This option applies for SwitchX/ConnectX-4/ConnectX-3 devices only

-c | --create_config_file <config-file>

Creates template configuration file.

--clear_congestion_counters

Displays Congestion Counters and clear them - this option also activate congestion_control option.

--config_file <config-file>

Configuration file.

--congestion_control

Displays Congestion Control info.

--congestion_counters

Displays Congestion Counters - this option also activate congestion_control option.

--dbg_levels

Verbosity levels to be applied on the debug log file.

Possible values are:

  • 0x01 - Error

  • 0x02 - Info

  • 0x04 - MAD

  • 0x08 - Discover

  • 0x10 - Debug

  • 0x20 - Funcs

  • 0x80 - Sys

  • 0xff - ALL

--dbg_modules

Comma separated Module's names to be added to the debug log file.

Possible values are:

  • IBIS, IBDIAG, IBDM, IBDIAGNET, ALL

--dfp

Provides a report of the fabric Dragonfly+ analysis.

--dfp_opt <max_cas=num>

Coma separated Dragonfly+ options (if --dfp option selected):

  • max_cas: maximal number of CAs on a switch to be counted as Dragonfly+ spine. This parameter is mutually exclusive with --smdb

--enable_output <files types list | csv section name>

Enables output for files and csv sections.

  • CSV section should have prefix 'csv:'

    • Examples of csv sections see in '.db_csv' file

  • Examples type of files (by file extensions):

    • lst|sm|pm|nodes_info|fdbs|mcfdbs|debug|pkey|aguid|slvl|vl2vl|plft|ar|far|rn|rnc|mlnx_cntrs|net_dump|vports|vports_pkey|sharp|
      sharp_an_info|sharp_pm|cables|port_attr|net_dump_ext|db_csv

  • Specific reserved types:

    • <default|csv:default> : Will disabled by default for types wasn't set.

    • <all|csv:all> : Will disabled for all, ignore any specified value for file or csv section.

--discovery_only

Dumps only db_csv output file with discovery

--disable_output <files types list|csv section name>

Disables output for files and csv sections.

  • CSV section should have prefix 'csv:'

    • Examples of csv sections see in '.db_csv' file

  • Examples type of files (by file extensions):

    • lst|sm|pm|nodes_info|fdbs|mcfdbs|debug|pkey|aguid|slvl|vl2vl|plft|ar|far|rn|rnc|mlnx_cntrs|net_dump|vports|vports_pkey|sharp|
      sharp_an_info|sharp_pm|cables|port_attr|net_dump_ext|db_csv

  • Specific reserved types:

    • <default|csv:default> : Will enabled by default for types wasn't set.

    • <all|csv:all> : Will enabled for all, ignore any specified value for file or csv section.

--enable_spst

Skips switch down ports while discover the fabric - use Switch Port State Table of the switch (enabled by default) - Deprecated

--enable_switch_dup_guid

Enables duplicated switch GUIDs detection while discover the fabric

--exclude_scope <file.guid>

The file with a list of Node-GUIDs and their ports to be excluded from the scope.

The ibdiagnet2.ibnetdiscover file will not be generated.

--extended_speeds <dev-type>

Collects and tests port extended speeds counters.

dev-type:

  • sw | all | none

-f | --load_from_file <path to ibdiagnet2.db_csv file>

Loads ibdiagnet.db_csv from external file. Use this option to skip discovery stage.

--fec_mode

Dumps FEC mode section in the CSV file

--ft

Provides a report of the fabric Fat Tree analysis.

-g | --guid <GUID in hex>

Specifies the local port GUID value of the port used to connect to the IB fabric. If GUID given is 0 then ibdiagnet displays a list of possible port GUIDs and waits for user input.

--gmp_window <num>

Max gmp MADs on wire. (default=128).

-H | --deep_help

Deprecated - same as -h|--help.

-h | --help

Prints help information (including plugins help if exists).

-i | --device <dev-name>

Specifies the name of the device of the port used to connect to the IB fabric (in case of multiple devices on the local system).

--llr_active_cell <0|64|128>

Specifies the LLR active cell size for BER test, when LLR is active in the fabric. (0 - not specified). This option applies for SwitchX/ConnectX-4/ConnectX-3 devices only

--ls <0|2.5|5|10|14|25|50|100|FDR10>

Specifies the expected link speed. (0 - disable expected link speed)

--lw <0|1x|2x|4x|8x|12x>

Specifies the expected link width. (0 - disable expected link width)

-m | --map <map-file>

Specifies mapping file, that maps node guid to name (format: 0x[0-9a-fA-F]+ "name"). Mapping file can also be specified by environment variable "IBUTILS_NODE_NAME_MAP_FILE_PATH".

--m_key <m_key>

Specifies constant m_key for the fabric.

--m_key_files <path to m_key_files directory>

Specifies the path to the directory with the key files (guid2lid, guid2mkey, neighbors, guid2cckey, guid2vskey).

--mads_retries <mads-retries>

Specifies the number of retries for every timeout mad. (default=2).

--mads_timeout <mads-timeout>

Specifies the timeout (in milliseconds) for sent and received mads. (default=500).

--max_hops <max-hops>

Specifies the maximum hops for the discovery process. (default=64).

-o | --output_path <directory>

Specifies the directory where the output files will be placed.

--out_ibnl_dir <directory>

The topology file custom system definitions (ibnl) directory.

-P | --counter <<PM>=<value>>

If any of the provided PM is greater than its provided value, then print it.

-p | --port <port-num>

Specifies the local device's port number used to connect to the IB fabric.

--path <files types list>=<path>

Sets custom path for specific files.

  • Specific reserved types:

  • <default> : Will set path by default for types wasn't set.

<all> : Will set path for all, ignore any specified value for file or csv section.

--pc

Resets all fabric IB spec compliant port counters (PortCounters and PortCountersExtended).

--per_slvl_cntrs

Provides a report of all per sl/vl port counters

--pm_pause_time <seconds>

Specifies the seconds to wait between first counters sample and second counters sample. If seconds given is 0 then no second counters sample will be done. (default=1).

--pm_per_lane

Lists all counters per lane (when available).

--qos

Displays qos config sl.

-r | --routing

Provides a report of the fabric qualities.

--r_opt

Comma separated routing options: (if -r option is selected)

  • vs: Collect and check vendor specific routing settings like AR and PLFT. (enabled by default)

  • far: Dump full ar tables data to file. (enabled by default)

  • skip_vs: Skip collect and check vendor specific routing settings like AR and PLFT.

  • skip_far: Skip dump full ar tables data to file.

  • rn: (Deprecated! - enabled by default) Dump routing notification data to file.

  • drnc: (Deprecated! - enabled by default) Dump routing notification port counters to file.

  • crnc: Clear roting notification port counters.

  • sl=<sl_num>: SL number to be used for ar connectivity and credit loop check.

  • check_sl: Check all SL2VL tables. SL should not be mapped to VL15.

  • mcast: Multicast credit loop check. It is recommended to use this option with sa_dump.

  • dump_only: Dump routing configuration files and skip routing checks.

  • dump_only_skip_routing_tables: Dump routing data and skip routing tables (LFTs) retrieving.

  • static_ca2ca: Run also static CA to CA routing check even if AR enabled.

--rail_validation

Checks topology being rail optimized (default - disabled).

--rail_validation_opt <regex='regular expression'>

Comma separated Rail Optimized Validation options (if --rail_validation option selected):

  • regex: regular expression to filter HCA nodes from reports. To be applied to HCAs node descriptions.

--read_capability <file name>

Specifies capability masks configuration file, giving capability mask configuration for the fabric. ibdiagnet will use this mapping for Vendor Specific MADs sending.

--routers

Discovers routers' tables.

--sa_dump <file>

Specifies opensm-sa.dump file path, multicast groups definition generated by SM. used for mcast credit loop check (if -r option selected and r_opt=mcast).

--sc

Provides a report of Mellanox counters

--scope <file.guid>

The file with a list of Node-GUIDs and their ports to be left in the scope.

The ibdiagnet2.ibnetdiscover file will not be generated.

--scr

Resets all the Mellanox counters (if -sc option selected).

--screen_num_errs <num>

Specifies the threshold for printing errors to screen. (default=5).

--sharp

Collects SHARP configuration. Check and dump to file.

--sharp_control_version < 0|1|2 >

Checks and dumps only SHARP nodes with the specified version (default 0 - all nodes).

--sharp_opt <[csc][dsc][dscp]>

Comma separated sharp options: (if --sharp option selected)

  • csc: Clear sharp counters.

  • dsc: Dump sharp performance counters to db_csv file. This option is for debug

  • dscp: Dump sharp HBA performance counters per port to db_csv file. This option is for debug

--skip <stage>

Skips the executions of the given stage.

Applicable skip stages:

  • dup_guids | dup_node_desc | lids | sm | nodes_info | pkey | aguid | vs_cap_smp | vs_cap_gmp | links | pm | speed_width_check | temp_sensing | virt | all.

--skip_plugin <library name>

Skip the load of the given library name.

Applicable skip plugins:

  • libibdiagnet_cable_diag_plugin-2.1.1

  • libibdiagnet_phy_diag_plugin-2.1.1.

--sl <sl>

Specifies the SL to be used for QP1 MADs. (default=0).

--smdb <path to SMDB file>

Loads Routing Engine and Ranks from the User Subnet Manager SMDB file. Used for Adaptive Routing validation (if -r option selected) and Dragonfly+ validation (if --dfp option selected).

--smp_window <num>

Max smp MADs on wire. (default=8).

-t | --topo_file <file>

Specifies the topology file name.

-V | --version

Prints the version of the tool.

--vlr <file>

Specifies opensm-path-records.dump file path, src-dst to SL mapping generated by SM plugin. ibdiagnet will use this mapping for MADs sending and credit loop check (if -r option is selected).

-w | --write_topo_file <file name>

Writes out a topology file for the discovered topology.

--write_capability <file name>

Writes out an example file for capability masks configuration, and also the default capability masks for some devices.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.