ibdiagnet InfiniBand Fabric Diagnostic Tool User Manual v2.9.0

Useful Options

Parameter

Description

--screen_num_errs <num>

Specifies the maximal number error/warning messages logged to the screen (default=5). If the number of errors/warnings is higher than a <num> value. Additional error/warning messages will be logged to the ibdiagne2.log file.

Example:

Copy
Copied!
            

ibdiagnet

Output (default):

Copy
Copied!
            

Nodes Information -I- Devid: 4099(0x1003), PSID: MT_1090120019, Latest FW Version:2.42.5000 -I- Devid: 4103(0x1007), PSID: MT_1090111019, Latest FW Version:2.42.5000 -I- Devid: 4115(0x1013), PSID: MT_2190110032, Latest FW Version:12.100.5600 -I- Devid: 4119(0x1017), PSID: MT_0000000008, Latest FW Version:16.18.160 -I- Devid: 51000(0xc738), PSID: MT_1270110020, Latest FW Version:9.3.1700 -I- Devid: 52000(0xcb20), PSID: MT_1880110032, Latest FW Version:11.1100.26 -E- FW Check finished with errors -W- r-ufm118/U1 - Node with Devid:4115(0x1013),PSID:MT_2190110032 has FW version 12.27.6008 while the latest FW version for the same Devid/PSID on this fabric is 12.100.5600 -W- r-ufm112/U2 - Node with Devid:4115(0x1013),PSID:MT_2190110032 has FW version 12.26.4012 while the latest FW version for the same Devid/PSID on this fabric is 12.100.5600 -W- r-ufm218/U1 - Node with Devid:4115(0x1013),PSID:MT_2190110032 has FW version 12.26.4000 while the latest FW version for the same Devid/PSID on this fabric is 12.100.5600 -W- r-ufm216/U2 - Node with Devid:4115(0x1013),PSID:MT_2190110032 has FW version 12.26.4000 while the latest FW version for the same Devid/PSID on this fabric is 12.100.5600 -E- r-ufm101/U2 - The firmware of this device returned invalid general info data

Example:

Copy
Copied!
            

ibdiagnet --screen_num_errs 3

Output ((--screen_num_errs 3):

Copy
Copied!
            

Nodes Information -I- Devid: 4099(0x1003), PSID: MT_1090120019, Latest FW Version:2.42.5000 -I- Devid: 4103(0x1007), PSID: MT_1090111019, Latest FW Version:2.42.5000 -I- Devid: 4115(0x1013), PSID: MT_2190110032, Latest FW Version:12.100.5600 -I- Devid: 4119(0x1017), PSID: MT_0000000008, Latest FW Version:16.18.160 -I- Devid: 51000(0xc738), PSID: MT_1270110020, Latest FW Version:9.3.1700 -I- Devid: 52000(0xcb20), PSID: MT_1880110032, Latest FW Version:11.1100.26 -E- FW Check finished with errors -I- Errors/Warnings list will be reported in log file

The following ibdiagnet option can be used to provide meaningful names for unmanaged switches in ibdiagnet log and dump files. Same file can be used in opensm and infiniband-diags utilities such as ibnetdiscover.

Parameter

Description

-m|--map <map-file>

Specifies the mapping file that maps unmanaged switch node GUID to the name. The format of the content of file should be as follows: 0x[0-9a-fA-F]+ "name" e.g 0x123456 "Switch 1"
The file path can be specified via environment variable "IBUTILS_NODE_NAME_MAP_FILE_PATH".

The following ibdiagnet options allow counters and diagnostics fetching only from subset of nodes/switches in the fabric.

Parameter

Description

--scope <file>

The file with a list of Node-GUIDs and ports belonging to the scope.

--exclude_scope <file>

The file with a list of Node-GUIDs and ports which the counters fetching, and diagnostics should not be applied to.

Warning

The ibdiagnet2.ibnetdiscover file will not be generated if any of the options is provided.

File format:

Scope file format includes the version and the list of nodes to include in the scope, according to the following syntax:

  • version:<format version number> - Scope file format version, must be first line of the file. Supported version 1.0

  • Comment lines start with #.

  • Nodes line of the following formats:

    • <Node GUID> - Includes node with specified node GUID with all its ports.

    • <Node GUID>@port1/port2/... - Includes only the specified ports of specified node.
      Note: When using exclude scope option, only the specified ports of the node will be excluded.

    • ALL_SWITCHES - Includes all switches with all ports in the scope.

    • ALL_CAS - Includes all HCAs in the scope.

Examples:

  • Defining a scope for nodes with Node GUIDs 0x10001, 0x10002, 0x10003 with all their ports:

    Copy
    Copied!
                

    version: 1.0 0x10001 0x10002 0x10003

  • Defining a scope for ports 1,2,17 of node with Node GUID 0x10002:

    Copy
    Copied!
                

    version: 1.0 0x10002@1/2/17

  • Defining a scope for all switches (with all their ports):

    Copy
    Copied!
                

    version: 1.0 ALL_SWITCHES

  • Defining a scope for all CAs:

    Copy
    Copied!
                

    version: 1.0 ALL_CAS

  • Define scope with all the following nodes:

    • node with Node GUIDs 0x10001

    • port 1,2,17 of node with Node GUID 0x10002

    • All CAs

Copy
Copied!
            

version: 1.0 0x10001 0x10002@1/2/17 ALL_CAS

Warning

Scope feature is not applicable for routing validation stages!

Some data collection/diagnostic can be skipped in order to speed up ibdiagnet reporting. For instance, when only routing validation is required, no need to perform port counters fetching and checks.

Parameter

Description

--skip <stage>

Skips the executions of particular diagnostic stages.

The following stages can be skipped:

Parameter

Description

dup_guids

Duplicated GUIDs check

dup_node_desc

Duplicated node description check

lids

Valid LID assignment check

sm

Subnet Manager checks

nodes_info

Fetching vendor specific data from nodes

pkey

Partitions fetch and validation

vs_cap_smp

Collecting Vendor specific data with SMP MADs

vs_cap_gmp

Collecting Vendor specific data with GMP MADs

links

Fetching links data

pm

Fetching and checking port counters

speed_width_check

Link speed and link width checks

temp_sensing

Fetching temperature sense

virt

Virtualization stage

all

Skip all above stages

If the Virtualization stage is skipped, the ibdiagnet2.ibnetdiscover file will not contain virtual ports information.

Example:

Copy
Copied!
            

ibdiagnet -r --r_opt=vs,sl=2 --skip pm, pkey, links, temp_sensing,speed_width_check,nodes_info,sm,dup_guids,dup_node_desc,vs_cap_gmp,lids

Parameter

Description

--vlr <file>

This option provides opensm-path-records.dump file that includes source-to-destination to SL mapping. This file is generated by dump_pr Subnet manager plugin. ibdiagnet will use this mapping for MADs sending on correct SL.

Parameter

Description

--back_compat_db <ver>

Indicates the old version of PORTS section in CSV file for backward compatibility.
If the given version is less than 2.0 (also not 0), the following fields will not be dumped in the CSV:

  • CapMsk2,FECActv,RetransActv

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.