ibdiagnet InfiniBand Fabric Diagnostic Tool User Manual v2.8.0

Routing Validation

The following options should be used to enable Static/Adaptive and Multicast routing validation in the InfiniBand fabric, potential credit-loops detection, and Adaptive Routing configuration validation. In some cases, routing validation options should be specified to perform additional routing diagnostics.

Parameter

Description

-r | -routing

ibdiagnet performs unicast (Static and Adaptive) and Multicast Routing validation, calculates and reports:

  • The number of CA pairs that are in each number of hops distance

  • Number of actual paths going through each switch out port considering all the CA to CA paths

  • Number of actual Destination LIDs going through each switch out port considering all the CA to CA paths

  • Scanning multicast routing tables for loops and connectivity

  • Applies credit-loop detection algorithm

  • Applies adaptive routing configuration validation - checking AR LFTs against up-down min-hop tables (if -smdb option is used).

Switch routing tables are dumped to the following files:

VL2VL configuration: /var/tmp/ibdiagnet2/ibdiagnet2.vl2vl

PLFT dump: /var/tmp/ibdiagnet2/ibdiagnet2.plft

AR/SHIELD tables dump: /var/tmp/ibdiagnet2/ibdiagnet2.far

Unicast tables dump: /var/tmp/ibdiagnet2/ibdiagnet2.fdbs

Multicast tables dump: /var/tmp/ibdiagnet2/ibdiagnet2.mcfdbs

SLVL Table dump: /var/tmp/ibdiagnet2/ibdiagnet2.slvl

Example:

Copy
Copied!
            

ibdiagnet -r

Output:

Copy
Copied!
            

################################### -I- Fabric Qualities Report: ################################### -I- Verifying all CA to CA paths ... ---------------------- CA to CA : LFT ROUTE HOP HISTOGRAM ----------------- The number of CA pairs that are in each number of hops distance. This data is based on the result of the routing algorithm.   HOPS NUM-CA-CA-PAIRS 2 24 3 30 4 78 5 22 6 56 ---------------------------------------------------------------------------   ---------- LFT CA to CA : SWITCH OUT PORT - NUM PATHS HISTOGRAM ----------- Number of actual paths going through each switch out port considering all the CA to CA paths. Ports driving CAs are ignored (as they must have = Nca - 1). If the fabric is routed correctly the histogram should be narrow for all ports on same level of the tree.   NUM-PATHS NUM-SWITCH-PORTS 0 21 1 4 2 8 3 6 4 1 5 6 6 9 7 6 8 12 9 2 10 3 11 6 12 7 14 1 ---------------------------------------------------------------------------   ---------- LFT CA to CA : SWITCH OUT PORT - NUM DLIDS HISTOGRAM ----------- Number of actual Destination LIDs going through each switch out port considering all the CA to CA paths. Ports driving CAs are ignored (as they must have = Nca - 1). If the fabric is routed correctly the histogram should be narrow for all ports on same level of the tree. A detailed report is provided in /tmp/ibdmchk.sw_out_port_num_dlids.   NUM-DLIDS NUM-SWITCH-PORTS 0 21 1 37 2 34 ---------------------------------------------------------------------------   -I- Scanned:210 CA to CA paths ---------------------------------------------------------------------------   -I- Scanning all multicast groups for loops and connectivity... -I- Multicast Group:0xC000 has:7 switches and:9 FullMember ports -I- Multicast Group:0xC001 has:7 switches and:9 FullMember ports -I- Multicast Group:0xC002 has:7 switches and:9 FullMember ports -I- Multicast Group:0xC003 has:7 switches and:8 FullMember ports -I- Multicast Group:0xC004 has:6 switches and:3 FullMember ports ---------------------------------------------------------------------------     ################################### -I- Credit Loops Report: ################################### -I- Analyzing Fabric for Credit Loops 1 SLs, 1 VLs used. -I- Traced 186 unicast paths -I- no credit loops found

The following options can be used when the "-r" option is invoked.

Parameter

Description

--r_opt

List of comma-separated options:

  • vs: (Enabled by default) Adaptive routing validation option.

  • far: (Enabled by default) Dump AR/SHIELD tables data to the ibdiagnet2.far file.

  • skip_vs: Skips collecting and checking vendor specific routing settings like AR/SHIELD.

  • skip_far: Skips dumping full AR/SHIELD tables to the file.

  • rn: (Enabled by default) Required for dumping SHIELD remote notification configuration to the ibdiagnet2.rn file.

  • crnc: Required for clearing SHIELD counters.

  • drnc: (Enabled by default) Required for dumping SHIELD counters to the ibdiagnet2.rnc file.

  • mcast: Credit-loop detection algorithm will take into account multicast routing as well (It is recommended to use this option with --sa_dump option).

  • sl=<sl_num>: SL to be used for adaptive routing connectivity and credit loop check (default 0).

  • check_sl: Checks all SL2VL tables.

  • dump_only: Dumps routing configuration to the files and skips routing checks.

  • dump_only_skip_routing_tables: Dump routing data and skip routing tables (LFTs) retrieving.

  • static_ca2ca: Runs also static CA to CA routing check even if AR enabled.

--sa_dump <file>

Use Subnet Manager SMDB file for routing checks. If specified, Adaptive Routing validation is done during routing validation stage (if -r option selected)

--smdb <file>

Load Routing Engine and Ranks from SMDB file. Used for AR validation in routing stage (if -r option selected).

--vlr <file>

This option provides opensm-path-records.dump file that includes source-to-destination to SL mapping. This file is generated by dump_pr Subnet manager plugin. ibdiagnet will use this mapping for credit loop check. This option is mainly applicable in 3D-Torus topologies.

Example:

Copy
Copied!
            

ibdiagnet -r --r_opt=vs,sl=2 --skip pm,pkey,links,temp_sensing,speed_width_check,nodes_info,sm,dup_guids,dup_node_desc,vs_cap_gmp,lids

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.