ibdiagnet InfiniBand Fabric Diagnostic Tool User Manual v2.21

Validation of SM configuration for HCAs

Ibdiagnet is checking that all HCAs have the same SM configurations for some features.

  • OOOSLMask (ar_sl_mask) (by default)

  • adaptive_timeout_sl_mask (by default)

  • virt_enabled (by default, avoid --skip virt)

  • sl2vl (-r)

  • VL Arbitration (-r)

  • CC: (--congestion_control)

    • CongestionHCAGeneralSettings

    • CongestionHCARPParameters

    • CongestionHCANPParameters

    • CongestionHCAAlgoConfig

    • CongestionHCAConfigParams

Info

This validation will be start automatically by data that received by ibdiagnet

This validation can be skipped with '--skip hca_cfg_check'

Copy
Copied!
            

Command line to get all parameters   ibdiagnet -r --congestion_control

Every field has a separated warning and there is also a new line in the 'Fabric Summary'

ibdiagnet2.log:

Copy
Copied!
            

-W- Post Reports SM Configuration Validations finished with warnings -W- Field 'OOOSLMask' has 2 different values across the fabric [0,1] -W- Field 'AdaptiveTimeoutSLMask' has 2 different values across the fabric [2,3] -W- Field 'SL2VL_0' has 3 different values across the fabric [2,3,4] -W- Field 'SL2VL_1' has 2 different values across the fabric [2,3] -W- Field 'SL2VL_2' has 4 different values across the fabric [2,3,4,5] -I- All other warnings can be found in ibdiagnet2.db_csv

ibdiagnet2.db_csv:

Copy
Copied!
            

Scope,NodeGUID,PortGUID,PortNumber,EventName,Summary CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'OOOSLMask' has 2 different values across the fabric [0,1]" CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'AdaptiveTimeoutSLMask' has 2 different values across the fabric [2,3]" CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'SL2VL_0' has 3 different values across the fabric [2,3,4]" CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'SL2VL_1' has 2 different values across the fabric [2,3]" CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'SL2VL_2' has 4 different values across the fabric [2,3,4,5]" ...

Fabric Summary:

Copy
Copied!
            

Post Reports SM Configuration Validations: 82 fields have different value across the fabric.

Post Reports SM Configuration Validations: 82 fields have different value across the fabric.

© Copyright 2025, NVIDIA. Last updated on Feb 10, 2025.