What can I help you with?
ibdiagnet InfiniBand Fabric Diagnostic Tool User Manual v2.19

Validation of SM configuration for HCAs

Ibdiagnet is checking that all HCAs have the same SM configurations for some features.

  • OOOSLMask (ar_sl_mask) (by default)

  • adaptive_timeout_sl_mask (by default)

  • virt_enabled (by default, avoid --skip virt)

  • sl2vl (-r)

  • VL Arbitration (-r)

  • CC: (--congestion_control)

    • CongestionHCAGeneralSettings

    • CongestionHCARPParameters

    • CongestionHCANPParameters

    • CongestionHCAAlgoConfig

    • CongestionHCAConfigParams

Info

This validation will be start automatically by data that received by ibdiagnet

This validation can be skipped with '--skip hca_cfg_check'

Copy
Copied!
            

Command line to get all parameters   ibdiagnet -r --congestion_control

Every field has a separated warning and there is also a new line in the 'Fabric Summary'

ibdiagnet2.log:

Copy
Copied!
            

-W- Post Reports SM Configuration Validations finished with warnings -W- Field 'OOOSLMask' has 2 different values across the fabric [0,1] -W- Field 'AdaptiveTimeoutSLMask' has 2 different values across the fabric [2,3] -W- Field 'SL2VL_0' has 3 different values across the fabric [2,3,4] -W- Field 'SL2VL_1' has 2 different values across the fabric [2,3] -W- Field 'SL2VL_2' has 4 different values across the fabric [2,3,4,5] -I- All other warnings can be found in ibdiagnet2.db_csv

ibdiagnet2.db_csv:

Copy
Copied!
            

Scope,NodeGUID,PortGUID,PortNumber,EventName,Summary CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'OOOSLMask' has 2 different values across the fabric [0,1]" CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'AdaptiveTimeoutSLMask' has 2 different values across the fabric [2,3]" CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'SL2VL_0' has 3 different values across the fabric [2,3,4]" CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'SL2VL_1' has 2 different values across the fabric [2,3]" CLUSTER,0x00,0x00,0x00,DIFFERENT_VALUE_BY_SM_CONFIGURATION,"Field 'SL2VL_2' has 4 different values across the fabric [2,3,4,5]" ...

Fabric Summary:

Copy
Copied!
            

Post Reports SM Configuration Validations: 82 fields have different value across the fabric.

Post Reports SM Configuration Validations: 82 fields have different value across the fabric.

© Copyright 2024, NVIDIA. Last updated on Nov 13, 2024.