image image image image image

On This Page

The Bit Error Rate (BER) is the number of bit errors per unit time divided by the total number of transferred bits during a studied time interval. BER is a unitless performance measure, often expressed as a percentage.

ParameterDescriptionNotes
--get_phy_infoCollects BER information for fabric ports and checks BER validating with specific thresholds. Errors will be reported to the ibdiagnet2.log and ibdiagnet2.db_csv files.Applicable to all EDR/HDR and future InfiniBand devices.
--ber_testDeprecated. Provides a BER test for each port. Calculate BER for each port and check no BER value has exceeded the BER threshold. (default threshold="10^-12").This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices.
--ber_thresh <value>

Deprecated. Specifies the threshold value for the BER test. The reciprocal number of the BER should be provided.

For example, the value of 10^-12 should be 1000000000000 or 0xe8d4a51000 (10^12).

If the given threshold is 0, then all BER values for all ports will be reported.

This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices.
--llr_active_cell <64|128>Deprecated. Specifies the Link Level Retransmission (LLR) active cell size for BER test, when LLR is active in the fabric.This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices

Example: 

ibdiagnet --get_phy_info

Fabric Health Validation Example

For NDR/HDR/EDR links, symbol errors (NDR/HDR) or effective errors (EDR) are the actual errors seen by the application level after error correction.

The below methodology is recommended as a first step if fabric performance is degraded. 

  1. Make sure the significant traffic is running in the fabric
  2. ibdiagnet --pc  --reset_phy_info  -i  <mlx_dev>
  3. Wait for some time (5-10 minutes)
  4. ibdiagnet --get_phy_info  -i  <mlx_dev>
  5. Review ibdiagnet2.log
  6. Contact Support if Symbol/Effective BER Check finished with errors.

For detailed description of cmd line parameters, see previous chapter “Bit Error Rate”

BER check log file fragment:

-E- Symbol BER Check finished with errors 
-E- H-10/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- H-14/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- H-3/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_544_514_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- H-7/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_271_257_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- SW-1-0/U1/P4 - BER exceeds threshold - BER type: Symbol BER, FEC mode: RS_FEC_544_514, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- SW-1-0/U1/P5 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 
 
--------------------------------------------- 
Fabric Summary 
 
Total Nodes             : 24 
IB Switches             : 8 
IB Channel Adapters     : 16 
IB Aggregation Nodes    : 0 
IB Routers              : 0 
 
Total number of links   : 32 
Links at 4x10           : 32 
 
High BER reported by 6 ports

BER check error section in db_csv file: 

START_ERRORS_SYMBOL_BER_CHECK
Scope,NodeGUID,PortGUID,PortNumber,EventName,Summary
PORT,0x0002c90000000005,0x0002c90000000006,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000015,0x0002c90000000016,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000025,0x0002c90000000026,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_544_514_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000035,0x0002c90000000036,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_271_257_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000049,0x0002c90000000049,4,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: RS_FEC_544_514, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000049,0x0002c90000000049,5,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
END_ERRORS_SYMBOL_BER_CHECK