Bit Error Rate (BER)
The Bit Error Rate (BER) is the number of bit errors per unit time divided by the total number of transferred bits during a studied time interval. BER is a unitless performance measure, often expressed as a percentage.
| Parameter | Description | Notes | 
| --get_phy_info | Collects BER information for fabric ports and checks BER validating with specific thresholds. Errors will be reported to the ibdiagnet2.log and ibdiagnet2.db_csv files. | Applicable to all EDR/HDR and future InfiniBand devices. | 
| --ber_test | Deprecated. Provides a BER test for each port. Calculate BER for each port and check no BER value has exceeded the BER threshold. (default threshold="10^-12"). | This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices. | 
| --ber_thresh <value> | Deprecated. Specifies the threshold value for the BER test. The reciprocal number of the BER should be provided. For example, the value of 10^-12 should be 1000000000000 or 0xe8d4a51000 (10^12). If the given threshold is 0, then all BER values for all ports will be reported. | This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices. | 
| --llr_active_cell <64|128> | Deprecated. Specifies the Link Level Retransmission (LLR) active cell size for BER test, when LLR is active in the fabric. | This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices | 
Example:
            
            ibdiagnet --get_phy_info
    For NDR/HDR/EDR links, symbol errors (NDR/HDR) or effective errors (EDR) are the actual errors seen by the application level after error correction.
The below methodology is recommended as a first step if fabric performance is degraded.
- Make sure the significant traffic is running in the fabric 
- ibdiagnet --pc --reset_phy_info -i <mlx_dev> 
- Wait for some time (5-10 minutes) 
- ibdiagnet --get_phy_info -i <mlx_dev> 
- Review ibdiagnet2.log 
- Contact Support if Symbol/Effective BER Check finished with errors. 
For detailed description of cmd line parameters, see previous chapter “Bit Error Rate”
BER check log file fragment:
            
            -E- Symbol BER Check finished with errors 
-E- H-10/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- H-14/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- H-3/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_544_514_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- H-7/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_271_257_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- SW-1-0/U1/P4 - BER exceeds threshold - BER type: Symbol BER, FEC mode: RS_FEC_544_514, BER value = 1.500000e+01 / threshold = 5.000000e-12 
-E- SW-1-0/U1/P5 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 
 
--------------------------------------------- 
Fabric Summary 
 
Total Nodes             : 24 
IB Switches             : 8 
IB Channel Adapters     : 16 
IB Aggregation Nodes    : 0 
IB Routers              : 0 
 
Total number of links   : 32 
Links at 4x10           : 32 
 
High BER reported by 6 ports
    
BER check error section in db_csv file:
            
            START_ERRORS_SYMBOL_BER_CHECK
Scope,NodeGUID,PortGUID,PortNumber,EventName,Summary
PORT,0x0002c90000000005,0x0002c90000000006,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000015,0x0002c90000000016,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000025,0x0002c90000000026,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_544_514_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000035,0x0002c90000000036,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_271_257_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000049,0x0002c90000000049,4,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: RS_FEC_544_514, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
PORT,0x0002c90000000049,0x0002c90000000049,5,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 "
END_ERRORS_SYMBOL_BER_CHECK