Bit Error Rate (BER)

The Bit Error Rate (BER) is the number of bit errors per unit time divided by the total number of transferred bits during a studied time interval. BER is a unitless performance measure, often expressed as a percentage.

Parameter

Description

Notes

--get_phy_info

Collects BER information for fabric ports and checks BER validating with specific thresholds. Errors will be reported to the ibdiagnet2.log and ibdiagnet2.db_csv files.

Applicable to all EDR/HDR and future InfiniBand devices.

--ber_test

Deprecated. Provides a BER test for each port. Calculate BER for each port and check no BER value has exceeded the BER threshold. (default threshold="10^-12").

This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices.

--ber_thresh <value>

Deprecated. Specifies the threshold value for the BER test. The reciprocal number of the BER should be provided.

For example, the value of 10^-12 should be 1000000000000 or 0xe8d4a51000 (10^12).

If the given threshold is 0, then all BER values for all ports will be reported.

This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices.

--llr_active_cell <64|128>

Deprecated. Specifies the Link Level Retransmission (LLR) active cell size for BER test, when LLR is active in the fabric.

This option is available only when using SwitchX/ConnectX-4 and ConnectX-3 devices

Example:

Copy
Copied!
            

ibdiagnet --get_phy_info

For NDR/HDR/EDR links, symbol errors (NDR/HDR) or effective errors (EDR) are the actual errors seen by the application level after error correction.

The below methodology is recommended as a first step if fabric performance is degraded.

  1. Make sure the significant traffic is running in the fabric

  2. ibdiagnet --pc --reset_phy_info -i <mlx_dev>

  3. Wait for some time (5-10 minutes)

  4. ibdiagnet --get_phy_info -i <mlx_dev>

  5. Review ibdiagnet2.log

  6. Contact Support if Symbol/Effective BER Check finished with errors.

For detailed description of cmd line parameters, see previous chapter “Bit Error Rate”

BER check log file fragment:

Copy
Copied!
            

-E- Symbol BER Check finished with errors -E- H-10/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 -E- H-14/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 -E- H-3/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_544_514_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 -E- H-7/U1/P1 - BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_271_257_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 -E- SW-1-0/U1/P4 - BER exceeds threshold - BER type: Symbol BER, FEC mode: RS_FEC_544_514, BER value = 1.500000e+01 / threshold = 5.000000e-12 -E- SW-1-0/U1/P5 - BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12   --------------------------------------------- Fabric Summary   Total Nodes : 24 IB Switches : 8 IB Channel Adapters : 16 IB Aggregation Nodes : 0 IB Routers : 0   Total number of links : 32 Links at 4x10 : 32   High BER reported by 6 ports

BER check error section in db_csv file:

Copy
Copied!
            

START_ERRORS_SYMBOL_BER_CHECK Scope,NodeGUID,PortGUID,PortNumber,EventName,Summary PORT,0x0002c90000000005,0x0002c90000000006,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 " PORT,0x0002c90000000015,0x0002c90000000016,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 " PORT,0x0002c90000000025,0x0002c90000000026,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_544_514_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 " PORT,0x0002c90000000035,0x0002c90000000036,1,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: MLNX_RS_271_257_PLR, BER value = 1.500000e+01 / threshold = 5.000000e-12 " PORT,0x0002c90000000049,0x0002c90000000049,4,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: RS_FEC_544_514, BER value = 1.500000e+01 / threshold = 5.000000e-12 " PORT,0x0002c90000000049,0x0002c90000000049,5,BER_EXCEEDS_THRESHOLD,"BER exceeds threshold - BER type: Symbol BER, FEC mode: STD-LL-RS, BER value = 1.500000e+01 / threshold = 5.000000e-12 " END_ERRORS_SYMBOL_BER_CHECK

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.