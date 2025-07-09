UFM Telemetry
Unified Fabric Manager Telemetry collects over 120 unique counters (BER, Temperature, Histograms, Retransmissions, and many more) for each port in the InfiniBand fabric, enabling the user to predict which cables are marginal and should be replaced during the bring-up process to avoid malfunctions in the future.
The tool collects data samples from all ports over all the cluster and save the data in csv file.
To collect InfiniBand Link Quality metrics, perform the following:
curl http:
//{machine_ip}:9002/csv/xcset/low_freq_debug >> my_telemetry_file.csv
Example:
The following table lists the link monitoring key indicators and provides their descriptions and evaluation criteria.
Parameter
Description
Evaluation Criteria
Link State
Phy_state
Physical link state
Verify link up ( Enumeration value = 5 )
Link Quality
NDR Link Quality
Link Quality criteria depend on error correction scheme type.
DAC - directly attach copper
ACC - active copper cable
AOC - active optical cable
Note: Minimum port up time for BER measurement - 125 minutes.
XDR Link Quality
Link Quality criteria depend on error correction scheme type.
********NOT OFICIAL THRESHOLDS*********
PHY Errors
Link_Down counter
Total number of link down occurred as a result of involuntary link shutdown.
If delta from last sample > 0:
Cable Information
Module_Temperature
Temperature of the transceiver - optic transceiver only
There is an alarm and threshold for each transceiver.
Usually Warning [70c, 0c] and Alarm [80c, -10c]
rx_power_lane_x and tx_power_lane_x
Rx power and Tx power per transceiver lane - optic transceiver only
There is an alarm and threshold for each transceiver.