PDR Deterministic Plugin
Overview
The PDR Deterministic plugin is a Docker container that is managed by the UFM and is designed to manage port isolation instead of the UFM automatic isolation. In order to perform port isolation, the PDR plugin utilizes an algorithm that depends on telemetry data provided by UFM Telemetry and monitors packet drop rate (PDR), BER counter values, and cable temperature. Additionally, the plugin can operate in a "dry run" mode, which enables writing to the log without initiating port isolation.
Install UFM with the latest software version.
Run:
/etc/init.d/ufmd start
To get PDR plugin image, please contact the NVIDIA Support team. After that, load the plugin using this command:
When working with UFM in HA mode, load the plugin on the standby node.
ufmapl [ mgmt-sa ] (config) # docker load ufm-plugin-pdr-determinitic.tar
Run the following command. Add -p pdr-determinitic to enable the plugin:
/opt/ufm/scripts/manage_ufm_plugins.sh add -p pdr-determinitic
Ensure that the plugin is up and running. Run: /opt/ufm/scripts/manage_ufm_plugins.sh show
The following table lists the default configuration when running the plugin. These configurations can be changed via the pdr_deterministic.conf file.
Value |
Default Value |
Description |
T_ISOLATE |
300 |
Interval for requesting telemetry counters in seconds |
MAX_NUM_ISOLATE |
10 |
Maximum number of ports to be isolated. Max(10,0.5% * fabric_size) |
TMAX |
70 |
The maximal nominal operating temperature for fabric devices and cables (minimum of the two) |
D_TMAX |
10 |
The maximum allowed temperature change within T_ISOLATE interval. Value is in Celsius. |
MAX_PDR |
1e-12 |
The maximum allowed Packet Drop Rate. |
CONFIGURED_BER_CHECK |
True |
Indicates whether to check BER counters thresholds |
DRY_RUN |
False |
Isolation decisions are only logged and will not take affect |
DEISOLATE_CONSIDER_TIME |
5 |
Consideration time for port de-isolation (in minutes) |
AUTOMATIC_DEISOLATE |
True |
automatically performs de-isolation, even if a port is not set as "treated" |
DO_DEISOLATION |
True |
If set to false, the plugin does not perform de-isolation |
BER thresholds will be taken from the Field_BER_Thresholds.csv file.
The plugin’s purpose is to isolate malfunctioning ports using the isolation API from the UFM. A port is set as isolated if the values of its counter pass the thresholds of its cable temperature, effective BER, symbol BER, raw BER, or packet drop rate. A port can be de-isolated if its values are back to normal for 5 minutes (configurable).
The primary objective of the plugin is to utilize the isolation API provided by the UFM to isolate malfunctioning ports. A port is set as "isolated" when the values of its counter surpass the predetermined thresholds for parameters such as temperature, effective BER, symbol BER, raw BER, or packet drop rate.
For calculating BER counters, the plugin extracts the maximum window it needs to wait for calculating the BER value, using the following formula:
Example:
Rate |
BER Target |
Minimum Bits |
Minimum Time in Seconds |
In min |
|
HDR |
2.00E+11 |
1.00E-12 |
1.00E+12 |
5 |
0.083333 |
HDR |
2.00E+11 |
1.00E-13 |
1.00E+13 |
50 |
0.833333 |
HDR |
2.00E+11 |
1.00E-14 |
1.00E+14 |
500 |
8.333333 |
HDR |
2.00E+11 |
1.00E-16 |
1.00E+16 |
50000 |
833.3333 |
BER counters are calculated with the following formula:
The following telemetry counters are used:
Symbol: phy_symbol_errors_high/low
Effective: phy_effective_errors_high/low
raw: sum(phy_raw_errors_lane<i>_high/low)
Data is kept in memory and is saved for the largest window period.
The plugin can simulates port isolation without actually executing it for the purpose of analyzing the algorithm's performance and decision-making process in order to make future adjustments. This behavior is achieved through the implementation of a "dry_run" flag that changes the plugin's behavior to solely record its port "isolation" decisions in the log, rather than invoking the port isolation API. All decisions will be recorded in the plugin's log.