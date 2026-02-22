The action execution block processes inputs from the alert generation module, the UFM and the internal port state table in order to determine which links should undergo isolation or de-isolation.

Then, it executes the actions via a UFM API, and finally it updates both on the actions that were and were not taken via messages to the UFM events table.

The action execution flow is comprised of 6 main modules (below image, right side), which can be aggregated into three main steps:

Update of the internal port state DB, based on data collected from the alert generation module and from the UFM. Based on the ports' state as captured by the port state DB, run two decision processes to determine which links should be isolated and which should be de-isolated. Apply the neccessary actions through the UFM.

Once a link is selected for action, it enters a mitigation cycle as outlined below.

Isolation and de-isolation of links is done via updating the UFM Unhealthy Ports file. This file is submitted by the UFM to the SM, triggering the update of the routing tables throughout the fabric.

Notably, even while the link is isolated and no application traffic flows through it, the physical layer remains active hence we are able to keep monitoring the state of the link using the same aforementioned alert engines.

This enables the the decision process to determine if the link has successfully recovered following the mitigation procedure, and if so reinstate it back into the network.

The below diagram depicts the link state mahcine managed by the IBLR plugin.

Application traffic will flow through the link only if it is in Healthy state. In all other states, the link is isolated and traffic is redirected to other parallel links.

The black arrows in the diagram indicate state transitions triggered by the plugin, while the red dotted arrows indicate state transitions triggered outside the plugin (e.g. by the user).

While the main flow of isolating a problematic link, waiting for it to recover and reinstating it is fully operated by the plugin, there are two additional flows that require user intervention:

If a link that was isolated by the plugin has not recovered within a pre-determined period, it will be moved into the unrecoverable state. This comes to suggest that the network operator should inspect the link, attempt to recover it, and if successfull de-isolate the link to signal the link was maintained. If a link was isolated outside the scope of the plugin, the plugin will not de-isolate the link automatically as it has insufficient knowledge with repect to the isolation reason. In that case, de-isolation should also be handled by the user or external system that isolated the link.

The action execution decison process includes a set of configurable checkpoints, that provide the user the ability to determine under what conditions actions will be taken.

These configurable checkpoints can be broadly divided into the following groups:

Isolation and de-isolation rate limit constraints. Topology-dependent constraints that prevent isolations in case this will lead to insufficient redundancy in the fabric. Isolated links' time limits to qualifying for de-isolation or be considered unrecoverable.

The below tables describe the main configuration parameters used by the decision process checkpoints.

Name Type Description max_per_hour/day/week/month int The maximum number of distinct links allowed to be isolated within a specific time window. min_links_per_switch_pair int The minimum number of healthy links between two switches, when reached no further isolations will be allowed. min_active_ports_per_switch int The minimum number of active ports per switch, when reached no further isolations will be allowed.

Name Type Description max_per_hour int The maximum number of distinct links allowed to be de-isolated within an hour. min_health_time_before_deisolation_min int The minimum period in minutes that a link has to be clean of alerts in order to be qualified for de-isolation. max_recovery_time_threshold_hours int The maximum period in hours that a link is granted for recovery and de-isolation, before declaring it as unrecoverable.

Customers may be looking to extend or replace the plugin's mitigation procedure with their private business logic.

For that purpose, the action execution module reports both on actions it has taken, as well as on alerts that did not result in any action due to the decision process failing at one of the checkpoints.

Reporting is done via sending event-specific messages to the UFM events table.

These messages are sent following every action taken by the plugin:

Event Message Format Comments Isolation Completed isolation of the link between ports <src_guid>_<src_num> <-> <dest_guid>_<dest_num>, isolation reasons: <alert_reasons> De-isolation Completed deisolation of the link between ports <src_guid>_<src_num> <-> <dest_guid>_<dest_num> after no alerts were triggered for <isolation_reasons> Isolation reasons are the alerts that triggered the initial isolation Declaring unrecoverable port Port <src_guid>_<src_num> could not be deisolated for more than <threshold value> hours due to repeating <alert_reason> alerts, hence it is marked as unrecoverable Threshold is configurable, the plugin will not deisolate unrecoverable ports

These messages are sent following alerts that were triggered and did not results in any action (for example, due to the link being configured to run in shadow mode).

In this case, as no attempt to recover a problematic link is done by the plugin, the same alert may be repeatedly raised and result in repetitive messages sent to the UFM events table indicating the same issue.

To avoid such a scenario, we have included an option to suppress repetitive messages, defined as messages that repeat information already provided in a previous message (e.g., similar event reported for the same port due to the same alert). By setting the value of the configuration parameter suppresion_interval_hours (default value is 24), the user can determine for how many hours a repetitive alert message will be suppressed (setting a value of 0 disables the suppression mechanism).