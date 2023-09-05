Network Alerts: Alerts for the entire cluster. The algorithm checks for unusual changes in several important metrics and notifies the user.

Tenant/Application Alerts: Triggered by PKey monitoring in the cluster. It checks the most congested PKeys for a better understanding of applications' health.

Link Failure Prediction: Prediction of future link failures 1-to-24 hours in advance using machine learning algorithms with a probability indicator and the counters that influenced the triggering of the alert the most .

Link Anomaly: Detects anomalous behavior in the cluster with a probability indicator. It detects the most significant influencers on the anomaly notice.

The purpose of this tab is to detect abnormal behavior at the level of the entire cluster.

An ETL process runs hourly and calculates network aggregated statistics while another process checks how the current statistics compare to statistics aggregated over the previous month. If over 20% of the difference is detected (default value that can be changed) the system triggers an alert with relevant information. It is also possible to see recommended action by clicking the relevant icon per alert.

The web UI provides a list of alerts as shown in the following:

Clicking any alert provides an additional layer of analysis that shows the anomalous parameter over three different time ranges.





The ETL process of UFM Cyber-AI combines a partitioning key (PKey) topology with network telemetry to monitor PKey performance.

Based on normalized congestion measurements (the default is greater than 70%) the system detects the most congested PKeys. This is done by counting the amount of time when the alert is received.

In addition, a resource allocation pie is available which shows allocated nodes for PKey via free nodes.

Detailed event information is provided to the user regarding PKey alerts, where the user can see PKey details and descriptions of the alert.

Clicking any PKeys alert shows six graphs representing network statistics in general and per selected Pkey.

This way the user can see the impact of a specific PKey throughout the entire network and can see if PKey activity is normal both from a performance and from a duration of usage (if the activity is happening in a reasonable time) point of view.





UFM Cyber-AI trains machine learning algorithms to predict future failures by collecting monitoring information (i.e. training data for the machine learning algorithms) over a time duration (e.g. 1-24 hours) in advance of (retrospectively known) previous failures that occurred and having the algorithms learn the connection between different parameters over time.

Using the machine learning algorithm, the processor derives the potential failure pattern by, for example, alerting future failure times of components. The processor repeatedly updates the alerted future failure times based on newly collected failures.

The dashboard provides a list of ports with the most Link Failure Predictions alerts raised and the relation between Alerted and the Total number of devices in the cluster.

Users may see the detailed events through an event list where alert details like Node Name, Port, Hours to Fail, and alert Description are available.

Clicking any alert in the list shows three graphs representing counters that influenced the triggering of the alert the most. Several time ranges are available.

The default view provides two lines for each graph: One for current data, and another for historical data which is calculated based on average values from the prior week.

Users can choose to switch between Weekly average (default) to Day of Week average.

Day of Week Average is based on the calculation of the statistics in the same hours and day of the week of the past month. For example The average for 8AM–9AM on Mondays during the past month.





Port anomaly detection is based on defining composite metrics to reliably detect anomalies, where such metrics dynamically change, for example, according to a baseline that is determined and subsequently updated by a system.

In addition, there is a process for defining an anomaly score that provides a statistical estimation, such as the number of standard deviations, or the number of Mean Absolute Errors (MAEs) from a baseline value of the feature (i.e., metrics value), and assigning a degree of severity according to the number of standard deviations or MAEs.

The dashboard provides a list of top ports reporting link anomalies including the number of times an anomaly is detected and statistics regarding Alerted and the Total number of devices in the cluster.

Users can also see detailed events in the events list where the alert details such as Node Name, Probability, and Alert Description are available.

Clicking on the Recommended Action icon opens a window with recommended actions that may be taken.

Clicking any alert in the list shows three graphs representing counters that influenced the triggering of the alert the most. Several ranges of time are available.

The default view provides two lines for each graph: One for current data, and another for historical data which is calculated based on average values from the prior week.

Users can choose to switch between Weekly average (default) to Day of Week average.

Day of Week Average is based on calculating the statistics in the same hours and day of the week of the past month. For example The average for 8AM–9AM on Mondays during the past month.





Logical server data collection and analytic jobs are disabled by default. To enable this, the related flag should be set to ‘true’ in the cyberai.cfg file:

Copy Copied! [[CyberAi] log_level = INFO log_path = /var/log/cyberai/cyberai.log license_check_interval = 24 health_check_interval = 120 use_gpu = false enable_logical_servers = false

The ETL process of UFM Cyber-AI combines the topology of the logical server, with network telemetry allowing the monitoring of logical servers' performance.

Based on utilization measurements (the default is greater than 70%) the system detects the most utilized logical server. This is done by counting the amount of time when the alert is received.

In addition, a resource allocation pie is available which shows allocated nodes for logical servers compared to free nodes.

Detailed event information is provided to the user regarding logical server alerts, where the user can see logical server details and a description of the alert.

Clicking any logical server alert shows six graphs representing network statistics in general and per selected logical server.

This way the user can see the impact of a specific logical server throughout the entire network and can see if logical server activity is normal both from a performance and from a duration of usage (i.e., if the activity is happening in a reasonable time) point of view.





A recommended action is available for all alert types. The user can click on the recommended action icon to see the recommended actions for the alert.



