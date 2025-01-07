Telemetry
UFM Telemetry allows the collection and monitoring of InfiniBand fabric port statistics, such as network bandwidth, congestion, errors, latency, and more.
UFM provides a range of telemetry capabilities:
Real-time monitoring views
Monitoring of multiple attributes
Intelligent Counters for error and congestion counters
InfiniBand port-based error counters
InfiniBand congestion XmitWait counter-based congestion measurement
InfiniBand port-based bandwidth data
The telemetry session panels support the following actions:
Rearrangement via a straightforward drag-and-drop function
Resizing by hovering over the panel's border
There are two methods for managing telemetry instances:
Legacy Mode (via UFM): In this mode, telemetry instances are invoked during UFM startup and fully managed by UFM.
UTM Mode (via UFM Telemetry Manager): In this mode, telemetry instances are managed by the UFM Telemetry Manager (UTM) plugin.
By default, the system operates in legacy mode. To switch to UTM mode:
Start UFM.
Deploy and enable the UFM Telemetry Manager (UTM) Plugin
Edit the
/opt/ufm/files/conf/gv.cfgconfiguration file, and under the [Telemetry] section, set the following flags to
false:
primary_telemetry_legacy_mode = false
secondary_telemetry_legacy_mode = false
Save and close the
gv.cfgfile.
Restart UFM (or simply restart the telemetry service).
|
Telemetry Instance
|
Description
|
REST API
|
High-Frequency (Primary) Telemetry Instance
|
A default telemetry session that collects a predefined set of ~30 counters covering bandwidth, congestion, and error metrics, which UFM analyzes and reports.
These counters are used for:
|
For Default and Real-time Telemetry: Monitoring REST API
For Historical Telemetry: History Telemetry Sessions REST API → History Telemetry Sessions
|
Low-Frequency (Secondary) Telemetry Instance
|
Operates automatically upon UFM startup, offering an extended scope of 120 counters. For a list of the Secondary Telemetry Fields, refer to Low-Frequency (Secondary) Telemetry Fields.
|
N/A
For direct telemetry endpoint access, which exposes the list of supported counters:
For the High-Frequency (Primary) Telemetry Instance, run the following command:
#curl -s
127.0.
0.1:
9001/csv/cset/converted_enterprise
For the Low-Frequency (Secondary) Telemetry Instance, run the following command:
#curl -s
127.0.
0.1:
9002/csv/xcset/low_freq_debug
Historical Telemetry Collection in UFM
UFM periodically collects fabric port statistics and saves them in its SQLite database. Before starting up UFM Enterprise, please consider the following disk space utilization for various fabric sizes and duration.
The measurements in the table below were taken with sampling interval set to once per 30 seconds.
Be aware that the default sampling rate is once per 300 seconds. Disk utilization calculation should be adjusted accordingly.
|
Number of Nodes
|
Ports per Node
|
Storage per Hour
|
Storage per 15 Days
|
Storage per 30 Days
|
16
|
8
|
1.6 MB
|
576 MB (0.563 GB)
|
1152 MB (1.125 GB)
|
100
|
8
|
11 MB
|
3960 MB (3.867 GB)
|
7920 MB (7.734 GB)
|
500
|
8
|
50 MB
|
18000 MB (17.58 GB)
|
36000 MB (35.16 GB)
|
1000
|
8
|
100 MB
|
36000 MB (35.16 GB)
|
72000 MB (70.31 GB)