Prometheus Endpoint Support
UFM Telemetry can expose an http or https endpoint to allow simple and effective integration with monitoring systems that work in poll mode and support Prometheus, CSV, or JSON data formats. The endpoint provides only the last data sample. The user cannot obtain statistics for time points in the past.
An http endpoint provides data in Prometheus format by default. It also supports JSON and CSV formats. The user can request the desired format using a URL prefix, as shown in the table below.
Data Format |
URL Prefix |
Prometheus |
- |
JSON |
/json |
CSV |
/csv |
An http endpoint can provide all sampled data using the default /metrics URL. The filtering functionality described in the Cset/Fset Filtering section is also supported. To use it place <name>.cset or <name>.fset file in appropriate folders. This folder should be stated in configuration file. See section "Configuring Data Polling Endpoint" for more details.
The Extended counter set filtering, as described below, presents an alternative approach to filtering functionality by enabling counters and field selection.
A filter file name is included in the URL to request that the data be filtered through the particular .cset/.fset/.xcset file the user intends. For example, if there are two filter files named name1.cset and name2.cset, then URLs /name1 (or /cset/name1 ) and /name2 (or /cset/name2) can be used to get filtered output described in these files accordingly.
The URL prefixes /cset, /fset and /xcet can also be used to specify which filter file is meant.
URL |
File Extension |
Folder Parameter in Configuration File |
Note |
/cset |
*.cset |
plugin_env_PROMETHEUS_CSET_DIR |
If the cset folder is not explicitly specified in the configuration file, then the cset directory is set the same as the fset directory. |
/fset |
*.fset |
plugin_env_PROMETHEUS_FSET_DIR |
If the fset folder is not explicitly specified in the configuration file, then the fset directory is set the same as the cset directory. |
/xcset |
*.xcset |
plugin_env_PROMETHEUS_XCSET_DIR |
If the xcset folder is not explicitly specified in the configuration file, then the xcset directory is set the same as the fset directory. |
If a URL prefix is not specified, then the filter file will be searched under both cset and fset folders. If they both have files with the same names, then both filters will be applied.
Extended Counter Set Filtering
The http server provides an optional Extended counter set (xcset) selection mechanism in addition to the counter set (cset) and field set (fset) filtering. To define an extended counter set, a file or group of files with the .xcset extension must be placed in its designated directory or adjacent to existing field or counter sets.
Each line of the file may contain:
Selection of a counter with an optional alias in the format “counter[=alias]”
Selection of a type’s field with an optional alias in the format “type.field[=alias]”
Reference to another file to be included “file.xcset”
Extended counter set files are searched for in the same directory as the source xcset.
Aliases are not mandatory, but if provided, they are used to name the selected counter or field in the output. Empty lines and comments that begin with the "#" sign are disregarded.
URL prefixes can be used to manipulate data output. It is important to use the prefixes in the correct order as they have assigned priorities. The table below shows URL prefixes priority assignments with examples:
Priority |
Prefix |
Link Examples |
Description |
1 |
/labels |
/labels/metrics, /metrics |
Used to show labels from metadata files |
2 |
/json, /csv |
/json/metrics, /csv/metrics, /labels/json/metrics, /labels/csv/metrics |
Used to specify output format |
3 |
/cset, /fset, /xcset |
/cset/filter1, /fset/filter2, /labels/cset/filter1, /labels/fset/filter2, /json/cset/filter1, /json/fset/filter2, /csv/cset/filter1, /csv/fset/filter2, /csv/xcset/ib, /labels/json/cset/filter1, /labels/json/fset/filter2, /labels/csv/cset/filter1, /labels/csv/fset/filter2 |
Used to specify which type of filer file should be applied |
To configure the Prometheus endpoint, the keys listed below need to be set in the launch_ibdiagnet_config.ini file.
plugin_env_PROMETHEUS_ENDPOINT http://0.0.0.0:9100
plugin_env_PROMETHEUS_PROXY_ENDPOINT_PORT 9200
plugin_env_PROMETHEUS_INDEXES port_num
plugin_env_PROMETHEUS_FSET_INDEXES port,lid,guid,[CableInfo]^port_guid,^Port$
plugin_env_PROMETHEUS_CSET_DIR /config/prometheus_configs/cset
…
There are several options related to configuring the HTTP polling endpoint. The key plugin_env_PROMETHEUS_ENDPOINT is used to configure the IP interface for endpoint binding. The “0.0.0.0” part in the setting above means that any of the host's valid IP addresses can be used. Note that the user can also specify the host's IP address explicitly.
The plugin_env_PROMETHEUS_ENDPOINT key also configures the data transport. For regular HTTP, prefix to http. To send over a TLS connection, set the prefix to https, set the above mandatory parameters (keys), and select the existing security keys as follows.
A DH (key exchange protoon) file can also be specified if needed as follows:
plugin_env_CLX_SSL_DH_FILE=/certs/dh.pem
To use custom labels for Prometheus statistics, a metadata file is used. For details about labels and label file format, see sections "Prometheus Labels" and "Prometheus Label Generation".
There are several options that allow configuring metadata. The file containing the labels used in Prometheus generation is set as follows:
plugin_env_CLX_METADATA_FILE=/config/labels.txt
The user can create the metadata file upon system setup or use a script to generate it automatically via script, using the following parameter:
plugin_env_CLX_METADATA_COMMAND=/opt/mellanox/collectx/telem/bin/gen_metadata --fabric compute --file /var/log/ibdiagnet2.ibnetdiscover --output /config/labels.txt
In the above example, the script generates metadata from /var/log/ibdiagnet2.ibnetdiscover. If the user wishes to create the label file manually, the above option should be commented out to prevent periodic overwriting of the content of the metadata file.
By default, the Prometheus endpoint provides statistics with the collection timestamps. The user can decide whether counter values will be passed with or without timestamps by setting the plugin_env_PROMETHEUS_SHOW_TIMESTAMPS parameter to T (true) or F (false), respectively. For example, to send counter values without timestamps, set the parameter as follows:
plugin_env_PROMETHEUS_SHOW_TIMESTAMPS=F
To use data filters folders with counter set, field sets, and extended counter sets, the directories where the files are stored should be configured as follows:
plugin_env_PROMETHEUS_CSET_DIR=/telemetry.config/prometheus_configs/cset
plugin_env_PROMETHEUS_FSET_DIR=/telemetry.config/prometheus_configs/fset
plugin_env_PROMETHEUS_XCSET_DIR=/telemetry.config/prometheus_configs/xcset
Any parameters not explicitly documented should not be changed and should be considered read-only.
For use cases such as UFM Enterprise or UFM Cyber AI where the network topology is known, a human-readable name can be presented based on the GUID.
# TYPE PortXmitDataExtended counter
# TYPE PortXmitPktsExtended counter
PortXmitDataExtended{source="0x0002c90300f172a0", node_guid="2c90300f172a0", port_guid="2c90300f172a2", port_num="2"} 85554128244 1628683905941
PortXmitPktsExtended{source="0x0002c90300f172a0", node_guid="2c90300f172a0", port_guid="2c90300f172a2", port_num="2"} 1188251785 1628683905941
For integration with third-party applications, labels which are more human-readable may be generated using a labels metadata file, as described below.
To generate custom labels, a file containing key-value pairs is used. When the keys are matched, the key-value pairs added to the Prometheus labels are generated.
The following is an example of the format of a labels metadata file:
ec0d9a0300b41a50_36|port_id|ec0d9a0300b41a50_36|device_name|SwitchIB Mellanox Technologies|device_type|switch|fabric|compute|hostname||node_desc||level|leaf|peer_level|server
ec0d9a0300b41a50_37|port_id|ec0d9a0300b41a50_37|device_name|SwitchIB Mellanox Technologies|device_type|switch|fabric|compute|hostname||node_desc||level|leaf|peer_level|
ec0d9a0300b41a58_1|port_id|ec0d9a0300b41a58_1|device_name||device_type|switch|fabric|compute|hostname|aggregation|node_desc|aggregation node|level||peer_level|leaf
98039b0300640b92_1|port_id|98039b0300640b92_1|device_name||device_type|host|fabric|compute|hostname|agx-1|node_desc|agx-1 mlx5_0|level|server|peer_level|leaf
98039b0300640c22_1|port_id|98039b0300640c22_1|device_name||device_type|host|fabric|compute|hostname|agx-2|node_desc|agx-2 mlx5_0|level|server|peer_level|leaf
0002c90300f172a0_2|port_id|0002c90300f172a0_2|device_name||device_type|host|fabric|compute|hostname|agx-3|node_desc|agx-3 mlx4_0|level|server|peer_level|leaf
98039b0300640b9a_1|port_id|98039b0300640b9a_1|device_name||device_type|host|fabric|compute|hostname|agx-3|node_desc|agx-3 mlx5_0|level|server|peer_level|leaf
The following is an example of the generated Prometheus output:
# TYPE infiniband_port_xmit_data_bytes counter
# TYPE infiniband_port_rcv_data_bytes counter
# TYPE infiniband_link_error_recovery_events counter
# TYPE infiniband_link_downed_events counter
# TYPE infiniband_cbw gauge
infiniband_port_xmit_data_bytes {port_id="0002c90300f172a0_2", ADDITIONAL_LABELS} 82218360540 1628602711924
infiniband_port_rcv_data_bytes {port_id="0002c90300f172a0_2", ADDITIONAL_LABELS} 82218429458 1628602711924
infiniband_link_error_recovery_events {port_id="0002c90300f172a0_2", ADDITIONAL_LABELS} 0 1628602711924
infiniband_link_downed_events {port_id="0002c90300f172a0_2", ADDITIONAL_LABELS} 0 1628602711924
infiniband_cbw {port_id="0002c90300f172a0_2", ADDITIONAL_LABELS}} 0 1628602711924
where ADDITIONAL_LABELS include:
hostname="agx-3"
node_desc="agx-3 mlx5_0"
device_name=""
device_type="host"
fabric="compute"
level="server"
peer_level="leaf"
To enable this functionality, the following additional keys need to be configured:
plugin_env_CLX_EXPORT_API_IBNETDISCOVER_RUN_ONCE 1 # Without this, the gen_metadata.py script cannot generate the human readable names, nor the level and peer_level.
plugin_env_CLX_METADATA_FILE /path/to/labels/file
plugin_env_CLX_METADATA_COMMAND "python3 /opt/mellanox/collectx/telem/bin/gen_metadata.py --fabric compute --file /var/log/ibdiagnet2.ibnetdiscover -o /path/to/labels/file"
To test, the curl command can be used as follows:
[root@jazz11 /]# curl --silent IP_ADDR_OF_HOST:9100/metrics |egrep "xmit|rcv" | tail
port_xmit_discard{device_name="",device_type="host",fabric="compute",hostname="jazz32",level="server",node_desc="jazz32 mlx5_2",peer_level="leaf",port_id="ec0d9a0300c04a54_1"} 0 1629194120043
port_rcv_switch_relay_errors{device_name="",device_type="host",fabric="compute",hostname="jazz32",level="server",node_desc="jazz32 mlx5_2",peer_level="leaf",port_id="ec0d9a0300c04a54_1"} 0 1629194120043
port_rcv_constraint_errors{device_name="",device_type="host",fabric="compute",hostname="jazz32",level="server",node_desc="jazz32 mlx5_2",peer_level="leaf",port_id="ec0d9a0300c04a54_1"} 0 1629194120043
port_xmit_constraint_errors{device_name="",device_type="host",fabric="compute",hostname="jazz32",level="server",node_desc="jazz32 mlx5_2",peer_level="leaf",port_id="ec0d9a0300c04a54_1"} 0 1629194120043