NVIDIA UFM Telemetry Documentation v1.13
NVIDIA UFM Telemetry Documentation v1.13

Fluent Bit Export

NVIDIA® UFM® Telemetry adds the ability to stream to multiple destinations using Fluent Bit. The streaming implementation can stream to any Fluent Bit export plugin, with the "Forward" plugin being particularly useful as it allows sending data to a customer-maintained Fluent Bit or FluentD instance which the customer can then configure as based on their requirements.

To export collected data from the UFM Telemetry docker image:

  1. Load, configure, and run the docker image. See the details in the "Software Management" chapter.

  2. Connect to "ufm-telemetry docker bash".

    Copy
    Copied!
                

    [root@r-ufm ~]# sudo docker exec -it ufm-telemetry bash

  3. Configure/create export files *.exp in export directory /config/fluent_bit_configs/ and set enable=1 for plugins you want to run. Please see details in the "Export Files" section.

  4. Enable Fluent Bit export by setting plugin_env_FLUENT_BIT_EXPORT_ENABLE=1 in /config/launch_ibdiagnet_config.ini.

    Copy
    Copied!
                

    [root@r-ufm ~]# vi /telemetry.config/launch_ibdiagnet_config.ini … [fluentbit_export]   plugin_env_FLUENT_BIT_EXPORT_ENABLE=1 plugin_env_FLUENT_BIT_CONFIG_DIR=/telemetry.config/fluent_bit_configs plugin_env_LD_LIBRARY_PATH=/opt/mellanox/collectx/lib ...

    Alternatively, you may do this using the configuration script configure_ufm_telemetry_target.py by running:

    Copy
    Copied!
                

    [root@r-ufm ~]# /config/configure_ufm_telemetry_target.py enable-streaming

    This changes the value of the plugin_env_FLUENT_BIT_EXPORT_ENABLE parameter in the launch_ibdiagnet_config.ini file. See section "Controlling Fluent Bit Streaming" for more details.

  5. Run destination programs that will receive data. See more details in the "Data Forwarding" section.

  6. See the data on the receiving side.

Ibdiagnet will collect and export data periodically as configured by launch_ibdiagnet_config.ini file using the sample_rate parameter.

Export destinations are set by configuring .exp files or creating new ones. All export files are placed in the export configuration folder /config/fluent_bit_configs. The easiest way to start is to use documented example exp-files for the following plugins:

  • forward

  • stdout

  • stdout_raw (this plugin is presented only in the Fluent Bit version installed in the UFM Telemetry docker image)

All plugins are disabled by default. To enable a plugin, set enable=1.

Export File Configuration Details

Each export destination has the following fields:

  • name – configuration name

  • plugin_name – Fluent Bit plugin name

  • enable – 1 or 0 values to enable/disable this destination

  • host – the host for Fluent Bit plugin

  • port – port for Fluent Bit plugin

  • msgpack_data_layout – the msgpacked data format. Default is flb_std. The other option is custom. See section "Msgpack Data Layout" for details.

  • plugin_key=val – key-value pairs of Fluent Bit plugin parameter (optional)

  • counterset/fieldset – file paths (optional). See the details in section "Cset/Fset Filtering".

Use "#" to comment line.

Msgpack Data Layout

Data layout can be configured using .exp files by setting "msgpack_data_layout=layout".

Two layouts are available:

  1. "flb_std" data layout is an array of 2 fields: timestamp double value and a plain dictionary (key-value pairs). The standard layout is appropriate for all Fluent Bit plugins. For example:

    Copy
    Copied!
                

    [timestamp_val, {"timestamp"->ts_val, type=>"counters/events", "source"=>"source_val", "key_1"=>val_1, "key_2"=>val_2,...}]

  2. "custom" data layout is a dictionary of meta-fields and counter fields. Values are placed into a separate plain dictionary. Custom data format can be dumped with "stdout_raw" output plugin of fluent-bit installed or can be forwarded with "forward" output plugin.

    Counters example:

    Copy
    Copied!
                

    {"timestamp"=>timestamp_val, "type"=>"counters", "source"=>"source_val", "values"=> {"key_1"=>val_1, "key_2"=>val_2,...}}

    Events example:

    Copy
    Copied!
                

    {"timestamp"=>timestamp_val, "type"=>"events", "type_name"=>"type_name_val", "source"=>" source_val", "values"=>{"key_1"=>val_1, "key_2"=>val_2,...}}

Cset/Fset Filtering

Each export file can optionally use one cset and one fset file to filter UFM Telemetry counters and events data.

  • Cset file contains tokens per line to filter data with "type"="counters".

  • Fset contains several blocks started with the header line [event_type_name] and tokens under that header. Fset file is used to filter data with "type"="events".

    • Event type names can be prefixed to apply the same tokens to all fitting types. For example, to filter all ethtool events use [ethtool_event_*].

If several tokens are needed to be matched simultaneously use "tok1+tok2+tok3". Exclusive tokens are available too: line "tok1+tok2-tok3-tok4" will filter names that match both tok1 and tok2 and do not match tok3 or tok4.

Both events and counters can be extended with aliased fields and new constant fields.

  • “meta_field_aliases:exact_name=alias” will add new field/counter with name “alias_name” and copied value from the existing field/counter “exact_name”.

  • “meta_field_add:new_name=constant_value” will add new filed/counter with a name “new_name” and value “constant_value”

New fields should have unique names, otherwise,they will be ignored.

For more details see documentation in the files ufm_enterprise.cset and ufm_ enterprise.fset under /config/fluent_bit_configs.

The following is the content of /config/fluent_bit_configs/ufm_enterprise.cset:

Copy
Copied!
            

# put tokens on separate lines   # Tokens are the actual name 'fragments' to be matched # port$ # match names ending with token "port" # ^port # match names starting with token "port" # ^port$ # include name that is exact token "port # port+xmit # match names that contain both tokens "port" and "xmit" # port-support # match names that contain the token "port" and do not match the "-" token "support" # -port # exclude all names that contain the token "port" # # Tip: To disable counter export put a single token line that fits nothing   # Meta fiedls are user-defined additional fiedls of 2 types: aliases and new constant fields. # - Aliases: # add data of field "exact_name" to meta fields of record with new "alias_name". # One field can have only one alias. # Aliases match only exact names and will apper in data record even if field is disabled by fset. # Example: # meta_field_alias:exact_name=alias_name # - Constants: # add new field "new_field_name" with constant data sting "constant_value"to the meta fields. # Names should be unique. # Example: # meta_field_add:new_field_name=constant_value   # List of available counters: # #node_guid #port_guid #port_num #lid #link_down_counter #link_error_recovery_counter #symbol_error_counter #port_rcv_remote_physical_errors #port_rcv_errors #port_xmit_discard #port_rcv_switch_relay_errors #excessive_buffer_errors …

The following is the content of /config/fluent_bit_configs/ufm_enterprise.fset:

Copy
Copied!
            

# Put your events here   # Usage: # # [type_name_1] # tokens # [type_name_2] # tokens # [type_name_3] # tokens # ...   # Tokens are the actual name 'fragments' to be matched # port$ # match names ending with token "port" # ^port # match names starting with token "port" # ^port$ # include name that is exact token "port # port+xmit # match names that contain both tokens "port" and "xmit" # port-support # match names that contain the token "port" and do not match the "-" token "support" # -port # exclude all names that contain the token "port"   # Meta fiedls are user-defined additional fiedls of 2 types: aliases and new constant fields. # - Aliases: # add data of field "exact_name" to meta fields of record with new "alias_name". # One field can have only one alias. # Aliases match only exact names and will apper in data record even if field is disabled by fset. # Example: # meta_field_alias:exact_name=alias_name # - Constants: # add new field "new_field_name" with constant data sting "constant_value"to the meta fields. # Names should be unique. # Example: # meta_field_add:new_field_name=constant_value   # The next example will export the whole "switch_fan" events and events "CableInfo" filtered with token "port" : # [switch_fan] # # [CableInfo] # port   # To know which event type names are available use one of these options: # 1. Check export and find field "type_name"=>"switch_temperature" # OR # 2. Open log file "/tmp/ibd/ibdiagnet2_port_counters.log" and find event types are printed to log: # ... # [info] type [CableInfo] is type of interest # [info] type [switch_temperature] is type of interest # [info] type [switch_fan] is type of interest # [info] type [switch_general] is type of interest # ...   # Corner cases: # 1. Empty fset file will export all events. # 2. Tokens written above/without [event_type] will be ignored. # 3. If cannot open fset file, warning will be printed, all event types will be exported.


  1. Connect to a remote Linux machine via SSH and ensure docker is installed and started on it.

    Copy
    Copied!
                

    [root@r-ufm ~]# sudo service docker start

  2. Pull FluentD image:

    Copy
    Copied!
                

    [root@r-ufm ~]# sudo docker pull fluentd

  3. Create a configuration file for fluentd container.

    Copy
    Copied!
                

    [root@r-ufm ~]# export fluentd_dir=/tmp/fluentd [root@r-ufm ~]# mkdir -p $ fluentd_dir [root@r-ufm ~]# vim $ fluentd_dir/config.conf #fill it with next configuration   <source> @type forward bind 0.0.0.0 port 24432 </source>   <match ufm_telemetry> @type stdout </match>

  4. Start fluentd collector container.

    Copy
    Copied!
                

    [root@r-ufm ~]# sudo docker run -it --rm --network host -v $fluentd_dir:/fluentd/etc fluentd -c /fluentd/etc/config.conf -v

For more details refer to "FluentD" on docker hub.

  1. Follow the instructions under "Quick Start Guide for FluentD" to prepare remote host with a running FluentD.

  2. Follow the instructions under "Exporting Data Using Fluent Bit Export" to prepare UFM Telemetry with Fluent Bit export capability and ensure it matches the following configurations:

    • Fluent Bit is enabled (plugin_env_FLUENT_BIT_EXPORT_ENABLE=1) in the launch_ibdiagnet_config.ini file:

      Copy
      Copied!
                  

      [root@r-ufm ~]# grep -a2 fluent /config/launch_ibdiagnet_config.ini   [fluentbit_export] plugin_env_FLUENT_BIT_EXPORT_ENABLE=1 plugin_env_FLUENT_BIT_CONFIG_DIR=/telemetry.config/fluent_bit_configs plugin_env_LD_LIBRARY_PATH=/opt/mellanox/collectx/lib

    • Prepare a forward.exp file to send data to remote host where fluentd is running:

      Copy
      Copied!
                  

      [root@r-ufm ~]# cat /config/fluent_bit_configs/forward.exp   name=ufm-enterprise enable=1 plugin_name=forward host=10.209.36.248 # Remote host IP where fluentd is running port=24432   plugin_tag_match_pair=ufm_telemetry

  3. Verify that data is streamed from the CollectX Telemetry plugin and is received on the FluentD collector.

A script to facilitate the configuration of UFM Telemetry is located under the path /config/configure_ufm_telemetry_target.py.

The script is used to set and show sample rate duration, enable and disable streaming capabilities, add, remove, update, enable, disable and review target destinations to receive counters and cable info data, and import filters defined in files to filter streamed data.

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py -h usage: configure_ufm_telemetry_target.py <command> [<args>]   positional arguments: {add-target,show-target,remove-target,enable-target,enable-streaming,disable-target,disable-streaming,modify-target,import-filter-file,disable-filter-file,set-sample-rate,show-sample-rate} Commands add-target Add a telemetry target show-target Show telemetry target(s) remove-target Remove a telemetry target enable-target Enable a telemetry target enable-streaming Enable telemetry streaming disable-target Disable a telemetry target disable-streaming Disable telemetry streaming modify-target Modify a telemetry target import-filter-file Import a telemetry target filter file disable-filter-file Disable telemetry target filter file set-sample-rate Set telemetry sample rate show-sample-rate Show telemetry sample rate   optional arguments: -h, --help show this help message and exit -V, --version Print version information

Controlling Fluent Bit Streaming

Fluent Bit data streaming is disabled by default. You may enable it by using the script argument enable-streaming (disable-streaming to disable). This changes the value of the plugin_env_FLUENT_BIT_EXPORT_ENABLE parameter in the launch_ibdiagnet_config.ini file.

Copy
Copied!
            

[root@r-ufm ~]# grep plugin_env_FLUENT_BIT_EXPORT_ENABLE /config/launch_ibdiagnet_config.ini plugin_env_FLUENT_BIT_EXPORT_ENABLE=0 [root@r-ufm ~]# /config/configure_ufm_telemetry_target.py enable-streaming [root@r-ufm ~]# grep plugin_env_FLUENT_BIT_EXPORT_ENABLE /config/launch_ibdiagnet_config.ini plugin_env_FLUENT_BIT_EXPORT_ENABLE=1


Controlling Target Destinations

You can add, remove, update, enable, disable and review many target destinations to receive counters and cable info data.

Warning

Use the flag -h to see the details of any operation.

Adding Destination Target

The parameter add-target adds and enables a destination target.

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py add-target -h usage: configure_ufm_telemetry_target.py <command> [<args>] add-target [-h] -n <[A-Za-z0-9_-] Name size: 32> -H <IPv4> -p <1-65535> -m {extended,standard}   optional arguments: -h, --help show this help message and exit -n <[A-Za-z0-9_-] Name size: 32>, --target-name <[A-Za-z0-9_-] Name size: 32> Target name -H <IPv4>, --target-host <IPv4> IPv4 address -p <1-65535>, --target-port <1-65535> Port number -m {extended,standard}, --target-message-type {extended,standard}

For example:

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py add-target --target-name ufm-telemetry --target-host 10.212.145.6 --target-port 24453 -m standard


Displaying Destination Target Details

The parameter show-target displays the details of a destination target.

Copy
Copied!
            

[root@r-ufm ~]#[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py add-target -h usage: configure_ufm_telemetry_target.py <command> [<args>] add-target [-h] -n <[A-Za-z0-9_-] Name size: 32> -H <IPv4> -p <1-65535> -m {extended,standard}   optional arguments: -h, --help show this help message and exit -n <[A-Za-z0-9_-] Name size: 32>, --target-name <[A-Za-z0-9_-] Name size: 32> Target name -H <IPv4>, --target-host <IPv4> IPv4 address -p <1-65535>, --target-port <1-65535> Port number -m {extended,standard}, --target-message-type {extended,standard}

For example:

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py show-target --target-name ufm-telemetry Enabled: Yes Name: ufm-telemetry Enabled: Yes Host: 10.212.145.6 Port: 24453 Message Type: Standard


Disabling Destination Target

The parameter disable-target disables a destination target.

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py disable-target -h usage: configure_ufm_telemetry_target.py <command> [<args>] disable-target [-h] -n TARGET_NAME   optional arguments: -h, --help show this help message and exit -n TARGET_NAME, --target-name TARGET_NAME

For example:

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py disable-target --target-name ufm-telemetry [root@r-ufm ~]# /config/configure_ufm_telemetry_target.py show-target --target-name ufm-telemetry Enabled: Yes Name: ufm-telemetry Enabled: No Host: 10.212.145.6 Port: 24453 Message Type: Standard


Enabling Destination Target

The parameter enable-target enables a destination target.

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py enable-target -h usage: configure_ufm_telemetry_target.py <command> [<args>] enable-target [-h] -n TARGET_NAME   optional arguments: -h, --help show this help message and exit -n TARGET_NAME, --target-name TARGET_NAME

For example:

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py enable-target --target-name ufm-telemetry [root@r-ufm ~]# /config/configure_ufm_telemetry_target.py show-target --target-name ufm-telemetry Enabled: Yes Name: ufm-telemetry Enabled: Yes Host: 10.212.145.6 Port: 24453 Message Type: Standard


Modifying Destination Target

The parameter modify-target modifies a destination target.

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py modify-target -h usage: configure_ufm_telemetry_target.py <command> [<args>] modify-target [-h] -n TARGET_NAME [-H <IPv4>] [-p <1-65535>] [-m {extended,standard}]   optional arguments: -h, --help show this help message and exit -n TARGET_NAME, --target-name TARGET_NAME -H <IPv4>, --target-host <IPv4> IPv4 address -p <1-65535>, --target-port <1-65535> Port number -m {extended,standard}, --target-message-type {extended,standard}

For example:

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py modify-target --target-name ufm-telemetry --target-host 10.212.145.7 --target-port 24455 -m standard [root@r-ufm ~]# /config/configure_ufm_telemetry_target.py show-target --target-name ufm-telemetry Enabled: Yes Name: ufm-telemetry Enabled: Yes Host: 10.212.145.7 Port: 24455 Message Type: Standard


Removing Destination Target

The parameter remove-target removes a destination target.

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py remove-target -h usage: configure_ufm_telemetry_target.py <command> [<args>] remove-target [-h] -n TARGET_NAME   optional arguments: -h, --help show this help message and exit -n TARGET_NAME, --target-name TARGET_NAME

For example:

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py remove-target --target-name ufm-telemetry [root@r-ufm ~]# /config/configure_ufm_telemetry_target.py show-target --target-name ufm-telemetry Enabled: Yes Target ufm-telemetry is missing. Please add it first.

Data Filtration

The configure_ufm_telemetry_target.py script allows users to import filter files to enable filtering streamed data and to disable filter options.

Enabling Data Filtration

To enable filtration of the streamed counters and cable info data, users must create a file containing the appropriate RegEx patterns (one pattern per line to extract the required parameters data).

Copy
Copied!
            

[root@r-ufm ~]# cat ~/counters_filter lm_counter Errors

Then they must import the filter file to a destination, specifying the type of data (counters or cable info) using the parameter import-filter-file.

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py import-filter-file -h usage: configure_ufm_telemetry_target.py <command> [<args>] import-filter-file [-h] -n TARGET_NAME -t {counters,fields} -f FILE_PATH   optional arguments: -h, --help show this help message and exit -n TARGET_NAME, --target-name TARGET_NAME -t {counters,fields}, --target-filter-type {counters,fields} -f FILE_PATH, --file-path FILE_PATH

For example, to enable filtering streamed data and create filters:

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py import-filter-file --target-name ufm-telemetry --target-filter-type counters --file-path ~/counters_filter

On the target destination side, users will receive all the counters include one of texts (lm_counterm Errors).

Disabling Data Filtration

The parameter disable-filter-file disables an imported filtering file.

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py disable-filter-file -h usage: configure_ufm_telemetry_target.py <command> [<args>] disable-filter-file [-h] -n TARGET_NAME -t {counters,fields}   optional arguments: -h, --help show this help message and exit -n TARGET_NAME, --target-name TARGET_NAME -t {counters,fields}, --target-filter-type {counters,fields}

For example:

Copy
Copied!
            

[root@r-ufm ~]# /config/configure_ufm_telemetry_target.py disable-filter-file --target-name ufm-telemetry --target-filter-type counters

On the target destination side, users will receive all the counters without filtering.

© Copyright 2023, NVIDIA. Last updated on Sep 6, 2023.