The UTM plugin is designed to operate either as a UFM plugin or in standalone mode.

UTM plugin mode is deployed via UFM UI. Telemetry instances might be deployed by UFM or manually by deployment scripts.

Standalone mode deploys the whole setup (UTM, host TI list, switch telemetry image) with the deployment scripts.

As a first step, get the UTM image. If it runs in UFM mode, upload it to the UFM machine.

Copy Copied! docker pull mellanox/ufm-plugin-utm

The UTM plugin can be added either via the Command Line Interface or Web-UI.

To add the plugin, run:

Copy Copied! /opt/ufm/scripts/manage_ufm_plugins.sh add -p utm

To remove the plugin, run:

Copy Copied! /opt/ufm/scripts/manage_ufm_plugins.sh remove -p utm





Navigate to the UFM web UI and click on Settings in the left panel. Go to the "Plugin Management" tab. Right-click on the UTM plugin row and select "Add." Go to the option on the left called "Telemetry Status" to see the UTM UI page. Operate with several options: The default UFM TIs. Depending on UFM configuration, TIs might run in legacy mode or within UTM. Start telemetry instances manually using UTM deployment scripts. See section "Manual Deployment".

To stop the UTM plugin, go to "Plugin Management", right-click on the UTM plugin line and click on disable .

Note If non-default UFM credentials are used, UTM may fail to access the UFM REST API. To resolve this, configure the ufm section of the utm_config.ini file with ufm_user= and ufm_pass= to restore the connection between UTM and UFM.

UFM Telemetry has high and low-frequency (Primary and Secondary, respectively) TIs that are running by default.

To enable meaningful monitoring:

Set plugin_env_CLX_EXPORT_API_SHOW_STATISTICS=1 in the config files: Copy Copied! /opt/ufm/files/conf/telemetry_defaults/launch_ibdiagnet_config.ini /opt/ufm/files/conf/secondary_telemetry_defaults/launch_ibdiagnet_config.ini Restart telemetry instances with the new config. If UFM Enterprise runs as a docker container, this command should be executed inside the container. Copy Copied! /etc/init.d/ufmd ufm_telemetry_restart Give TIs some time to update performance metrics. The time depends on the update interval of default TIs.

If legacy mode is disabled in UFM configuration, UTM will run Primary and Secondary telemetries automatically.

Additional telemetry instances for UFM plugin mode or the whole standalone setup (UTM and TIs) are deployed using UTM Deployment scripts.

Get deployment scripts and examples by mounting the local folder UTM_DEPLOYMENT_SCRIPTS ( /tmp/utm_deployment_scripts in this example) and running get_deployment_scripts.sh :

Copy Copied! $ export UTM_DEPLOYMENT_SCRIPTS=/tmp/utm_deployment_scripts $ docker run -v "$UTM_DEPLOYMENT_SCRIPTS:/deployment_scripts" --rm --name utm-deployment-scripts -ti mellanox/ufm-plugin-utm:latest /bin/sh /get_deployment_scripts.sh

The content of the script folder consists of:

Examples - Contains run/stop scripts for both standalone and UFM plugin modes. Each example script is an example of actual deployment script usage.

hostlist.txt - Specifies the hosts, ports, and HCAs for TIs to be deployed

Scripts - Contains actual deployment scripts. Entry-point script deploy_managed_telemetry.sh triggers the rest two scripts, depending on input arguments. Copy Copied! $ cd $UTM_DEPLOYMENT_SCRIPTS $ tree . ├── examples │ ├── run_standalone.sh │ ├── run_with_plugin.sh │ ├── stop_standalone.sh │ └── stop_with_plugin.sh ├── hostlist.txt ├── README.md └── scripts ├── deploy_bringup.sh ├── deploy_managed_telemetry.sh └── deploy_ufm_telemetry.sh

Note All example/deployment scripts should run from the UTM_DEPLOYMENT_SCRIPTS folder.





Please note the following:

The hostlist.txt file should be set before running any script.

The hostname and port will be used for communication and HCA for telemetry collection.

UTM only supports a single fabric for managed TIs, even if different HCAs on the same machine are connected to different fabrics.

Both local and remote hosts are supported for TI deployments.

Copy Copied! $ cat hostlist.txt # List lines in the following format: # host:port:hca # # where: # - host is IP or hostname. Use localhost or 127.0 . 0.1 for local deployment # - port to run telemetry on. # - hca is the target host device from which telemetry collects. Run `ssh $host ibstat` # to find the active device on the target host. localhost: 8123 :mlx5_0 localhost: 8124 :mlx5_0

For a more customizable setup beyond what the example scripts offer, users have the option to manually run ./scripts/deploy_managed_telemetry.sh . This primary deployment script can deploy multiple TIs and optionally UTM as well.

Use deploy_managed_telemetry.sh --help to get help.

Copy Copied! ./deploy_managed_telemetry.sh --help ./deploy_managed_telemetry.sh options: mandatory: mandatory: --hostlist-file= Path to a file that lists hostname:port:hca lines mandatory run options (use only one at the same time): -r, --run Deploy and run managed telemetry setup -s, --stop Stop all processes and cleanup mandatory telemetry deployment options (use only one at the same time): -t=, --ufmt-image= UFM telemetry docker image or tgz/tar.gz-image or: --bringup- package = Bringup tar.gz package optional: -m=, --utm-image= UTM docker image or tgz/tar.gz-image. Runs UTM only if it is set. Configures UTM according hostlist file --utm-as-plugin= if UTM runs as a plugin, set this flag -d=, --data-root= Root directory for run data | Default: '/tmp/managed_telemetry/' -- switch -telem-image= Switch telemetry image (tar.gz-file or docker image). UTM will be able to deploy it to managed switches if set --common-data-dir= Common data folder for TIs -h, --help Print this message





Prepare TI setup using utm_deployment_scripts example scripts: Change directory: Copy Copied! cd $UTM_DEPLOYMENT_SCRIPTS Open and configure hostlist.txt Deploy and run TIs according to hostlist.txt and set these TIs to be monitored by UTM: Copy Copied! sudo ./examples/run_with_plugin.sh To stop and cleanup TIs setup and unset TIs to be monitored by UTM: Copy Copied! sudo ./examples/stop_with_plugin.sh Note This script does not stop UTM plugin!

In standalone mode, UTM periodically tracks fabric changes by itself and does not require UFM Enterprise.

Deploy via example scripts:

Change directory Copy Copied! cd $UTM_DEPLOYMENT_SCRIPTS Open and configure hostlist.txt Deploy and run TIs according to hostlist.txt and run UTM: Copy Copied! sudo ./examples/run_standalone.sh To stop and cleanup TIs setup and UTM, run: Copy Copied! sudo ./examples/stop_standalone.sh

This section provides detailed instructions for manually deploying UTM and managed TIs to ensure coverage of all potential corner cases where the convenience script may not be effective.

UTM can be started with two docker run commands.

Set utm_config , utm_data , utm_log , and utm_image variables. Initialize UTM config: Initialize UTM Collapse Source Copy Copied! docker run - v $utm_config:/config \ - v $utm_data:/data \ -- rm --name utm-init \ --device=/dev/infiniband/ \ $utm_image /init.sh Run UTM Run UTM Collapse Source Copy Copied! docker run -d --net=host \ --security-opt seccomp=unconfined --cap-add=SYS_ADMIN \ --device=/dev/infiniband/ \ - v $utm_config:/config \ - v $utm_data:/data \ - v $utm_log:/log \ -- rm --name utm $utm_image

TI can be represented either as a UFM Telemetry docker container or as a UFM Telemetry bring-up package.

To run the docker container in managed mode, launch_ibdiagnet_config.ini should have the following flags enabled:

Copy Copied! plugin_env_CLX_EXPORT_API_SHOW_STATISTICS= 1 plugin_env_UFM_TELEMETRY_MANAGED_MODE= 1

To run UFM Telemetry with Distributed Telemetry, enable its receiver and specify HCA to work on:

Copy Copied! plugin_env_CLX_EXPORT_API_RUN_DT_RECEIVER= 1 plugin_env_CLX_EXPORT_API_DT_RECEIVER_HCA=$HCA

To run bringup in managed mode, create enable_managed.ini file with the same flags and use custom_config option of collection_start :

Copy Copied! collection_start custom_config=./enable_managed.ini

The UTM configuration file utm_config.ini is placed under the configuration folder (which is referred to as UTM_CONFIG later on this document).

In the case of UFM plugin mode, UTM_CONFIG = /opt/ufm/files/conf/plugins/utm/ .

In the case of standalone mode, the default value is UTM_CONFIG =/tmp/managed_telemetry/utm/config and can be changed via --data-root argument of deployment script.

When changes are made to the configuration file, UTM initiates a restart of its main process to apply the updated configuration.

Users may wish to adjust timeout and update rate configurations based on their specific setups. However, it is important to note that the remaining configurations are tailored to enable UTM to function as a UFM plugin and should not be modified.

To enable distributed telemetry set dt_enable=1 in the corresponding section.

Note Distributed Telemetry requires Switch Telemetry docker image tagged as switch-telemetry:{version} and placed under $UTM_CONFIG/telem_files/ as switch-telemetry_{version}.tar.gz UTM scans this file at its start. Example deployment scripts handle it for both UFM plugin and standalone modes.

For more details refer to NVIDIA UFM Telemetry Documentation→ Distributed Telemetry - Switch Telemetry Agent