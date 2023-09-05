Deploying UFM Telemetry can be done in the following three modes:

NVIDIA® UFM® Telemetry can be obtained as a tarball for installation on a Linux machine with all prerequisites installed.

To deploy the UFM Telemetry:

Ensure the following prerequisites are installed: Python3 Python3-venv Supervisor Copy the tarball package to the target location. Extract package. Copy Copied! tar -xf collectx-<version>.tar.gz Initialize and configure. Copy Copied! ./bin/initialize_telemetry.sh --telemetry-dir /tmp/ufm_telemetry --config "hca=mlx5_0;sample_rate=300;data_dir=/tmp/clx_data;plugin_env_CLX_FILE_WRITE_ENABLED=1" Warning This collects port counter data every 5 minutes, and uses HCA mlx5_0 and writes data to /tmp/clx_data. Start data collection. Copy Copied! supervisord --config /tmp/ufm_telemetry/conf/supervisord.conf

Ensure the following prerequisites are installed: Python3 Python3-venv Supervisor Copy the tarball package to the target location. Extract package. Copy Copied! tar -xf collectx-<version>.tar.gz Initialize and configure. Copy Copied! ./bin/initialize_telemetry.sh --telemetry-dir /tmp/ufm_telemetry --config "hca=mlx5_0;sample_rate=300;data_dir=/tmp/clx_data;plugin_env_CLX_FILE_WRITE_ENABLED=1" --gen_systemd_service Warning This collects port counter data every 5 minutes, and uses HCA mlx5_0 and writes data to /tmp/clx_data. Download UFM-HA Package on both servers from this link. Extract the HA package to /tmp/, and from there, run the installation command on both servers as follows: Warning In the below commands, "disk", the partition name, is assumed as /dev/sda4. Copy Copied! ./install -l /opt/ufm-telemetry/ -d /dev/sda4 -p telemetry Run the UFM-HA configuration command ONLY on the master server, as follows: Copy Copied! configure_ha_nodes.sh \ --cluster-password 12345678 \ --master-ip 192.168 . 10.1 \ --standby-ip 192.168 . 10.2 \ --virtual-ip 192.168 . 10.5 Warning The cluster-password must be at least 8 characters long. Warning Change the values of in the above command with your server' information. Start UFM Telemetry HA cluster. Run: Copy Copied! ufm_ha_cluster start

To check the status of your UFM Telemetry HA cluster, run:

Copy Copied! ufm_ha_cluster status

To perform failover, run:

Copy Copied! ufm_ha_cluster failover

To perform takeover, run:

Copy Copied! ufm_ha_cluster takeover

NVIDIA UFM Telemetry is packaged as a docker image that should be loaded and deployed on a Linux machine with docker installed. This section describes how to deploy the UFM Telemetry docker image on a Linux machine.

To deploy the UFM telemetry, perform the following steps:

Make sure that docker is installed on the Linux machine. Copy Copied! [root@r-ufm ~]# docker –version Start the docker service. Copy Copied! [root@r-ufm ~]# sudo service docker start Pull the image. Copy Copied! [root@r-ufm ~]# export image=mellanox/ufm-telemetry:<version> [root@r-ufm ~]# sudo docker pull $image Create the default .ini files and place them in the local directory mapped to /config in the container and initialize the container configuration. Copy Copied! root@r-ufm ~]# sudo docker run -v /opt/ufm-telemetry/conf:/config --rm -d $image /get_collectx_configs.sh "sample_rate=300;hca=mlx5_0;cable_info_schedule=1/00:00,3/00:00,5/00:00" Warning This collects port counter data every 5 minutes and uses HCA mlx5_0. It also collects cable info on the 1st, 3rd, and 5th day of the week at midnight, where: sample_rate: Frequency of collecting port counters

hca: Card to use

cable_info_schedule: Time of collecting cable info data (optional) Create a container of UFM telemetry. Copy Copied! root@r-ufm ~]# sudo docker run --net=host --uts=host --ipc=host \ --ulimit stack=67108864 --ulimit memlock=-1 \ --security-opt seccomp=unconfined --cap-add=SYS_ADMIN \ --device=/dev/infiniband/ -v "/opt/ufm-telemetry/conf:/config" -v "/tmp/data:/data" -v "/opt/ufm/files/licenses:/opt/ufm/files/licenses/" --rm --name ufm-telemetry -d $image Verify that UFM Telemetry is running. Make sure the UFM Telemetry container is up. Copy Copied! [root@r-ufm ~]# docker ps If the container name exists, access the shell of the container. Copy Copied! [root@r-ufm ~]# docker exec -it ufm-telemetry bash Review your configurations under /config/launch_ibdiagnet_config.ini . View the UFM Telemetry configuration files. Copy Copied! root@ r-ufm ~]# ls -l /config/ -rw-r--r-- 1 3478 101 396 Apr 15 21:04 clx_config.ini -rw-r--r-- 1 3478 101 2987 Apr 15 21:04 collectx.ini -rw-r--r-- 1 3478 101 4257 Apr 15 21:04 launch_ibdiagnet_config.ini -rw-r--r-- 1 3478 101 1912 Apr 16 12:03 supervisord.conf To watch and review the execution of the various components, you can check the log files under /var/log . Each component has a dedicated log file. Running the "ls -l" command will display all files under the folder. The following output shows only the relevant log files (other files have been omitted). Copy Copied! [root@r-ufm ~]# ls -l /var/log -rw-r--r-- 1 root root 128393 Apr 3 10:49 launch_cableinfo.log -rw-r--r-- 1 root root 467 Apr 3 09:35 launch_compression.log -rw-r--r-- 1 root root 194566 Apr 3 10:49 launch_ibdiagnet.log -rw-r--r-- 1 root root 798 Apr 3 09:35 launch_retention.log -rw-r--r-- 1 root root 1729 Apr 3 09:56 supervisord.log To exit the UFM Telemetry docker context, run "exit" to return to the Linux machine context. To access the UFM Telemetry CLI, run the following command on the Linux machine: Copy Copied! [root@r-ufm ~]# docker exec -it ufm-telemetry clxcli For settings and configuration instructions, see Settings and Configuration.

Requirements:

An important requirement for the HA solution is to prepare a dedicated partition for DRBD to work with. Example of such a requirement: /dev/sda4.

Install pcs and drbd-utils on both servers (using “ yum ” or “ apt-get install ”, based on your OS.

Warning On RH/CentOS, please run “yum install pcs drbd84-utils kmod-drbd84.

Procedure:

Load (pull) the latest UFM Telemetry Docker image on both servers. Copy Copied! docker pull mellanox/ufm-telemetry:latest Run the Telemetry configuration command on both servers. Copy Copied! docker run --rm -i --name=config-telemetry \ -v /opt/ufm-telemetry/conf:/config \ -v /etc/systemd/system:/etc/systemd/system \ -v /var/run/docker.sock:/var/run/docker.sock \ mellanox/ufm-telemetry:latest \ /get_collectx_configs.sh \ --gen_service \ --config=ufm_telemetry Refresh systemd on both servers: Copy Copied! systemctl daemon-reload Create the /opt/ufm-telemetry/licenses/ directory on the master server and copy the UFM Telemetry license file there. Download UFM-HA Package on both servers from this link. Extract the HA package to /tmp/, and from there, run the installation command on both servers as follows: Warning In the below commands, "disk", the partition name, is assumed as /dev/sda4. Copy Copied! ./install -l /opt/ufm-telemetry/ -d /dev/sda4 -p telemetry Run the UFM-HA configuration command ONLY on the master server, as follows: Copy Copied! configure_ha_nodes.sh \ --cluster-password 12345678 \ --master-ip 192.168 . 10.1 \ --standby-ip 192.168 . 10.2 \ --virtual-ip 192.168 . 10.5 Warning The cluster-password must be at least 8 characters long. Warning Change the values of in the above command with your server' information. Start UFM Telemetry HA cluster. Run: Copy Copied! ufm_ha_cluster start

