NVIDIA UFM Enterprise Quick Start Guide v6.10.0

Installing UFM Server Software

The default UFM® installation directory is /opt/ufm.

UFM Server installation options are:

The following processes might be interrupted during the installation process:

  • httpd (apachi2 in Ubuntu)

  • dhcpd

Warning

To install UFM over static IPv4 configuration (instead of DHCP) please refer to "Appendix – Configuring UFM Over Static IPv4 Address" before installation.

After installation:

  1. Activate the software license

  2. Perform initial configuration

Warning

Before you run UFM, ensure that all ports used by the UFM server for internal and external communication are open and available. For the list of ports, see Used Ports in the UFM User Manual.

Verify that a supported version of Linux is installed on your machine. For details, see UFM System Requirements.

The following table lists the packages that must be installed on your machine (according to system OS) before you install the UFM server software.

RedHat 7

RedHat 8

Ubuntu 18.04

Ubuntu 20.04

psmisc

psmisc

sqlite3

sqlite3

bc

bc

apache2

apache2

httpd

httpd

ssl-cert

ssl-cert

mariadb

mariadb

libcurl4

libcurl4

mariadb-server

mariadb-server

snmpd

snmpd

mariadb-devel

php

cron

cron

php

net-snmp

logrotate

logrotate

MySQL-python

net-snmp-libs

chrpath

chrpath

net-snmp

net-snmp-utils

bc

bc

net-snmp-libs

mod_ssl

acl

acl

net-snmp-utils

iptables

supervisor

supervisor

mod_ssl

libnsl

python3-virtualenv

python3.9

iptables

telnet

python3-venv

python3.9-venv

pexpect

libxml2

gawk

gawk

telnet

libxslt

sshpass

sshpass

pyOpenSSL

unixODBC

lftp

lftp

libxml2

infiniband-diags

libibumad3(>= 52mlnx1-1.53105)

libibumad3(>= 52mlnx1-1.53105)

libxslt

sudo

libibverbs1(>= 52mlnx1-1.53105)

libibverbs1(>= 52mlnx1-1.53105)

unixODBC

gnutls

librdmacm1(>= 28.0)

librdmacm1(>= 28.0)

infiniband-diags

qperf

rdma-core(>= 28.0)

rdma-core(>= 28.0)

cairo

sqlite

sudo

mod_session

gnutls

apr-util-openssl

qperf

net-tools

sqlite

jansson

mod_session

libmemcached

apr-util-openssl

python3-virtualenv

net-tools

supervisor

supervisor

sshpass

sshpass

lftp

lftp

librdmacm( >= 28.0)

librdmacm( >= 28.0)

rdma-core( >= 28.0)

rdma-core( >= 28.0)

mft

mft

python3-pip

pip3

In addition, ensure the following before you begin installation:

  • The computer hostname is not defined as 127.0.0.1 and localhost is defined as 127.0.0.1.

  • The hostname must NOT appear on the loopback address line. An example of the loopback address is: 127.0.0.1 localhost.localdomain localhost.

  • Disable the firewall service (/etc/init.d/iptables stop), or ensure that the required ports are open (see the prerequisite script).

  • SELinux is disabled.

  • If more than one fabric is managed by different UFM instances, set up different management network spaces for each fabric (not the same LAN).

  • Uninstall any previously installed Subnet Manager from the UFM server machine.

  • MLNX_OFED 1.5.x version (NVIDIA or MLNX_OFED community) is installed prior to installing UFM with ib0 and/or ib1 interface up and running.

  • The default MLNX_OFED installation includes opensm. Remove the MLNX_OFED opensm before UFM installation by running rpm -e, for example:

    RedHat:

    Copy
    Copied!
                

    rpm -e opensm-3.3.9.MLNX_20111006_e52d5fc-0.1

    Ubuntu:

    Copy
    Copied!
                

    apt purge opensm

    By default, ib0 and eth0 are configured as primary access points for the UFM management. If different management and/or InfiniBand interfaces (including bond interfaces) are used as the primary access points, you should modify the configuration file by running the script /opt/ufm/scripts/change_fabric_config.sh as described in the section Configuring General Settings in gv.cfg.

    Change the UFM Agent interface to the Ethernet and/or IPoIB interfaces used for communication with UFM Agent:

    Copy
    Copied!
                

    ufma_interfaces = ib0,eth0

  • An IP address is defined for the local Ethernet interface and for the InfiniBand interface (/etc/sysconfig/network-scripts/ifcfg-eth0 and ifcfg-ib0 and/or ifcfg-ib1).

  • Reliable and high-capacity out-of-band IP connectivity between the UFM Primary and Secondary servers (1 Gb Ethernet is recommended). This connectivity is used for DRBD synchronization.

  • Format two identical servers with dedicated disk partitions for UFM replication. Since the UFM configuration file is replicated to the standby server, both master and standby servers must have the same interfaces.

  • Allocate exactly the same size partition on both servers (master and slave) for the replicated data. See UFM Server Requirements for the recommended partition size.
    Partitions should not be mounted and must be zeroed (the file system should not be installed on the partitions). For disk partitioning, see the Linux user manual (man fdisk).

  • We recommend establishing a passwordless SSH (via /root/.ssh/authorized_keys file) between the two servers before the installation.

  • In fabrics consisting of multiple tiers of switches, it is recommended that the management ports (ib0) of the primary and secondary UFM server be connected to different fabric switches on the same tier (the outermost edge in CLOS 5 designs).

    This is because by default, UFM manages the IB fabric via ib0, port 1 of the HCA. Failure or disconnect of ib0, the IB management port, causes a failure condition in UFM resulting in HA failover.
    When the management ports (ib0) of the primary and secondary UFM server are connected to the same switch, a failure of this switch will result in a disconnect of both UFMs from the fabric, and therefore UFM will not be able to manage the fabric.

Warning

Subnet Manager is running over the native InfiniBand layer, therefore bonding the IpoIB interfaces will not provide high availability. For additional information, please refer to section UFM Failover to Another Port.

The UFM installation includes the InfiniBand Performance Management module (IBPM). This module is responsible for reporting performance information back to UFM and upper layer applications. When available, this process is offloaded to the non-management port (default ib1) of the UFM server. Failure or disconnect of the non-management port (ib1) on the primary UFM server will not cause UFM to failover. By default, the UFM Health Monitoring process is configured to try to restart the IBPM. For more information, see UFM Health Configuration in the UFM User Manual.

To install the UFM server software as a standalone for InfiniBand:

  1. Create a temporary directory (for example /tmp/ufm).

  2. Open the UFM software zip file that you downloaded. The zip file contains the following installation files:

    • RedHat 7/CentOS 7/OEL 7: ufm-6.9-XXX.el7.x86_64.tgz

    • RedHat 8/Centos 8: ufm-X.X-XXX.el8.x86_64.tgz

    • Ubuntu 18.04: ufm-X.X-XXX.Ubuntu18.x86_64.tgz

    • Ubuntu 20.04: ufm-X.X-XXX.Ubuntu20.x86_64.tgz

  3. Extract the installation file for your system's OS to the temporary directory that you created.

  4. From within the temporary directory, run the following command as root:

    Copy
    Copied!
                

    ./install.sh

    Warning

    Running with the option "-o ib" is no longer required. For automatic installation, use the -q flag.

    For “quiet” installation -q flag can be added (automatically answer yes for each question the installer asks).

The UFM software is installed. You can now remove the temporary directory.

UFM can be installed in HA mode using an additional package for HA called UFM-HA.

Warning

UFM HA package requires a dedicated partition with the same name for DRBD on both servers. This guide uses /dev/sda5 as an example.

Warning

In UFM Enterprise appliance, the UFM HA package and related components (i.e. pacemaker and DRBD) are already deployed. Therefore, follow the below instructions from step 6 (Configure HA from the main server).

  1. On both servers, Install UFM Enterprise in SA mode.

    Warning

    Do not start UFM service.

  2. Install the latest pcs and drbd-utils drivers on both servers.

    For Ubuntu:

    Copy
    Copied!
                

    apt install pcs drbd-utils

    For CentOS/Red Hat:

    Copy
    Copied!
                

    yum install pcs drbd84-utils kmod-drbd84

  3. Download UFM-HA latest package from this link.

  4. Extract the downloaded UFM-HA package on both servers under /tmp/.

  5. Go to the directory you extracted /tmp/ufm_ha_XXX and run the installation script:

    Copy
    Copied!
                

    ./install.sh -f /opt/ufm/files/ -p /dev/sda5

    Option

    Description

    -f

    UFM Enterprise files directory

    -p

    Partition name for DRBD

  6. Configure HA from the main server using the following command:

    Copy
    Copied!
                

    configure_ha_nodes.sh --cluster-password 123456 --main-hostname ufm-host01 --main-ip 192.168.10.1 --main-sync-interface enp2s0f0 --standby-hostname ufm-host02 --standby-ip 192.168.10.2 --standby-sync-interface enp2s0f0 --virtual-ip 192.168.10.5

    Warning

    configure_ha_nodes.sh will require SSH connection to the standby server. If SSH is not configured then you will be prompted to enter the password during configuration runtime.

    Option

    Description

    --cluster-password

    UFM HA cluster password for authentication by pacemaker.

    --main-hostname

    Master (main) server hostname

    --main-ip

    Master (main) server IP address

    --main-sync-interface

    Port name (interface) on master (main) server that will be used in DRBD sync

    --standby-hostname

    Standby server hostname

    --standby-ip

    Standby server IP address

    --standby-sync-interface

    Port name (interface) on standby server that will be used in DRBD sync

    --virtual-ip

    UFM HA cluster Virtual IP

  7. You must wait until after configuration for DRBD sync to finish depending on the size of your partition.

  8. To start UFM HA cluster:

    Copy
    Copied!
                

    ufm_ha_cluster start

  9. To check UFM HA cluster status:

    Copy
    Copied!
                

    ufm_ha_cluster status

To stop UFM HA cluster:

Copy
Copied!
            

ufm_ha_cluster stop

To uninstall UFM HA, first, stop the cluster and then run the ufm_ha uninstallation script as follows:

Copy
Copied!
            

/opt/ufm/ufm_ha/uninstall_ha.sh

UFM can be deployed as a docker container. For further information, please refer to the UFM Enterprise Docker Installation User Manual .

  1. Before starting the UFM software, copy your license file(s) downloaded from Mellanox’s Licensing and Downloading Site (volt-ufm-<serial-number>.lic) to the master server under the /opt/ufm/files/licenses directory. We recommend that you back up the license file(s).
    In High Availability mode, the license files are replicated to the standby machine automatically. Your software is now activated.

  2. Run the UFM software as described in the following sections.

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.