NVIDIA UFM Enterprise Quick Start Guide v6.11.1
NVIDIA UFM Enterprise Quick Start Guide v6.11.1

Installing UFM Server Software

The default UFM® installation directory is /opt/ufm.

UFM Server installation options are:

The following processes might be interrupted during the installation process:

  • httpd (apachi2 in Ubuntu)

  • dhcpd

Warning

To install UFM over static IPv4 configuration (instead of DHCP) please refer to "Appendix – Configuring UFM Over Static IPv4 Address" before installation.

After installation:

  1. Activate the software license

  2. Perform initial configuration

Warning

Before you run UFM, ensure that all ports used by the UFM server for internal and external communication are open and available. For the list of ports, see Used Ports in the UFM User Manual.

Verify that a supported version of Linux is installed on your machine. For details, see UFM System Requirements.

The following table lists the packages that must be installed on your machine (according to system OS) before you install the UFM server software.

In addition, ensure the following before you begin installation:

RedHat 7

RedHat 8

Ubuntu 18.04

Ubuntu 20.04

Ubuntu 22.04

acl

acl

acl

acl

acl

apr-util-openssl

apr-util-openssl

apache2

apache2

apache2

bc

bc

bc

bc

bc

cairo

conda >=4.12.0

chrpath

chrpath

chrpath

conda >=4.12.0

gnutls

conda >= 4.12.0

conda >= 4.12.0

conda >= 4.12.0

gnutls

httpd

cron

cron

cron

httpd

infiniband-diags

gawk

gawk

gawk

infiniband-diags

iptables

lftp

lftp

lftp

iptables

jansson

libcurl4

libcurl4

libcurl4

lftp

lftp

logrotate

logrotate

logrotate

libxml2

libmemcached

python3

python3

python3

libxslt

libnsl

python3-pip

python3-pip

python3-pip

mariadb

libxml2

python3-venv

python3-venv

python3.10-venv

mariadb-devel

libxslt

rsync

rsync

rsync

mariadb-server

mariadb

snmpd

snmpd

snmpd

mod_session

mariadb-server

sqlite3

sqlite3

sqlite3

mod_ssl

mod_session

sshpass

sshpass

sshpass

MySQL-python

mod_ssl

ssl-cert

ssl-cert

ssl-cert

net-snmp

net-snmp

sudo

sudo

sudo

net-snmp-libs

net-snmp-libs

supervisor

supervisor

supervisor

net-snmp-utils

net-snmp-utils

zip

zip

zip

net-tools

net-tools

pexpect

php

php

psmisc

pip3

python3.6

psmisc

python3-pip

pyOpenSSL

python3-virtualenv

python3

qperf

qperf

rsync

rsync

sqlite

sqlite

sshpass

sshpass

sudo

sudo

supervisor

supervisor

telnet

telnet

unixODBC

unixODBC

zip

  • The computer hostname is not defined as 127.0.0.1 and localhost is defined as 127.0.0.1.

  • The hostname must NOT appear on the loopback address line. An example of the loopback address is: 127.0.0.1 localhost.localdomain localhost.

  • Disable the firewall service (/etc/init.d/iptables stop), or ensure that the required ports are open (see the prerequisite script).

  • SELinux is disabled.

  • If more than one fabric is managed by different UFM instances, set up different management network spaces for each fabric (not the same LAN).

  • Uninstall any previously installed Subnet Manager from the UFM server machine.

  • MLNX_OFED 5.x version is installed prior to installing UFM.

  • As of UFM v.6.12.0, it is NOT mandatory to configure the IPoIB fabric interface with an IP address.
    In cases where the IP is configured, it is mandatory that the IP is permanently configured and that it starts automatically upon server reboot (the IPoIB fabric interface should be active even if the network is down).

    Warning

    The user can set a persistent IP address using Netplan (mainly for Ubuntu systems) or modifying the interface network script (RedHat systems).

  • The default MLNX_OFED installation includes opensm. Remove the MLNX_OFED opensm before UFM installation like the following examples:

    RedHat:

    Copy
    Copied!
                

    rpm -e opensm-3.3.9.MLNX_20111006_e52d5fc-0.1

    Ubuntu:

    Copy
    Copied!
                

    apt purge opensm

    By default, ib0 and eth0 are configured as primary access points for the UFM management. If different management and/or InfiniBand interfaces (including bond interfaces) are used as the primary access points, you should modify the configuration file by running the script /opt/ufm/scripts/change_fabric_config.sh as described in the section Configuring General Settings in gv.cfg.

    Change the UFM Agent interface to the Ethernet and/or IPoIB interfaces used for communication with UFM Agent:

    Copy
    Copied!
                

    ufma_interfaces = ib0,eth0

  • Reliable and high-capacity out-of-band IP connectivity between the UFM Primary and Secondary servers (1 Gb Ethernet is recommended). This connectivity is used for DRBD synchronization.

  • Format two identical servers with dedicated disk partitions for UFM replication. Since the UFM configuration file is replicated to the standby server, both master and standby servers must have the same interfaces.

  • Allocate exactly the same size partition on both servers (master and slave) for the replicated data. See UFM Server Requirements for the recommended partition size.
    Partitions should not be mounted and must be zeroed (the file system should not be installed on the partitions). For disk partitioning, see the Linux user manual (man fdisk).

  • We recommend establishing a passwordless SSH (via /root/.ssh/authorized_keys file) between the two servers before the installation.

  • In fabrics consisting of multiple tiers of switches, it is recommended that the management ports (ib0) of the primary and secondary UFM server be connected to different fabric switches on the same tier (the outermost edge in CLOS 5 designs).

    This is because by default, UFM manages the IB fabric via ib0, port 1 of the HCA. Failure or disconnect of ib0, the IB management port, causes a failure condition in UFM resulting in HA failover.
    When the management ports (ib0) of the primary and secondary UFM server are connected to the same switch, a failure of this switch will result in a disconnect of both UFMs from the fabric, and therefore UFM will not be able to manage the fabric.

Warning

Subnet Manager is running over the native InfiniBand layer, therefore bonding the IpoIB interfaces will not provide high availability. For additional information, please refer to section UFM Failover to Another Port.

The UFM installation includes the InfiniBand Performance Management module (IBPM). This module is responsible for reporting performance information back to UFM and upper layer applications. When available, this process is offloaded to the non-management port (default ib1) of the UFM server. Failure or disconnect of the non-management port (ib1) on the primary UFM server will not cause UFM to failover. By default, the UFM Health Monitoring process is configured to try to restart the IBPM. For more information, see UFM Health Configuration in the UFM User Manual.

To install the UFM server software as a standalone for InfiniBand:

  1. Create a temporary directory (for example /tmp/ufm).

  2. Open the UFM software zip file that you downloaded. The zip file contains the following installation files:

    • RedHat 7/CentOS 7/OEL 7: ufm-6.9-XXX.el7.x86_64.tgz

    • RedHat 8/Centos 8: ufm-X.X-XXX.el8.x86_64.tgz

    • Ubuntu 18.04: ufm-X.X-XXX.Ubuntu18.x86_64.tgz

    • Ubuntu 20.04: ufm-X.X-XXX.Ubuntu20.x86_64.tgz

  3. Extract the installation file for your system's OS to the temporary directory that you created.

  4. From within the temporary directory, run the following command as root:

    Copy
    Copied!
                

    ./install.sh

    Warning

    Running with the option "-o ib" is no longer required. For automatic installation, use the -q flag.

    For “quiet” installation -q flag can be added (automatically answer yes for each question the installer asks).

The UFM software is installed. You can now remove the temporary directory.

UFM can be installed in HA mode using an additional package for HA called UFM-HA.

Warning

UFM HA package requires a dedicated partition with the same name for DRBD on both servers. This guide uses /dev/sda5 as an example.

Warning

In UFM Enterprise appliance, the UFM HA package and related components (i.e. pacemaker and DRBD) are already deployed. Therefore, follow the below instructions from step 6 (Configure HA from the main server).

  1. On both servers, Install UFM Enterprise in SA mode.

    Warning

    Do not start UFM service.

  2. Install the latest pcs and drbd-utils drivers on both servers.

    For Ubuntu:

    Copy
    Copied!
                

    apt install pcs pacemaker drbd-utils

    For CentOS/Red Hat:

    Copy
    Copied!
                

    yum install pcs pacemaker drbd84-utils kmod-drbd84

    OR

    Copy
    Copied!
                

    yum install pcs pacemaker drbd90-utils kmod-drbd90

  3. Download UFM-HA latest package from this link.

  4. Extract the downloaded UFM-HA package on both servers under /tmp/.

  5. Go to the directory you extracted /tmp/ufm_ha_XXX and run the installation script. For example:

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -d /dev/sda5 -p enterprise

    Option

    Description

    -l

    DRBD Files Location. Must be always /opt/ufm/files/

    -d

    DIsk name for DRBD. For example /dev/sda5

    -p

    Product Name. Must use “enterprise” to UFM Enterprise

  6. Configure HA from the main server using the following command:

    Copy
    Copied!
                

    configure_ha_nodes.sh --cluster-password 123456 --main-hostname ufm-host01 --main-ip 192.168.10.1 --main-sync-interface enp2s0f0 --standby-hostname ufm-host02 --standby-ip 192.168.10.2 --standby-sync-interface enp2s0f0 --virtual-ip 192.168.10.5

    configure_ha_nodes.sh will require SSH connection to the standby server. If SSH is not configured then you will be prompted to enter the password during configuration runtime.

    Option

    Description

    --cluster-password

    UFM HA cluster password for authentication by pacemaker.

    --main-hostname

    Master (main) server hostname

    --main-ip

    Master (main) server IP address

    --main-sync-interface

    Port name (interface) on master (main) server that will be used in DRBD sync

    --standby-hostname

    Standby server hostname

    --standby-ip

    Standby server IP address

    --standby-sync-interface

    Port name (interface) on standby server that will be used in DRBD sync

    --virtual-ip

    UFM HA cluster Virtual IP

    --no-vip

    Configure HA without virtual IP

  7. You must wait until after configuration for DRBD sync to finish depending on the size of your partition.

  8. To start UFM HA cluster:

    Copy
    Copied!
                

    ufm_ha_cluster start

  9. To check UFM HA cluster status:

    Copy
    Copied!
                

    ufm_ha_cluster status

To stop UFM HA cluster:

Copy
Copied!
            

ufm_ha_cluster stop

To uninstall UFM HA, first, stop the cluster and then run the ufm_ha uninstallation script as follows:

Copy
Copied!
            

/opt/ufm/ufm_ha/uninstall_ha.sh

To replace the standby server with a new one, run:

Copy
Copied!
            

./replace_ha_nodes --cluster-password 123456 --main-hostname ufm-host01 --main-ip 192.168.10.1 --main-sync-interface enp2s0f0 --standby-hostname ufm-host02 --standby-ip 192.168.10.2 --standby-sync-interface enp2s0f0

Warning

The replace_ha_nodes.sh command requires SSH trust established between HA nodes.

Option

Description

--cluster-password

UFM HA cluster password for authentication by pacemaker.

--main-hostname

Master (main) server hostname

--main-ip

Master (main) server IP address

--main-sync-interface

Port name (interface) on master (main) server that will be used in DRBD sync

--standby-hostname

New Standby server hostname

--standby-ip

New Standby server IP address

--standby-sync-interface

Port name (interface) on standby server that will be used in DRBD sync

--help

Show the options

UFM can be deployed as a docker container. For further information, please refer to the UFM Enterprise Docker Installation .

© Copyright 2023, NVIDIA. Last updated on Sep 8, 2023.