NVIDIA UFM Enterprise User Manual v6.12.1
NVIDIA UFM Enterprise User Manual v6.12.1

Docker Installation

  • MLNX_OFED must be installed on the server that will run UFM Docker

  • For UFM to work, you must have an InfiniBand port configured with an IP address and in "up" state.

    Warning

    For InfiniBand support, please refer to NVIDIA Inbox Drivers , or MLNX_OFED guides.

  • Make sure to stop the following services before running UFM Docker container, as it utilizes the same default ports that they do: Pacemaker, httpd, OpenSM, and Carbon.

  • If firewall is running on the host, please make sure to add an allow rule for UFM used ports (listed below):

    Warning

    If the default ports used by UFM are changed in UFM configuration files, make sure to open the modified ports on the host firewall.

    • 80 (TCP) and 443 (TPC) are used by WS clients (Apache Web Server)

    • 8000 (UDP) is used by the UFM server to listen for REST API requests (redirected by Apache web server)

    • 6306 (UDP) is used for multicast request communication with the latest UFM Agents

    • 8005 (UDP) is used as a UFM monitoring listening port

    • 8888 (TCP) is used by DRBD to communicate between the UFM Primary and Standby servers

    • 2022 (TCP) is used for SSH

  • Supported versions for upgrade are UFM v.6.7.0 and above.

  • UFM files directory from previous container version mounted on the host.

To load the UFM docker image, pull the latest image from docker hub:

Copy
Copied!
            

docker pull mellanox/ufm-enterprise:latest

Warning

You can see full usage screen for ufm-installation by running the container with -h or -help flag:

Copy
Copied!
            

docker run --rm mellanox/ufm-enterprise-installer:latest -h

Installation Command Usage

Copy
Copied!
            

docker run -it --name=ufm_installer --rm \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /etc/systemd/system/:/etc/systemd_files/ \ -v [UFM_FILES_DIRECTORY]:/installation/ufm_files/ \ -v [LICENSE_DIRECTORY]:/installation/ufm_licenses/ \ mellanox/ufm-enterprise:latest \ --install [OPTIONS]

Modify the variables in the installation command as follows:

  • [UFM_FILES_DIRECTORY]: A directory on the host to mount UFM configuration files.

    Warning

    UFM_FILES_DIRECTORY must have read/write permissions for other users because UFM needs write data during runtime.

    Warning

    Example: If you want UFM files on the host to be under /opt/ufm/files/ you must set this volume to be: -v /opt/ufm/files/:/installation/ufm_files/

  • [UFM_LICENSES_DIR]: UFM license file or files location.

    Warning

    Example: If your license file or files are located under /downloads/ufm_license_files/ then you must set this volume to be -v /downloads/ufm_license_files/:/installation/ufm_licenses/

  • [OPTIONS]: UFM installation options. For more details see the table below.

Command Options

Flag

Description

Default Value

-f | --fabric-interface

IB fabric interface name.

ib0

-g | --mgmt-interface

Management interface name.

eth0

-h | --help

Show help

N/A

UFM Enterprise installer supports several deployment modes:

Stand Alone (SA) Installation

  1. Create a directory on the host to mount and sync UFM Enterprise files with read/write permissions. For example: / opt/ufm_files/.

  2. Copy only your UFM license file(s) to a temporary directory which we’re going to use in the installation command. For example: /tmp/license_file/

  3. Run the UFM installation command according to the following example which will also configure UFM fabric interface to be ib1:

    Copy
    Copied!
                

    docker run -it --name=ufm_installer --rm \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /etc/systemd/system/:/etc/systemd_files/ \ -v /opt/ufm/files/:/installation/ufm_files/ \ -v /tmp/license_file/:/installation/ufm_licenses/ \ mellanox/ufm-enterprise:latest \ --install \ --fabric-interface ib1

    Warning

    The values below can be updated in the command to your needs:

    • /opt/ufm/files/

    • /tmp/license_file/

    • For example, if you want UFM files to be mounted in another location on your server, create that directory and replace the path in the command.

  4. Reload system

    Copy
    Copied!
                

    systemctl daemon-reload

  5. To Start UFM Enterprise service run:

    Copy
    Copied!
                

    systemctl start ufm-enterprise

High Availability

Pre-deployments requirements

  • Install pacemaker, pcs, and drbd-utils on both servers

  • A partition for DRBD on each server (with the same name on both servers) such as /dev/sdd1. Recommended partition size is 10-20 GB, otherwise DRBD sync will take a long time to complete.

  • CLI command hostname -i must return the IP address of the management interface used for pacemaker sync correctly (update /etc/hosts/ file with machine IP)

  • Create the directory on each server under /opt/ufm/files/ with read/write permissions on each server. This directory will be used by UFM to mount UFM files, and it will be synced by DRBD.

Installing UFM Containers

On the main server, install UFM Enterprise container with the command below:

Copy
Copied!
            

docker run -it --name=ufm_installer --rm \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /etc/systemd/system/:/etc/systemd_files/ \ -v /opt/ufm/files/:/installation/ufm_files/ \ -v /tmp/license_file/:/installation/ufm_licenses/ \ mellanox/ufm-enterprise:latest \ --install

On each the standby (secondary) server, install UFM Enterprise container like the following example with the command below:

Copy
Copied!
            

docker run -it --name=ufm_installer --rm \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /etc/systemd/system/:/etc/systemd_files/ \ -v /opt/ufm/files/:/installation/ufm_files/ \ mellanox/ufm-enterprise:latest \ --install


Downloading UFM HA Package

Download the UFM-HA package on both servers using the following command:

Copy
Copied!
            

wget https://www.mellanox.com/downloads/UFM/ufm_ha_5.0.1-2.tgz


Installing UFM HA Package

  1. [On Both Servers] Extract the downloaded UFM-HA package under /tmp/

  2. [On Both Servers] Go to the extracted directory /tmp/ufm_ha_XXX and run the installation script:

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -d /dev/sda5 -p enterprise

    Option

    Description

    -l

    Location For DRBD. Please always use /opt/ufm/files/

    -d

    Partition (disk) name for DRBD

    -p

    Product Name. For UFM Enterprise this must always be “enterprise”

Configuring UFM HA

There are two methods to configure the HA cluster:

Configure HA with SSH Trust

  1. On the master server only, configure the HA nodes. To do so, from /tmp, run the configure_ha_nodes.sh command as shown in the below example

    Copy
    Copied!
                

    configure_ha_nodes.sh --cluster-password 12345678 --master-ip 192.168.10.1 --standby-ip 192.168.10.2 --virtual-ip 192.168.10.5 

    Warning

    The script configure_ha_nodes.sh is is located under /usr/local/bin/, therefore, by default, you do not need to use the full path to run it.

    Warning

    The --cluster-password must be at least 8 characters long.

    Warning

    When using back-to-back ports with local IP addresses for HA sync interfaces, ensure that you add your IP addresses and hostnames to the /etc/hosts file. This is needed to allow the HA configuration to resolve hostnames correctly based on the IP addresses you are using.

    Warning

    configure_ha_nodes.sh requires SSH connection to the standby server. If SSH trust is not configured, then you are prompted to enter the SSH password of the standby server during configuration runtime

    Option

    Description

    --cluster-password

    UFM HA cluster password for authentication by the pacemaker.

    --master-ip

    Master (main) server IP address

    --standby-ip

    Standby server IP address

    --virtual-ip OR --no-vip

    UFM HA cluster Virtual IP or configure HA without virtual IP

  2. Depending on the size of your partition, wait for the configuration process to complete and DRBD sync to finish.

Configure HA without SSH Trust

If you cannot establish an SSH trust between your HA servers, you can use ufm_ha_cluster directly to configure HA. You can see all the options for configuring HA in the Help menu:

Copy
Copied!
            

ufm_ha_cluster config -h

Usage:

Copy
Copied!
            

ufm_ha_cluster config [<options>] 

Option

Description

-r

--role <node role>

Node role (master or standby).

-e

--peer-ha-ip <ip address>

Peer node sync IP address (mandatory).

-l

--local-ha-ip <ip address>

Local node sync IP address (mandatory).

-i

--virtual-ip <virtual-ip>

Cluster virtual IP (should be used for master only)

-p

--hacluster-pwd <pwd>

HA cluster user password.

-h

--help

Show this message

-N

--no-vip

Configure HA without virtual IP

To configure HA, follow the below instructions:

Warning

Please change the variables in the commands below based on your setup.

  1. [On Both Servers] Run the following command to set the cluster password:

    Copy
    Copied!
                

    ufm_ha_cluster set-password –p <cluster_password> 

  2. [On Standby Server] Run the following command to configure Standby Server:

    Copy
    Copied!
                

    ufm_ha_cluster config -r standby -e <peer ip address> -l <local ip address> -p <cluster_password>

  3. [On Master Server] Run the following command to configure Master Server:

    Copy
    Copied!
                

    ufm_ha_cluster config -r master -e <peer ip address> -l <local ip address> -p <cluster_password> -i <virtual ip address>

Starting HA Cluster

  • To start UFM HA cluster:

    Copy
    Copied!
                

     ufm_ha_cluster start 

  • To check UFM HA cluster status:

    Copy
    Copied!
                

    ufm_ha_cluster status 

  • To stop UFM HA cluster:

    Copy
    Copied!
                

    ufm_ha_cluster stop 

  • To uninstall UFM HA, first stop the cluster and then run the uninstallation command as follows:

    Copy
    Copied!
                

    /opt/ufm/ufm_ha/uninstall_ha.sh

Warning

Upgrade the UFM container based on the existing UFM configuration files that are mounted on the server. It is important to use that same directory as a volume for the UFM installation command.
In the below example /opt/ufm_files is used.

Upgrading UFM Container in SA Mode

  1. Stop the UFM Enterprise service. Run:

    Copy
    Copied!
                

    systemctl stop ufm-enterprise

  2. Remove the old docker image. Run:

    Copy
    Copied!
                

    docker rmi mellanox/ufm-enterprise:latest

  3. Load the new UFM docker image. Run:

    Copy
    Copied!
                

    docker pull mellanox/ufm-enterprise:latest

  4. Run the docker upgrade command:

    Copy
    Copied!
                

    docker run -it --name=ufm_installer --rm \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /etc/systemd/system/:/etc/systemd_files/ \ -v /opt/ufm/files/:/opt/ufm/shared_config_files/ \ mellanox/ufm-enterprise:latest --upgrade

  5. Reload system manager configuration:

    Copy
    Copied!
                

    systemctl daemon-reload

  6. Start UFM Enterprise service:

    Copy
    Copied!
                

    systemctl start ufm-enterprise

Upgrading UFM Container in HA Mode

  1. Stop HA Cluster on the master node. Run:

    Copy
    Copied!
                

    ufm_ha_cluster stop

  2. Remove the old docker image from both servers. Run:

    Copy
    Copied!
                

    docker rmi mellanox/ufm-enterprise:latest

  3. Load the new docker image on both servers. Run:

    Copy
    Copied!
                

    docker pull mellanox/ufm-enterprise:latest

  4. Run the docker command to upgrade UFM on the master node. Run:

    Copy
    Copied!
                

    docker run -it --name=ufm_installer --rm \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /etc/systemd/system/:/etc/systemd_files/ \ -v /opt/ufm/files/:/opt/ufm/shared_config_files/ \ mellanox/ufm-enterprise:latest --upgrade

  5. Download and extract the latest UFM HA package. Run

    Copy
    Copied!
                

    wget https://www.mellanox.com/downloads/UFM/ufm_ha_5.0.1-2.tgz

  6. Install the extracted UFM HA package:

    Warning

    In the below command, please modify the partition name based on the already configured DRBD partition.

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -d /dev/sda5 -p enterprise

  7. Start UFM HA cluster. Run:

    Copy
    Copied!
                

    ufm_ha_cluster start

To open UFM WEB UI, open the following URL in your browser: http://[SERVER_IP]/ufm/ and type the default credentials.

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.