NVIDIA UFM Enterprise Quick Start Guide v6.14.1
NVIDIA UFM Enterprise Quick Start Guide v6.14.1

Upgrading UFM on Bare Metal Server

You can upgrade the UFM standalone server software for InfiniBand from the previous UFM version.

To upgrade the UFM server software:

  1. Create a temporary directory (for example /tmp/ufm).

  2. Open the UFM software zip file that you downloaded. The zip file contains the following installation files for:

    • RedHat 7/CentOS 7/OEL 7: ufm-X.X -XXX.el7.x86_64.tgz

    • RedHat 8/CentOS 8/OEL 8: ufm-X.X -XXX.el8.x86_64.tgz

    • Ubuntu 18.04: ufm-X.X -XXX.ubuntu18.x86_64.tgz

    • Ubuntu 20.04: ufm-X.X -XXX.ubuntu20.x86_64.tgz

    • Ubuntu 22.04: ufm-X.X-XXX.ubuntu22.x86_64.tgz

  3. Extract the installation file for your system's OS to the temporary directory that you created.

  4. Stop the UFM server. Run:

    Copy
    Copied!
                

    systemctl stop ufm-enterprise

  5. From within the temporary directory, run the following command as root:

    Copy
    Copied!
                

    ./upgrade.sh

    Warning

    A configuration backup ZIP file will be created in the running directory (e.g. /tmp/ufm). The backup file name is ufm_X.X.X_bkp.zip (X.X.X is the previous version).

    1. Upgrade from the previous version: the existing UFM data and configuration are preserved.

    2. In case upgrade.sh script stops before completion (e.g. missing prerequisite), the upgrade procedure can be resumed by fixing the issue (e.g. installing missing prerequisite) and rerunning ./upgrade.sh again.

  6. Restart the UFM server. Run:

    Copy
    Copied!
                

    systemctl start ufm-enterprise.service

    Warning

    /etc/init.d/ufmd start - Available for backward compatibility.

  7. After the upgrade, remove the temporary directory

Warning

As of UFM version 6.14.0, UFM upgrade on HA supports in-service upgrade, meaning UFM can continue running during the steps of the upgrade, and there is no need to stop UFM before the upgrade (although this is also supported).

You can upgrade the UFM server HA software for InfiniBand from the previous release. The upgrade is performed on both servers.

To upgrade the UFM server software:

  1. On the standby server, extract the new UFM Enterprise package to the /tmp folder:

    Copy
    Copied!
                

    tar -xzf ufm-X.X.X-XXXXX.tgz -C /tmp

  2. On the standby server, enter to the installation folder and upgrade script:

    Copy
    Copied!
                

    standby# cd /tmp/ ufm-X.X.X-X.<OS_NAME>.x86_64.mofed5/

  3. Run the UFM upgrade script on the standby server:

    Copy
    Copied!
                

    ./upgrade.sh

  4. After the completion of the upgrade script, the UFM code will undergo an upgrade, while the UFM data will remain unchanged. The automatic upgrade of UFM data will take place during the next startup of UFM. To initiate this process, execute a failover from the Master node (or perform a takeover from the Standby node).

    Copy
    Copied!
                

    master# ufm_ha_cluster failover

    Warning

    UFM will log the data upgrade to the syslog of the server, in case of issue a backup of the UFM data is saved prior to the upgrade in /opt/ufm/BACKUP directory and can be restored. For more information, refer to Appendix – Restoring UFM Data.

  5. Once UFM is operational on the upgraded node (formerly the standby node), proceed to replicate steps 1 to 3 on the non-upgraded node (previously the master node).

  6. On both servers, download latest UFM-HA package

    Copy
    Copied!
                

    wget https://www.mellanox.com/downloads/UFM/ufm_ha_5.1.1-6.tgz

  7. On both servers, extract HA package under /tmp/ and enter the new directory.

  8. On both servers, run the installation command (use the same dedicated DRBD partition for UFM HA. In the following example /dev/sda5 is used:

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -d /dev/sda5 -p enterprise

  9. Configure HA. There are two methods:

    Configure HA with SSH Trust

      1. On the master server only, configure the HA nodes. To do so, from /tmp, run the configure_ha_nodes.sh command as shown in the below example

        Copy
        Copied!
                    

        configure_ha_nodes.sh --cluster-password 12345678 \ --local-primary-ip 10.10.50.1 \ --peer-primary-ip 10.10.50.2 \ --local-secondary-ip 192.168.10.1 \ --peer-secondary-ip 192.168.10.2 \ --no-vip

        Warning

        The script configure_ha_nodes.sh is is located under /usr/local/bin/, therefore, by default, you do not need to use the full path to run it.

        Warning

        The --cluster-password must be at least 8 characters long.

        Warning

        To set up a Virtual IP for UFM and gain access to UFM through this IP, regardless of which server is running UFM, you may employ the --no-vip OR --virtual-ip command and provide an IP address as an argument. This can be achieved by navigating to https://<Virtual-IP>/ufm on your web browser.

        Warning

        When using back-to-back ports with local IP addresses for HA sync interfaces, ensure that you add your IP addresses and hostnames to the /etc/hosts file. This is needed to allow the HA configuration to resolve hostnames correctly based on the IP addresses you are using.

        Warning

        configure_ha_nodes.sh requires SSH connection to the standby server. If SSH trust is not configured, then you are prompted to enter the SSH password of the standby server during configuration runtime

      2. Depending on the size of your partition, wait for the configuration process to complete and DRBD sync to finish.

    Configure HA without SSH Trust

    If you cannot establish an SSH trust between your HA servers, you can use ufm_ha_cluster directly to configure HA. To configure HA, follow the below instructions:

    Warning

    Please change the variables in the commands below based on your setup.

      1. [On Standby Server] Run the following command to configure Standby Server:

        Copy
        Copied!
                    

        ufm_ha_cluster config -r standby \ --local-primary-ip 10.10.50.1 \ --peer-primary-ip 10.10.50.2 \ --local-secondary-ip 192.168.10.1 \ --peer-secondary-ip 192.168.10.2 \ --hacluster-pwd 123456789 \ --no-vip

      2. [On Master Server] Run the following command to configure Master Server:

        Copy
        Copied!
                    

        ufm_ha_cluster config -r master --local-primary-ip 10.10.50.1 \ --peer-primary-ip 10.10.50.2 \ --local-secondary-ip 192.168.10.1 \ --peer-secondary-ip 192.168.10.2 \ --hacluster-pwd 123456789 \ --no-vip

    You must wait until after configuration for DRBD sync to finish depending on the size of your partition.

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.