Upgrading UFM on Bare Metal Server

NVIDIA UFM Enterprise User Manual v6.17.2

You can upgrade the UFM standalone server software for InfiniBand from the previous UFM version.

To upgrade the UFM server software:

  1. Create a temporary directory (for example /tmp/ufm).

  2. Open the UFM software zip file that you downloaded. The zip file contains the following installation files for:

    • RedHat 7/CentOS 7/OEL 7: ufm-X.X -XXX.el7.x86_64.tgz

    • RedHat 8/CentOS 8/OEL 8: ufm-X.X -XXX.el8.x86_64.tgz

    • Ubuntu 18.04: ufm-X.X -XXX.ubuntu18.x86_64.tgz

    • Ubuntu 20.04: ufm-X.X -XXX.ubuntu20.x86_64.tgz

    • Ubuntu 22.04: ufm-X.X-XXX.ubuntu22.x86_64.tgz

  3. Extract the installation file for your system’s OS to the temporary directory that you created.

  4. Stop the UFM server. Run:

    Copy
    Copied!
                

    systemctl stop ufm-enterprise

  5. From within the temporary directory, run the following command as root:

    Copy
    Copied!
                

    ./upgrade.sh

    Note

    A configuration backup ZIP file will be created in the running directory (e.g. /tmp/ufm). The backup file name is ufm_X.X.X_bkp.zip (X.X.X is the previous version).

    1. Upgrade from the previous version: the existing UFM data and configuration are preserved.

    2. In case upgrade.sh script stops before completion (e.g. missing prerequisite), the upgrade procedure can be resumed by fixing the issue (e.g. installing missing prerequisite) and rerunning ./upgrade.sh again.

  6. Restart the UFM server. Run:

    Copy
    Copied!
                

    systemctl start ufm-enterprise.service

    Note

    /etc/init.d/ufmd start - Available for backward compatibility.

  7. After the upgrade, remove the temporary directory

Note

As of UFM version 6.14.0, UFM upgrade on HA supports in-service upgrade, meaning UFM can continue running during the steps of the upgrade, and there is no need to stop UFM before the upgrade (although this is also supported).

You can upgrade the UFM server HA software for InfiniBand from the previous release. The upgrade is performed on both servers.

To upgrade the UFM server software:

  1. On the standby server, extract the new UFM Enterprise package to the /tmp folder:

    Copy
    Copied!
                

    tar -xzf ufm-X.X.X-XXXXX.tgz -C /tmp

  2. On the standby server, enter to the installation folder and upgrade script:

    Copy
    Copied!
                

    standby# cd /tmp/ ufm-X.X.X-X.<OS_NAME>.x86_64.mofed5/

  3. Run the UFM upgrade script on the standby server:

    Copy
    Copied!
                

    ./upgrade.sh

  4. After the completion of the upgrade script, the UFM code will undergo an upgrade, while the UFM data will remain unchanged. The automatic upgrade of UFM data will take place during the next startup of UFM. To initiate this process, execute a failover from the Master node (or perform a takeover from the Standby node).

    Copy
    Copied!
                

    master# ufm_ha_cluster failover

    Note

    UFM will log the data upgrade to the syslog of the server, in case of issue a backup of the UFM data is saved prior to the upgrade in /opt/ufm/BACKUP directory and can be restored. For more information, refer to Appendix – Restoring UFM Data.

  5. Once UFM is operational on the upgraded node (formerly the standby node), proceed to replicate steps 1 to 3 on the non-upgraded node (previously the master node).

  6. On both servers, download latest UFM-HA package

    Copy
    Copied!
                

    wget https://www.mellanox.com/downloads/UFM/ufm_ha_5.5.0-9.tgz

  7. On both servers, extract HA package under /tmp/ and enter the new directory

  8. Stop the UFM HA cluster, run the following command on the Master server:

    Copy
    Copied!
                

    ufm_ha_cluster stop

  9. On both the Master and Standby servers (starting with the Standby), run the upgrade command from within the extracted HA package located in /tmp:

    Copy
    Copied!
                

    ./install.sh --upgrade

  10. Start the UFM HA cluster, run the following command on the Master server:

    Copy
    Copied!
                

    ufm_ha_cluster start

© Copyright 2024, NVIDIA. Last updated on Jun 27, 2024.