UFM Cyber-AI OS Upgrade

NVIDIA UFM Cyber-AI Documentation v2.5.0

This section provides a step-by-step guide for UFM Cyber-AI Operating System upgrade.

Each UFM Cyber-AI Appliance software has an additional tar file with a -omu.tar suffix (OMU stands for OS Manufacture and Upgrade). This tar file can be used to re-manufacture the server and to upgrade the operating system/software on the server.

  1. Copy the OMU tar file to a temporary directory on the server.
    CyberAI - ufm-cyberai-appliance<version>-<revision>-omu.tar

  2. Extract the contents of the tar file to /tmp:

    Copy
    Copied!
                

    tar xf ./ufm-cyberai-appliance-<version>-<revision>-omu.tar -C /tmp/

  3. Change to the extracted directory:

    Copy
    Copied!
                

    cd /tmp/ufm-cyberai-appliance-<version>-<revision>-omu

  4. An upgrade script and an ISO file are included in the extracted directory:

    Copy
    Copied!
                

    ls -1 ./# ls -1 ./ ufm-os-upgrade.sh ufm-cyberai-appliance-<version>-<revision>.iso

    The following flags are available in the upgrade script help.

    Copy
    Copied!
                

    # ufm-os-upgrade.sh --help ufm-os-upgrade.sh will upgrade and install OS packages.   IMPORTANT!!! a reboot is mandatory after the finalization of this script, kernel and kernel models will not work properly until the server is rebooted.   Additional SW installations will be automatically invoked after reboot, a message will pop on all open terminals with the installation status: "UFM-OS-FIRSTBOOT-FAILURE" - if installation is failed. "UFM-OS-FIRSTBOOT-SUCCESS" - if installation succeeded.   additional info will be available in "/var/log/ufm_os_upgrade_<UFM-OS-VERSION>.log" log file.   syntax: ufm-os-upgrade.sh [options]   options --appliance-sw-upgrade upgrade ufm_appliance SW as well, default is to upgrade OS only, P.S. only applicable for StandAlone installations.   -d,--debug debug info will be visible on the screen.   -r,--reboot Automatically reboot the server when upgrade is finished. P.S. if secure boot is enabled and a new certificate is enrolled the server will not automatically reboot even if this flag is set.   -y,--yes wont prompt for user acknowledgements.   -h,--help print this help message.

    Important

    IMPORTANT!!! System reboot is mandatory once the upgrade procedure is completed. The -r flag can be used to automatically reboot the server at the end of the upgrade. Note that some kernel modules may not work properly until server reboot is performed.

Upgrading in Standalone Mode

  1. Stop UFM and CyberAI services.

    Copy
    Copied!
                

    systemctl stop ufm-enterprise.service systemctl stop ufm-cyberai.service

  2. Run the upgrade script:

    Warning

    System reboot is mandatory once the upgrade procedure is completed. The -r flag can be used to automatically reboot the server.

    To bypass user prompts, use the -y flag when executing the command, but note that this flag alone will not trigger an automatic server reboot. If a reboot is desired, use the -r flag in combination with -y. Additionally, the --appliance-sw-upgrade flag can be used to upgrade both the UFM Enterprise Appliance SW and Cyber-AI SW, but this upgrade is not enabled by default. In the provided example, the server will automatically reboot after the upgrade process is completed.

    Copy
    Copied!
                

    ./ufm-os-upgrade.sh -y -r

    The below is an example with the --appliance-sw-upgrade flag. Note that the UFM Enterprise appliance SW will also be upgraded.

    Copy
    Copied!
                

    ./ufm-os-upgrade.sh -y -r --appliance-sw-upgrade

  3. After the reboot procedure is complete, a systemd service (ufm-os-firstboot.service) runs the remainder of the upgrade procedure. Once completed, a message is prompted to all open terminals including the status:
    "UFM-OS-FIRSTBOOT-FAILURE" - if installation is failed.
    "UFM-OS-FIRSTBOOT-SUCCESS" - if installation succeeded.
    Example:

    image2023-1-15_15-55-45.png

    To manually check the status, run systemctl status ufm-os-firstboot.service. If it is already completed, an error message is prompted stating that there is no such service. In that case, the log /var/log/ufm-os-firstboot.log can be checked instead.

    Copy
    Copied!
                

    systemctl status ufm-os-firstboot.service

    Example:

    image2023-1-15_15-57-16.png

Upgrade in High-Availability Mode

Upgrade on HA should be done first on the stand-by node and after that on the master node, each node upgrade is similar to the SA instructions.

In case the Standby node is unavailable, the upgrade can be run on the Master node only, however, some additional steps will be required after the appliance is upgraded.

  1. [On the standby Node]: Copy and extract the OMU tar file to a temporary directory, refer to Extracting the Software.

  2. [On master Node]: Run the upgrade script.

    Warning

    System reboot is mandatory once the upgrade procedure is completed. The -r flag can be used to automatically reboot the server.

    The --appliance-sw-upgrade flag CAN NOT !!! be supplied to upgrade the UFM Enterprise Appliance SW in HA and the upgrade will not be performed if provided.

    The -y flag can be supplied to skip user questions (the flag does not automatically reboot the server on its own. For auto reboot, combine with the -r flag).

    In the following example the server auto reboots once the upgrade procedure is completed:

    Copy
    Copied!
                

    cd /tmp/ufm-cyberai-appliance-<version>-<revision>-omu ./ufm-os-upgrade.sh -y -r

  3. In case the -r flag was not included, the server must be manually rebooted if the user selects "No" when prompted with a question on whether to reboot after the script finishes.

    Copy
    Copied!
                

    reboot now

  4. After the reboot procedure is complete, a systemd service (ufm-os-firstboot.service) runs the remainder of the upgrade procedure. Once completed, a message is prompted to all open terminals including the status:
    "UFM-OS-FIRSTBOOT-FAILURE" - if installation is failed.
    "UFM-OS-FIRSTBOOT-SUCCESS" - if installation succeeded.
    Example:

    image2023-1-15_15-55-45.png


    To verify the status manually, execute "systemctl status ufm-os-firstboot.service". If the service has already completed, an error message will be displayed indicating that the service does not exist. In such a scenario, refer to the log file located at /var/log/ufm-os-firstboot.log for checking the status.

    Copy
    Copied!
                

    systemctl status ufm-os-firstboot.service

    Example:

    image2023-1-15_15-57-16.png

  5. After the stand-by node have finished the upgrade check the HA cluster status

    Copy
    Copied!
                

    ufm_ha_cluster status

    image2023-3-16_21-11-14.png

    Every node within the cluster is expected to be operational while the present node remains in a stand-by mode (designated as Secondary in DRBD_ROLE).

  6. [On the Master Node]: Initiate a fail-over of UFM to the stand-by node, which will result in the upgraded node taking over as the master and the current node transitioning to a stand-by state.

    Copy
    Copied!
                

    ufm_ha_cluster failover

    Wait until all the resources of UFM are up and functioning correctly on the upgraded node.

  7. Perform the same process on the inactive node that has not been upgraded, and is currently functioning as a standby.

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.