High Availability

NVIDIA UFM Enterprise Appliance Software User Manual v1.4.1

UFM HA supports High-Availability on the host level for UFM Enterprise appliances. The solution is based on a pacemaker to monitor services, and on DRBD to sync file-system states.

UFM HA should be configured on two appliances, master and standby via the configure_ha_nodes.sh tool. For detailed information on configuring UFM HA from the master (AKA main) appliance, refer to Installing UFM Server Software for High Availability. Since the UFM HA package and related components (i.e. pacemaker and DRBD) are already deployed, follow instructions from step 6 (Configure HA from the main server) and onward.

  • To manage the HA cluster, use the ufm_ha_cluster tool.
    ufm_ha_cluster Usage

    Copy
    Copied!
                

    # ufm_ha_cluster --help =================================================================== UFM-HA version: 5.1.0-8 ------------------------------------------------------------------- Usage: ufm_ha_cluster [-h|--help] <command> [<options>]   This script manages UFM HA cluster.

    Options:

    Copy
    Copied!
                

    OPTIONS: -h|--help Show this message   COMMANDS: version HA cluster version config Configure HA cluster cleanup Remove HA configurations status Check HA cluster status failover Master node failover takeover Standby node takeover start Start HA services stop Stop HA services detach etach the standby from cluster attach Attach a new standby to cluster enable-maintain Enable maintenance to cluster disable-maintain Disable maintenance to cluster reset Reset DRBD connectivity from split-brain

  • For further information on each command, run:

    Copy
    Copied!
                

    ufm_ha_cluster <command> --help

  • To check UFM HA cluster status, run:

    Copy
    Copied!
                

    ufm_ha_cluster status

  • To start the UFM HA cluster, run:

    Copy
    Copied!
                

    ufm_ha_cluster start

  • To stop the UFM HA cluster, run:

    Copy
    Copied!
                

    ufm_ha_cluster stop

  • Execute the failover command on the master appliance to become the standby appliance. Run:

    Copy
    Copied!
                

    ufm_ha_cluster failover

  • Execute the takeover command on the standby machine to become the master appliance. Run:

    Copy
    Copied!
                

    ufm_ha_cluster takeover

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.