NVIDIA UFM Enterprise Appliance Software User Manual v1.6.2
NVIDIA UFM Enterprise Appliance Software User Manual v1.6.2

High Availability

UFM HA supports High-Availability on the host level for UFM Enterprise appliances. The solution is based on a pacemaker to monitor services, and on DRBD to sync file-system states.

The diagram below describes the connectivity scheme of the UFM High-Availability cluster.

image2023-11-7_14-53-1-version-1-modificationdate-1704380181070-api-v2.png

UFM HA should be configured on two appliances, master and standby.

Important

High-availability should be configured first on on the standby node. When completed, it should be configured on the master node.

Command Usage:

Copy
Copied!
            

# ufm_ha_cluster config --help Usage: ufm_ha_cluster config [<options>]   The config command configures ha add-on for ufm server.

Options:

Option

Description

-r | --role <node role>  

Node role (master or standby) - Mandatory

-e | --peer-primary-ip <ip address>

Peer node primary ip address - Mandatory

-l | --local-primary-ip <ip address>

Local node primary ip address - Mandatory

-E | --peer-secondary-ip <ip address>

Peer node secondary ip address - Mandatory

-L | --local-secondary-ip <ip address>

Local node secondary ip address - Mandatory

-i | --virtual-ip <virtual-ip> OR 

-N | --no-vip

Cluster virtual IP OR

Do not create virtual IP resource - Mutual exclusive with virtual-IP option

One of the two options is mandatory

-p | --hacluster-pwd <pwd> 

hacluster user password - Mandatory

-f | --ha-config-file <file path>

HA configuration file - The default is ufm-ha.conf

Configure HA with VIP (Virtual IP)

  1. [On Standby Server] Run the following command to configure Standby Server:

    Copy
    Copied!
                

    ufm_ha_cluster config -r standby \ --local-primary-ip <local back-to-back IP> \ --peer-primary-ip <peer back-to-back IP> \ --local-secondary-ip <local management IP> \ --peer-secondary-ip <peer management IP> \ --virtual-ip <virtual management IP used for accessing the master node> \ --hacluster-pwd <password>

  2. [On Master Server] Run the following command to configure Master Server:

    Copy
    Copied!
                

    ufm_ha_cluster config -r master \ --local-primary-ip <local back-to-back IP> \ --peer-primary-ip <peer back-to-back IP> \ --local-secondary-ip <local management IP> \ --peer-secondary-ip <peer management IP> \ --virtual-ip <virtual management IP used for accessing the master node> \ --hacluster-pwd <password>

Alternatively, you can run the CLI command ufm ha configure.

Important

You must wait until after configuration for DRBD sync to finish before starting the UFM cluster. To check the DRBD sync status, run:

Copy
Copied!
            

ufm_ha_cluster status


Configure HA without VIP (on a Dual Subnet)

Warning

Please change the variables in the commands below based on your setup.

  1. [On Standby Server] Run the following command to configure Standby Server:

    Copy
    Copied!
                

    ufm_ha_cluster config -r standby \ --local-primary-ip <local back-to-back IP> \ --peer-primary-ip <peer back-to-back IP> \ --local-secondary-ip <local management IP> \ --peer-secondary-ip <peer management IP> \ --hacluster-pwd <password> \ --no-vip

  2. [On Master Server] Run the following command to configure Master Server:

    Copy
    Copied!
                

    ufm_ha_cluster config -r master \ --local-primary-ip <local back-to-back IP> \ --peer-primary-ip <peer back-to-back IP> \ --local-secondary-ip <local management IP> \ --peer-secondary-ip <peer management IP> \ --hacluster-pwd <password> \ --no-vip

Alternatively, you can run the CLI command ufm ha configure dual-subnet.

Important

You must wait until after configuration for DRBD sync to finish before starting the UFM cluster. To check the DRBD sync status, run:

Copy
Copied!
            

ufm_ha_cluster status


  • To manage the HA cluster, use the ufm_ha_cluster tool.

    ufm_ha_cluster Usage

    Copy
    Copied!
                

    # ufm_ha_cluster --help =================================================================== UFM-HA version: 5.3.0-17 -------------------------------------------------------------------   Usage: ufm_ha_cluster [-h|--help] <command> [<options>] This script manages UFM HA cluster.

    Options:

    Copy
    Copied!
                

    OPTIONS: -h|--help Show this message   COMMANDS: version HA cluster version config Configure HA cluster cleanup Remove HA configurations status Check HA cluster status failover Master node failover takeover Standby node takeover start Start HA services stop Stop HA services detach etach the standby from cluster attach Attach a new standby to cluster enable-maintain Enable maintenance to cluster disable-maintain Disable maintenance to cluster reset Reset DRBD connectivity from split-brain is-master check if the current node is a master is-running check if ufm services are running is-ha Check if running in HA mode  

  • For further information on each command, run:

    Copy
    Copied!
                

    ufm_ha_cluster <command> --help

  • To check UFM HA cluster status, run:

    Copy
    Copied!
                

    ufm_ha_cluster status

  • To start the UFM HA cluster, run:

    Copy
    Copied!
                

    ufm_ha_cluster start

  • To stop the UFM HA cluster, run:

    Copy
    Copied!
                

    ufm_ha_cluster stop

  • Execute the failover command on the master appliance to become the standby appliance. Run:

    Copy
    Copied!
                

    ufm_ha_cluster failover

  • Execute the takeover command on the standby machine to become the master appliance. Run:

    Copy
    Copied!
                

    ufm_ha_cluster takeover

Warning

For additional information on configuring UFM HA, please refer to Installing UFM Server Software for High Availability . Since the UFM HA package and related components (i.e. pacemaker and DRBD) are already deployed, follow instructions from step 6 (Configure HA from the main server) and onward.

© Copyright 2023, NVIDIA. Last updated on Mar 7, 2024.