NVIDIA UFM High-Availability User Guide v5.3.1
NVIDIA UFM High-Availability User Guide v5.3.1

Installation and Configuration

The UFM HA package can be downloaded by running the following command:

Copy
Copied!
            

wget https://download.nvidia.com/ufm/ufm_ha_5.3.1-2.tgz

The UFM HA package should be installed on both machines (Master and Standby) and the required UFM products. Installation order does not matter. To install the UFM-HA package:

  • Untar the ufm-ha package:

    Copy
    Copied!
                

    tar xvzf ufm-ha-<version>.tgz

  • Go to the directory you extracted and run the installation script. For example:

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -d /dev/sda5 -p enterprise

    For NFS support, run the following installation script. For example:

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -p enterprise

    Option

    Description

    -l

    Sync Files Location. Must be always /opt/ufm/files/

    -d

    Diskname for DRBD. For example /dev/sda5 (in case of using DRBD)

    -p

    Product Name. Must use “enterprise” to UFM Enterprise

UFM HA scripts are installed under /usr/local/bin

There are two methods to configure the HA cluster:

Configure HA with SSH Trust

  1. On the master server only, configure the HA nodes. To do so, from /tmp, run the configure_ha_nodes.sh command as shown in the below example

    Copy
    Copied!
                

    configure_ha_nodes.sh --cluster-password 12345678 \ --master-primary-ip 10.10.10.1 \ --standby-primary-ip 10.10.10.2 \ --master-secondary-ip 192.168.10.1 \ --standby-secondary -ip 192.168.10.2 \ --virtual-ip 10.10.10.5

    Warning

    The script configure_ha_nodes.sh is is located under /usr/local/bin/, therefore, by default, you do not need to use the full path to run it.

    Warning

    The --cluster-password must be at least 8 characters long.

    Warning

    To ensure effective HA sync interface functionality for PCS version 0.9.X, employing back-to-back ports with local IP addresses, it is crucial to incorporate the relevant IP addresses and hostnames into the /etc/hosts file. This step is necessary to enable the HA configuration to accurately resolve hostnames based on the specific IP addresses in use.

    Warning

    configure_ha_nodes.sh requires SSH connection to the standby server. If SSH trust is not configured, then you are prompted to enter the SSH password of the standby server during configuration runtime

    Option

    Description

    --cluster-password

    UFM HA cluster password for authentication by the pacemaker.

    --master-ip

    Master (main) server IP address

    --standby-ip

    Standby server IP address

    --virtual-ip OR --no-vip

    UFM HA cluster Virtual IP or configure HA without virtual IP

  2. Depending on the size of your partition, wait for the configuration process to complete and DRBD sync to finish.

Configure HA without SSH Trust

If you cannot establish an SSH trust between your HA servers, you can use ufm_ha_cluster directly to configure HA. You can see all the options for configuring HA in the Help menu:

Copy
Copied!
            

ufm_ha_cluster config -h

Usage:

Copy
Copied!
            

ufm_ha_cluster config [<options>] 

Option

Description

-r

--role <node role>

Node role (master or standby).

-e

--peer-primary-ip <ip address>

Peer node primary IP address (mandatory).

-l

--local-primary-ip <ip address>

Local node primary IP address (mandatory).

-E

--peer-secondary-ip <ip address>

Peer node secondary IP address (mandatory).

-L

--local-secondary-ip <ip address>

Local node primary IP address (mandatory).

-i

--virtual-ip <virtual-ip>

Cluster virtual IP (should be used for master only)

-p

--hacluster-pwd <pwd>

HA cluster user password.

-h

--help

Show this message

-N

--no-vip

Configure HA without virtual IP

To configure HA, follow the below instructions:

Warning

Please change the variables in the commands below based on your setup.

  1. [On Standby Server] Run the following command to configure Standby Server:

    Copy
    Copied!
                

    ufm_ha_cluster config -r standby -e <peer primary ip address> -l <local primary ip address> -E <peer secondary ip address> -L <local secondary ip address> -p <cluster_password>

  2. [On Master Server] Run the following command to configure Master Server:

    Copy
    Copied!
                

    ufm_ha_cluster config -r master -e <peer primary ip address> -l <local primary ip address> -E <peer secondary ip address> -L <local secondary ip address> -p -i <virtual ip address>

If certain versions of DRBD do not support synchronizing the file system across more than two nodes within a cluster, NFS is used. To activate this functionality, users must define the following parameters:

  • Mode: NFS

  • NFS Server

  • Shared Folder

Ensure that the NFS version supports nfs4. Refer to the section below for details on configuring the file.

The UFM-HA cluster can comprise of more than two nodes. Among these nodes, one will serve as the master, while the others will operate in standby mode.

To configure multiple nodes, users must populate the configuration file '/etc/ha_nodes.cfg' on all nodes (ensuring that the file is identical across all nodes).

This file contains details about each participating node, including:

  • Role: Master/Standby

  • Primary IP address

  • Secondary IP address

Using File Configuration

The '/etc/ha_nodes.cfg' file contains all the necessary information for HA configuration and can serve as a replacement for command-line configuration. The only configuration not saved in the file is the password for security reasons.

To configure, use the following command (should be executed after setting the configuration):

Copy
Copied!
            

ufm_ha_cluster config –p <password>


Configuration File

The sample configuration file includes up to three sections for nodes, but users can add additional sections as needed.

Copy
Copied!
            

[General] # Number of nodes in the cluster, one is master and others are standby # Set this number according to the number of configured nodes nodes_number = 2 # Connection mode # in case dual_link is true, each node must have primary and secondary IPs dual_link = true   [Node.1] # valid role options: master/standby role = master # Mandatory primary_ip = # Mandatory if dual_link = true secondary_ip =   [Node.2] role = standby primary_ip = secondary_ip =   [Node.3] role = standby primary_ip = secondary_ip =   # Add other Node.x sections if needed.   [Virtual] # If virtual IP should not be added, set `virtual_ip = no-vip` virtual_ip = # when using BGP virtual IP, you must use the loopback interface, set `interface = lo` # in other cases we let the pcs to decide on the relevant network interface. interface =   [FileSync] # valid options are: drbd/nfs mode = nfs   [NFS] # fill in case the FileSync.mode is nfs nfs_server = shared_folder =


Show UFM HA version

Run the following command to show UFM HA version:

Copy
Copied!
            

ufm_ha_cluster version


Starting UFM HA Cluster

Warning

Before starting the UFM cluster, ensure that the DRBD sync is completed.

To start UFM HA cluster:

Copy
Copied!
            

ufm_ha_cluster start


Checking UFM Cluster Status

To check UFM HA cluster status:

Copy
Copied!
            

ufm_ha_cluster status 


Stopping UFM HA Cluster

To stop UFM HA cluster:

Copy
Copied!
            

ufm_ha_cluster stop 


Takeover Services

The takeover command can be executed on the standby machine so that it will be the master.

Copy
Copied!
            

ufm_ha_cluster takeover


Master Failover

The failover command can be executed on the master machine so that it will be the standby.

Copy
Copied!
            

ufm_ha_cluster failover


Replacing the Standby Node

  • Install the HA package for the new node (standby).

  • Disconnect the standby node (the old standby) and run the following command on the master node:

    Copy
    Copied!
                

    ufm_ha_cluster detach

  • Config the new standby node; please refer to Configuration.

  • Connect the new standby to the cluster by running the command on the master node:

    Copy
    Copied!
                

    ufm_ha_cluster attach -l <local primary ip address> -e <peer primary ip address> -E <peer secondary ip address> -p <cluster_password>

Uninstalling UFM HA

To uninstall UFM HA, first stop the cluster and then run the uninstallation command as follows:

Copy
Copied!
            

/opt/ufm/ufm_ha/uninstall_ha.sh


© Copyright 2023, NVIDIA. Last updated on Dec 12, 2023.