NVIDIA UFM High-Availability User Guide v6.2.1

Installation and Configuration

The UFM HA package can be downloaded by running the following command:

Copy
Copied!
            

wget http://www.mellanox.com/downloads/UFM/ufm_ha_6.1.1-2.tgz

The UFM HA package should be installed on both machines (Master and Standby) and the required UFM products (Installation order does not matter).

To install the UFM-HA package:

  • Untar the ufm-ha package:

    Copy
    Copied!
                

    tar xvzf ufm-ha-<version>.tgz

  • Go to the directory you extracted and run the installation script. For example:

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -d /dev/sda5 -p enterprise

    For NFS support, run the following installation script. For example:

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -p enterprise

    Option

    Description

    -l

    Sync Files Location. Must be always /opt/ufm/files/

    -d

    Disk name for DRBD. For example /dev/sda5 (in case of using DRBD). Note that the `-d` option is not needed in case of NFS.

    -p

    Product Name. Must use “enterprise” to UFM Enterprise

Info

In cases where you have a previous installation of ufm_ha and you want to upgrade to the newer version, run the following command:

Copy
Copied!
            

./install.sh -u

Info

UFM HA scripts are installed under /usr/bin.

There are two methods to configure the HA cluster, depending on how the configuration procedure is orchestrated:

Manual (per-node) configuration

The user runs the configuration steps separately on each node (run first on standby node).

This method does not require SSH trust between the nodes.

Orchestrated (cluster-wide) configuration

The ufm_ha_cluster procedure runs once on the master node and orchestrates the configuration remotely across all nodes.

This method requires passwordless SSH trust between the nodes.

Configure HA Manual (Per-Node) Mode

Use this method when you prefer to control the configuration process on each node individually, or when SSH trust cannot be established between the HA servers.

In this mode, the user is responsible for running the configuration commands separately on each node using ufm_ha_cluster config.

You can view all available configuration options in the Help menu:

Copy
Copied!
            

ufm_ha_cluster config -h

Usage:

Copy
Copied!
            

ufm_ha_cluster config [<options>] 

To configure the HA cluster in per-node mode, you must configure all standby nodes first and the master node last.

This order is required to ensure the cluster is correctly created on the master node; if the master is configured first, synchronization with standby nodes may fail.

Example for a 2-node cluster:

On the first node (Standby):

Copy
Copied!
            

ufm_ha_cluster config -r standby -e <peer primary IP> -l <local primary IP> -E <peer secondary IP> -L <local secondary IP> -p <cluster_password> [...]

On the second node (Master):

Copy
Copied!
            

ufm_ha_cluster config -r master -e <peer primary IP> -l <local primary IP> -E <peer secondary IP> -L <local secondary IP> -p -i <virtual-ip> [...]

All available configuration options for the ufm_ha_cluster config command are listed in the table below:

Option

Description

-r

--role <node role>

Node role (master or standby)

-e

--peer-primary-ip <ip address>

Peer node primary IP address (mandatory)

-l

--local-primary-ip <ip address>

Local node primary IP address (mandatory)

-E

--peer-secondary-ip <ip address>

Peer node secondary IP address (mandatory)

-L

--local-secondary-ip <ip address>

Local node primary IP address (mandatory)

-i

--virtual-ip <virtual-ip>

--virtual-ip4 <virtual-ip>

--virtual-ip6 <virtual-ip>

Cluster virtual IP (auto-detects IPv4/IPv6)

IPv4 virtual IP (Deprecated Use --virtual-ip)

IPv6 virtual IP (Deprecated Use --virtual-ip)

--vip-interface <interface>

Network interface for VIP (e.g., lo for BGP).

-N

--no-vip

Configure HA without virtual IP

-M

--ignore-mgmt-failure

Ignore management interface status if VIP is configured.

Will not failover if master node's secondary IP is down.

--file-sync-mode <mode>

File sync mode: drbd or external-storage

--drbd-data-mode <mode>

DRBD data mode: ordered or journal

-D

--drbd-dual-primary

Enable DRBD dual-primary mode for UFM active-active

(default: single-primary)

--enable-multinode

Add UFM Infra services (ufm-redis-mgr, ufm-infra). Use this option for UFM Infra setup

--enable-single-link

Enable single network interface mode.

--ha-params-file

Path to ha_nodes.cfg configuration file

(default: /etc/ufm_ha/ha_nodes.cfg).

-p

--hacluster-pwd <pwd>

HA cluster user password. Must be at least 8 characters long.

--configure-all-nodes

Configure all cluster nodes (via SSH) before configuring master.

Run only from master node.Requires SSH trust.

-h

--help

Show this message

Note

Modify the configuration command options to match your specific setup (network, storage, etc.).

Note

To ensure effective HA sync interface functionality for PCS version 0.9.X, employing back-to-back ports with local IP addresses, it is crucial to incorporate the relevant IP addresses and hostnames into the /etc/hosts file. This step is necessary to enable the HA configuration to accurately resolve hostnames based on the specific IP addresses in use.

Note

While configuring UFM HA on Oracle Linux, make sure the SELinux is disabled. You can check SELinux status with sestatus.

If it is enabled, follow the below steps to disable it:

  • Run vi /etc/selinux/config

  • Add SELINUX=disabled

  • Reboot the machine

  • Verify SELinux is disabled with the command sestatus.


Configure HA Orchestrated Configuration

Use this method when SSH trust is established between the cluster nodes.

In this mode, a single ufm_ha_cluster config command is executed on the master node, and the tool orchestrates the configuration process remotely across all cluster nodes.

To configure all nodes, add the --configure-all-nodes option to the ufm_ha_cluster config command, as shown in the example below.

Copy
Copied!
            

ufm_ha_cluster -r master --configure-all-nodes <...>

Note

The configure-all-nodes option requires SSH connection to the standby server. If SSH trust is not configured, then you are prompted to enter the SSH password of the standby server during configuration runtime


The DRBD is used for syncing File System between the two nodes. DRBD is the default sync method, unless stated otherwise in the configuration file. (see section: "Using File Configuration" below). The DRBD disk that was assigned during installation phase will be mounted as a File System directory, the default option of this mount is "data=ordered", however, it can be override in the configuration file in the "DRBD" section, in order to set the data option to "journal" which offers the highest level of data integrity, but it can impact write performance.

NFS synchronization mechanism can be used instead of DRBD. Multi-Nodes Support can be used with NFS synchronization mechanism only, as described in the following section. To activate this functionality, users must define the following parameters:

  • Mode: NFS

  • NFS Server

  • Shared Folder

Ensure that the NFS version supports nfs4. It is recommended that the NFS server is not one of the UFM-HA nodes.

Refer to the Using File Configuration section below for details on configuring HA with NFS.

UFM-HA active-active mode manages additional services required for UFM active-active deployments (UFM-Infra) and provides additional storage options for this setup.

Multinode Option

The --enable-multinode option enables UFM infra services (ufm-redis-mgr, ufm-infra).

Copy
Copied!
            

ufm_ha_cluster config ... --enable-multinode

Or set in /etc/ufm_ha/ha_nodes.cfg:
[General]<p></p>enable_multinode = true

Storage Options

Option

Use Case

External Storage (NFS)

External NFS server available

DRBD Dual-Primary

No external storage server required (Ubuntu 24.04+)


External Storage (NFS)

CLI:

Copy
Copied!
            

ufm_ha_cluster config --role <standby|master> \ -l <local-ip> -e <peer-ip> \ --file-sync-mode external-storage \ --enable-multinode

Or set in /etc/ufm_ha/ha_nodes.cfg:[FileSync]

mode = external-storage

Note: NFS mount must be configured separately on both nodes before running HA config.

DRBD Dual-Primary

Prerequisites: Ubuntu 24.04+, ocfs2-tools, port 7777 open.

CLI:

Copy
Copied!
            

ufm_ha_cluster config --role <standby|master> \ -l <local-ip> -e <peer-ip> \ --drbd-dual-primary \ --enable-multinode

The UFM-HA cluster can comprise of more than two nodes. Among these nodes, one will serve as the master, while the others will operate in standby mode.

To configure multiple nodes, users must populate the configuration file '/etc/ufm_ha/ha_nodes.cfg' on all nodes (ensuring that the file is identical across all nodes).

This file contains details about each participating node, including:

  • Role: Master/Standby

  • Primary IP address

  • Secondary IP address

Using File Configuration

The '/etc/ufm_ha/ha_nodes.cfg' file contains all the necessary information for HA configuration and can serve as a replacement for command-line configuration.

The only configuration not saved in the file is the password for security reasons.

To configure, use the following command (should be executed after setting the configuration):

Copy
Copied!
            

ufm_ha_cluster config –p <password>

Info

The standby nodes must be configured at first, with the last node being set as the master node.


Configuration File

The sample configuration file includes up to three sections for nodes, but users can add additional sections as needed.

Copy
Copied!
            

[General] # Connection mode # in case dual_link is true, each node must have primary and secondary IPs dual_link = true   # enable ufm-infra add-on; default is false enable_multinode = false   # automatic failure cleanup interval (in hours) # will perform cleanup of failures to enable automatic failover automatic_failure_cleanup_interval = 24   [Node.1] role = master primary_ip = secondary_ip =   [Node.2] role = standby primary_ip = secondary_ip =   # Add other Node.x sections if needed.   [Virtual] # If virtual IP should not be added, set `no_vip = true` no_vip =   virtual_ip =   ignore_mgmt_failure = false # when using BGP virtual IP, you must use the loopback interface, set `interface = lo` # in other cases we let the pcs to decide on the relevant network interface. interface =   [FileSync] # valid options are: drbd/external-storage # in case of external-storage the user MUST mount the files system PRIOR to ha configuration mode = drbd   [DRBD] # fill in case the FileSync.mode is drbd # drbd data mode. options are: ordered/journal (default is ordered) # data=journal offers the highest level of data integrity, # but it can impact write performance. # primary_mode = single/dual (default is single) data = ordered primary_mode = single


Show UFM HA version

Run the following command to show UFM HA version:

Copy
Copied!
            

ufm_ha_cluster version


Starting UFM HA Cluster

Note

Before starting the UFM cluster, ensure that the DRBD sync is completed.

To start UFM HA cluster:

Copy
Copied!
            

ufm_ha_cluster start


Checking UFM Cluster Status

To check UFM HA cluster status:

Copy
Copied!
            

ufm_ha_cluster status 


Stopping UFM HA Cluster

To stop UFM HA cluster:

Copy
Copied!
            

ufm_ha_cluster stop 


Takeover Services

The takeover command can be executed on the standby machine so that it will be the master.

Copy
Copied!
            

ufm_ha_cluster takeover


Master Failover

The failover command can be executed on the master machine so that it will be the standby.

Copy
Copied!
            

ufm_ha_cluster failover


Automatic cleanup of failed actions

When an action failed in one of the HA nodes, for example DRBD failure, service failure or any other HA resources failure, the failed node will no longer be a candidate of automatic failover until these failed actions are cleaned up. To manually cleanup failed action, the user can run the following command:

Copy
Copied!
            

pcs resource cleanup

The UFM-HA performs automatic cleanup of failed actions every 24 hours. This period is configurable and can be changed in the General section in the ha_nodes.cfg configuration file. See section "Configuration File" above.

Replacing the Standby Node

  • Install the HA package for the new node (standby).

  • Disconnect the standby node (the old standby) and run the following command on the master node:

    Copy
    Copied!
                

    ufm_ha_cluster detach

  • Config the new standby node; please refer to Configuration.

  • Connect the new standby to the cluster by running the command on the master node:

    Copy
    Copied!
                

    ufm_ha_cluster attach -l <local primary ip address> -e <peer primary ip address> -E <peer secondary ip address> -p <cluster_password>

Uninstalling UFM HA

To uninstall UFM HA, first stop the cluster and then run the uninstallation command as follows:

Copy
Copied!
            

/opt/ufm/ufm_ha/uninstall_ha.sh


© Copyright 2026, NVIDIA. Last updated on Feb 20, 2026