Installing UFM on Bare Metal Server - High Availability Mode

NVIDIA UFM Enterprise User Manual v6.17.1 Download PDF

Before installing UFM server software in High-Availability mode, ensure that the Additional Prerequisites for UFM High Availability Installation are met.

The UFM High-Availability configuration requires dual-link connectivity based on two separate interfaces between the two UFM HA nodes. This configuration comprises of a primary link that is exclusively reserved for DRBD operations and a secondary link designated for backup purposes. Crucially, it is imperative that communication between the servers is established in a bidirectional manner across both interfaces and validated through user-initiated testing, such as a 'ping' command or other suitable alternatives before HA configuration can be implemented. In cases where only one link is available among the two UFM HA nodes/servers, manually configure UFM with a single link. Refer to Configure HA without SSH Trust (Single Link Configuration).

Note

UFM HA package requires a dedicated partition with the same name for DRBD on both servers. This guide uses /dev/sda5 as an example.

  1. On both servers, Install UFM Enterprise in Stand Alone (SA) mode.

    Note

    Do not start UFM service.

  2. Install the latest pcs and drbd-utils drivers on both servers.

    For Ubuntu:

    Copy
    Copied!
                

    apt install pcs pacemaker drbd-utils

    For CentOS/Red Hat:

    Copy
    Copied!
                

    yum install pcs pacemaker drbd84-utils kmod-drbd84

    OR

    Copy
    Copied!
                

    yum install pcs pacemaker drbd90-utils kmod-drbd90

  3. Download UFM-HA latest package from using this command:

    Copy
    Copied!
                

    wget https://www.mellanox.com/downloads/UFM/ufm_ha_5.5.0-9.tgz

    For Sha256:

    Copy
    Copied!
                

    wget https://download.nvidia.com/ufm/ufm_ha/ufm_ha_5.5.0-9.sha256

    Note

    For more information on the UFM-HA package and all installation and configuration options, please refer to UFM High-Availability User Guide.

  4. Extract the downloaded UFM-HA package on both servers under /tmp/.

  5. Go to the directory you extracted /tmp/ufm_ha_XXX and run the installation script. For example, if your DRBD partition is /dev/sda5 run:

    Copy
    Copied!
                

    ./install.sh -l /opt/ufm/files/ -d /dev/sda5 -p enterprise

  6. Configure the HA cluster. There are the three methods:

Configure HA with SSH Trust (Dual Link Configuration)

    1. On the master server only, configure the HA nodes. To do so, from /tmp, run the configure_ha_nodes.sh command as shown in the below example

      Copy
      Copied!
                  

      configure_ha_nodes.sh \ --cluster-password 12345678 \ --master-primary-ip 10.10.10.1 \ --standby-primary-ip 10.10.10.2 \ --master-secondary-ip 192.168.10.1 \ --standby-secondary -ip 192.168.10.2 \ --no-vip

      Note

      The script configure_ha_nodes.sh is is located under /usr/local/bin/, therefore, by default, you do not need to use the full path to run it.

      Note

      The --cluster-password must be at least 8 characters long.

      Note

      To set up a Virtual IP for UFM and gain access to UFM through this IP, regardless of which server is running UFM, you may employ the --no-vip OR --virtual-ip command and provide an IP address as an argument. This can be achieved by navigating to https://<Virtual-IP>/ufm on your web browser.

      Note

      When using back-to-back ports with local IP addresses for HA sync interfaces, ensure that you add your IP addresses and hostnames to the /etc/hosts file. This is needed to allow the HA configuration to resolve hostnames correctly based on the IP addresses you are using.

      Note

      configure_ha_nodes.sh requires SSH connection to the standby server. If SSH trust is not configured, then you are prompted to enter the SSH password of the standby server during configuration runtime

    2. Depending on the size of your partition, wait for the configuration process to complete and DRBD sync to finish.

Configure HA without SSH Trust (Dual Link Configuration)

If you cannot establish an SSH trust between your HA servers, you can use ufm_ha_cluster directly to configure HA. To configure HA, follow the below instructions:

Note

Please change the variables in the commands below based on your setup.

    1. [On Standby Server] Run the following command to configure Standby Server:

      Copy
      Copied!
                  

      ufm_ha_cluster config -r standby \ --local-primary-ip 10.10.50.1 \ --peer-primary-ip 10.10.50.2 \ --local-secondary-ip 192.168.10.1 \ --peer-secondary-ip 192.168.10.2 \ --hacluster-pwd 123456789 \ --no-vip

    2. [On Master Server] Run the following command to configure Master Server:

      Copy
      Copied!
                  

      ufm_ha_cluster config -r master --local-primary-ip 10.10.50.1 \ --peer-primary-ip 10.10.50.2 \ --local-secondary-ip 192.168.10.1 \ --peer-secondary-ip 192.168.10.2 \ --hacluster-pwd 123456789 \ --no-vip

      You must wait until after configuration for DRBD sync to finish, depending on the size of your partition. To check the DRBD sync status, run:

      Copy
      Copied!
                  

      ufm_ha_cluster status

Configure HA without SSH Trust (Single Link Configuration)

Warning

This is not the recommended configuration and, in case of network failure, it might cause HA cluster split brain.

If you cannot establish an SSH trust between your HA servers, you can use ufm_ha_cluster directly to configure HA. To configure HA, follow the below instructions:

Note

Please change the variables in the commands below based on your setup.

    1. [On Standby Server] Run the following command to configure Standby Server:

      Copy
      Copied!
                  

      ufm_ha_cluster config \ -r standby \ -e 10.212.145.5 \ -l 10.212.145.6 \ --enable-single-link

    2. [On Master Server] Run the following command to configure Master Server:

      Copy
      Copied!
                  

      ufm_ha_cluster config -r master \ -e 10.212.145.6 \ -l 10.212.145.5 \ -i 10.212.145.50 \ --enable-single-link

      You must wait until after configuration for DRBD sync to finish, depending on the size of your partition. To check the DRBD sync status, run:

      Copy
      Copied!
                  

      ufm_ha_cluster status

Starting HA Cluster

  • To start UFM HA cluster:

    Copy
    Copied!
                

     ufm_ha_cluster start 

  • To check UFM HA cluster status:

    Copy
    Copied!
                

    ufm_ha_cluster status 

Stopping UFM HA cluster:

Copy
Copied!
            

ufm_ha_cluster stop

Note

For complete details on high availability, refer to NVIDIA UFM High-Availability User Guide.

© Copyright 2024, NVIDIA. Last updated on Jun 7, 2024.