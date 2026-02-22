NVIDIA UFM High-Availability User Guide v6.2.1
Changes and New Features

Feature / Component

Description

HA Active-Active Mode

Provides high-availability infrastructure for UFM-Infra in Active-Active mode. DRBD Dual-Primary Mode available for UFM Enterprise XDR and XDR-DC Appliances (3.5)

UFM-Infra (multimode) Addon

The UFM HA addons framework enables extending the set of services that HA manages, including start, stop, and monitoring operations. It provides a predefined list of service bundles (addons) that can be selected during HA configuration.

The ufm-infra (multinode) addon includes the following services:

  • ufm-redis-mgr – Redis manager for UFM distributed state; part of the ufm-cluster group and runs on the master node.

  • ufm-infra – UFM infrastructure service; runs on both nodes as a cloned resource.

DRBD Dual-Primary Mode

Provides a shared storage layer for the UFM Active-Active architecture, allowing both nodes to write simultaneously while keeping data synchronized in real time to ensure a consistent cluster state. Uses the OCFS2 cluster filesystem with its O2CB communication layer. Note: dual-primary mode supports only two nodes and is not scalable beyond that.

HA Monitor

Added support for HA cluster monitoring and maintenance daemon (systemd service). Runs on master node and managed by Pacemaker. Performs:

  • Periodic Pacemaker resource cleanup

  • Activation of attached standby nodes after DRBD synchronization in dual-primary mode

Configuring all Nodes

Added a new ufm_ha_cluster command option to configure all nodes. Requires SSH trust. Usage: ufm_ha_cluster --configure-all-node ...

Upgrading HA Cluster

When upgrading from version ≤6.1.1-x:

  • The HA state is synchronized from the cluster on both nodes

  • On the last upgraded node: ha-monitor is added to the cluster and ufm-ha-watcher is restarted

HA State

HA configuration state is stored in /var/lib/ufm_ha/ha_state on each node. This file is used to load the HA configuration for cluster operations (start, stop, detach, etc.).
