NVIDIA UFM Enterprise User Manual v6.23.1

System Backup and Restore

Full system backup creates a complete snapshot of your UFM system including Docker images and configurations. This enables complete disaster recovery and version downgrade capability.

The backup includes:

  • UFM enterprise Docker image

  • All plugin Docker images

  • UFM configuration files

  • Plugin configurations

  • Install arguments

  • UFM version and plugin versions

Preserved During Restore (Not Replaced):

  • PKEY configurations (/opt/ufm/files/conf/opensm/partitions.conf)

  • Unhealthy ports configuration (/opt/ufm/files/conf/opensm/opensm-health-policy.conf)

  • Existing backups (/opt/ufm/files/backup, /opt/ufm/backup)

Note

Only a single full system backup is kept at any time; creating a new backup overwrites the existing one.

Standalone (SA):

  • Full backup stored at /opt/ufm/backup/downgrade/1/

  • Accessible to the local system only

High Availability (HA):

  • Full backup stored at /opt/ufm/backup/downgrade/1/ on the master node's local storage

  • NOT stored on DRBD shared storage (unlike configuration snapshots)

  • Important: If HA failover occurs, the new master will NOT have access to backups created on the old master

  • User Action Required: After failover, if you need to restore a backup created on the old master, you must manually copy /opt/ufm/backup/downgrade/ from the old master to the new master node

Note

During backup, the tool performs schema validation on some of the config files; if this step fails, the backup operation stops with a proper error.

Basic Backup

Create a full system backup. Run:

Copy
Copied!
            

ufm_versions_mgr backup

Backup with Options

Create a backup with a label. Run:

Copy
Copied!
            

ufm_versions_mgr backup --label "Before production deployment"

Create backup in custom location. Run:

Copy
Copied!
            

ufm_versions_mgr backup --backup-dir /mnt/external/ufm-backup

Create backup with more workers for faster operation. Run:

Copy
Copied!
            

ufm_versions_mgr backup --max-workers 10

Command Options

Option

Description

--label TEXT

Optional description for backup

--backup-dir PATH

Custom backup location (default: /opt/ufm/backup/downgrade/)

--max-workers N

Number of parallel workers (4-10, default: 8)

--list

List existing backup

--dry-run

Preview operation

--verbose

Enable detailed output

List existing full system backup. Run:

Copy
Copied!
            

ufm_versions_mgr backup --list

Preview Backup (Dry-Run)

Preview backup operation. Run:

Copy
Copied!
            

ufm_versions_mgr backup --dry-run

Note

The UFM service continues to run during the backup, with no downtime.

Standalone (SA) Restore

  1. Preview restore. Run:

    Copy
    Copied!
                

    ufm_versions_mgr restore --dry-run

  2. Restore full system. Run:

    Copy
    Copied!
                

    ufm_versions_mgr restor

Restore Process:

  1. Validate backup integrity

  2. Save current PKEY and health policy configurations

  3. Stop UFM service

  4. Uninstall current UFM

  5. Load Docker images from backup

  6. Install UFM with saved install arguments

  7. Restore configurations

  8. Restore preserved files (PKEY, health policies)

  9. Start UFM service

Note

The UFM service is stopped for the duration of the restore operation.


High Availability (HA) Restore

Restore full system in HA environment. Run on master node:

Copy
Copied!
            

ufm_versions_mgr restore

Restore Process

  1. Validate backup integrity

  2. Save current PKEY and health policy configurations

  3. Stop HA cluster

  4. Get standby node IP from HA configuration

  5. Uninstall UFM on master

  6. Load Docker images on master

  7. SSH to standby: Uninstall UFM and load images

  8. Install UFM on master with saved install arguments

  9. Install UFM on standby with saved install arguments.

  10. Restore configurations

  11. Restore preserved files

  12. Start HA cluster

Note
  • The restore must be executed from the master node.

  • SSH trust with the standby node is set up automatically.

  • The entire HA cluster is offline for the duration of the restore.


Restore from Custom Location

Restore from custom backup directory. Run:

Copy
Copied!
            

ufm_versions_mgr restore --backup-dir /mnt/external/ufm-backup

During full system restore, the following files from the current system are preserved:

File / Directories

Reason

PKey Configurations

/opt/ufm/files/conf/opensm/partitions.conf

Represents current fabric partitioning. Changing PKEYs can disrupt running workloads.

Unhealthy Ports Configuration

/ opt /ufm/files/conf/opensm/opensm-health-policy.conf

Current health policies should survive version changes.

Existing Backups

/opt/ufm/files/backup, /opt/ufm/backup

Maintain ability to restore again.

© Copyright 2025, NVIDIA. Last updated on Nov 20, 2025