NVIDIA Docs Hub Homepage NVIDIA Networking Networking Software Management Software NVIDIA UFM Enterprise User Manual v6.23.1 System Backup and Restore

System Backup and Restore

Full system backup creates a complete snapshot of your UFM system including Docker images and configurations. This enables complete disaster recovery and version downgrade capability.

The backup includes:

UFM enterprise Docker image
All plugin Docker images
UFM configuration files
Plugin configurations
Install arguments
UFM version and plugin versions

Preserved During Restore (Not Replaced):

PKEY configurations (/opt/ufm/files/conf/opensm/partitions.conf)
Unhealthy ports configuration (/opt/ufm/files/conf/opensm/opensm-health-policy.conf)
Existing backups (/opt/ufm/files/backup, /opt/ufm/backup)

Note

Only a single full system backup is kept at any time; creating a new backup overwrites the existing one.

Storage Location Considerations

Standalone (SA):

Full backup stored at /opt/ufm/backup/downgrade/1/
Accessible to the local system only

High Availability (HA):

Full backup stored at /opt/ufm/backup/downgrade/1/ on the master node's local storage
NOT stored on DRBD shared storage (unlike configuration snapshots)
Important: If HA failover occurs, the new master will NOT have access to backups created on the old master
User Action Required: After failover, if you need to restore a backup created on the old master, you must manually copy /opt/ufm/backup/downgrade/ from the old master to the new master node

Create Full System Backup

Note

During backup, the tool performs schema validation on some of the config files; if this step fails, the backup operation stops with a proper error.

Basic Backup

Create a full system backup. Run:

Copy
Copied!

            
            ufm_versions_mgr backup

Backup with Options

Create a backup with a label. Run:

Copy
Copied!

            
            ufm_versions_mgr backup --label "Before production deployment"

Create backup in custom location. Run:

Copy
Copied!

            
            ufm_versions_mgr backup --backup-dir /mnt/external/ufm-backup

Create backup with more workers for faster operation. Run:

Copy
Copied!

            
            ufm_versions_mgr backup --max-workers 10

Command Options

Option	Description
`--label TEXT`	Optional description for backup
`--backup-dir PATH`	Custom backup location (default: `/opt/ufm/backup/downgrade/`)
`--max-workers N`	Number of parallel workers (4-10, default: 8)
`--list`	List existing backup
`--dry-run`	Preview operation
`--verbose`	Enable detailed output

Backup List

List existing full system backup. Run:

Copy
Copied!

            
            ufm_versions_mgr backup --list

Preview Backup (Dry-Run)

Preview backup operation. Run:

Copy
Copied!

            
            ufm_versions_mgr backup --dry-run

Note

The UFM service continues to run during the backup, with no downtime.

Restore Full System

Standalone (SA) Restore

Preview restore. Run:

Copy
Copied!

            
            ufm_versions_mgr restore --dry-run

Restore full system. Run:

Copy
Copied!

            
            ufm_versions_mgr restor

Restore Process:

Validate backup integrity
Save current PKEY and health policy configurations
Stop UFM service
Uninstall current UFM
Load Docker images from backup
Install UFM with saved install arguments
Restore configurations
Restore preserved files (PKEY, health policies)
Start UFM service

Note

The UFM service is stopped for the duration of the restore operation.

High Availability (HA) Restore

Restore full system in HA environment. Run on master node:

Copy
Copied!

            
            ufm_versions_mgr restore

Restore Process

Validate backup integrity
Save current PKEY and health policy configurations
Stop HA cluster
Get standby node IP from HA configuration
Uninstall UFM on master
Load Docker images on master
SSH to standby: Uninstall UFM and load images
Install UFM on master with saved install arguments
Install UFM on standby with saved install arguments.
Restore configurations
Restore preserved files
Start HA cluster

Note

The restore must be executed from the master node.
SSH trust with the standby node is set up automatically.
The entire HA cluster is offline for the duration of the restore.

Restore from Custom Location

Restore from custom backup directory. Run:

Copy
Copied!

            
            ufm_versions_mgr restore --backup-dir /mnt/external/ufm-backup

Preserved Files

During full system restore, the following files from the current system are preserved:

	File / Directories	Reason
PKey Configurations	`/opt/ufm/files/conf/opensm/partitions.conf`	Represents current fabric partitioning. Changing PKEYs can disrupt running workloads.
Unhealthy Ports Configuration	/ opt /ufm/files/conf/opensm/opensm-health-policy.conf	Current health policies should survive version changes.
Existing Backups	`/opt/ufm/files/backup`, `/opt/ufm/backup`	Maintain ability to restore again.

On This Page