NVIDIA UFM Enterprise User Manual v6.23.1

Upgrade

The upgrade feature allows you to upgrade UFM to a new version using the ufm_versions_mgr upgrade command. It provides automated upgrade with backup protection for both Standalone (SA) and High Availability (HA) environments.

  • Automatic backup before upgrade (can be skipped)

  • Version validation

  • SA: In-place upgrade

General

  • Valid UFM .gz image file

  • Not supported on UFM Appliance

HA Environment

  • Must run from the standby node (not master)

  • Same ufm_versions_mgr version on both nodes

  • SSH access from standby to master

SA Upgrade Process

The SA upgrade performs an in-place upgrade with the following steps:

  1. Validate .gz file exists and is readable

  2. Create full backup (unless --skip-backup)

  3. Tag current UFM image as backup

  4. Load new UFM image

  5. Run container upgrade command

  6. Copy service file (if rootless environment)

  7. Start UFM service

  8. Cleanup backup image tag

Basic SA Upgrade

Upgrade UFM on standalone system. Run:

Copy
Copied!
            

ufm_versions_mgr upgrade /path/to/ufm-6.24.0.gz

SA Upgrade with Options

Skip automatic backup (not recommended). Run:

Copy
Copied!
            

ufm_versions_mgr upgrade /path/to/ufm-6.24.0.gz --skip-backup

Preview upgrade without executing. Run:

Copy
Copied!
            

ufm_versions_mgr upgrade /path/to/ufm-6.24.0.gz --dry-run

Upgrade with detailed output. Run:

Copy
Copied!
            

ufm_versions_mgr upgrade /path/to/ufm-6.24.0.gz --verbose

Command Options

Option

Description

UFM_GZ_FILE

Path to new UFM .gz image file (required)

--skip-backup

Skip automatic backup before upgrade (not recommended)

--dry-run

Preview operation without executing

--verbose

Enable detailed output

Note
  • A full backup is automatically created before the upgrade to enable rollback if needed.

  • Upgrading the SA results in a short UFM downtime while the service restarts.


HA Upgrade Process

The HA upgrade follows a staged approach to ensure zero downtime:

  1. Detect the environment and confirm the upgrade is running on the standby node.

  2. Discover the master node IP from the cluster.

  3. Validate or establish SSH trust with the master node.

  4. Verify that the tool is installed on the master and that the version matches.

  5. Create a backup on the master via SSH (unless --skip-backup is specified).

  6. Read backup metadata from the master.

  7. Prepare the standby node:

    • Tag the current image as a backup

    • Load the new UFM image

    • Clean up the backup tag

After Step 7: Perform a manual failover, then upgrade the new standby node by loading the UFM docker image, and running the install command the tool outputed.

Downtime: None - Only when the user triggers the failover.

Basic HA Upgrade

Step 1: Upgrade the standby node. Run on standby:

Copy
Copied!
            

ufm_versions_mgr upgrade /path/to/ufm-6.24.0.gz

Step 2: Perform manual failover. Run on master:

Copy
Copied!
            

ufm_ha_cluster failover

Step 3: Upgrade new STANDBY (old master). Run on new standby:

Copy
Copied!
            

ufm_versions_mgr upgrade /path/to/ufm-6.24.0.gz

HA Upgrade with Options

Skip backup on master (not recommended). Run on standby:

Copy
Copied!
            

ufm_versions_mgr upgrade /path/to/ufm-6.24.0.gz --skip-backup

Preview upgrade. Run on standby:

Copy
Copied!
            

ufm_versions_mgr upgrade /path/to/ufm-6.24.0.gz --dry-run

Command Options

UFM_GZ_FILE

Path to new UFM .gz image file (required)

--skip-backup

Skip backup on master node (not recommended)

--dry-run

Preview operation without executing

--verbose

Enable detailed output

Note
  • The upgrade must be executed from the STANDBY node, not the master.

  • The tool version must be identical on both nodes.

  • The master node remains unaffected throughout the upgrade process.


SA Upgrade Steps (Detailed)

  1. Validate .gz file - Check file exists, readable, valid format

  2. Create full backup - Full system backup for rollback (unless --skip-backup)

  3. Tag current image - Tag ufm:latest as ufm:version-backup

  4. Load new image - Load new UFM from the .gz file

  5. Run upgrade - Execute docker/podman run with --upgrade flag

  6. Copy service file - If rootless, copy service file to systemd

  7. Start UFM - Start ufm-enterprise service

  8. Cleanup - Remove backup image tag

Rollback: If step 4 or 5 fails, automatic rollback re-tags backup as latest

HA Upgrade Steps (Detailed)

  1. Detect environment - Validate running on STANDBY node (not master)

  2. Discover master IP - Get master node IP from HA cluster

  3. Validate SSH trust - Establish/verify SSH trust with master

  4. Validate tool on master - Check ufm_versions_mgr installed and same version

  5. Backup on master - SSH to master, trigger full backup (unless --skip-backup)

  6. Read master metadata - SSH to master, read backup info for validation

  7. Prepare standby:

    • Tag current image as backup

    • Load new UFM image from .gz

    • Remove backup tag

After Upgrade: User performs manual failover, then repeats on new standby

© Copyright 2025, NVIDIA. Last updated on Nov 20, 2025