NVIDIA UFM Enterprise User Manual v6.22.1

Changes and New Features

Note

Notes:

  • For an archive of changes and features from previous releases, please refer to Changes and New Features History.

  • The items listed in the table below apply to all UFM license types.

  • For bare metal installation of UFM, it is required to install MLNX_OFED 5.X (or newer) before the UFM installation. Please make sure to use the UFM installation package that is compatible with your setup, as detailed in Bare Metal Deployment Requirements.

Feature

Description

TACACS+ Authentication

Added support for TACACS+ authentication that allows users to access UFM REST API . For more information, refer TACACS+ Authentication.

Replacement of a Faulty Standby Node

Added support for automating the replacement of a failed standby node in a UFM High Availability (HA) cluster, ensuring minimal downtime and minimal manual intervention. For more information, refer to NVIDIA UFM High-Availability User Guide.

UFM Upgrade

Added the ability to preserve UFM Health user settings that were set in previous versions during UFM upgrade procedure. For more information, refer to Upgrading UFM Software.

Added support for upgrading UFM Enterprise when running in rootless mode. For more information, refer to Upgrading UFM Podman Rootless Container.

UFM Infra

Added support for running UFM Infra with Docker Root.

Updated the UFM Infra Using Rootless with Podman installation instructions. For more information refer to Installing UFM Infra Using Rootless with Podman.

Fabric Health

Added support for using UFM fabric as a unified tool. For more information, refer to Fabric Health Tab. For the REST API, refer to Reports REST API.

OpenSM Root GUID Configuration

Added the ability to configure OpenSM Root GUID via the UFM Web UI. For more information, refer to Subnet Manager Tab. For the REST API, refer to SM Configuration REST API.

UFM Monitoring Metrics

Added support for collecting and analyzing UFM API usage, providing detailed and aggregated metrics over a specified time range.

Network Fast Recovery

Introduced a new parameter field (network_fast_recovery_action) that allows users to configure their preferred automatic response when the switch detects a network fast recovery condition. For more information, refer to Enabling Network Fast Recovery.

Managed and Unmanaged Switch Fan/PSU Speed

Enhanced the Modules APIs by exposing the fan_speedfor each module and extracting the RPM speed for each module (Fans and PSUs) for all switch types. A new event will be triggered when the fan speed is running above/below the threshold. For more information, refer to Inventory Window. For the REST API, refer to Modules REST API.

UFM Configuration Files Validation

Added the ability to detect incompatible or incorrect configuration files such as gv.cfg and opensm.conf files imported by users to ensure alignment with the running UFM software version. For more information, refer Validating UFM Configuration Files.

Second Source Cable Transceiver Burning

Added support for faster burning and activation of second source cable transceivers by replacing the previous sequential process with the LinkX tool, enabling parallel and more efficient programming.

This feature requires specific firmware and operating system versions on the switch side. Please ensure the following versions are used:

  • MLNX-OS Switch Firmware: 31.2014.4088

  • NVOS: nvos-25.02.4014

NVOS CLI Changes

Aligned changes made in the NVOS switches' CLIs to align the user experience between the NVOS and Cumulus operation systems. These changes were covered in the UFM to support both the old and the new NVOS CLI changes.

REST APIs

TACACS+ Support

Added TACACS support for UFM REST API. For more information, refer to TACACS+ Authentication.

OpenSM Static Topology Configuration REST API

Added support for updating the state of a single link using static topology configuration file.

SM Configuration REST API

Added the ability to configure OpenSM root GUID.

Reports REST API

Added support for using UFM fabric as a unified tool.

Metrics Collector REST API

Added support for collecting metrics on UFM API behavior within a specified time range. The feature aggregates API call statistics and deliver both high-level overviews and detailed views of API performance and usage patterns.

Note

In gv.cfg, the xdr_enabled flag is used to properly process the XDR fabrics in UFM. It is set to "True" by default and does not impact UFM behavior on legacy fabrics. Starting with UFM Enterprise v6.23.0, this flag will be removed

Tool

Version

Changes and New Features

SHARP

3.12.0

Improved Handling of MAD Errors

Enhanced SHARP_am's response to MAD errors, where instead of marking a switch as entirely unusable, it now deprioritizes the switch while keeping it eligible for job selection when alternatives are limited. Cleanup still occurs when possible, reducing disruption and improving resilience.

IBUtils2 Utility

2.23.0

PHY Plugin

Update MGIR register.

Updated PDDR registers.

Updated PEMI registers.

Validation

Updated CA-CA routing.

Updated credit loop validation .

General

Updated SMP cable to avoid sending its information for NDR /XDR switches.

Added support for Q3400 and Q3200 port label explanation.

Added support for fetch XmitDiscardDetailscounters for port 0 of switches.

OpenSM

5.24.0

General

Added support for sending SwitchPortStateTableMADs during light sweep to identify topology changes.

Removed the requirement for specifying FNM ports in topology file on planarized subnets.

Logging

Updated log messages related to APort validation .

Added port numbers to port recovery trap log messages.

Parameter Changes

Parameter Name

Status

Type

Description

dfp_down_up_turns_mode

Update

Numeric

Change default to 1 (disable down up turns)

additional_mepi_force_devices

Update

String

Support "all" keyword

additional_gi_supporting_devices

Update

String

Support "all" keyword

light_sweep_spst

New

Boolean

Enable/disable sending SPST MAD during light sweeps (default TRUE)

Plugin

Version

Changes and New Features

REST-RDMA Plugin

1.0.0-39

N/A

NDT Plugin

1.1.1-24

Added support for XDR (plane ports).

UFM Telemetry Fluentd Streaming (TFS) Plugin

1.1.1-1

N/A

UFM Events Fluent Streaming (EFS) Plugin

1.0.0-6

N/A

UFM Bright Cluster Integration Plugin

1.0.0-3

N/A

IB Link Resiliency Plugin

1.1.4-1

New Features:

  • Added XDR support for deterministic failure detection

  • Added per tier link down thresholds

  • Added interface-name to event log

  • Updated the supported UFM REST APIs

Bug Fix:

  • Enhanced data retention policy handling

ClusterMinder Plugin

1.1.12

N/A

Sysinfo Plugin 

1.1.1

N/A

SNMP Plugin

1.0.0-3

N/A

Packet Level Monitoring Collector (PMC) Plugin

1.19.33

Lev

GNMI-Telemetry Plugin

1.3.7-3

New Features:

Bug Fixes:

  • Resolved "TelemetrySize" in inventory field issue when fetching a single telemetry endpoint (ref #4505247)

  • Resolved extensive memory consumption of gNMI. (ref #4535756)

UFM Telemetry Manager (UTM) Plugin

1.21.3

Added the ability to run on SELinux and rootless environment.

UFM Consumer Plugin

1.0.0-16

N/A

Fast-API Plugin

1.0.5-2

N/A

UFM Light Plugin

1.1.0-2

N/A

Key Performance Indexes (KPI) Plugin

1.0.9-0

New Feature:

  • Added new telemetry KPIs

UFM Events Grafana Dashboard Plugin

1.0.2-0

Limitations:

  • Not supported on UFM Gen 2.0

  • FluentD fails with RHEL9 OS.

Log Streamer Plugin

1.0.1-2

N/A

GNMI NVOS Events Plugin

1.0.1-1

N/A

Unmanaged Switch Dump (USD) Plugin

1.0.1-0

N/A

NVLink Plugin

1.2.1-3

New Features:

Added support for NVLink Telemetry (NMX-T):

  • Viewing telemetry controller properties, including health status

  • Viewing telemetry configuration

  • Health status change events

The following distributions are no longer supported in UFM:

  • RH7.0-RH7.7 / CentOS7.0-CentOS7.7

  • SLES12 / SLES 15

  • EulerOS2.2 / EulerOS2.3

  • Ubuntu18.04

Deprecated Features:

  • Mellanox Care (MCare) Integration

  • UFM on VM (UFM with remote fabric collector)

  • Logical server auditing

  • The UFM high availability script - /etc/init.d/ufmha - is no longer supported

  • The UFM Multi-site portal feature is no longer supported. The Multi-Subnet feature can be used instead

  • As of UFM Enterprise v6.19.0, the Autonomous Link Maintenance (ALM) and PDR Deterministic plugins are no longer supported.

  • The GRPC-Streamer plugin is deprecated.

  • As of UFM Enterprise v6.18.0, UFM Agent discovery will be disabled by default, and managed switches will be discovered in-band

  • As of UFM Enterprise v6.18.0, the ibdiagpathdiagnostic utility is deprecated

  • As of UFM Enterprise version 6.14.0, UFM Monitoring Mode is deprecated and is no longer supported

  • As of UFM Enterprise v6.12.0, the Logical Elements tab is removed

  • As of UFM Enterprise v6.23.0, the xdr_enabledflag will be removed. It is used to process the XDR fabric properly in UFM. This flag is set to "True" by default and does not affect UFM behavior on legacy fabrics.

  • Removed the following fabric validation tests: CheckPortCounters & CheckEffectiveBER

Note

In order to continue working with /etc/init.d/ufmha options, use the same options using the /etc/init.d/ufmd script.

For example:

Instead of using /etc/init.d/ufmha model_restart, please use /etc/init.d/ufmd model_restart (on the primary UFM server)

Instead of using /etc/init.d/ufmha sharp_restart, please use /etc/init.d/ufmd sharp_restart (on the primary UFM server)

The same goes for any other option that was supported on the /etc/init.d/ufmha script

© Copyright 2025, NVIDIA. Last updated on Sep 3, 2025.