What can I help you with?
NVIDIA UFM Enterprise User Manual v6.20.0

Changes and New Features History

Note

The items listed in the table below apply to all UFM license types.

Feature

Description

Rev 6.19.6

ConnectX-8 Support

Added support for ConnectX-8

Rev 6.19.0

Telemetry Enhancements

Added the ability to zoom into XDR aggregated ports and view telemetry data per related plane ports. For more information, r efer to XDR Per-Plane Zoom-In.

Added integration with UTM plugin to avoid intermittent port zero counter values. Refer Telemetry.

Added support for automatic handling of telemetry discovery in case of topology changes.

Managing Unhealthy Ports in XDR IB Clusters

Added the ability to set XDR aggregated ports as healthy or unhealthy. For more information, refer to Unhealthy Ports Window.

Switch Management via Web UI

Added a configurable option for accessing managed switch CLI and Web-UI via UFM Web-UI. For more information, refer to Devices Window.

Switch In-Service Upgrade Events

Added support for two new events - isolating and de-isolating actions of switch in-service upgrade. Refer to Threshold-Crossing Events Reference.

Global API for UFM Plugins Management

Added API for managing UFM plugins via UFM Multi-subnet consumer. Refer to Multi-Subnet UFM.

Module Temperature Events Update

Updated the naming and thresholds of the Module Temperature threshold reached events. Refer to Appendix - Supported Port Counters and Events.

Persistency for Certificate Authorities (CAs) Certificates

Added support for CAs certificate persistency, ensuring the same CA certificates are used in case of UFM HA failover/takeover. Refer to Setting Up SSL and CA Certificates in UFM.

SM Configuration Validation

Added support for automatic validation of SM configuration on HCAs. The Validation can be done upon demand via Fabric Validation. Refer to Events & Alarms → SM Configuration Events.

Supported Operating Systems

Added support for UFM HA Ubuntu24.04 and Debian 10 operating systems. Refer to Installation Notes.

Added support for UFM on CentOS Stream 10.

Podman Support

Added Podman support for Oracle. Refer to Podman Installation.

Plugin Health Test Enhancement

Updated the health test of the REST over RDMA plugin to test if the plugin is operating properly. For more information, refer to UFM Server Health Monitoring.

Software Upgrade - API Request Update

Extended the password length limitation from 20 to 64 characters for the following UFM actions: software upgrade, firmware upgrade, OFED upgrade, and profile update.

OpenSM static topology configuration REST API

Added support for managing OpenSM static topology configuration using REST API. Refer to the UFM REST API Documentation.

Rev 6.18.0

UFM Reports Enhancements

Added an option for excluding unhealthy ports from UFM reports based on ibdiagnet(for example: Fabric Health report, Fabric Validation tests). For more information, refer to Excluding Unhealthy Ports from Fabric Health Report.

Telemetry Enhancements

Added support for Egress Queue depth indications (as part of UFM secondary telemetry instance). For more information, refer to Exposing Performance Histogram Counters.

Added support for Extended Port VL Xmit Time Congestion counters (as part of UFM secondary telemetry instance).

UFM Configuration Adjustments

Added the option for auto-setting of UFM configuration based on fabric size (large scale, small scale). For more information, refer to Adjusting UFM Configuration Files Based on Fabric Size.

UFM Container Timezone

The UFM Container has been updated to operate in the host machine's time zone instead of UTC.

UFM Events

Added the ability to update thresholds, severities, and durations (TTL) for selected UFM Events.

Added a new UFM event for indicating asymmetric Adaptive Routing (based on SM trap). For more information, refer to Appendix - Supported Port Counters and Events.

Topology Changes Reports Enhancements

Enhanced the topology change indication from the master topology and enabled a quick drill-down to the associated topology change report. For more information, refer to Topology Compare Tab and Events & Alarms.

Multi-Subnet UFM

Added support for running UFM Fabric validation Tests from UFM Multi-Subnet Consumer. For more information, refer to Multi-Subnet UFM.

UFM Docker Container Deployment

Added support for deploying UFM as a docker on Oracle Linux 8. For more information, refer to Installation Notes.

UFM-HA

HA Deployment: Added support for deploying UFM HA on Ubuntu24.04.

HA Configuration: Added configurable failover criteria (management interface loss-of-link).

UFM System Dump Analyser

Introduced an internal debugging tool for more efficient analysis of UFM system dumps.

Plugins

REST-RDMA Plugin

Added support for client certificate authentication when communicating between the client and the REST over RDMA plugin server.

UFM Light Plugin

Added support for UFM Light Plugin to create a reduced UFM model and deliver a high-performance REST API.

Key Performance Indexes (KPI) Plugin

Added support for the KPI plugin which periodically collects telemetry metrics and topology data from one or multiple UFM Telemetry and UFM clusters to calculate high-level Key Performance Indicators (KPIs).

DTS Plugin

Added support for the ClusterMind plugin which collects telemetry data from multiple data sources and aggreats, streams and visualizes the backend.

Packet Level Monitoring Collector (PMC) Plugin

Added the option to collect PHY link-down event indications through fast-recovery notification channels.

UFM Plugins Management

Added support for UFM plugin management using the manage_ufm_plugins.sh script.

Plugins Bundle

Added support for a single deployment of plugins to extend functionalities of the UFM ecosystem.

REST APIs

UFM-Forge Integration

Added support for setting SM resource limitation. For more information, refer to the Physical-Virtual GUID Mapping REST API.

SHARP Jobs Performance Analysis

Added a new REST API which expose SHARP Job statistics data. For more information, refer to NVIDIA SHARP REST API

UFM Logging

Added caller (IP Address) and duration logging info for all REST API calls.

UFM Version API Enhancement

Added a REST API to retrieve the versions of major UFM components and enabled plugins.

Rev 6.17.0

XDR Support

Added XDR support readiness (based on XDR setup simulations only).

Added support for UFM Network topology planarized network.

Switch NVOS Support

Added support for NVOS switches in UFM.

Device Access

Added the ability to record two sets of switch login credentials on UFM. Refer to Device Access

UFM Authentication Server

The authentication server is enabled by default. Refer to Configurations of the UFM Authentication Server.

InfiniBand Cluster Procedures Automation

Upon UFM startup, the following procedures are initiated:

Secondary Telemetry

  • Added the " rq_general_error" field to support retrieving the number of packet drops due to MPR mismatch.

  • Added support for reporting cable length information for NDR optic cables.

  • Added support for retrieving cable information for downed ports.

  • Added switch/module power usage data in UFM telemetry.

For more information, refer to Low-Frequency (Secondary) Telemetry Fields.

Events Simulation

Added the ability to simulate any event from the Events policy tab. Refer to Events Policy Simulation.

Unhealthy Port Enhancement

Added the ability to display valid unhealthy port information (eliminating non-zero port values) when added manually. Refer to Unhealthy Ports Window.

UFM High Availability

Added support for UFM HA to configure IPv4 and IPv6 concurrently to provide Virtual IP address. Refer to UFM High-Availability User Guide.

REST APIs

Unhealthy Ports REST API: Added the ability to return device state (healthy/unhealthy). Refer to NVIDIA UFM Enterprise REST API Guide → Unhealthy Ports REST API

Add switch/module power usage data in UFM telemetry. Refer to NVIDIA UFM Enterprise REST API Guide → Systems REST API

Plugins

Enhanced Plugins management infrastructure.

GNMI-Telemetry Plugin: Added support for streaming gNMI events and restricted authentication via client SAN pinning/filtering on the gNMI plugin server-side.

UFM Telemetry Manager (UTM) Plugin: Added a new plugin to monitor and manage running UFM Telemetry instances.

UFM Consumer Plugin: Added a new plugin that serves as a Multi-Subnet consumer within UFM, offering all the functionalities available for Multi-Subnet UFM.

PDR Deterministic Plugin: Updated high BER analysis with the up-to-date high BER algorithm.

Autonomous Link Maintenance (ALM) Plugin: Added the following capabilities:

  • Model configuration

  • Configure to reset both ports after isolation

  • Reflect model performance to the user

UFM Telemetry Fluentd Streaming (TFS) Plugin: Added an option to the TFS to stream the data using the CLX C streamer instead of the Python streamer.

© Copyright 2025, NVIDIA. Last updated on Feb 11, 2025.