What can I help you with?
NVIDIA UFM Enterprise User Manual v6.19.5

Changes and New Features History

Note

The items listed in the table below apply to all UFM license types.

Feature

Description

Rev 6.18.0

UFM Reports Enhancements

Added an option for excluding unhealthy ports from UFM reports based on ibdiagnet(for example: Fabric Health report, Fabric Validation tests). For more information, refer to Excluding Unhealthy Ports from Fabric Health Report.

Telemetry Enhancements

Added support for Egress Queue depth indications (as part of UFM secondary telemetry instance). For more information, refer to Exposing Performance Histogram Counters.

Added support for Extended Port VL Xmit Time Congestion counters (as part of UFM secondary telemetry instance).

UFM Configuration Adjustments

Added the option for auto-setting of UFM configuration based on fabric size (large scale, small scale). For more information, refer to Adjusting UFM Configuration Files Based on Fabric Size.

UFM Container Timezone

The UFM Container has been updated to operate in the host machine's time zone instead of UTC.

UFM Events

Added the ability to update thresholds, severities, and durations (TTL) for selected UFM Events.

Added a new UFM event for indicating asymmetric Adaptive Routing (based on SM trap). For more information, refer to Appendix - Supported Port Counters and Events.

Topology Changes Reports Enhancements

Enhanced the topology change indication from the master topology and enabled a quick drill-down to the associated topology change report. For more information, refer to Topology Compare Tab and Events & Alarms.

Multi-Subnet UFM

Added support for running UFM Fabric validation Tests from UFM Multi-Subnet Consumer. For more information, refer to Multi-Subnet UFM.

UFM Docker Container Deployment

Added support for deploying UFM as a docker on Oracle Linux 8. For more information, refer to Installation Notes.

UFM-HA

HA Deployment: Added support for deploying UFM HA on Ubuntu24.04.

HA Configuration: Added configurable failover criteria (management interface loss-of-link).

UFM System Dump Analyser

Introduced an internal debugging tool for more efficient analysis of UFM system dumps.

Plugins

REST-RDMA Plugin

Added support for client certificate authentication when communicating between the client and the REST over RDMA plugin server.

UFM Light Plugin

Added support for UFM Light Plugin to create a reduced UFM model and deliver a high-performance REST API.

Key Performance Indexes (KPI) Plugin

Added support for the KPI plugin which periodically collects telemetry metrics and topology data from one or multiple UFM Telemetry and UFM clusters to calculate high-level Key Performance Indicators (KPIs).

ClusterMinder Plugin 

Added support for the ClusterMind plugin which collects telemetry data from multiple data sources and aggreats, streams and visualizes the backend.

Packet Level Monitoring Collector (PMC) Plugin

Added the option to collect PHY link-down event indications through fast-recovery notification channels.

UFM Plugins Management

Added support for UFM plugin management using the manage_ufm_plugins.sh script.

Plugins Bundle

Added support for a single deployment of plugins to extend functionalities of the UFM ecosystem.

REST APIs

UFM-Forge Integration

Added support for setting SM resource limitation. For more information, refer to the Physical-Virtual GUID Mapping REST API.

SHARP Jobs Performance Analysis

Added a new REST API which expose SHARP Job statistics data. For more information, refer to NVIDIA SHARP REST API

UFM Logging

Added caller (IP Address) and duration logging info for all REST API calls.

UFM Version API Enhancement

Added a REST API to retrieve the versions of major UFM components and enabled plugins.

Rev 6.17.0

XDR Support

Added XDR support readiness (based on XDR setup simulations only).

Added support for UFM Network topology planarized network.

Switch NVOS Support

Added support for NVOS switches in UFM.

Device Access

Added the ability to record two sets of switch login credentials on UFM. Refer to Device Access

UFM Authentication Server

The authentication server is enabled by default. Refer to Configurations of the UFM Authentication Server.

InfiniBand Cluster Procedures Automation

Upon UFM startup, the following procedures are initiated:

Secondary Telemetry

  • Added the " rq_general_error" field to support retrieving the number of packet drops due to MPR mismatch.

  • Added support for reporting cable length information for NDR optic cables.

  • Added support for retrieving cable information for downed ports.

  • Added switch/module power usage data in UFM telemetry.

For more information, refer to Low-Frequency (Secondary) Telemetry Fields.

Events Simulation

Added the ability to simulate any event from the Events policy tab. Refer to Events Policy Simulation.

Unhealthy Port Enhancement

Added the ability to display valid unhealthy port information (eliminating non-zero port values) when added manually. Refer to Unhealthy Ports Window.

UFM High Availability

Added support for UFM HA to configure IPv4 and IPv6 concurrently to provide Virtual IP address. Refer to UFM High-Availability User Guide.

REST APIs

Unhealthy Ports REST API: Added the ability to return device state (healthy/unhealthy). Refer to NVIDIA UFM Enterprise REST API Guide → Unhealthy Ports REST API

Add switch/module power usage data in UFM telemetry. Refer to NVIDIA UFM Enterprise REST API Guide → Systems REST API

Plugins

Enhanced Plugins management infrastructure.

GNMI-Telemetry Plugin: Added support for streaming gNMI events and restricted authentication via client SAN pinning/filtering on the gNMI plugin server-side.

UFM Telemetry Manager (UTM) Plugin: Added a new plugin to monitor and manage running UFM Telemetry instances.

UFM Consumer Plugin: Added a new plugin that serves as a Multi-Subnet consumer within UFM, offering all the functionalities available for Multi-Subnet UFM.

PDR Deterministic Plugin: Updated high BER analysis with the up-to-date high BER algorithm.

Autonomous Link Maintenance (ALM) Plugin: Added the following capabilities:

  • Model configuration

  • Configure to reset both ports after isolation

  • Reflect model performance to the user

UFM Telemetry Fluentd Streaming (TFS) Plugin: Added an option to the TFS to stream the data using the CLX C streamer instead of the Python streamer.

Rev 6.16.0

Syslog Streaming

Added the option for setting UFM syslog streaming facility. For more information, refer to Configuring Syslog

Switch Cables REST API

Added the option to query specific switch cables (using Ports API).

Switch Power Information

Added support for switch and modules power usage data in UFM telemetry and REST API​. For more information, refer to Devices Window and Inventory Window.

UFM Data Streaming

Added the ability to change the UFM Data streaming log facility. For more information, refer to Configuring Syslog and Configuring UFM Logging.

Kerberos Authentication

Added the ability for Kerberos authentication, a strong network authentication protocol for client-server applications. For more information, refer to Kerberos Authentication and Enabling Kerberos Authentication.

SM Settings

Changed the default maximal number of VLs to 2 (VL0 – VL1)​. For more information, refer to Appendix – UFM Subnet Manager Default Properties.

Cable Management

Added support for showing transceiver information for downed links. For more information, refer to Cables Window and Network Map.

Secondary Telemetry

Added the secondary_slvl_support flag and information on the default counters. For more information, refer to Secondary Telemetry.

Congestion Control

Added support for SM congestion control settings. For more information, refer to Appendix - OpenSM Configuration Files for Congestion Control.

UFM HA

Enhanced reliability and added support for setting UFM HA on LVM (Logical Volume Manager). For more information, refer to UFM High-Availability Documentation.

Plugins

Packet Mirroring Collector (PMC) Plugin: Added support for event on PF indicating a QP closing with error on any other GVMI/VF. For more information, refer to Packet Level Monitoring Collector (PMC) Plugin.

PDR Deterministic Plugin: Updated instructions. For more information, refer to PDR Deterministic Plugin.

GNMI-Telemetry Plugin: Added gNMI telemetry streaming support ​(supporting secured mode streaming). For more information, refer to GNMI-Telemetry Plugin.

NDT Plugin (Subnet Merger): Added the option to validate the extended fabric using cable validation tool. For more information, refer to the NDT Plugin.

© Copyright 2024, NVIDIA. Last updated on Jan 7, 2025.