NVIDIA UFM Enterprise User Manual v6.19.0

Changes and New Features

This section lists the new and changed features in this software version.

Note

For an archive of changes and features from previous releases, please refer to Changes and New Features History.

Note

XDR-related features were delivered at the Alpha level (XDR readiness only) and are scheduled to reach the Beta level by the November 2024 release.

Feature

Description

Telemetry Enhancements

Added the ability to zoom into XDR aggregated ports and view telemetry data per related plane ports. For more information, r efer to XDR Per-Plane Zoom-In.

Added integration with UTM plugin to avoid intermittent port zero counter values. Refer to Telemetry.

Added support for automatic handling of telemetry discovery in case of topology changes.

Managing Unhealthy Ports in XDR IB Clusters

Added the ability to set XDR aggregated ports as healthy or unhealthy. For more information, refer to Unhealthy Ports Window.

Switch Management via Web UI

Added a configurable option for accessing managed switch CLI and Web-UI via UFM Web-UI. For more information, refer to Devices Window.

Switch In-Service Upgrade Events

Added support for two new events - isolating and de-isolating actions of switch in-service upgrade. Refer to Threshold-Crossing Events Reference.

Global API for UFM Plugins Management

Added API for managing UFM plugins via UFM Multi-subnet consumer. Refer to Multi-Subnet UFM.

Module Temperature Events Update

Updated the naming and thresholds of the Module Temperature threshold reached events. Refer to Appendix - Supported Port Counters and Events.

Persistency for Certificate Authorities (CAs) Certificates

Added support for CAs certificate persistency, ensuring the same CA certificates are used in case of UFM HA failover/takeover. Refer to Setting Up SSL and CA Certificates in UFM.

SM Configuration Validation

Added support for automatic validation of SM configuration on HCAs. The Validation can be done upon demand via Fabric Validation. Refer to Events & Alarms → SM Configuration Events.

Supported Operating Systems

Added support for UFM HA Ubuntu24.04 and Debian 10 operating systems. Refer to Installation Notes.

Added support for UFM on CentOS Stream 10.

Podman Support

Added Podman support for Oracle. Refer to Podman Installation.

Plugin Health Test Enhancement

Updated the health test of the REST over RDMA plugin to test if the plugin is operating properly. For more information, refer to UFM Server Health Monitoring.

Software Upgrade - API Request Update

Extended the password length limitation from 20 to 64 characters for the following UFM actions: software upgrade, firmware upgrade, OFED upgrade, and profile update.

OpenSM static topology configuration REST API

Added support for managing OpenSM static topology configuration using REST API. Refer to the UFM REST API Documentation.

Plugins Changes and New Features

Plugin

Version

Changes and New Features

REST-RDMA Plugin

1.0.0-33

N/A

NDT Plugin

1.1.1-17

N/A

UFM Telemetry Fluentd Streaming (TFS) Plugin

1.0.15-2

As of v1.0.15-2, the plugin pushes telemetry data to FluentD. A new flag has been introduced to enable or suppress this feature, with the default value set to true.

UFM Events Fluent Streaming (EFS) Plugin

1.0.0-6

N/A

UFM Bright Cluster Integration Plugin

1.0.0-3

N/A

UFM Cyber-AI Plugin

2.10.0-8

N/A

IB Link Resiliency Plugin

1.0.0-3

Introduced the new IB Resiliency plugin, merging ALM and PDR plugins.

ClusterMinder Plugin 

1.1.7

Introduced the following changes:

Switch:

  1. Support NVOS switches

  2. Add amBER data for cumulus switches

HOST DTS ad-hoc mode:

  1. Collect data for "GeneralInfo" and "FirmwareConfigInfo"

Data sources:

  1. Add option to add label while adding new data sources

  2. Add option to update existing data sources

  3. add option to multiple remove data sources

Global changes:

  1. Histogram export to excel file

  2. Define Cluster Name via UI

  3. Component view for group differences

  4. Suspected error tab for all the services except DTS

Sysinfo Plugin 

1.1.1

N/A

SNMP Plugin

1.0.0-3

N/A

Packet Level Monitoring Collector (PMC) Plugin

1.19.10

Bug Fixes:

  • High and critical severity vulnerability in waitress

  • Plugin's failure to start web server when PMC process fails to run

PDR Deterministic Plugin

1.0.5-2

As of UFM Enterprise v6.19.0, the PDR plugin will not be supported.

Autonomous Link Maintenance (ALM) Plugin

2.9.1-2

As of UFM Enterprise v6.19.0, the ALM plugin will not be supported.

GNMI-Telemetry Plugin

1.2.12-5

New Features:

  • Notification Trigger: A notification will only be generated if at least one of the subscribed counters changes when the flag include_all_data=true is set

  • Partition Format: The partition now supports only the format nvidia/ib/1/3/guid[guid=*]/port[port_number=*/amber/*

  • Strict Mode Control: A new flag, strict_collected_counters, defaults to false. This flag manages strict mode for multi-path requests. If set to true, an error will be returned indicating the path is illegal, and no subscription stream will be started. If set to false, a message will be logged, and the existing counters will be sent

Bug Fixes:

  • Fixed issue with onchange subscription: The headers for the first message when subscribing to an onchange should be a string, not an array of strings

  • Fixed issue with incorrect plugin version returned by capabilities

UFM Telemetry Manager (UTM) Plugin

1.19.4

Bug Fixes:

  • Issue with primary and secondary session IDs not being recognized by the plugin

  • Issue with switch port status not being updated

UFM Consumer Plugin

1.0.0-16

N/A

Fast-API Plugin

1.0.0-2

N/A

UFM Light Plugin

1.1.0-2

Improved Topology API

Aligned new timestamp format with Telemetry.

Key Performance Indexes (KPI) Plugin

1.0.7-1

New Features:

  1. Introduced security updates

  2. Integrated the link flapping logic to the KPI plugin as a new KPI

Note

The items listed in the table below apply to all UFM license types.

Note

For bare metal installation of UFM, it is required to install MLNX_OFED 5.X (or newer) before the UFM installation.

Please make sure to use the UFM installation package that is compatible with your setup, as detailed in Bare Metal Deployment Requirements.

The following distributions are no longer supported in UFM:

  • RH7.0-RH7.7 / CentOS7.0-CentOS7.7

  • SLES12 / SLES 15

  • EulerOS2.2 / EulerOS2.3

  • Ubuntu18.04

Deprecated Features:

  • Mellanox Care (MCare) Integration

  • UFM on VM (UFM with remote fabric collector)

  • Logical server auditing

  • The UFM high availability script - /etc/init.d/ufmha - is no longer supported

  • The UFM Multi-site portal feature is no longer supported. The Multi-Subnet feature can be used instead

  • As of UFM Enterprise v6.19.0, the Autonomous Link Maintenance (ALM) and PDR Deterministic plugins are no longer supported.

  • The GRPC-Streamer plugin is deprecated.

  • As of UFM Enterprise v6.18.0, UFM Agent discovery will be disabled by default, and managed switches will be discovered in-band

  • As of UFM Enterprise v6.18.0, the ibdiagpathdiagnostic utility is deprecated

  • As of UFM Enterprise version 6.14.0, UFM Monitoring Mode is deprecated and is no longer supported

  • As of UFM Enterprise v6.12.0, the Logical Elements tab is removed

  • Removed the following fabric validation tests: CheckPortCounters & CheckEffectiveBER

Note

In order to continue working with /etc/init.d/ufmha options, use the same options using the /etc/init.d/ufmd script.

For example:

Instead of using /etc/init.d/ufmha model_restart, please use /etc/init.d/ufmd model_restart (on the primary UFM server)

Instead of using /etc/init.d/ufmha sharp_restart, please use /etc/init.d/ufmd sharp_restart (on the primary UFM server)

The same goes for any other option that was supported on the /etc/init.d/ufmha script

© Copyright 2024, NVIDIA. Last updated on Nov 7, 2024.