NVIDIA UFM-SDN Appliance User Manual v4.11.0

Changes and New Features History

Feature

Description

v4.10.0 (UFM Enterprise v6.11.0)

UFM Discovery and Device Management

  • InBand autosicovery of switchs' IP addresses using ibdiagnet

  • Discovering the device's PSID and FW version using ibdiagnet by default instead of using an SM vendor plugin

CPU Affinity

Enabling the user to control CPU affinity of UFM's major processes

gRPC API

Added support for streaming UFM REST API data over gRPC as part of new UFM plugin. Refer to GRPC-Streamer Plugin.

Telemetry

  • Added support for flexible counters infrastructure (ability to change counter sets that are sampled by the UFM)

  • Updated the set of available counters for Telemetry (removed General counters from default view: Row BER, Effective BER and Device Temperature.
    Now available through the secondary telemetry instance). Refer to Secondary Telemetry

EFS UFM Plugin

Added support for streaming UFM events data to FluentD destination as part of a new UFM plugin. Refer to UFM Telemetry Fluent Streaming (TFS) Plugin

General UI Enhancements

• Displayed columns of all tables are persistent per user, with the option to restore defaults. Refer to Displayed Columns
• Improved look and feel in Network Map. Refer to Network Map
• Added Reveal Uptime to the general tab in the devices information tabs. Refer to Device General Tab

REST APIs

Added support for PKey filtering for default session data. Refer to Get Default Monitoring Session Data by PKey Filtering.

Added support for filtering session data by groups. Refer to Monitoring Sessions REST API.

Added support for resting all unhealthy ports at once. Refer to Mark All Unhealthy Ports as Healthy at Once

Added support for presenting system uptime in UFM REST API. Refer to Systems REST API.

Deployment Installation

UFM installation is now based on Conda-4.12 (or newer) for Python v3.9 environment and third party packages deployments

MLNX_OFED Package

Upgraded the MLNX_OFED version to v5.7-1.0.2.0

Diagnostics Utilities

Added CLI commands for new diagnostic utilities

NVIDIA SHARP Software

Updated NVIDIA SHARP software version to v3.1.1.

UFM Logical Elements

UFM Logical Elements (Environments, Logical Servers, Networks) views are deprecated and will no longer be available starting from UFM v4.11.0 (January 2023 release)

v4.9.0 (UFM Enterprise v6.10.0)

UFM Package

Integrated with UFM Enterprise v6.10.0

System health enhancements

Add support for the periodic fabric health report, and reflected the ports' results in UFM's dashboard

UFM Plugins Management

Add support for plugin management via UFM web UI

UFM Extended Status

Added REST API for exposing UFM readiness

Failover to Other Ports

Add support for SM and UFM Telemetry failover to other ports on the local machine

UFM Appliance Upgrade

Added a set of REST APIs for supporting the UFM Appliance upgrade

Configuration Audit

Add support for tracking changes made in major UFM configuration files (UFM, SM, SHARP, Telemetry)

UFM Plugins

Add support for new SDK plugins

Telemetry

Add support for statistics processing based on UFM telemetry csv format

UFM High Availability Installation

UFM high availability installation has changed and it is now based on an independent high availability package which should be deployed in addition to the UFM Enterprise standalone package.
for further details about the new UFM high availability installation, please refer to - Installing UFM Server Software for High Availability

v4.8.0 (UFM Enterprise v6.9.0)

NDR Support

Full E2E NDR including ConnectX-7 HCAs Family (Discovery and Monitoring)

Cable FW Burn

Add support for multiple switches with multiple FW images burning

Events

Add support for monitoring and alerting on cable transceiver temperatures over threshold

Improve SM traps handling (offloading SM traps handling to a separated process)

Add option for setting events persistency (keeping max last X events) for showing upon UFM startup

Add option for consolidating similar events on the UFM Web UI Events Log View

SHARP

Add support for failover to secondary bond port in case of IB interface failure

Add option to override SHARP smx_sock_interface based on UFM fabric_interface (gv.cfg)

Add option to set SHARP AM ib_port_guid based on UFM fabric_interface (gv.cfg)

SM

Add support for tracking SM configuration changes (configuration history)

Add support for pkey assignment validation (for user defined pkey assignment only)

Client Certificate Authentication

Add option to push bootstrap certificate to the UFM via REST API

MFT Integration Enhancement

Add support for MFT based operation (FW burning, cable info) while m_key/vs_key are configured on SM

UFM Health

Add option for users to add customized health tests based on scripts (Python/Bash)

Web UI Enhancements

Add support for user defined modular UFM dashboard views (based on available list of pre-defined panels)

Add support for UFM dashboard timeline (for viewing historical dashboard views)

Enhance the dashboard inventory view for showing elements (HCAs, Switches, Cables, Gateways, Routers) by version

Add support for user defined modular UFM telemetry persistent dashboard (Telemetry View)

Adding option for viewing Web client data based on local client time or UFM server time

Add option to select UFM look and feel between dark mode and light mode (default is light mode)

Add support for hierarchical view when presenting the network map elements.

Add option for selecting the displayed columns for all data tables.

Add option for exporting all table data into CSV (not only the current displayed page data)

Improved view of the ports table (port name, speed and width)

Add option to show disabled/down ports

Add support for Web UI usage statistics collection

Add option for sending test email

UFM Plugins

Add support for running UFM plugins within UFM docker container

Add support for AHX monitoring plugin

Add support for TFS (Telemetry Fluentd streaming) plugin

v4.7.0 (UFM Enterprise v6.8.0)

IButils2 Package

Upgraded the ibdiagnet version integrated with UFM Appliance to v2.8.2 with temperature alarm/warning reporting

UFM Telemetry

Changed the Telemetry infrastructure from UFM Telemetry docker container to UFM Telemetry bare metal

Performance improvements for supporting telemetry on large scale fabrics (up to 216,000 ports fabric)

Live sessions enhancements – adding support for multiple telemetry sessions based on one UFM Telemetry instance

Add support for collecting historical telemetry (all fabric ports counters) by default

Unhealthy Port

Add option (configurable) for automatically Isolating ports which were detected with high BER

Add option to present unhealthy port table by the connection type (switch-switch or switch-host)

Add option to mark selected device as unhealthy

Add context menu options for selected unhealthy ports

UFM Plugins – REST over RDMA

Add support for REST API over RDMA plugin (allowing execution of UFM REST API requests over the InfiniBand fabric)

UFM Plugins – NDT

Add support for NDT (CSV formatted topology) comparison with UFM fabric detected topology

Fabric Validation Tests

Add context menu options for selected results of fabric validation tests based of UFM model objects (Devices and Ports).

Add support for Socket-direct mode reporting (Inventory)

Add support for SHARP Aggregation Manager health tests

Add support for Tree Topology Analysis support in UFM

Events Policy

Add new category for Events Policy – Security

Add new UFM events indicating Pkey assignment of GUIDs and removal of GUIDs from Pkey

Add new UFM events which are triggered when duplicated node or port GUIDs are detected in the fabric

Add new event for indicating switch down reported by SM

UFM SDK

Add option to get topology via UFM REST API and stream it out to an external destination

Virtualization

Add option to assign selected virtual ports to a specified PKEY (via UFM Web UI)

Cable Information

Showing Link grade in Cable info

Network Map

Add support for network map topology persistency on server side

UFM Web UI

Add option to copy and paste tables content ( GUIDS and LIDS ) via UFM Web UI

UFM Authentication

Add support for token based authentication

SM Configuration

Setting AR (Adaptive Routing) Up Down as the default routing configuration in UFM / SM ( for new UFM installations )

UFM REST API

Add Support for CloudX API in UFM for OpenStack integration and allow auto provisioning of the InfiniBand fabric

NDR Support

Add support for discovering and monitoring Nvidia NDR switches

UFM Plugins

Added a support to deploy and run UFM plugins.

MLNX_OFED Package

Upgraded the MLNX_OFED version integrated with UFM Appliance to v5.5-1.0.3

v4.6.0 (UFM Enterprise v6.7.0)

Auto-isolation of high BER ports

Added support for automatically isolate port with high BER (monitoring is done based on Symbol BER).

Periodic ibdiagnet

Added ability to execute ibdiagnet periodically and collect the generated logs

UFM Telemetry-based monitoring

Changed UFM's monitoring mechanism to be based on UFM Telemetry instead of IBPM (for both default and live telemetry sessions)

IB router & IB gateway monitoring

Added support for monitoring of InfiniBand router and gateway ports

SHARP aggregation manager events

Added support for showing SHARP aggregation manager events in UFM

Periodic topology check

Added support for periodic run of topology comparisons and reporting of topology changes against preset topology

Visual topology difference

Added option to view visual-representation of topology changes in the network topology map (as compared to a "master" or user-defined topology)

System dump for externally managed switches

Added support for collecting system dump for externally managed switches

Syslog settings via web UI

Added support for configuring UFM syslog settings via UFM web UI

Upgrade for group of switches

Added support for software/firmware upgrade for a group of switches

NDR switches readiness

Added support for discovery and management of NDR switches

Transition to file-based storage

Transitioned from Mysql to SQlite DB for persistent model objects

Counters over threshold

Added support for showing telemetry counters over a predefined threshold when using historical statistic collection

HDR cables burning

Added support for burning HDR cable transceivers for selected switches

Dragonfly+ topology analysis

Added fabric validation test to validate an existing Dragonfly+ topology

Web UI enhancements

  • Context switch for events & alarms

  • Zoom-in and filtering options for network map

  • Updated live session members

Uploading ibdiagnet results

Added option to upload periodic ibdiagnet results to any remote destination over SCP or SFTP

Telemetry API enhancements

Added option to retrieve short counter format or specified counters only for monitoring session data REST API

High BER ports list

Added support for displaying all ports with high BER (from the Ports view) as well as the ability to mark them as unhealthy

OpenSM GUID list

Added support for new OpenSM traps (UFM Events) which indicate activity in the fabric of unexpected OpenSM

REST API

Links API has been updated with two additional fields: source_port_name, destination_port_name.

v4.5.1 (UFM Enterprise v6.6.2)

SHARP Topology API

Added the ability to query SHARP topology API regardless to SHARP reservation mode

v4.5.0 (UFM Enterprise v6.6.1)

MLNX_OFED Package

Upgraded the MLNX_OFED version integrated with UFM Appliance to v5.2-1.0.4

Licensing

Added support for UFM subscription license

Sysdump

Added ability to perform sysdump on internally managed switches

Added ability to perform sysdump on hosts

Event streaming

Added ability to stream UFM events via FluentBit plugin

Virtualization

Added support for port virtualization including virtualization events

Telemetry

Added support for new telemetry capabilities and showing historical data reports

Multiple rail optimization

Added support for multiple rail optimization validation test ​

MCARE

Added support for MCARE integration with UFM over REST API ​

Log history

Added support for showing history of UFM, OpenSM, and Events logs

Multi-HCA grouping

Added support for grouping Windows Multi-HCA

Congestion map

Added support for traffic and congestion map for used-defined port group

IB Gateway

Added support for IB Gateway discovery

IB Router

Added support for IB Router discovery

Topology comparison

Enhanced topology diff reports

Look and feel

Updated look and feel to NVIDIA theme

Static SM LID upon failover (Static SM-LID)

Preserving SM LID upon UFM restart/failover/takeover

Replay SHARP reservations

Added support for replay of persistent SHARP allocation upon SHARP startup

Large-scale virtualization support

Added virtualization support for up to 1M virtual ports

BlueField DPUs support

Added support for management of BlueField DPU devices in the fabric

Topology map enhancements

Added support for selection and running of actions on multiple elements in network map

v4.4.0 (UFM Enterprise v6.5.2)

New licensing mechanism

Added support for the new UFM subscription license (keeping backward compatibility with old license file)

UFM Events Forwarder

UFM events are forwarded to a Fluentd container

System dump for switches and hosts

Added support for running and uploading system dump from internally managed switches and hosts via UFM web UI

Pagination

Added support for paginating web UI tables for better responsiveness

PKey versioning

Added support for PKey versioning to indicate PKey related changes

Integration with MCare

Add support for UFM-Mellanox Care integration over UFM REST APIs

Large scale support

Improved the handling of IB Performance Monitoring (IBPM) statistic data and generation of events in UFM for large scale fabrics

Offloaded handling of topology changes of large scale fabrics to a new process in UFM

Added CLI command to set the HCA VL15 port receive buffer size

Added CLI command to set the maximum number of SMPs sent in HCA

UFM Safe Startup

Set all UFM ports to full membership upon UFM startup so that all UFM IB applications (e.g. OpenSM, IBPM, ibdiagnet) have full access to the IB fabric

IBPM Resiliency

If UFM's fabric interface is configured as a bond, UFM restarts the IBPM on the secondary interface (the new active interface) if the active interface fails

MLNX_OFED Package

Upgraded the MLNX_OFED version integrated with UFM Appliance to v5.1-2.3.7

Large scale support improvements

Added support for running UFM in large scale setup (up to 40K nodes)

Multi-port SM

Added an option to run UFM-SM on multiple pre-configured ports

Python3 support

Unified UFM code to run using Python3 code

Cable lifecycle events

Added support for new cables lifecycle events (e.g. cable added, removed, changed location and duplicated)

Updating port speed via UFM

Added REST API to control the rate limit of physical and virtual ports

Enhanced SM configuration via UFM

Added REST API for updating SM congestion control and adaptive routing parameters

IB Gateway support

Added support for discovery and monitoring of IB Gateway

Ports display

Present all disabled ports as well for each device in the right ports tab

Externally managed switch reset option

Added support for resetting externally managed switches

MetroX-2 system support

Added support for MetroX-2 systems TQ8100-HS2F, TQ8200-HS2F

UFM-SHARP resources allocation integration

Added REST API to allocate and deallocate SHARP resources

UFM Multisite Portal

Single pane of glass to manage multiple UFMs in one console

Mlxlink support

Added option to display enhanced cable information for a selected port using mlxlink

MLNX_OFED Package

Upgraded the MLNX_OFED version integrated with UFM Appliance
to v5.0-2.1.8.0

UFM Appliance Gen2.5 support

Added support for UFM Appliance Gen2.5 hardware

Docker containers support

Added support for Docker containers

UFM Telemetry container integration

Integrated UFM Telemetry container in UFM Appliance

v4.2.0 (UFM Enterprise v6.4)

AHX Monitoring

Added support for monitoring AHX devices

Security Enhancements

Admin and monitor user passwords must be configured during first boot. Default passwords are no longer created automatically.

Warning

To comply with California SB-327 regulations, the user must now enter a password manually as part of the initial wizard, and is not allowed to skip this step. Nevertheless, the user will be allowed to manually write in the default username and password (admin/admin or monitor/monitor).

Mellanox SHARP Support

Added support for Mellanox SHARP and HBA for Mellanox Quantum™ switches (SHARPv2)

Backend

  • Added support for a security alert and related actions on SA_Key violation

  • Added support for a security alert and related actions on SA DoS

  • Added support for configure rate limit per VF on EDR ports

UFM REST API

Added new UFM REST APIs. For the full list, please refer to UFM REST API document.

New Web UI Functionalities

Expanded the functionality of the new web UI for InfiniBand management and operations to include:

  • Fabric Validation: Added support for running fabric validation tests

  • Network Configuration: Added support for setting QoS per network (PKey)

  • Monitoring Mode: Added support for Monitoring mode view

Logical Model in UFM web UI

The logical model enables to manage IB fabric based on business-oriented requirements. It is based on the logical model which treats the physical fabric topology as an abstraction.

Support for new predefined groups

Added 3 new pre-defined groups to UFM for: Nodes, director switches, and 1U switches as well.

Network Map Link Analysis

Enables traffic and error counters analysis based on the discovered topology links. This allows defining thresholds per counter, coloring all topology links according to those thresholds, and viewing specified counter information per link.

IPv6 UFM Agent support

Added support for UFM Agent over IPv6

More web UI enhancements

Updated UI components and flows to improve user experience.

  • Dashboard enhancements:

    • Added the option to click on the recent activity events and be redirected to the Events table

    • Added the option to hover and show the timestamp of the recent activity events

    • Added the option to change the monitored object (Nodes/Ports) under the top-nodes-by-bandwidth panel

    • Remember the selected view of dashboard panels (Bars / Table) so next time user present it, it will show the last selected view

    • Move the Mellanox top bar to the left Panel (in order to space for the dashboard panels)

  • Managed elements:

    • Devices: Adding Additional action under the device actions menu to set the node description

    • Ports: Adding the option to show the peer as names/GUID/IP

Last updated on Sep 5, 2023.