NVIDIA UFM Enterprise User Manual v6.24.1

Known Issues in This Release

Ref #

Issue

4670139

Description: In a Clustered Telemetry deployment, if the local InfiniBand port used by one of the telemetry instances goes down, the cluster does not immediately detect the failure or rebalance telemetry collection across the fabric. This condition is temporary and may persist for several minutes, until the affected telemetry instance is restarted or reset.

Keywords: Clustered Telemetry

Workaround: N/A

Discovered in release: v6.24.1

4889257

Description: UFM reports false events of "node is Down" for multi-NIC host HCAs

Keywords: "Node is Down", Multi-NIC Host

Workaround: N/A

Discovered in release: v6.24.1

© Copyright 2026, NVIDIA. Last updated on Feb 20, 2026