NVIDIA UFM Enterprise User Manual v6.22.2

ClusterMinder Plugin

Note

This plugin is supported on UFM Enterprise Appliance only.

The ClusterMinder plugin collects telemetry data from multiple data sources and aggregates, streams and visualizes the backend data. The plugin can cluster/group aggregated data from multiple machines that allows operational anomaly and misconfiguration detection. The plugin provides Cluster-wide histograms of hardware telemetry which details compute node configuration and inventory, PCIe bus, hardware information (SN and FW version) and health alerts of all relevant devices.

The plugin can be deployed as a container and supports multiple data sources

Data source supported types and devices

Type

Devices

Redfish

AMI DGX H100

Dell PowerEdge XE9680

Dell PowerEdge R760

Dell PowerEdge R750

Dell PowerEdge R760xa

Supermicro SYS-421GE-TNRT3

Supermicro SYS-421GE-TNHR2-LC-TW008

Supermicro SYS-821GE-TNHR

AMI ESC N8-E11 ASUSTeK COMPUTER INC

Redfish on DPU

Nvidia

Switch

MLNXOS, Cumulus, NVOS

Unmanaged Switch

UFM Telemetry

DTS ( DOCA Telemetry Service) on DPU (Bluefield)

DTS: version > 1.12

DPU OS : ubuntu 20/22

DTS ( DOCA Telemetry Service) on Host

NMX

NMX-C or NMX Aggregator

DDN

DDN storage

CDU (Coolant Distribution Unit)

XDU1350, XDU450, XDU R100


The plugin can be deployed using the following methods:

  1. On the UFM Appliance

  2. On the UFM Software

To deploy the plugin, follow these steps:

  • The plugin is included in the default plugin bundle available at NVIDIA's Licensing Portal .

  • Load the downloaded image onto the UFM server. This can be done either by using the UFM GUI by navigating to the Settings -> Plugins Management tab or by loading the image via the following instructions:

    • Log in to the UFM server terminal.

    • Run:

      Copy
      Copied!
                  

      docker load < <path_to_image>

    • After successfully loading the plugin image, the plugin should become visible within the plugins management table within the UFM GUI. To initiate the plugin’s execution, simply right-click on the respective in the table. image-2024-8-8_15-6-41-version-1-modificationdate-1756970697390-api-v2.png

After the successful deployment of the plugin, a new item is shown in the UFM side menu for the ClusterMinder plugin: 

Managing data sources with ClusterMinder is intuitive and efficient. Users can easily add new data sources, update existing ones, or remove those that are no longer needed. This feature ensures your data is always current and relevant.

Example of Adding Data Source

Adding hosts is done through the "Data Sources" section. To add a data source, you start by selecting the appropriate tab, clicking image-2024-3-10_8-53-16-version-1-modificationdate-1756970697060-api-v2.png , and filling out the information on the endpoint. Then you must first test the endpoint and if the endpoint status is "up," the add button will become clickable, allowing you to add the data source.

Note: we support adding hosts in hostlist format for exmaple: agx[01-10].

for example:

image-2025-8-18_15-33-10-version-1-modificationdate-1756970689330-api-v2.png

Example of Adding a Redfish Host

After pressing the image-2024-3-10_8-53-16-version-1-modificationdate-1756970697060-api-v22.png button you will be presented with a form for inputting the following fields: " BMC IP", "Protocol","Username" and "Password". after inputting the needed information, pressing the image-2024-3-10_9-1-20-version-1-modificationdate-1756970696423-api-v2.png button tests the connection to the host. You will then be presented with a window notifying if the connection was successful. if successful you can click the image-2024-3-10_9-9-38-version-1-modificationdate-1756970696130-api-v2.png button to add the data source.

For example:

image-2025-7-16_16-19-31-version-1-modificationdate-1756970695790-api-v2.png

Note: when adding multiple redfish hosts, they should all be of the same model and vendor. If this is not done it can cause issues with data collection and presentation.

Example of Adding a Switch Host

After pressing the image-2024-3-10_8-53-16-version-1-modificationdate-1756970697060-api-v22.png button you will be presented with a form for inputting the following fields: "Switch IP", "Username", "Password". after inputting the needed information, pressing the image-2024-3-10_9-1-20-version-1-modificationdate-1756970696423-api-v2.png button tests the connection to the host. You will then be presented with a window notifying if the connection was successful. if successful you can click the image-2024-3-10_9-9-38-version-1-modificationdate-1756970696130-api-v2.png button to add the data source.

image-2025-7-20_11-14-17-version-1-modificationdate-1756970695437-api-v2.png

Note: when adding multiple switch hosts, they should all be of the same OS type. If this is not done it can cause issues with data collection and presentation.

Example of Adding a User Generated Host

Pressing the image-2025-7-22_9-43-33-version-1-modificationdate-1756970694833-api-v2.png button in the row of data types allows the user to create custom user generated data source, you will be presented with a form for inputting the following fields: "Type", "Name". After inputting the needed information, you can click the image-2024-3-10_9-9-38-version-1-modificationdate-1756970696130-api-v2.png button to add the data source.

image-2025-7-22_9-46-43-version-1-modificationdate-1756970694540-api-v2.png

Note: Type dropdown has "Redfish","Switch","Host DTS", " CM Http". These are the currently supported data types.

Example of Removing Data Source

Removing hosts is done through the "Data Sources" section. here you can right click any available host and click the remove option.

image-2025-7-22_9-57-55-version-1-modificationdate-1756970694253-api-v2.png

You can also use the image-2025-7-22_9-59-10-version-1-modificationdate-1756970693987-api-v2.png button

image-2025-7-22_9-57-0-version-1-modificationdate-1756970693677-api-v2.png

to open the Remove Data Sources form that has the IP field. After choosing the appropriate IP you can click the image-2025-7-22_10-42-0-version-1-modificationdate-1756970693343-api-v2.png button to remove those endpoints

image-2025-7-22_10-43-59-version-1-modificationdate-1756970692957-api-v2.png

Example of Updating Data Source

Updating hosts is done through the "Data Sources" section. here you can right click any available host and click the update option.

image-2025-7-22_12-52-42-version-1-modificationdate-1756970692637-api-v2.png

You can also use the image-2025-7-22_12-53-36-version-1-modificationdate-1756970692347-api-v2.png button

image-2025-7-22_12-55-57-version-1-modificationdate-1756970692063-api-v2.png

to open the Update Data Sources form that has the IP field. After filling the appropriate fields, pressing the image-2024-3-10_9-1-20-version-1-modificationdate-1756970696423-api-v22.png button tests the connection to the host. You will then be presented with a window notifying if the connection was successful. if successful you can click the image-2024-3-10_9-9-38-version-1-modificationdate-1756970696130-api-v22.png button to update the endpoint.

Data Tab

The Data tab provides a comprehensive view of all your data in one place, organized in an easy-to-navigate tree or table view. Additionally, there is a group view tab that allows users to see differences in host groups, helping to identify and understand misconfigurations within their data. This feature allows users to quickly access and navigate through their data, making it easier to perform analyses and derive insights.

Redfish Data Example

image-2025-7-22_14-54-29-version-1-modificationdate-1753185269973-api-v2.png

Switch Data Example

image-2025-7-24_12-37-13-version-1-modificationdate-1753349834200-api-v2.png

Group Differences Tab

The “Group differences” tab helps users identify and understand misconfigurations within their data. By comparing different data groups, users can easily spot discrepancies and take corrective actions. At the start of the report, there is a summary table that lists the most problematic hosts in descending order based on the number of appearances. This table provides a quick overview of the hosts that require the most attention, allowing users to prioritize their troubleshooting efforts effectively. Additionally, each table in the report (besides the summary table) includes the API that provided the data, a column for the number of hosts per group, and the fields where their values differed, ensuring transparency and traceability of the information presented.

Note: There is a switch button that enables component view

Redfish Group Differences Example

image-2025-7-22_14-14-30-version-1-modificationdate-1756970691780-api-v2.png

Switch Group Differences example

image-2025-7-24_12-41-18-version-1-modificationdate-1756970691467-api-v2.png

Suspected Errors tab

The “Suspected Errors” tab provides a comprehensive report on APIs that have returned values flagged as potential issues. This report is crucial for maintaining the reliability and integrity of your data analysis. At the start of the report, a summary table lists the number of hosts with errors compared to the total number of hosts, giving users a quick snapshot of the overall health of their data environment. This summary helps prioritize troubleshooting efforts and allocate resources effectively. Each entry in the report details the host ID, the specific fields where values were problematic, and the problematic values themselves. These values are highlighted in red or orange to indicate the severity of the issue, with red denoting more critical problems and orange indicating less severe ones. This color-coding allows users to quickly assess the urgency of each issue and take appropriate corrective actions.

Note: There is a switch button that enables component view

Redfish Suspected Errors example

image-2025-7-22_13-31-48-version-1-modificationdate-1756970691133-api-v2.png

Switch Suspected Errors example

image-2025-7-24_12-40-36-version-1-modificationdate-1756970690840-api-v2.png

Visualizing data is made simple with ClusterMinder’s histogram feature. Histograms are premade depending on the type of data source as well as the make and model. The histograms range from rpm speeds, voltages, wattage, amperage, temperatures (Celsius) and more. By clicking the histograms, users can filter specific hosts or values, allowing for a more targeted analysis. Additionally, using the dropdown menu, users can further filter the histogram bars to hone in on specific data subsets and attributes, enhancing their ability to derive meaningful insights.

Redfish Histogram example

image-2025-7-22_14-58-18-version-1-modificationdate-1756970690507-api-v2.png

Switch Histogram example

image-2025-8-19_13-50-15-version-1-modificationdate-1756970688947-api-v2.png

ClusterMinder’s telemetry page allows users to create custom graphs based on their data. Additionally, there are premade graphs available depending on the telemetry data the data source provides. This feature provides flexibility in how data is displayed and analyzed, enabling users to tailor their graphs to meet specific needs.

Note: There is a switch button that enables tree view

Redfish Telemetry example

image-2025-7-22_16-0-6-version-1-modificationdate-1756970689920-api-v2.png

Switch Telemetry example

image-2025-7-24_12-42-5-version-1-modificationdate-1756970689650-api-v2.png

© Copyright 2025, NVIDIA. Last updated on Sep 10, 2025.