Devices Window

NVIDIA UFM-SDN Appliance User Manual v4.16.1

The Devices window shows data pertaining to the physical devices in a tabular format.


Devices Window Data

Data Type



Health of the device reflecting the highest alarm severity. Please refer to the Health States table.


Name of the device


If UFM Agent is running on a device, the following icon will appear next to the device name:



System GUID of the device


Type of the device: switch, node, IB router, and getaway


IP address of the device


The vendor of the device

Firmware Version

The firmware version installed on the device

Health States






Information/notification displayed during normal operating state or a normal system event.



Critical means that the operation of the system or a system component fails.



Minor reflects a problem in the fabric with no failure.



Warning reflects a low priority problem in the fabric with no failure. A warning is asserted when an event exceeds a predefined threshold.

A right-click on the device name displays a list of actions that can be performed on it.


Devices Actions



Firmware Upgrade

Perform a firmware upgrade on the selected device

Firmware Reset

Reboot the device. This action is only applicable to unmanaged hosts (servers).

Set Node Description

Configure a description to this node

Collect System Dump

Collect the system dump log for a specific device

Add to Group

Add the selected device to a devices group

Remove from Group

Remove the selected device from a devices group

Suppress Notifications

Suppress all event notifications for the device

Add to Monitor Session

Configure and activate host monitoring

Show in Network Map

Move to Zoom In tab in network map and add the selected device to filter list


Collecting system dump for hosts, managed by UFM, is available only for hosts which are set with a valid IPv4 address and installed with MLNX_OFED.

From the Devices table, it is possible to mark devices as healthy or unhealthy using the context menu (right-click).

There are two options for marking a device as unhealthy:

  • Isolate

  • No Discover



Server: conf/opensm/opensm-health-policy.conf content:


0xe41d2d030003e3b0 34 UNHEALTHY isolate 0xe41d2d030003e3b0 19 UNHEALTHY isolate 0xe41d2d030003e3b0 3 UNHEALTHY isolate 0xe41d2d030003e3b0 26 UNHEALTHY isolate 0xe41d2d030003e3b0 0 UNHEALTHY isolate 0xe41d2d030003e3b0 27 UNHEALTHY isolate 0xe41d2d030003e3b0 7 UNHEALTHY isolate 0xe41d2d030003e3b0 10 UNHEALTHY isolate 0xe41d2d030003e3b0 11 UNHEALTHY isolate 0xe41d2d030003e3b0 22 UNHEALTHY isolate 0xe41d2d030003e3b0 18 UNHEALTHY isolate 0xe41d2d030003e3b0 29 UNHEALTHY isolate 0xe41d2d030003e3b0 8 UNHEALTHY isolate 0xe41d2d030003e3b0 5 UNHEALTHY isolate 0xe41d2d030003e3b0 17 UNHEALTHY isolate 0xe41d2d030003e3b0 23 UNHEALTHY isolate 0xe41d2d030003e3b0 15 UNHEALTHY isolate 0xe41d2d030003e3b0 24 UNHEALTHY isolate 0xe41d2d030003e3b0 2 UNHEALTHY isolate 0xe41d2d030003e3b0 16 UNHEALTHY isolate 0xe41d2d030003e3b0 13 UNHEALTHY isolate 0xe41d2d030003e3b0 14 UNHEALTHY isolate 0xe41d2d030003e3b0 32 UNHEALTHY isolate 0xe41d2d030003e3b0 33 UNHEALTHY isolate 0xe41d2d030003e3b0 35 UNHEALTHY isolate 0xe41d2d030003e3b0 20 UNHEALTHY isolate 0xe41d2d030003e3b0 21 UNHEALTHY isolate 0xe41d2d030003e3b0 28 UNHEALTHY isolate 0xe41d2d030003e3b0 1 UNHEALTHY isolate 0xe41d2d030003e3b0 9 UNHEALTHY isolate 0xe41d2d030003e3b0 4 UNHEALTHY isolate 0xe41d2d030003e3b0 31 UNHEALTHY isolate 0xe41d2d030003e3b0 30 UNHEALTHY isolate 0xe41d2d030003e3b0 36 UNHEALTHY isolate 0xe41d2d030003e3b0 12 UNHEALTHY isolate 0xe41d2d030003e3b0 25 UNHEALTHY isolate 0xe41d2d030003e3b0 6 UNHEALTHY isolate

/opt/ufm/files/log/opensm-unhealthy-ports.dump content:



Server /opt/ufm/files/conf/opensm/opensm-health-policy.conf content:


0xe41d2d030003e3b0 15 HEALTHY 0xe41d2d030003e3b0 25 HEALTHY 0xe41d2d030003e3b0 35 HEALTHY 0xe41d2d030003e3b0 0 HEALTHY 0xe41d2d030003e3b0 11 HEALTHY 0xe41d2d030003e3b0 21 HEALTHY 0xe41d2d030003e3b0 28 HEALTHY 0xe41d2d030003e3b0 7 HEALTHY 0xe41d2d030003e3b0 17 HEALTHY 0xe41d2d030003e3b0 14 HEALTHY 0xe41d2d030003e3b0 24 HEALTHY 0xe41d2d030003e3b0 34 HEALTHY 0xe41d2d030003e3b0 3 HEALTHY 0xe41d2d030003e3b0 10 HEALTHY 0xe41d2d030003e3b0 20 HEALTHY 0xe41d2d030003e3b0 31 HEALTHY 0xe41d2d030003e3b0 6 HEALTHY 0xe41d2d030003e3b0 16 HEALTHY 0xe41d2d030003e3b0 27 HEALTHY 0xe41d2d030003e3b0 2 HEALTHY 0xe41d2d030003e3b0 13 HEALTHY 0xe41d2d030003e3b0 23 HEALTHY 0xe41d2d030003e3b0 33 HEALTHY 0xe41d2d030003e3b0 30 HEALTHY 0xe41d2d030003e3b0 9 HEALTHY 0xe41d2d030003e3b0 19 HEALTHY 0xe41d2d030003e3b0 26 HEALTHY 0xe41d2d030003e3b0 36 HEALTHY 0xe41d2d030003e3b0 5 HEALTHY 0xe41d2d030003e3b0 12 HEALTHY 0xe41d2d030003e3b0 22 HEALTHY 0xe41d2d030003e3b0 32 HEALTHY 0xe41d2d030003e3b0 1 HEALTHY 0xe41d2d030003e3b0 8 HEALTHY 0xe41d2d030003e3b0 18 HEALTHY 0xe41d2d030003e3b0 29 HEALTHY 0xe41d2d030003e3b0 4 HEALTHY

/opt/ufm/files/log/opensm-unhealthy-ports.dump content:


# NodeGUID, PortNum, NodeDesc, PeerNodeGUID, PeerPortNum, PeerNodeDesc, {BadCond1, BadCond2, ...}, timestamp

Software/Firmware Upgrade via FTP

Software and firmware upgrade over FTP is enabled by the UFM Agent. UFM invokes the Software/Firmware Upgrade procedure locally on switches or on hosts. The procedure copies the new software/firmware file from the defined storage location and performs the operation on the device. UFM sends the set of attributes required for performing the software/firmware upgrade to the agent.

The attributes are:

  • File Transfer Protocol – default FTP

    • The Software/Firmware upgrade on InfiniScale III ASIC-based switches supports FTP protocol for transmitting files to the local machine.

    • The Software/Firmware upgrade on InfiniScale IV-based switches and hosts supports TFTP and protocols for transmitting files to the local machine.

  • IP address of file-storage server

  • Path to the software/firmware image location
    The software/firmware image files should be placed according to the required structure under the defined image storage location. Please refer to section Devices Window.

  • File-storage server access credentials (User/Password)

In-Band Firmware Upgrade

You can perform in-band firmware upgrades for externally managed switches and HCAs. This upgrade procedure does not require the UFM Agent or IP connectivity, but it does require current PSID recognition. Please refer to section PSID and Firmware Version In-Band Discovery. This feature requires that the Mellanox Firmware Toolkit (MFT), which is included in the UFM package, is installed on the UFM server. UFM uses flint from the MFT for in-band firmware burning.

Before upgrading, you must create the firmware repository on the UFM server under the directory /opt/ufm/files/userdata/fw/. The subdirectory should be created for each PSID and one firmware image should be placed under it. For example:


/opt/ufm/files/userdata/fw/ MT_0D80110009 fw-ConnectX2-rel-2_9_1000-MHQH29B-XTR_A1.bin MT_0F90110002 fw-IS4-rel-7_4_2040-MIS5023Q_A1-A5.bin

Directory Structure for Software or Firmware Upgrade Over FTP

Before performing a software or firmware upgrade, you must create the following directory structure for the upgrade image. The path to the <ftp user home>/<path>/ directory should be specified in the upgrade dialog box.


<ftp user home>/<path>/ InfiniScale3 - For anafa based switches Software/Firmware upgrade images voltaire_fw_images.tar – firmware image file ibswmpr-<s/w version>.tar – software image file InfiniScale4 - For InfiniScale IV based switches Software/Firmware upgrade images firmware_2036_4036.tar – Firmware image file upgrade_2036_4036.tgz – Software image file OFED /* For host SW upgrade*/ OFED-<OS label>.tar.bz2 <PSID>* – For host FW upgrade fw_update.img

The <PSID> value is extracted from the mstflint command:


mstflint -d <device> q

The device is extracted from the lspci command. For example:


# lspci 06:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex # mstflint -d 06:00.0 q | grep PSID PSID: VLT0040010001

PSID and Firmware Version In-Band Discovery

The device PSID and device firmware version are required for in-band firmware upgrade and for the correct functioning of Subnet Manager plugins, such as Congestion Control Manager and Lossy Configuration Management. For most devices, UFM discovers this information and displays it in the Device Properties pane. The PSID and the firmware version are discovered by the Vendor-specific MAD.

By default, the gv.cfg file value for event_plugin_option is set to (null). This means that the plugin is disabled and opensm does not send MADs to discover devices' PSID and FW version. Therefore, values for devices' PSID and FW version are taken from ibdiagnet output (section NODES_INFO).

The below is an example of the default value:


event_plugin_options = (null)

To enable the vendor-specific discovery by opemsm, in the gv.cfg configuration file, change the value of event_plugin_option to (--vendinfo –m 1), as shown below:


event_plugin_options = --vendinfo –m 1

If the value is set to –vendinfo –m 1, the data should be supplied by opensm, and in this case the ibdiagnet output is ignored.


In some firmware versions, the information above is currently not available.

Switch Management IP Address Discovery

From NVIDIA switch FM version 27.2010.3942 and up, NVIDIA switches support switch management IP address discovery using MADs. This information can be retrieved as part of ibdiagnet run (ibdiagnet output), and assigned to discover switches in UFM.

There is an option to choose the IP address of which IP protocol version that is assigned to the switch: IPv4 or IPv6.

The discovered_switch_ip_protocol key, located in the gv.cfg file in section [FabricAnalysys], is set to 4 by default. This means that the IP address of type IPv4 is assigned to the switch as its management IP address. In case this value is set to 6, the IP address of type IPv6 is assigned to the switch as its management IP address.

After changing the discover_switch_ip_protocol value in gv.cfg, the UFM Main Model needs to be restarted for the update to take effect. The discovered IP addresses for switches are not persistent in UFM – every UFM Main Model restarts the values of management IP address which is assigned from the ibdiagnet output.

Upgrading Server Software

The ability to update the server software is applicable only for hosts (servers) with the UFM Agent.

To upgrade the software:

  1. Select a device.

  2. From the right-click menu, select Software Update.

  3. Enter the parameters listed in the following table.




    Update is performed via FTP protocol


    Enter the host IP


    Enter the parent directory of the FTP directory structure for the Upgrade image.

    The path should not be an absolute path and should not contain the first slash (/) or trailer slash.


    Name of the host username


    Enter the host password

  4. Click Submit to save your changes.

Upgrading Firmware

You can upgrade firmware over FTP for hosts and switches that are running the UFM Agent, or you can perform an in-band upgrade for externally managed switches and HCAs.

Before you begin the upgrade ensure that the new firmware version is in the correct location. For more information, please refer to section In-Band Firmware Upgrade.

To upgrade the firmware:

  1. Select a host or server.

  2. From the right-click menu, select Firmware Upgrade.

  3. Select protocol In Band.

  4. For upgrade over FTP, enter the parameters listed in the following table.




    Enter device IP


    Enter the parent directory of the FTP directory structure for the Upgrade image.

    The path should not be an absolute path and should not contain the first slash (/) or trailer slash.


    Name of the host username


    Enter the host password

  5. Click submit to save your changes.


    The firmware upgrade takes effect only after the host or externally managed switch is restarted.

Upgrade Cables Transceivers Firmware Version

The main purpose of this feature is to add support for burning of multiple cables transceiver types on multiple devices using linkx tool which is part of flint. This needs to be done from both ends of the cable (switch and HCA/switch).

To upgrade cables transceivers FW version:

  1. Navigate to managed elements page

  2. select the target switches and click on Upgrade Cable Transceivers option


  3. A model will be shown containing list of the active firmware versions for the cables of the selected switches, besides the version number, a badge will show the number of matched switches:



  4. After the user clicks Submit, the GUI will start sending the selected binaries with the relevant switches sequentially, and a model with a progress bar will be shown (this model can be minimized):


  5. After the whole action is completed successfully, you will be able to see the following message at the model bottom The upgrade cable transceivers completed successfully, do you want to activate it? by clicking the yes button it will run a new action on all the burned devices to activate the new uploaded binary image.

  6. Another option to activate burned cables transceivers you can go to the Groups page and right click on the predefined Group named Devices Pending FW Transceivers Reset or you can right click on the upgraded device from managed element page and select Activate cable Transceivers action.


Selecting a device from the Devices table reveals the Device Information table on the right side of the screen. This table provides information on the device’s ports, cables, groups, events, alarms, , and device access.


General Tab

Provides general information on the selected device.


Ports Tab

This tab provides a list of the ports connected to this device in a tabular format.


Ports Data

Data Type


Port Number

The number of ports on device.


The node name/GUID/IP that the port belongs to.

Note that you can choose the node label (name/GUID/IP) using the drop-down menu available above the Ports data table.


Health of the port reflecting the highest alarm severity. Please refer to the Health States table.


Indicates whether the port is connected (active or inactive).


The local identifier (LID) of the port.


Maximum Transmission Unit of the port.



Lists the highest value of active, enabled and supported speeds in icons indicating their status:

  • Dark green – active speed

  • Light green – enabled speed

  • Grey – supported yet disabled speed



Lists the highest value of active, enabled and supported widths in icons indicating their status:

  • Dark green – active width

  • Light green – enabled width

  • Grey – supported yet disabled width


The GUID of the device the port is connected to.

Peer Port

The name of the port that is connected to this port.

Cables Tab

This tab provides a list of the cables connected to this device in a tabular format.


Cables Data

Data Type


Basic Information


Health of the cable reflecting the highest alarm severity. Please refer to the Health States table.

Serial Number

Serial number of the cable.


Identifier of the cable.

Source Port Information

Source GUID

GUID of the source port the cable is connected to.

Source Port

The number of the source port the cable is connected to.

Destination Port Information

Destination GUID

GUID of the destination port the cable is connected to.

Destination Port

The number of the destination port the cable is connected to.

Advanced Information


Revision of the cable.

Link Width

The maximum link width of the cable.

Part Number

Part number of the cable.


The transmitting medium of the cable: copper/optical/etc.


The cable length in meters.

Groups Tab

This tab provides a list of the groups to which the selected device belongs.


Groups Data

Data Type



Aggregated severity level of the group (the highest severity level of all group members).


Name of the group.


Description of the group.


Type of the group: General/Rack.

Alarms Tab

This tab provides a list of all UFM alarms related to the selected device.


Alarms Data

Data Type


Alarms ID

Alarm identifier.


Source object (device/port) on which the alarm was triggered.


The severity of the alarm.


Description of the alarm.


The time when the alarm was triggered.


Reason for the alarm.


Number of instances that the alarm occurred on the related source object.

Events Tab

This tab provides a list of the UFM events that are related to the selected device.


Events Data

Data Type



Event severity – Info, Warning, Error, Critical or Minor.

Event Name

The name of the event.


The source object (device/port) on which the event was triggered.


The time when the event was triggered.


The category of the event indicated by icons. Hovering over the icon will display the category name.


Description of the event. Full description can be displayed by hovering over the text.

Inventory Tab

This tab provides a list of the device’s modules with information in a tabular format.


This tab is available for switches only.


Inventory Data

Data Type



Health of the module reflecting the highest alarm severity. Please refer to the Health States table.


The module status.

Serial Number

Serial number of the module.


Name of the device.


Description of the module.


Type of the module: spine/line/etc.

Software Version

Firmware version installed on the module.

Part Number

Part number of the module.

Hardware Version

Hardware version of the module.


Power supply of the PSU.

HCAs Tab

This tab provides a list of the device’s HCAs with information in a tabular format.


This tab is available for hosts only.


Data Type



Health of the HCA reflecting the highest alarm severity. Please refer to the Health States table.


HCA Index




HCA Type


HCA ports GUIDs



FW Version

HCA firmware version

Device Access Tab

This tab allows for managing the access credentials of the selected device for remote accessibility. To be able to set access credentials for the device, a device IP must be set either by installing UFM Agent on the device, or by manually setting the IP under IP Address Settings (IP is now supported with v4 and v6).



After manually setting the IP address of NVIDIA® Mellanox® InfiniScale IV® and SwitchX® based switches, UFM will first validate the new IP before setting it.

To edit your device access credentials

  1. Select the preferred protocol tab:

    • SSH – allows you to define the SSH parameters to open an SSH session on your device (available for nodes and switches)

    • IPMI – allows you to set the IPMI parameters to open an IPMI session on your device for remote power control (available for nodes only)

    • HTTP – allows you to define the HTTP parameters to open an HTTP session on your device (available for switches only)

  2. Click Update to save your changes.


Device Access Credentials Parameters




Fill in or edit the computer user name.


Enter the device password.


Enter the device password a second time to confirm.

Manual IP

Enter the device IP address (could be IPv4/IPv6).


Enter the port number.


Enter the connection timeout (in seconds) for the device specific protocol (SSH/HTTP/IPMI).

Virtual Networking Tab

This tab displays a map containing the HCAs for the selected device, and the ports and virtual ports it is connected to.


© Copyright 2024, NVIDIA. Last updated on May 29, 2024.