image image image image image

On This Page

The Devices window shows data pertaining to the physical devices in a tabular format.

Devices Window Data

Data TypeDescription

Health

Health of the device reflecting the highest alarm severity. Please refer to the Health States table.

Name

Name of the device 

If UFM Agent is running on a device, the following icon will appear next to the device name:

GUID

System GUID of the device

Type

Type of the device: switch, node, IB router, and getaway

IP

IP address of the device

Vendor

The vendor of the device

Firmware Version

The firmware version installed on the device

Health States

IconNameDescription

Normal

Information/notification displayed during normal operating state or a normal system event.

Critical

Critical means that the operation of the system or a system component fails.

Minor

Minor reflects a problem in the fabric with no failure.

Warning

Warning reflects a low priority problem in the fabric with no failure. A warning is asserted when an event exceeds a predefined threshold.

A right-click on the device name displays a list of actions that can be performed on it.

Devices Actions

ActionDescription
Firmware UpgradePerform a firmware upgrade on the selected device
Firmware ResetReboot the device. This action is only applicable to unmanaged hosts (servers).
Set Node DescriptionConfigure a description to this node
Collect System DumpCollect the system dump log for a specific device
Add to GroupAdd the selected device to a devices group
Remove from GroupRemove the selected device from a devices group
Suppress NotificationsSuppress all event notifications for the device
Add to Monitor SessionConfigure and activate host monitoring
Show in Network MapMove to Zoom In tab in network map and add the selected device to filter list

Collecting system dump for hosts, managed by UFM, is available only for hosts which are set with a valid IPv4 address and installed with MLNX_OFED.

Mark Device as Unhealthy

From the Devices table, it is possible to mark devices as healthy or unhealthy using the context menu (right-click).

There are two options for marking a device as unhealthy:

  • Isolate
  • No Discover

Server: conf/opensm/opensm-health-policy.conf content:

0xe41d2d030003e3b0 34 UNHEALTHY isolate
0xe41d2d030003e3b0 19 UNHEALTHY isolate
0xe41d2d030003e3b0 3 UNHEALTHY isolate
0xe41d2d030003e3b0 26 UNHEALTHY isolate
0xe41d2d030003e3b0 0 UNHEALTHY isolate
0xe41d2d030003e3b0 27 UNHEALTHY isolate
0xe41d2d030003e3b0 7 UNHEALTHY isolate
0xe41d2d030003e3b0 10 UNHEALTHY isolate
0xe41d2d030003e3b0 11 UNHEALTHY isolate
0xe41d2d030003e3b0 22 UNHEALTHY isolate
0xe41d2d030003e3b0 18 UNHEALTHY isolate
0xe41d2d030003e3b0 29 UNHEALTHY isolate
0xe41d2d030003e3b0 8 UNHEALTHY isolate
0xe41d2d030003e3b0 5 UNHEALTHY isolate
0xe41d2d030003e3b0 17 UNHEALTHY isolate
0xe41d2d030003e3b0 23 UNHEALTHY isolate
0xe41d2d030003e3b0 15 UNHEALTHY isolate
0xe41d2d030003e3b0 24 UNHEALTHY isolate
0xe41d2d030003e3b0 2 UNHEALTHY isolate
0xe41d2d030003e3b0 16 UNHEALTHY isolate
0xe41d2d030003e3b0 13 UNHEALTHY isolate
0xe41d2d030003e3b0 14 UNHEALTHY isolate
0xe41d2d030003e3b0 32 UNHEALTHY isolate
0xe41d2d030003e3b0 33 UNHEALTHY isolate
0xe41d2d030003e3b0 35 UNHEALTHY isolate
0xe41d2d030003e3b0 20 UNHEALTHY isolate
0xe41d2d030003e3b0 21 UNHEALTHY isolate
0xe41d2d030003e3b0 28 UNHEALTHY isolate
0xe41d2d030003e3b0 1 UNHEALTHY isolate
0xe41d2d030003e3b0 9 UNHEALTHY isolate
0xe41d2d030003e3b0 4 UNHEALTHY isolate
0xe41d2d030003e3b0 31 UNHEALTHY isolate
0xe41d2d030003e3b0 30 UNHEALTHY isolate
0xe41d2d030003e3b0 36 UNHEALTHY isolate
0xe41d2d030003e3b0 12 UNHEALTHY isolate
0xe41d2d030003e3b0 25 UNHEALTHY isolate
0xe41d2d030003e3b0 6 UNHEALTHY isolate

/opt/ufm/files/log/opensm-unhealthy-ports.dump content:

Mark Device as Healthy

Server /opt/ufm/files/conf/opensm/opensm-health-policy.conf content:

0xe41d2d030003e3b0 15 HEALTHY
0xe41d2d030003e3b0 25 HEALTHY
0xe41d2d030003e3b0 35 HEALTHY
0xe41d2d030003e3b0 0 HEALTHY
0xe41d2d030003e3b0 11 HEALTHY
0xe41d2d030003e3b0 21 HEALTHY
0xe41d2d030003e3b0 28 HEALTHY
0xe41d2d030003e3b0 7 HEALTHY
0xe41d2d030003e3b0 17 HEALTHY
0xe41d2d030003e3b0 14 HEALTHY
0xe41d2d030003e3b0 24 HEALTHY
0xe41d2d030003e3b0 34 HEALTHY
0xe41d2d030003e3b0 3 HEALTHY
0xe41d2d030003e3b0 10 HEALTHY
0xe41d2d030003e3b0 20 HEALTHY
0xe41d2d030003e3b0 31 HEALTHY
0xe41d2d030003e3b0 6 HEALTHY
0xe41d2d030003e3b0 16 HEALTHY
0xe41d2d030003e3b0 27 HEALTHY
0xe41d2d030003e3b0 2 HEALTHY
0xe41d2d030003e3b0 13 HEALTHY
0xe41d2d030003e3b0 23 HEALTHY
0xe41d2d030003e3b0 33 HEALTHY
0xe41d2d030003e3b0 30 HEALTHY
0xe41d2d030003e3b0 9 HEALTHY
0xe41d2d030003e3b0 19 HEALTHY
0xe41d2d030003e3b0 26 HEALTHY
0xe41d2d030003e3b0 36 HEALTHY
0xe41d2d030003e3b0 5 HEALTHY
0xe41d2d030003e3b0 12 HEALTHY
0xe41d2d030003e3b0 22 HEALTHY
0xe41d2d030003e3b0 32 HEALTHY
0xe41d2d030003e3b0 1 HEALTHY
0xe41d2d030003e3b0 8 HEALTHY
0xe41d2d030003e3b0 18 HEALTHY
0xe41d2d030003e3b0 29 HEALTHY
0xe41d2d030003e3b0 4 HEALTHY

/opt/ufm/files/log/opensm-unhealthy-ports.dump content:

# NodeGUID, PortNum, NodeDesc, PeerNodeGUID, PeerPortNum, PeerNodeDesc, {BadCond1, BadCond2, ...}, timestamp


Upgrading Software and Firmware for Hosts and Externally Managed Switches

Software/Firmware Upgrade via FTP

Software and firmware upgrade over FTP is enabled by the UFM Agent. UFM invokes the Software/Firmware Upgrade procedure locally on switches or on hosts. The procedure copies the new software/firmware file from the defined storage location and performs the operation on the device. UFM sends the set of attributes required for performing the software/firmware upgrade to the agent.

The attributes are:

  • File Transfer Protocol – default FTP
    • The Software/Firmware upgrade on InfiniScale III ASIC-based switches supports FTP protocol for transmitting files to the local machine.
    • The Software/Firmware upgrade on InfiniScale IV-based switches and hosts supports TFTP and protocols for transmitting files to the local machine.
  • IP address of file-storage server
  • Path to the software/firmware image location 

    The software/firmware image files should be placed according to the required structure under the defined image storage location. Please refer to section Devices Window.
  • File-storage server access credentials (User/Password)

In-Band Firmware Upgrade

You can perform in-band firmware upgrades for externally managed switches and HCAs. This upgrade procedure does not require the UFM Agent or IP connectivity, but it does require current PSID recognition. Please refer to section PSID and Firmware Version In-Band Discovery. This feature requires that the Mellanox Firmware Toolkit (MFT), which is included in the UFM package, is installed on the UFM server. UFM uses flint from the MFT for in-band firmware burning.

Before upgrading, you must create the firmware repository on the UFM server under the directory /opt/ufm/files/userdata/fw/. The subdirectory should be created for each PSID and one firmware image should be placed under it. For example: 

/opt/ufm/files/userdata/fw/
	 MT_0D80110009
			fw-ConnectX2-rel-2_9_1000-MHQH29B-XTR_A1.bin
	 MT_0F90110002
			fw-IS4-rel-7_4_2040-MIS5023Q_A1-A5.bin

Directory Structure for Software or Firmware Upgrade Over FTP

Before performing a software or firmware upgrade, you must create the following directory structure for the upgrade image. The path to the <ftp user home>/<path>/ directory should be specified in the upgrade dialog box.

<ftp user home>/<path>/
	 InfiniScale3 - For anafa based switches Software/Firmware upgrade images
			voltaire_fw_images.tar – firmware image file
			ibswmpr-<s/w version>.tar – software image file
	InfiniScale4 - For InfiniScale IV based switches Software/Firmware upgrade images
			firmware_2036_4036.tar – Firmware image file
			upgrade_2036_4036.tgz – Software image file
	OFED /* For host SW upgrade*/
			OFED-<OS label>.tar.bz2
	<PSID>* – For host FW upgrade
			fw_update.img

The <PSID> value is extracted from the mstflint command:

mstflint -d <device> q

The device is extracted from the lspci command. For example:

# lspci
06:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex
# mstflint -d 06:00.0 q | grep PSID
PSID: VLT0040010001

PSID and Firmware Version In-Band Discovery

The device PSID and device firmware version are required for in-band firmware upgrade and for the correct functioning of Subnet Manager plugins, such as Congestion Control Manager and Lossy Configuration Management. For most devices, UFM discovers this information and displays it in the Device Properties pane. The PSID and the firmware version are discovered by the Vendor-specific MAD. 

By default, the gv.cfg file value for event_plugin_option is set to (null). This means that the plugin is disabled and opensm does not send MADs to discover devices' PSID and FW version. Therefore, values for devices' PSID and FW version are taken from ibdiagnet output (section NODES_INFO).

The below is an example of the default value:

event_plugin_options = (null)

To enable the vendor-specific discovery by opemsm, in the gv.cfg configuration file, change the value of event_plugin_option to (--vendinfo –m 1), as shown below:

event_plugin_options = --vendinfo –m 1

 If the value is set to –vendinfo –m 1, the data should be supplied by opensm, and in this case the ibdiagnet output is ignored. 


In some firmware versions, the information above is currently not available.

Switch Management IP Address Discovery

From NVIDIA switch FM version 27.2010.3942 and up, NVIDIA switches support switch management IP address discovery using MADs. This information can be retrieved as part of ibdiagnet run (ibdiagnet output), and assigned to discover switches in UFM.

There is an option to choose the IP address of which IP protocol version that is assigned to the switch: IPv4 or IPv6.

The discovered_switch_ip_protocol key, located in the gv.cfg file in section [FabricAnalysys], is set to 4 by default. This means that the IP address of type IPv4 is assigned to the switch as its management IP address. In case this value is set to 6, the IP address of type IPv6 is assigned to the switch as its management IP address. 

After changing the discover_switch_ip_protocol value in gv.cfg, the UFM Main Model needs to be restarted for the update to take effect. The discovered IP addresses for switches are not persistent in UFM – every UFM Main Model restarts the values of management IP address which is assigned from the ibdiagnet output.

Upgrading Server Software

The ability to update the server software is applicable only for hosts (servers) with the UFM Agent.

To upgrade the software:

  1. Select a device.
  2. From the right-click menu, select Software Update.
  3. Enter the parameters listed in the following table.

    ParameterDescription
    ProtocolUpdate is performed via FTP protocol
    IPEnter the host IP

    Path

    Enter the parent directory of the FTP directory structure for the Upgrade image.
    The path should not be an absolute path and should not contain the first slash (/) or trailer slash.
    UserName of the host username
    PasswordEnter the host password
  4. Click Submit to save your changes.

Upgrading Firmware

You can upgrade firmware over FTP for hosts and switches that are running the UFM Agent, or you can perform an in-band upgrade for externally managed switches and HCAs.

Before you begin the upgrade ensure that the new firmware version is in the correct location. For more information, please refer to section In-Band Firmware Upgrade.

To upgrade the firmware:

  1. Select a host or server.
  2. From the right-click menu, select Firmware Upgrade.
  3. Select protocol In Band.
  4. For upgrade over FTP, enter the parameters listed in the following table.

    ParameterDescription
    IPEnter device IP
    Path

    Enter the parent directory of the FTP directory structure for the Upgrade image.
    The path should not be an absolute path and should not contain the first slash (/) or trailer slash.

    UsernameName of the host username
    PasswordEnter the host password
  5. Click submit to save your changes. 

    The firmware upgrade takes effect only after the host or externally managed switch is restarted.

Upgrade Cables Transceivers Firmware Version

The main purpose of this feature is to add support for burning of multiple cables transceiver types on multiple devices using linkx tool which is part of flint. This needs to be done from both ends of the cable (switch and HCA/switch).

To upgrade cables transceivers FW version:

  1. Navigate to managed elements page
  2. select the target switches and click on Upgrade Cable Transceivers option
  3. A model will be shown containing list of the active firmware versions for the cables of the selected switches, besides the version number, a badge will show the number of matched switches:

  4. After the user clicks Submit, the GUI will start sending the selected binaries with the relevant switches sequentially, and a model with a progress bar will be shown (this model can be minimized):

  5. After the whole action is completed successfully, you will be able to see the following message at the model bottom The upgrade cable transceivers completed successfully, do you want to activate it? by clicking the yes button it will run a new action on all the burned devices to activate the new uploaded binary image.
  6. Another option to activate burned cables transceivers you can go to the Groups page and right click on the predefined Group named Devices Pending FW Transceivers Reset or you can right click on the upgraded device from managed element page and select Activate cable Transceivers action.

Device Information Tabs

Selecting a device from the Devices table reveals the Device Information table on the right side of the screen. This table provides information on the device’s ports, cables, groups, events, alarms, inventory, and device access.

General Tab

Provides general information on the selected device.

Ports Tab

This tab provides a list of the ports connected to this device in a tabular format.

Ports Data

Data TypeDescription

Port Number

The number of ports on device.

Node

The node name/GUID/IP that the port belongs to.

Note that you can choose the node label (name/GUID/IP) using the drop-down menu available above the Ports data table.

Health

Health of the port reflecting the highest alarm severity. Please refer to the Health States table.

State

Indicates whether the port is connected (active or inactive).

LID

The local identifier (LID) of the port.

MTU

Maximum Transmission Unit of the port.

Speed

Lists the highest value of active, enabled and supported speeds in icons indicating their status:

  • Dark green – active speed
  • Light green – enabled speed
  • Grey – supported yet disabled speed

Width

Lists the highest value of active, enabled and supported widths in icons indicating their status:

  • Dark green – active width
  • Light green – enabled width
  • Grey – supported yet disabled width

Peer

The GUID of the device the port is connected to.

Peer Port

The name of the port that is connected to this port.

Cables Tab 

This tab provides a list of the cables connected to this device in a tabular format.

Cables Data

Data TypeDescription

Basic Information

Health

Health of the cable reflecting the highest alarm severity. Please refer to the Health States table.

Serial Number

Serial number of the cable.

Identifier

Identifier of the cable.

Source Port Information

Source GUID

GUID of the source port the cable is connected to.

Source Port

The number of the source port the cable is connected to.

Destination Port Information

Destination GUID

GUID of the destination port the cable is connected to.

Destination Port

The number of the destination port the cable is connected to.

Advanced Information

Revision

Revision of the cable.

Link Width

The maximum link width of the cable.

Part Number

Part number of the cable.

Technology

The transmitting medium of the cable: copper/optical/etc.

Length

The cable length in meters.

Groups Tab

This tab provides a list of the groups to which the selected device belongs.

Groups Data

Data TypeDescription

Severity

Aggregated severity level of the group (the highest severity level of all group members).

Name

Name of the group.

Description

Description of the group.

Type

Type of the group: General/Rack.

Alarms Tab

This tab provides a list of all UFM alarms related to the selected device.

Alarms Data

Data TypeDescription

Alarms ID

Alarm identifier.

Source

Source object (device/port) on which the alarm was triggered.

Severity

The severity of the alarm.

Description

Description of the alarm.

Date/Time

The time when the alarm was triggered.

Reason

Reason for the alarm.

Count

Number of instances that the alarm occurred on the related source object.

Events Tab

This tab provides a list of the UFM events that are related to the selected device.

Events Data

Data TypeDescription

Severity

Event severity – Info, Warning, Error, Critical or Minor.

Event Name

The name of the event.

Source

The source object (device/port) on which the event was triggered.

Date/Time

The time when the event was triggered.

Category

The category of the event indicated by icons. Hovering over the icon will display the category name.

Description

Description of the event. Full description can be displayed by hovering over the text.

Inventory Tab

This tab provides a list of the device’s modules with information in a tabular format.

This tab is available for switches only. 


Inventory Data

Data TypeDescription

Health

Health of the module reflecting the highest alarm severity. Please refer to the Health States table.

Status

The module status.

Serial Number

Serial number of the module.

Name

Name of the device.

Description

Description of the module.

Type

Type of the module: spine/line/etc.

Firmware Version

Firmware version installed on the module.

Hardware Version

Hardware version of the module.

Temperature

Temperature of the module.

HCAs Tab

This tab provides a list of the device’s HCAs with information in a tabular format.

This tab is available for hosts only.


Data TypeDescription

Health

Health of the HCA reflecting the highest alarm severity. Please refer to the Health States table.

Name

HCA Index

GUID

HCA GUID

Type

HCA Type

Port GUID

HCA ports GUIDs

PSID

HCA PSID

FW Version

HCA firmware version

Device Access Tab

This tab allows for managing the access credentials of the selected device for remote accessibility. To be able to set access credentials for the device, a device IP must be set either by installing UFM Agent on the device, or by manually setting the IP under IP Address Settings (IP is now supported with v4 and v6). 

After manually setting the IP address of NVIDIA® Mellanox® InfiniScale IV® and SwitchX® based switches, UFM will first validate the new IP before setting it.

To edit your device access credentials

  1. Select the preferred protocol tab:
    • SSH – allows you to define the SSH parameters to open an SSH session on your device (available for nodes and switches)
    • IPMI – allows you to set the IPMI parameters to open an IPMI session on your device for remote power control (available for nodes only)
    • HTTP – allows you to define the HTTP parameters to open an HTTP session on your device (available for switches only)
  2. Click Update to save your changes.

Device Access Credentials Parameters

FieldDescription

User

Fill in or edit the computer user name.

Password

Enter the device password.

Confirmation

Enter the device password a second time to confirm.

Manual IP

Enter the device IP address (could be IPv4/IPv6).

Port

Enter the port number.

Timeout

Enter the connection timeout (in seconds) for the device specific protocol (SSH/HTTP/IPMI).

Virtual Networking Tab

This tab displays a map containing the HCAs for the selected device, and the ports and virtual ports it is connected to.