Events & Alarms

NVIDIA UFM Enterprise User Manual v6.17.2
Note

All information in a tabular format in UFM web UI can be exported into a CSV file.

UFM allows you to identify any problem, including ports and device connectivity, using events and alarms. Problems can be detected both before running applications and during standard operation.

Events trigger alarms (except for “normal” events. i.e., Info events) when they exceed a predefined threshold. Events and alarms can be configured under Events Policy tab under Settings window. For more information, refer to Events Policy Tab.

image2022-4-28_22-39-44-version-1-modificationdate-1718614161447-api-v2.png

image2022-4-28_22-40-4-version-1-modificationdate-1718614162767-api-v2.png

Users can enable the events persistency mechanism from the gv.cfg. This allows the user to see the events in the case of restarting the UFM or in HA mode.

Note

Alternatively you can run the following commands:

  • ufm events persistency enable

  • ufm events max-restored

The persistency is deactivated by default and can be enabled by the following controlled parameters in the config file:

  • max_restored_events = 50 # – will determine the number of events to restore

  • events_persistency_enabled = true # – will set to true for the feature to work

The Device Status Events tab displays topology change events related to devices in a table. it will support the following event types:

  • None is Up/Down

  • Switch is Up/Down

  • Director Switch is Up/Down

devicestatusevents1-version-1-modificationdate-1718614165780-api-v2.PNG

Filters are be provided to allow events filtering by the desired time interval with a length limit.

devicestatusevents2-version-1-modificationdate-1718614164427-api-v2.PNG

The Link Status Events tab displays topology change events related to links in a table. It supports the following event type:

  • Link is Up/Down

    linkstatusevents-version-1-modificationdate-1718614169617-api-v2.PNG

Filters are provided to allow filtering by the desired time interval in a time range.

linkstatusevents2-version-1-modificationdate-1718614166617-api-v2.PNG

linkstatusevents3-version-1-modificationdate-1718614166307-api-v2.PNG

Note

The related switch context menu is displayed only if the event type is ‘Switch is Up/Down’. Other event types show the default context menu, which is ‘Copy Cell’.

Cable Transceiver Temperatures

The UFM has alarms that notify the user in cases where an active cable overheats/overcools. The UFM uses ibdiagnet to get cable temperature analysis and report exceptions via the Alarms view.

Related events:

  • 919 for high cable temperature

  • 920 for low cable temperature

GUI Views

Alarms

image2022-4-28_12-46-33-version-1-modificationdate-1718614153090-api-v2.png


Event Policy

image2022-4-28_12-48-9-version-1-modificationdate-1718614152073-api-v2.png

© Copyright 2024, NVIDIA. Last updated on Jun 27, 2024.