Events and Alarms

NVIDIA UFM Enterprise User Manual v6.17.1 Download PDF

UFM offers comprehensive diagnostics for your InfiniBand fabric, covering a range of categories:

  1. Fabric configurations

  2. Fabric topology

  3. Hardware issues

  4. Communication errors

  5. Maintenance

  6. Security

  7. Switch module status

  8. NVIDIA SHARP notifications

Events are notifications generated by UFM, indicating issues within the mentioned categories in the InfiniBand fabric. On the other hand, alerts are urgent notifications derived from events (many events can be configured as alarms based on customer preferences).

These detections are performed both before running applications and during standard operation. They help troubleshoot and notify network administrators of potential network issues before they escalate.

Events can originate from various sources:

  • SM traps

  • SHARP AM traps

  • UFM internal analysis, encompassing:

    • Internal detection of topology changes

    • Internal fabric analysis (based on IBDiagnet)

    • Internal monitoring of managed switches

    • Maintenance activities (device action tracking, licensing, cable integrity)

  • Threshold-crossing events determined by telemetry counter readings

WebUI

REST API

Events

UFM events can be viewed via the Events and Alarms WebUI view. Refer to Events & Alarms.

Events REST API

For device-specific events, refer to the Events & Alarms.

N/A

Configuration of events is managed within the Events Policy Tab in the Settings window

Events Policy REST API

Alarms

UFM alarms can be viewed via the Events and Alarms WebUI view. Refer to Events & Alarms.

Alarms REST API

Configuration of alarms is managed within the Events Policy Tab in the Settings window

N/A

For showing all the UFM supported events, refer to Threshold-Crossing Events Reference.

© Copyright 2024, NVIDIA. Last updated on Jun 18, 2024.