InfiniBand Cluster Bring-up Procedure
InfiniBand Cluster Bring-up Procedure

UFM Events and Alarms

Using UFM events and alarms, it allows you to identify any problems including ports and device connectivity.

Problems can be detected both prior to running applications and during standard operation.

Events trigger alarms (except for "normal" events. i.e., Info events) when they exceed a predefined threshold.

For more information, see UFM user manual.

UFM alerts can detect a lot of scenarios, for example, bad link, low bandwidth, duplicate GUIDs, non-responsive switch, etc.

For the scenario list and explanation about how to detect and solve the issue, refer to list of scenarios.

image-2024-4-17_17-27-41-version-1-modificationdate-1716821933833-api-v2.png

© Copyright 2024, NVIDIA. Last updated on May 28, 2024.