Managing High-Speed Fabrics

The high-speed InfiniBand fabrics are managed with NVIDIA Unified Fabric Manager (UFM). UFM is a powerful platform for managing scale-out computing environments. UFM enables data center operators to efficiently monitor and operate the entire fabric, boost application performance, and maximize fabric resource utilization.

While other tools are device-oriented and involve manual processes, UFM automated and application-centric approach bridges the gap between servers, applications, and fabric elements, thus enabling administrators to manage and optimize from the smallest to the largest and most performance-demanding clusters.

The dashboard for UFM is shown in Figure 14.

Figure 14. UFM Dashboard

_images/monitoring-cluster-07.png

Verifying that UFM is Running

Use the service ufmha status command to verify UFM is running:

 1ufm001# service ufmha status
 2ufmha status
 3========================================
 4Local Host
 5Server               ufm001
 6Kernel               3.10.0-1127.19.1.el7.x86_64
 7IP Address           10.166.130.31
 8HA Interface         bond0
 9DRBD Partition       /dev/sda6
10Heartbeat            Master
11Mysql                Running
12UFM Server           Running
13DRBD State           Primary
14DRBD Device State    UpToDate
15========================================
16Remote Host
17Server               ufm002
18Kernel               3.10.0-1127.19.1.el7.x86_64
19IP Address           10.166.130.32
20HA Interface         bond0
21DRBD Partition       /dev/sda6
22Heartbeat            Slave
23Mysql                Stopped
24UFM Server           Stopped
25DRBD State           Secondary
26DRBD Device State    UpToDate
27========================================
28Virtual IP           10.166.130.58/24
29Broadcast IP         10.166.130.255
30========================================

Refer to https://support.mellanox.com/s/productdetails/a2v50000000XcP4AAK/ufm for the full documentation.