Managing High-Speed Fabrics
The high-speed InfiniBand fabrics are managed with NVIDIA Unified Fabric Manager (UFM). UFM is a powerful platform for managing scale-out computing environments. UFM enables data center operators to efficiently monitor and operate the entire fabric, boost application performance, and maximize fabric resource utilization.
While other tools are device-oriented and involve manual processes, UFM automated and application-centric approach bridges the gap between servers, applications, and fabric elements, thus enabling administrators to manage and optimize from the smallest to the largest and most performance-demanding clusters.
The dashboard for UFM is shown in Figure 14.
Figure 14. UFM Dashboard
Verifying that UFM is Running
Use the service ufmha status
command to verify UFM is running:
1ufm001# service ufmha status
2ufmha status
3========================================
4Local Host
5Server ufm001
6Kernel 3.10.0-1127.19.1.el7.x86_64
7IP Address 10.166.130.31
8HA Interface bond0
9DRBD Partition /dev/sda6
10Heartbeat Master
11Mysql Running
12UFM Server Running
13DRBD State Primary
14DRBD Device State UpToDate
15========================================
16Remote Host
17Server ufm002
18Kernel 3.10.0-1127.19.1.el7.x86_64
19IP Address 10.166.130.32
20HA Interface bond0
21DRBD Partition /dev/sda6
22Heartbeat Slave
23Mysql Stopped
24UFM Server Stopped
25DRBD State Secondary
26DRBD Device State UpToDate
27========================================
28Virtual IP 10.166.130.58/24
29Broadcast IP 10.166.130.255
30========================================
Refer to https://support.mellanox.com/s/productdetails/a2v50000000XcP4AAK/ufm for the full documentation.