Managing High-Speed Fabrics#
The high-speed InfiniBand fabrics are managed with NVIDIA Unified Fabric Manager (UFM). UFM is a powerful platform for managing scale-out computing environments. UFM enables data center operators to efficiently monitor and operate the entire fabric, boost application performance, and maximize fabric resource utilization.
While other tools are device-oriented and involve manual processes, UFM automated and application-centric approach bridges the gap between servers, applications, and fabric elements, thus enabling administrators to manage and optimize from the smallest to the largest and most performance-demanding clusters.
The dashboard for UFM is shown in Figure 14.
Figure 14. UFM Dashboard
Verifying that UFM is Running#
Use the service ufmha status
command to verify UFM is running:
1ufm001# service ufmha status
2ufmha status
3========================================
4Local Host
5Server ufm001
6Kernel 3.10.0-1127.19.1.el7.x86_64
7IP Address 10.166.130.31
8HA Interface bond0
9DRBD Partition /dev/sda6
10Heartbeat Master
11Mysql Running
12UFM Server Running
13DRBD State Primary
14DRBD Device State UpToDate
15========================================
16Remote Host
17Server ufm002
18Kernel 3.10.0-1127.19.1.el7.x86_64
19IP Address 10.166.130.32
20HA Interface bond0
21DRBD Partition /dev/sda6
22Heartbeat Slave
23Mysql Stopped
24UFM Server Stopped
25DRBD State Secondary
26DRBD Device State UpToDate
27========================================
28Virtual IP 10.166.130.58/24
29Broadcast IP 10.166.130.255
30========================================
Refer to https://support.mellanox.com/s/productdetails/a2v50000000XcP4AAK/ufm for the full documentation.