High-Speed Fabric Management#

InfiniBand Fabrics#

The high-speed InfiniBand fabrics are managed with NVIDIA Unified Fabric Manager (UFM). UFM is a powerful platform for managing scale-out computing environments. UFM enables data center operators to efficiently monitor and operate the entire fabric, boost application performance, and maximize fabric resource utilization.

While other tools are device-oriented and involve manual processes, UFM automated and application-centric approach bridges the gap between servers, applications, and fabric elements, thus enabling administrators to manage and optimize from the smallest to the largest and most performance-demanding clusters.

UFM Dashboard#

The following figure shows the dashboard window that summarizes the fabric’s status, including events, alarms, errors, traffic and statistics.

UFM Dashboard

UFM InfiniBand Fabric Ports View#

The Ports view provides a list of all ports in the InfiniBand fabric and their speed, width, and status.

UFM InfiniBand Fabric Ports View

Verifying that UFM is Running#

To verify that UFM is running, use the following service ufmha status command:

ufm001# service ufmha status
ufmha status
========================================
Local Host
Server              ufm001
Kernel              3.10.0-1127.19.1.el7.x86_64
IP Address          10.166.130.31
HA Interface        bond0
DRBD Partition      /dev/sda6
Heartbeat           Master
Mysql               Running
UFM Server          Running
DRBD State          Primary
DRBD Device State   UpToDate
========================================
Remote Host
Server              ufm002
Kernel              3.10.0-1127.19.1.el7.x86_64
IP Address          10.166.130.32
HA Interface        bond0
DRBD Partition      /dev/sda6
Heartbeat           Slave
Mysql               Stopped
UFM Server          Stopped
DRBD State          Secondary
DRBD Device State   UpToDate
========================================
Virtual IP          10.166.130.58/24
Broadcast IP        10.166.130.255
========================================

For more information about UFM, see: http://nvidia.com/en-us/networking/infiniband/ufm/.