High-Speed Fabric Management#

InfiniBand Fabrics#

The high-speed InfiniBand fabrics are managed with NVIDIA Unified Fabric Manager (UFM). UFM is a powerful platform for managing scale-out computing environments. UFM enables data center operators to efficiently monitor and operate the entire fabric, boost application performance, and maximize fabric resource utilization.

While other tools are device-oriented and involve manual processes, UFM automated and application-centric approach bridges the gap between servers, applications, and fabric elements, thus enabling administrators to manage and optimize from the smallest to the largest and most performance-demanding clusters.

UFM Dashboard#

The dashboard window summarizes the fabric’s status, including events, alarms, errors, traffic and statistics.

UFM Dashboard

UFM InfiniBand fabric ports view#

Ports view provides a list of all ports in InfiniBand fabric and their speed/width and status.

UFM InfiniBand Fabric Ports View

Verifying that UFM is running#

Use the service ufmha status command to verify UFM is running:

ufm001# service ufmha status
ufmha status
========================================
Local Host
Server              ufm001
Kernel              3.10.0-1127.19.1.el7.x86_64
IP Address          10.166.130.31
HA Interface        bond0
DRBD Partition      /dev/sda6
Heartbeat           Master
Mysql               Running
UFM Server          Running
DRBD State          Primary
DRBD Device State   UpToDate
========================================
Remote Host
Server              ufm002
Kernel              3.10.0-1127.19.1.el7.x86_64
IP Address          10.166.130.32
HA Interface        bond0
DRBD Partition      /dev/sda6
Heartbeat           Slave
Mysql               Stopped
UFM Server          Stopped
DRBD State          Secondary
DRBD Device State   UpToDate
========================================
Virtual IP          10.166.130.58/24
Broadcast IP        10.166.130.255
========================================

Refer to http://nvidia.com/en-us/networking/infiniband/ufm/ for UFM documentation.