UFM Failover to Another Port

NVIDIA UFM-SDN Appliance User Manual v4.16.0

When failure is detected on an InfiniBand port or link, failover occurs without stopping the UFM®-SDN Appliance or other related UFM services, such as mysql, http, DRDB, and so on.
When the UFM®-SDN Appliance is connected by two or more InfiniBand ports to the fabric, you can configure UFM Subnet Manager failover to one of the other ports. When failure is detected on the InfiniBand port or link, failover occurs without stopping the UFM®-SDN Appliance or other related UFM services, such as mysql, http, DRDB, and so on. This failover process prevents failure in a standalone setup, and preempts failover in a High Availability setup, thereby saving downtime and recovery.

Network Configuration for Failover to IB Port

image2019-8-22_13-20-40-version-1-modificationdate-1566505242597-api-v2.png

Note

UFM SM failover is not relevant for Monitoring mode, because in this mode, UFM must be connected to the fabric over ib0 only.

When failure is detected on an InfiniBand port or link, UFM®-SDN Appliance initiates the give-up operation that is defined in the Health configuration file for OpenSM failure. By default:

  • UFM-SDN Appliance discovers the other ports in the specified bond and fails over to the first interface that is up (SM failover)

  • If no interface is up:

    • In an HA setup, UFM initiates UFM failover.

    • In a standalone setup, UFM®-SDN Appliance does nothing.

If the failed link becomes active again, UFM®-SDN Appliance will select this link for the SM only after SM restart.

© Copyright 2024, NVIDIA. Last updated on May 24, 2024.