Subnet Manager High Availability

Warning

All nodes in an SM HA subnet must be of the same CPU type (e.g. x86), and must run the same MLNX-OS version.

High availability (HA) refers to a system or component that is continuously operational for a desirably extended period of time.

SM_HA_Subnet.png

NVIDIA Subnet Manager (SM) HA reduces subnet downtime and disruption as it is continuously operational for a desirably long length of time. It assures continuity of the work even when one of the SMs dies. The database is synchronized with all the nodes participating in the InfiniBand subnet and a configuration change is prepared. The synchronization is done out-of-band using an Ethernet management network.

NVIDIA SM HA allows the systems’ manager to enter and modify all InfiniBand SM configuration of different subnet managers from a single location. It creates an InfiniBand subnet and associates all the NVIDIA management appliances that are attached to the same InfiniBand subnet into that InfiniBand subnet ID. All subnet managers can be controlled, started, or stopped from this address.

All the nodes that participate in the NVIDIA SM HA are joined to the InfiniBand subnet ID and once joined, the synchronized SMs are launched. One of the nodes is elected as Master and the others are Slaves (or down). NVIDIA SM HA uses an IP address (VIP) that is always directed to the SM HA master to monitor the SM state and to verify that all configurations are executed.

Warning

When transitioning from standalone into a group or vice versa, a few seconds are required for the node state to stabilize. During that time, group feature commands (e.g. SM HA commands) should not be executed. To run group features, wait for the CLI prompt to turn into [standalone:master], [<group>:master] or [<group>:standby] instead of [standalone:*unknown*] or [<group>:*unknown*].

An InfiniBand subnet is formed by a network of InfiniBand nodes interconnected via InfiniBand switches. It includes all systems that can run an SM and is part of the SM HA domain. A switch that can potentially run an SM must be a member of an InfiniBand subnet ID to be associated with the NVIDIA SM HA domain. An IB subnet is recognized by its ID which is used by the system to either join or leave the subnet.

Every system that is not associated to an existing IB subnet (has never been part of an IB subnet or has left an existing one) or does not have MLNX-OS license installed, is by default associated to a subnet called “Standalone”.

In order to create, join or leave an InfiniBand subnet, one may use the following commands:

  • Create – “ib ha <IB_subnet_ID> ip <ip_addr> <netmask>”

  • Join – “ib ha <IB_subnet_ID>”

  • Leave – “no ib ha”

Warning

When leaving an SM HA cluster, SM configuration is not saved on the node leaving the cluster. After leaving, the configuration is reset to its default values.

For further information see section “Creating and Adding Systems to an InfiniBand Subnet ID”.

MLNX-OS centralized management infrastructure enables the user to configure or modify an existing configuration and monitor the subnet running status. MLNX-OS centralized management IP (VIP) is defined when a new subnet manager is created by running the command “ib ha <IB_subnet_ID> ip <ip_addr> <netmask>”. The created VIP is used as the current subnet master’s alias thus, assumes the same roles as the master.

The VIP always points to one of the systems part of the SM HA domain. It is always active even if one or more of the members are down. For example:

Copy
Copied!
            

switch (config) # ib ha subnet2 ip 192.168.10.110 255.255.255.0

A node is an InfiniBand switch system. Every node member of an IB subnet ID has one of the following roles:

  • Master – the node that manages SM configurations and provides services to the Virtual IP (VIP) addresses

  • Standby – the node that replaces the Master node and takes over its responsibilities once the Master node is down

  • Offline – has run an SM in the past and is currently offline, or it was created manually by the “ib smnode <node name> create” command. If the node has been removed from the environment, you can remove it from the list with the “no ib smnode xxx” command.

To see the mode of the current node, look at the CLI prompt for the following format:

Copy
Copied!
            

<host name> [<subnet ID>:<mode>] [standalone: master] (config) #

For example:

Warning

switch [ibstandalone: master] (config) #

To see a list of the existing nodes and details about the running state, run the command “show ib smnodes {brief}”.

The IP is used to configure or modify the existing configuration and monitor the subnet running status. To configure your IP, run the command “ib ha <IB_subnet_ID> ip <ip_addr> <netmask>”:

Copy
Copied!
            

switch [standalone: master] (config) # ib ha subnet2 ip 192.168.10.110 255.255.255.0 switch [subnet2: master] (config) #

To create and add systems to a subnet:

  1. Log into the system from which you intend to create the subnet.

  2. Enter config mode. Run:

    Copy
    Copied!
                

    switch [standalone: master] > switch [standalone: master] > enable switch [standalone: master] # configure terminal

  3. Create a new subnet using the command “ib ha <IB_subnet_ID> ip <ip_addr> <netmask>”. Run:

    Copy
    Copied!
                

    switch [standalone: master] (config) # ib ha subnet2 ip 192.168.10.110 255.255.255.0 switch [subnet2: master] (config) #

    Warning

    You must run the “ib ha <IB_subnet_ID> ip <ip_addr> <netmask>” command only once per subnet ID.

  4. Log into the system that you are going to join to the new created subnet.

  5. Join the system to the subnet, using the “ib ha <IB_subnet_ID>” command. Run:

    Copy
    Copied!
                

    switch [standalone: master] (config) # ib ha subnet2 switch [subnet2: standby] (config) #

In instances where the SM configuration becomes corrupted or the subnet manager cannot raise any logical links it is suggested that you restore the default SM configuration.

To restore subnet manager configuration:

  1. Enter config mode. Run:

    Copy
    Copied!
                

    *switch [subnet2: master] > enable *switch [subnet2: master] # configure terminal *switch [subnet2: master] (config) #

  2. Run the command “ib sm reset-config”. Run:

    Copy
    Copied!
                

    *switch [subnet2: master] (config) # ib sm reset-config

    Warning

    The asterisk in the example above (*switch) indicates the local system from where the command is running.

In order to receive information on the running state of a specific node one could run one of the following commands with its requested parameter:

  • show ib smnode <name> sm-running

  • show ib smnode <name> sm-state

  • show ib smnode <name> sm-priority

  • show ib smnode <name> active

  • show ib smnode <name> ha-state

  • show ib smnode <name> ha-role

Subnet Manager Configuration

To configure the subnet manager, log into the centralized management IP (VIP). Once the SM configuration is created, the SM database is duplicated to the other nodes.

Warning

The SM must be configured from MLNX-OS centralized management IP (VIP). All the configurations that are not created or modified in the master node (using the VIP) are overridden by the master configuration.

The user can configure different SM parameters such as where to run the SM(s) or the SM priority by running the commands according to the desired action.

NVIDIA High Availability and OpenSM Handover/Failover

Warning

NVIDIA products are fully compliant and interoperable with OpenSM.

Once an SM fails, the SM which takes over the subnet needs to reproduce the internal state of the failed master. Most of the information required is obtained by scanning the subnet and extracting the information from the devices. However, some information which is not stored directly in the network devices cannot be reproduced this way. InfiniBand management architecture limits such information to data exchanged between clients (either user-level programs or kernel modules) and the Subnet Administration (SA) service (attached to the SM). The SA keeps this set of client registrations in an internal data structure called SA-DB. The SA-DB information includes the multicast groups, the multicast group members, subscriptions for event forwarding and service records.

The new SM may retrieve the SA-DB by requesting the clients to re-register with the SA or by obtaining a copy of the previous master SM internal SA-DB via an SA-DB dump file. The client-re-registration offers database correctness and the SA-DB dump file replication provides lower setup time. Client re-registration is required since the SA-DB may not be up-to-date on the registrations listed in the master SM.

Furthermore, since the SM does not maintain SA-DB information for unknown nodes, it is very possible that some of the SA-DB information relating to nodes momentarily disconnected from the master SM become purged. Therefore, these nodes must re-register with the new SM when they are reconnected (they receive a client-re-register request from the SM). Relying only on client re-registration is also non-optimal as it takes some time to recreate the entire SA-DB and the network state.

NVIDIA SM HA replicates the SA-DB dump file from the current master SM to all the standby SMs running on NVIDIA switches. The SA-DB dump file replication provides further optimization to the standby SM that becomes master.

Standby SM loads the existing SA-DB file the old master has used. By using the existing SA-DB the amount of processing needed on client re-registration is lessened resulting in a reduced time to complete setting up the network.

Warning

SM HA does not replace InfiniBand spec requirement for client re-registration.

Warning

When running an SM HA cluster with more than 2 active OpenSM instances, IB multicast applications need to support client re-register or they may not work correctly after OpenSM failover.

ib ha

ib ha <IB_subnet_ID> [ip <IP address> <subnet mask> [force]]
no ib ha

Creates a subnet <IB_subnet_ID> with the specified IP.
The no form of the command removes this node from an InfiniBand subnet ID.

Syntax Description

IB subnet ID

Simple group name for shared IB config

ip <IP address>

Assigns management IP address

netmask

Netmask (e.g. 255.255.255.0 or /24)

force

Joins if exists or creates if not

Default

N/A

Configuration Mode

config

History

3.1.0000

Example

switch (config) # ib ha my-subnet

Related Commands

show ib ha

Notes

A new subnet may be joined only after leaving the current one

ib smnode

ib smnode <hostname> [create | disable | enable | sm-priority <priority>]
no ib smnode <hostname> [create | disable | enable | sm-priority]

Manages HA SM.
The no form of the command removes HA SM node configuration.

Syntax Description

hostname

Specifies <hostname> SM configuration to modify.

create

Creates SM configuration for selected node.

disable

Makes SM inactive on selected node.

enable

Makes SM active on selected node.

sm-priority <priority>

Sets SM selected node priority (0=low, 15=high).

Default

N/A

Configuration Mode

config

History

3.1.0000

Example

switch (config) # ib smnode switch-1133ce create

Related Commands

show ib smnode
show ib smnodes

Notes

show ib smnode

show ib smnode <hostname> {active | ha-role | ha-state | ip | sm-priority | sm-running | sm-state}

Displays SM High availability information.

Syntax Description

hostname

Specifies <hostname> SM configuration to display

active

Displays whether <hostname> is currently active

ha-role

Displays the High Availability role of <hostname>. Possible return values are: offline, unknown, master, standby, or disabled

ha-state

Possible return values are: offline, init, searching, joining, online, creating, waiting, leaving, join-sync, failed, removed, or regroup

ip

Displays the local management IP address associated with the active node, <hostname>. If <hostname> is not active, the command displays “offline”

sm-priority

Displays the SM priority for SM running on <hostname>

sm-running

Displays if <hostname> has an SM running. The command will display “active” (that is, SM is running) only if <hostname> is currently active, has a license, is enabled as a potential SM, is active as SM, and if there is a maximum of 2 SMs in the fabric.

sm-state

Displays if SM is enabled to run on <hostname>

Default

N/A

Configuration Mode

config

History

3.1.0000

3.8.1000

Updated Syntax Description

Example

switch (config) # show ib smnode my-hostname sm-state
enabled

Related Commands

show ib smnodes

Notes

show ib smnodes

show ib smnodes [brief]

Displays SM High availability information.

Syntax Description

brief

Displays information on all HA nodes

Default

N/A

Configuration Mode

config

History

3.1.0000

3.8.1000

Updated example

3.9.3100

Updated output to reflect the OpenSM master also when the command is triggered from non-SM master

Example

switch (config) # show ib smnodes

HA state of switch infiniband-default:
IB Subnet HA name: Mantaray142
HA IP address : 10.7.145.141/24
Active HA nodes : 2

HA node local information:
Name : Mantaray142 (active) <--- (local node)
SM-HA state: standby
SM Running : stopped
SM Enabled : enabled - master
SM Priority: 0
IP : 10.7.144.142

HA node local information:
Name : Mantaray141 (active)
SM-HA state: master
SM Running : stopped
SM Enabled : disabled
SM Priority: 0
IP : 10.7.144.141

Related Commands

Notes

show ib ha

show ib ha [brief]

Displays information about all the systems that are active or might be able to run SM.

Syntax Description

brief

Displays brief HA information

Default

N/A

Configuration Mode

config

History

3.1.0000

3.9.1000

Updated example

Example

switch (config) # show ib ha
Global HA state:
IB Subnet HA name: Barracuda-s
HA IP address : 10.7.48.100/24
Active HA nodes : 2
HA node local information:    Name                     : barracuda-216 (active)  <--- (local node)    SM-HA state              : standby    IP                       : 10.7.48.50    Virtual switch membership: infiniband-default
HA node local information:    Name                     : barracuda-217 (not active)    IP                       : offline    Virtual switch membership: infiniband-default
HA node local information:    Name                     : scorpionib2-19 (active)    SM-HA state              : master    IP                       : 10.7.51.169    Virtual switch membership: infiniband-default

switch (config) # show ib ha brief
Global HA state:
IB Subnet HA name: Barracuda-s
HA IP address : 10.7.48.100/24
Active HA nodes : 3
-----------------------------------------------------------------------------------------  ID                 Local node   SM-HA state   IP             Virtual switch membership  ---------------------------------------------------------------------------------------  barracuda-216      *            standby       10.7.48.50     infiniband-default  barracuda-217                   standby       10.7.48.51     infiniband-default  scorpionib2-19                  master        10.7.51.169    infiniband-default

Related Commands

Notes

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.