Initial Configuration
After installing the UFM server software, before running UFM, perform the following:
Optional: Quality of Service.
Optional: Activate and Enable Lossy Configuration Manager (Advanced License Only).
Optional: Activate and Enable Congestion Control Manager (Advanced License Only)
Configure general settings in the conf/gv.cfg file.
When running UFM in HA mode, the gv.cfg file is replicated to the standby server.
Configuring Fabric Interface
Fabric interface should be set to one of the InfiniBand IPoIB interfaces which connect the UFM/SM to the fabric:
fabric_interface = ib0
By default, fabric_interface is set to ib0
fabric_interface must be up and running before UFM startup, otherwise, UFM will not be able to run
fabric_interface must be configured with a valid IPv4 address before UFM startup, otherwise, UFM will not be able to run
Running UFM in Monitoring Mode
monitoring_mode = yes
For more information, see Running the UFM Software in Monitoring Mode.
Enabling Predefined Groups
enable_predefined_groups = true
By default, pre-defined groups are enabled. In very large-scale fabrics, pre-defined groups can be disabled in order to allow faster startup of UFM.
Enabling Multi-NIC Host Grouping
multinic_host_enabled = true
Upon first installation of UFM 6.4.1 and above, multi-NIC host grouping is enabled by default. However, if a user is upgrading from an older version, then this feature will be disabled for them.
It is recommended to set the value of this parameter before running UFM for the first time.
Running UFM SM Only (UFM HA with additional SMs)
management_mode = sm_only
Running UFM Over IPv6 Network Protocol
The default multicast address is configured to an IPv4 address. To run over IPv6, this must be changed to the following in section UFMAgent of gv.cfg.
[UFMAgent]
...
# if ufmagent works in ipv6 please set this multicast address to FF05:0:0:0:0:0:0:15F
mcast_addr = FF05:0:0:0:0:0:0:15F
Adding SM Plugin (e.g. lossymgr) to event_plugin_name Option
# Event plugin name(s)
event_plugin_name osmufmpi lossymgr
Add the plug-in options file to the event_plugin_options option:
# Options string that would be passed to the plugin(s)
event_plugin_options --lossy_mgr -f <lossy-mgr-options-file-name>
These plug-in parameters are copied to the opensm.conf file in Management mode only.
Enabling SHARP Aggregation Manager
SHARP Aggregation Manager is disabled by default. To enable it, set:
[Sharp]
sharp_enabled = true
Upon startup of UFM or SHARP Aggregation Manager, UFM will resend all existing tenant allocation to SHARP AM.
Multi-port SM
SM can use up to eight-port interfaces for fabric configuration. These interfaces can be provided via /opt/ufm/conf/gv.cfg. The users can specify multiple IPoIB interfaces or bond interfaces in /opt/ufm/conf/gv.cfg, subsequently, the UFM translates them to GUIDs and adds them to the SM configuration file (/opt/ufm/conf/opensm/opensm.conf). If users specify more than eight interfaces, the extra interfaces are ignored.
[Server]
# True/false flag to configure OpenSM with multiple GUIDs
enable_multi_port_sm = false
# When enabling multi_port_sm, specify here the additional fabric interfaces for opensm.conf
# Example: ib1,ib2,ib5 (OpenSM will support the first 8 GUIDs where first GUID are extracted
# from the fabric_interface field. The remaining GUIDs are taken from the
# additional_fabric_interfaces field.
additional_fabric_interfaces =
UFM treats bonds as a group of IPoIB interfaces. So, for example, if bond0 consists of the interfaces ib4 and ib8, then expect to see GUIDs for ib4 and ib8 in opensm.conf.
Duplicate interface names are ignored (e.g. ib1,ib1,ib1,ib2,ib1 = ib1,ib2).
Configuring UDP Buffer
This section is relevant only in cases where telemetry_provider=ibpm. (By default, telemetry_provider=telemetry).
To work with large-scale fabrics, users should set the set_udp_buffer flag under the [IBPM] section to "yes" for the UFM to set the buffer size (default is "no").
# By deafult, UFM does not set the UDP buffer size. For large scale fabrics
# it is recommended to increase the buffer size to 4MB (4194304 bits).
set_udp_buffer = yes
# UDP buffer size
udp_buffer_size = 4194304
Virtualization
This allows for supporting virtual ports in UFM.
[Virtualization]
# By enabling this flag, UFM will discover all the virtual ports assigned for all hypervisors in the fabric
enable = false
# Interval for checking whether any virtual ports were changed in the fabric
interval = 60
Static SM LID
Users may configure a specific value for the SM LID so that the UFM SM uses it upon UFM startup.
[SubnetManager]
# 1- Zero value (Default): Disable static SM LID functionality and allow the SM to run with any LID.
# Example: sm_lid=0
# 2- Non-zero value: Enable static SM LID functionality so SM will use this LID upon UFM startup.
sm_lid=0
To configure an external SM (UFM server running in sm_only mode), users must manually configure the opensm.conf file (/opt/ufm/conf/opensm/opensm.conf) and align the value of master_sm_lid to the value used for sm_lid in gv.cfg on the main UFM server.
Maximum Live Telemetry Sessions
In the gv.cfg configuration file, it is possible to set a limit on the number of live telemetry sessions running in parallel using the field max_live_sessions.
[Telemetry]
# max parallel user live sessions
max_live_sessions=3
# UFM’s provider of telemetry (counters). possible values:telemetry, ibpm
telemetry_provider=telemetry
Configuring Log Rotation
This section enables setting up the log files rotate policy. By default, log rotation runs once a day by cron scheduler.
[logrotate]
#max_files specifies the number of times to rotate a file before it is deleted (this definition will be applied to
#SM and SHARP Aggregation Manager logs, running in the scope of UFM).
#A count of 0 (zero) means no copies are retained. A count of 15 means fifteen copies are retained (default is 15)
max_files = 15
#With max_size, the log file is rotated when the specified size is reached (this definition will be applied to
#SM and SHARP Aggregation Manager logs, running in the scope of UFM). Size may be specified in bytes (default),
#kilobytes (for example: 100k), or megabytes (for exapmle: 10M). if not specified logs will be rotated once a day.
max_size = 3
Configuration Examples in gv.cfg
The following show examples of configuration settings in the gv.cfg file:
Polling interval for Fabric Dashboard information
ui_polling_interval = 30
[Optional] UFM Server local IP address resolution (by default, the UFM resolves the address by gethostip). UFM Web UI should have access to this address.
ws_address = <specific IP address>
HTTP/HTTPS Port Configuration
# WebServices Protocol (http/https) and Port ws_port = 8088 ws_protocol = http
Connection (port and protocol) between the UFM server and the APACHE server
ws_protocol = <http or https> ws_port = <port number>
For more information, see Launching a UFM Web UI Session.
SNMP get-community string for switches (fabric wide or per switch)
# default snmp access point for all devices [SNMP] port = 161 gcommunity = public
Enhanced Event Management (Alarmed Devices Group)
[Server] auto_remove_from_alerted = yes
Log verbosity
[Logging] # optional logging levels #CRITICAL, ERROR, WARNING, INFO, DEBUG level = INFO
For more information, see "UFM Logs".
Settings for saving port counters to a CSV file
[CSV] write_interval = 60 ext_ports_only = no
For more information, see "Saving the Port Counters to a CSV File".
Max number of CSV files (UFM Advanced)
[CSV] max_files = 1
For more information, see "Saving Periodic Snapshots of the Fabric (Advanced License Only)".
WarningThe access credentials that are defined in the following sections of the conf/gv.cfg file are used only for initialization:
SSH_Server
SSH_Switch
TELNET
IPMI
SNMP
MLNX_OS
To modify these access credentials, use the UFM Web UI. For more information, see "Device Access".
Configuring the UFM communication protocol with MLNX-OS switches. The available protocols are:
http
https (default protocol for secure communication)
For configuring the UFM communication protocol after fresh installation and prior to the first run, set the MLNX-OS protocol as shown below.
Example:
[MLNX_OS]
protocol = https
port = 443
Once UFM is started, all UFM communication with MLNX-OS switches will take place via the configured protocol.
For changing the UFM communication protocol while UFM is running, perform the following:
Set the desired protocol of MLNX-OS in the conf/gv.cfg file (as shown in the example above).
Restart UFM.
Update the MLNX-OS global access credentials configuration with the relevant protocol port. Refer to "Device Access" for help.
For the http protocol - default port is 80.
For the https protocol - default port is 443.Update the MLNX-OS access credentials with the relevant port in all managed switches that have a valid IP address.
SM Trap Handler Configuration
The SMTrap handler is the SOAP server that handles traps coming from OpenSM.
There are two configuration values related to this service:
osm_traps_debounce_interval – defines the period the service holds incoming traps
osm_traps_throttle_val – once osm_traps_debounce_interval elapses, the service transfers osm_traps_throttle_val to the Model Main
By default, the SM Trap Handler handles up to 1000 SM traps every 10 seconds.
Infiniband Quality of Service (QoS) is disabled by default in the UFM SM configuration file.
To enable it and benefit from its capabilities, set it to True in the /opt/ufm/files/conf/opensm/opensm.conf file.
The QoS parameters settings should be carefully reviewed before enablement of the qos flag. Especially, sl2vl and VL arbitration mappings should be correctly defined.
For information on Enhanced QoS, see Appendix – SM Activity Report.
You can configure UFM to fail over the UFM subnet manager (SM) to another InfiniBand port on the UFM server connected to the fabric. When failure is detected on an InfiniBand port or link, failover occurs without stopping the UFM Server or other related UFM services, such as mysql, http, DRDB, and so on.
When the UFM Server is connected by two or more InfiniBand ports to the fabric, you can configure UFM Subnet Manager failover to one of the other ports. When failure is detected on an InfiniBand port or link, failover occurs without stopping the UFM Server or other related UFM services, such as mysql, http, DRDB, and so on. This failover process prevents failure in a standalone setup, and preempts failover in a High Availability setup, thereby saving downtime and recovery.
Network Configuration for Failover to IB Port
UFM SM failover is not relevant for Monitoring mode, because in this mode, UFM must be connected to the fabric over ib0 only.
To enable UFM failover to another port:
Configure bonding between the InfiniBand interfaces to be used for SM failover. In an HA setup, the UFM active server and the UFM standby server can be connected differently; but the bond name must be the same on both servers.
Set the value of fabric_interface to the bond name. using the /opt/ufm/scripts/change_fabric_config.sh command as described in Configuring General Settings in gv.cfg. If ufma_interface is configured for IPoIB, set it to the bond name as well. These changes will take effect only after a UFM restart. For example, if bond0 is configured on the ib0 and ib1 interfaces, in gv.cfg, set the parameter fabric_interface to bond0.
If IPoIB is used for UFM Agent, add bond to the ufma_interfaces list as well.
When failure is detected on an InfiniBand port or link, UFM initiates the give-up operation that is defined in the Health configuration file for OpenSM failure. By default:
UFM discovers the other ports in the specified bond and fails over to the first interface that is up (SM failover)
If no interface is up:
In an HA setup, UFM initiates UFM failover
In a standalone setup, UFM does nothing
If the failed link becomes active again, UFM will select this link for the SM only after SM restart.
You can run UFM in HA mode with additional external UFM Subnet Managers. This mode:
Provides additional Subnet Managers for failover.
Enables UFM upgrade without fabric downtime.
While the main UFM Server is running, it synchronizes the configuration files on all the external UFM-SMs. If the main UFM Server fails (or stops for maintenance operations) an External SM takes mastership and manages the fabric until the main UFM Server resume operations.
The External UFM-SM is responsible for identifying a situation where it does not receive configuration updates while the main UFM-SM is still active. In this case, one of the following occurs:
The priority of the SM is reduced to 0 (default)
or
The SM is stopped if configured: stop_disconnected_sm = yes (see configuration section).
Configuration files should be modified only on the main UFM Server and only while the main UFM Server is operational.
UFM HA with Additional External UFMs Installation Prerequisites
Before you install the UFM HA with Additional External UFMs, ensure that the following requirements are met:
Provide a list of remote UFM-SM’s—IP addresses: (one IP per line) /opt/ufm/files/conf/external_sm.conf
Define a ssh trust mode between UFM hosts and hosts that running remote UFM-SM. password-less ssh between UFM HA and every external UFM-SM host (2 x N) according to the /opt/ufm/files/conf/external_sm.conf file
Installing UFM with External UFM-SMs
The main UFM Server can be installed in Standalone or High Availability mode. External UFM-SM requires installation of the entire UFM package in the Standalone mode. For installing External UFM-SM see Installing the UFM Server Software as Standalone. All External UFM-SM must have the same version as the main UFM-SM.
Configuring UFM HA on Main UFM
The following are several configuration settings changed when configuring UFM HA on the Main UFM:
Set management mode
In the /opt/ufm/files/conf/gv.cfg (on the primary UFM) set the management_mode to a mode that allows other SM. UFM will continue to print a warning if another SM runs in the fabric. It is crucial to change the management_mode when UFM is stopped since this setting effects the start / stop behavior.
management_mode = allow_other_sm
List of External UFM-SMs
In the/opt/ufm/files/conf/ external_sm.conf file (on the primary UFM) add the IP addresses of all the External UFM-SMs. IP addresses of UFM HA hosts such as the examples below should not appear in this file.
192.168
.10.11
192.168
.10.12
192.168
.10.13
Parameters to be overwritten when opensm.conf is copied to the External SM
In the /opt/ufm/files/conf/opensm.conf.sync_mask file (on the primary UFM) the parameters below will be overwritten once the opensm.com file is copied to the External SM.
log_flags
0x03
sm_priority14
sminfo_polling_timeout30000
polling_retry_number6
WarningModifying the values of GUID and sm_priority is forbidden.
Configuration of External SM behavior
In the/opt/ufm/files/conf/sm_sync.conf file (on the primary UFM) set the stop_disconnected_sm file as shown below to handle disconnection state (A state in which the Remote UFM-SM does not receive configuration updates while the main UFM-SM is still active)
stop_disconnected_sm = no
If set to "no" (default): The external SM is not stopped even when not synchronized, but the SM priority reduced.
If set to yes: The SM process is stopped and is resumed only after the new configuration files is received.
Time interval to check and synchronize the configuration
conf_update_time =
60
Configuring UFM HA on External UFM SMs
To configure the UFM HA on External UFM SMs set the running mode as SM only on each additional UFM Server in the /opt/ufm/files/conf/gv.cfg (on the external UFM-SM) file:
management_mode = sm_only
Running the UFM Software with External UFM-SM
Run the main UFM Server according to the operating mode (standalone or HA).
Once all the External UFM-SMs are synchronized, start each External UFM-SM by invoking /etc/init.d/ufmd start
The main UFM-SM must run with priority 15 and it must be the only SM with priority 15.If another SM with priority 15 is found during the startup, the UFM will not start
If another SM with a lower priority is found during startup, a warning message with the current master SM details will be printed and the main UFM-SM will start and take mastership
The External UFM-SM runs with priority 14 or moved to priority 0 if in the disconnected state.
Stopping ufmd & ufmha
The safe_stop command forces synchronization of all external UFM-SM configuration, changes the local SM priority to 12 and waits for other remote UFM-SM to take over before stopping the ufmd. If an error is detected during safe_stop, an error message is displayed describing the error and the stop procedure is canceled.
It is recommended using the safe_stop instead of stop to prevent unexpected loss in the fabric.
/etc/init.d/ufmd safe_stop - in the Standalone mode
/etc/init.d/ufmha safe_stop - in HA mode