UFM Communication Requirements

This chapter describes how the UFM server communicates with InfiniBand fabric components.

The UFM Server communicates with clients over IP. The UFM Server can belong to a separate IP network, which can also be behind the firewall.

UFM Server Communication with Clients

image2019-6-16_15-31-23.png

UFM Server Communication with UFM Web UI Client

Communication between the UFM Server and the UFM web UI client is HTTP(s) based. The only requirement is that TCP port 80 (443) must not be blocked.

UFM Server Communication with SNMP Trap Managers

The UFM Server can send SNMP traps to configured SNMP Trap Manager(s). By default, the traps are sent to the standard UDP port 162. However, the user can configure the destination port. If the specified port is blocked, UFM Server traps will not reach their destination.

Summary of UFM Server Communication with Clients

Affected Service

Network

Address / Service / Port

Direction

Web UI Client

Out-of-band management*

HTTP / 80

HTTPS / 443

Bi-directional

SNMP Trap Notification

Out-of-band management*

UDP / 162 (configurable)

UFM Server to SNMP Manager

*If the client machine is connected to the IB fabric, IPoIB can also be used.

UFM Server Communication with InfiniBand Switches

image2019-6-16_15-33-4.png

UFM Server InfiniBand Communication with Switch

The UFM Server must be connected directly to the InfiniBand fabric (via an InfiniBand switch). The UFM Server sends the standard InfiniBand Management Datagrams (MAD) to the switch and receives InfiniBand traps in response.

UFM Server Communication with Switch Management Software (Optional)

The UFM Server auto-negotiates with the switch management software on Mellanox Grid Director switches. The communication is bound to the switch Ethernet management port.

The UFM Server sends a multicast notification to MCast address 224.0.23.172, port 6306 (configurable). The switch management replies to UFM (via port 6306) with a unicast message that contains the switch GUID and IP address. After auto-negotiation, the UFM Server and switch management use XML-based messaging.

The following Device Management tasks are dependent on successful communication as described above:

  • Switch IP discovery

  • FRU Discovery (PSU, FAN, status, temperature)

  • Software and firmware upgrades

The UFM Server manages IB Switch Devices over SNMP (default port 161 – configurable) and / or SSH (default port 22 – configurable).

UFM Server Communication with Externally Managed Switches (Optional)

UFM server uses Ibdiagnet tool to discover chassis information (PSU, FAN, status, temperature) of the externally managed switches.

By monitoring chassis information data, UFM can trigger selected events when module failure occurs or a specific sensor value is above threshold.

Summary of UFM Server Communication with InfiniBand Switches

Affected Service

Network

Address / Service / Port

Direction

InfiniBand Management / Monitoring

InfiniBand

Management Datagrams

Bi-directional

Switch IP Address Discovery (auto-negotiation with switch management software)

Out-of-band management

Multicast 224.0.23.172,

TCP / 6306 (configurable)

Multicast: UFM Server to switch

TCP: Bi-directional

Switch Chassis Management / Monitoring

Out-of-band management

TCP / UDP / 6306 (configurable)

SNMP / 161 (configurable)

SSH / 22 (configurable)

Bi-directional

UFM Server Communication with InfiniBand Hosts

image2019-6-16_15-34-4.png

UFM Server InfiniBand Communication with HCAs

The UFM Server must be connected directly to the InfiniBand fabric. The UFM Server sends the standard InfiniBand Management Datagrams (MADs) to the Host Card Adapters (HCAs) and receives InfiniBand traps.

UFM Server Communication with Host Management (Optional)

The UFM Server auto-negotiates with the UFM Agent on a Host. The UFM Host Agent can be bound to the management Ethernet port or to an IPoIB interface (configurable). The UFM Server sends a multicast notification to MCast address 224.0.23.172, port 6306 (configurable). The UFM Agent replies to UFM (port 6306) with a unicast message that contains the host GUID and IP address. After auto-negotiation, the UFM Server and UFM Agent use XML-based messaging.

The following Device Management tasks are dependent on successful communication as described above:

  • Host IP discovery

  • Host resource discovery and monitoring: CPU, memory, disk

  • Software and firmware upgrades

Warning

UFM 3.6 supports in-band HCA FW upgrade. This requires enabling FW version and PSID discovery over vendor-specific MADs. for more information, see the UFM User Manual.

The UFM Server connects to the hosts over SSH (default port 22 - configurable) with root credentials, which are located in the UFM Server database.

Summary of UFM Server Communication with InfiniBand Hosts

Affected Service

Network

Address / Service / Port

Direction

InfiniBand Management / Monitoring

InfiniBand

Management Datagrams

Bi-directional

Host IP Address Discovery (auto-negotiation with UFM Host Agent)

Out-of-band management or IPoIB

Multicast 224.0.23.172,

TCP / 6306 (configurable)

Multicast: UFM Server to UFM Agent

TCP: Bi-directional

Host OS Management / Monitoring

Out-of-band management or IPoIB

TCP / UDP / 6306 (configurable)

SSH / 22 (configurable)

Bi-directional

UFM Server HA Active—Standby Communication

image2019-6-16_15-35-52.png

UFM Server HA Active—Standby Communication

UFM Active — Standby communication enables two services: heartbeat and DRBD.

  • heartbeat is used for auto-negotiation and keep-alive messaging between active and standby servers. heartbeat uses port 694 (udp).

  • DRBD is used for low-level data (disk) synchronization between active and standby servers. DRBD uses port 8888 (tcp).

Affected Service

Network

Address / Service / Port

Direction

UFM HA heartbeat

Out-of-band management*

UDP / 694

Bi-directional

UFM HA DRBD

Out-of-band management*

TCP / 8888

Bi-directional

*An IPoIB network can be used for HA, but this is not recommended, since any InfiniBand failure might cause split brain and lack of synchronization between the active and standby servers.

© Copyright 2023, NVIDIA. Last updated on Sep 5, 2023.