NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) v2.6.1
1.0

Setting up NVIDIA SHARP Environment

NVIDIA SHARP binary distribution is available as part of HPC-X, MLNX_OFED and UFM packages (UFM includes the Aggregation Manager only).

Prior to installing and using NVIDIA SHARP, make sure the following requirements are met.

  • Run Aggregation Manager and NVIDIA SHARP daemons using a "root user" as trusted entities.

  • Make sure onboard Subnet Manager is disabled in the managed switches. (Aggregation Manager is a central entity running on a dedicated server with a master Subnet Manager. This dedicated server cannot serve as a compute node and host an NVIDIA SHARP daemon).

  • Configure TCP/IP before running Mellanox SHARP as NVIDIA SHARP daemons and Aggregation Manager communicate over TCP/IP.

  • Run NVIDIA Switch-IB 2 and NVIDIA Quantum switches with the supported firmware versions as specified in the Prerequisites section in the Release Notes (use ibdiagnet utility to check the installed firmware version on the switches).

  • Enabled IPoIB interface in compute servers in order to enable using UD multicast for result distribution in SHARP.

  • Make sure SHARP Aggregation Manager out-of-the-box subnets are configured with OpenSM using the following routing engines:

    • Tree based topologies: updn, ar_updn, ftree, ar_ftree

    • DragonFly+ topology: dfp

    • Hypercube topologies: dor routing engine with dor_hyper_cube_mode enabled

  • SHARP daemon (sharpd) must be installed on compute nodes participating in the job.

When using HPC-X package, please refer to HPC-X User Manual for installation and configuration procedures.
This deployment guide includes examples on the environment variables HPCX_SHARP_DIR and OMPI_HOME, and assumes that HPC-X installation is in a shared folder accessible from all compute nodes.
To download the HPC-X packages, go here.

When using MLNX_OFED distribution, the HPCX_SHARP_DIR environment variable has to be set to redirect to SHARP installation directory (default location: /opt/mellanox/sharp), and OMPI_HOME environment variable to the MPI installation directory.

To download MLNX_OFED packages, go here.

When using Aggregation Manager from UFM, NVIDIA SHARP support has to be enabled in UFM. For further information, refer to the UFM User Manual.

Warning

UFM package includes only SHARP Aggregation Manager. Other NVIDIA SHARP components are not available through UFM and should be installed from either HPC-X or MLNX_OFED packages.

Device

Capabilities and limitations

NVIDIA Switch IB-2

Supports SHARP low latency operation only

NVIDIA Quantum

  • Supports both SHARP low latency and streaming aggregation operations

  • Supports up to 126 aggregation trees in the subnet (63 low latency trees, and 63 streaming aggregation trees)

Note: Number of streaming aggregation trees is limited per switch

ConnectX-5

Supports SHARP low latency operation only

ConnectX-6

Supports both SHARP low latency and streaming aggregation operations

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.