Setting up NVIDIA SHARP Environment

NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.6.0

NVIDIA SHARP binary distribution is available as part of HPC-X, MLNX_OFED and UFM packages (among SHARP binaries, UFM includes Aggregation Manager (AM) only).

Prior to installing and using NVIDIA SHARP, make sure the following requirements are met.

  • Run Aggregation Manager using a "root user" as trusted entities.

  • Make sure onboard Subnet Manager is disabled in the managed switches. (Aggregation Manager is a central entity running on a dedicated server with a master Subnet Manager. This dedicated server cannot serve as a compute node.

  • Configure TCP/IP before running NVIDIA SHARP and Aggregation Manager communicate over TCP/IP.

  • Run NVIDIA Switch-IB 2/NVIDIA Quantum/NVIDIA Quantum-2 switches with the supported firmware versions as specified in the Prerequisites section in the Release Notes (use ibdiagnet utility to check the installed firmware version on the switches).

  • Enabled IPoIB interface in compute servers in order to enable using UD multicast for result distribution in SHARP.

  • Make sure SHARP Aggregation Manager out-of-the-box subnets are configured with SM using the following routing engines:

    • Tree based topologies: updn, ar_updn, ftree, ar_ftree

    • DragonFly+ topology: dfp

    • Hypercube topologies: dor routing engine with dor_hyper_cube_mode enabled

When using HPC-X package, please refer to HPC-X User Manual for installation and configuration procedures.
This deployment guide includes examples on the environment variables HPCX_SHARP_DIR and OMPI_HOME, and assumes that HPC-X installation is in a shared folder accessible from all compute nodes.
To download the HPC-X packages, go here.

When using MLNX_OFED distribution, the HPCX_SHARP_DIR environment variable has to be set to redirect to SHARP installation directory (default location: /opt/mellanox/sharp), and OMPI_HOME environment variable to the MPI installation directory.

To download MLNX_OFED packages, go here.

When using Aggregation Manager from UFM, NVIDIA SHARP support has to be enabled in UFM. For further information, refer to the UFM User Manual.

Warning

UFM package includes only SHARP Aggregation Manager. Other NVIDIA SHARP components are not available through UFM and should be installed from either HPC-X or MLNX_OFED packages.

Device

Capabilities and limitations

NVIDIA Quantum

  • Supports both SHARP low latency and streaming aggregation operations

  • Supports up to 126 aggregation trees in the subnet (63 low latency trees, and 63 streaming aggregation trees)

Note: The number of SHARP streaming aggregation operations is limited to one active tree per switch

NVIDIA Quantum-2

  • Supports both SHARP low latency and streaming aggregation operations

  • Supports up to 1023 aggregation trees in the subnet (511 low latency trees, and 511 streaming aggregation trees)

Note: Multiple SHARP streaming aggregation operations can be operated in parallel by a single Quantum-2 switch. The limit is one active tree per port

ConnectX-5

Supports SHARP low latency operation only

ConnectX-6 and above

Supports both SHARP low latency and streaming aggregation operations

© Copyright 2023, NVIDIA. Last updated on Feb 8, 2024.