NVIDIA UFM Enterprise User Manual v6.23.1

Appendix – NVIDIA SHARP Integration

NVIDIA SHARP is a technology that improves the performance of MPI operation by offloading collective operations from the CPU and dispatching to the switch network, and eliminating the need to send data multiple times between endpoints. This approach decreases the amount of data traversing the network as aggregation nodes are reached, and dramatically reduces the MPI operation time.

NVIDIA SHARP software is based on:

  • Hardware capabilities in Quantum-1 and later generations

  • Hierarchical communication algorithms (HCOL) library into which NVIDIA SHARP capabilities are integrated1

  • NVIDIA SHARP Aggregation Manager, running on UFM

1. These components should be installed from HPCX or MLNX_OFED packages on compute nodes. Installation details can be found in SHARP Deployment Guide.

Aggregation Manager (AM) is a system management component used for system level configuration and management of the switch-based reduction capabilities. It is used to set up the NVIDIA SHARP trees, and to manage the use of these entities.

AM is responsible for:

  • NVIDIA SHARP resource discovery

  • Creating topology aware NVIDIA SHARP trees

  • Configuring NVIDIA SHARP switch capabilities

  • Managing NVIDIA SHARP resources

  • Assigning NVIDIA SHARP resource upon request

  • Freeing NVIDIA SHARP resources upon job termination

AM is configured by a topology file created by Subnet Manager (SM): subnet.lst. The file includes information about switches and HCAs.

NVIDIA SHARP AM Configuration

By default, when running NVIDIA SHARP AM by UFM, there is no need to run further configuration. To modify the configuration of NVIDIA SHAPR AM, you can edit the following NVIDIA SHARP AM configuration file:/opt/ufm/files/conf/sharp/sharp_am.cfg.

image2019-6-17_15-10-28-version-1-modificationdate-1762688274623-api-v2.png

To run NVIDIA SHARP AM within UFM, do the following:

  1. Enable NVIDIA SHARP AM in conf/gv.cfg UFM configuration file by running the command "ib sharp enable" or by setting the sharp_enabled parameter to true (it is true by default):

    Copy
    Copied!
                

    [Sharp] sharp_enabled = true

  2. (Optional) Enable NVIDIA SHARP allocation in conf/gv.cfg UFM configuration file by setting the sharp_allocation_enabled parameter to true (it is false by default):

    Copy
    Copied!
                

    [Sharp] sharp_allocation_enabled = true

    Note

    For further information about SHARP allocation methods, refer to the NVIDIA SHARP Documentation.

If NVIDIA SHARP AM is enabled, running UFM will run NVIDIA SHARP AM, and stopping UFM will stop NVIDIA SHARP AM.

To

Procedure_Heading_Icon-version-1-modificationdate-1762688274933-api-v2.PNG start UFM with NVIDIA SHARP AM (enabled):

Copy
Copied!
            

/etc/init.d/ufmd start

The same command applies to HA, using /etc/init.d/ufmha.

Upon startup of UFM or SHARP Aggregation Manager, UFM will resend all existing persistent allocation to SHARP AM.

Procedure_Heading_Icon-version-1-modificationdate-1762688274933-api-v2.PNG

To stop UFM with NVIDIA SHARP AM (enabled):

Copy
Copied!
            

/etc/init.d/ufmd stop

Procedure_Heading_Icon-version-1-modificationdate-1762688274933-api-v2.PNG

To stop only NVIDIA SHARP AM while leaving UFM running:

Copy
Copied!
            

/etc/init.d/ufmd sharp_stop

Procedure_Heading_Icon-version-1-modificationdate-1762688274933-api-v2.PNG

To start only NVIDIA SHARP AM while UFM is already running:

Copy
Copied!
            

/etc/init.d/ufmd sharp_start

Upon startup of UFM or SHARP Aggregation Manager, UFM will resend all existing persistent allocation to SHARP AM.

Procedure_Heading_Icon-version-1-modificationdate-1762688274933-api-v2.PNG
To restart only NVIDIA SHARP AM while UFM is running:

Copy
Copied!
            

/etc/init.d/ufmd sharp_restart

Upon startup of UFM or SHARP Aggregation Manager, UFM will resend all existing persistent allocation to SHARP AM.

Procedure_Heading_Icon-version-1-modificationdate-1762688274933-api-v2.PNG
To display NVIDIA SHARP AM status while UFM is running:

Copy
Copied!
            

/etc/init.d/ufmd sharp_status

UFMHealth monitors SHARP AM and verifies that NVIDIA SHARP AM is always running. When UFMHealth detects that NVIDIA SHARP AM is down, it will try to re-start it, and will trigger an event to the UFM to notify it that NVIDIA SHARP AM is down.

In case of a UFM HA failover or takeover, NVIDIA SHARP AM will be started on the new master node using the same configuration that was used prior to the failover/takeover.

NVIDIA SHARP AM log file (sharp_am.log) at /opt/ufm/files/log.

NVIDIA SHARP AM log files are rotated by UFM logrotate mechanism.

NVIDIA SHARP AM version can be found at/opt/ufm/sharp/share/doc/SHARP_VERSION.

© Copyright 2025, NVIDIA. Last updated on Nov 10, 2025