NVIDIA SHARP software, i.e. NVIDIA SHARP daemon (sharpd
) should be executed on every compute node, and the Aggregation Manager daemon (sharp_am
) should be executed on a dedicated server along with the Subnet Manager.
This section describes how to install Aggregation Manager and SHARP daemon as services on management and compute nodes in the fabric using NVIDIA SHARP daemon script provided with NVIDIA SHARP.
Installing Aggregation Manager as a service is required when used from the HPC-X or from MLNX_OFED packages.
Installing NVIDIA SHARP daemon as a service is required when used from the HPC-X and MLNX_OFED packages on systems that do not support SystemD with socket-based activation support.
SHARP Daemons Installation Script
In order to install/remove NVIDIA SHARP daemons on servers, use sharp_daemons_setup.sh script provided with the NVIDIA SHARP package. For example:
$HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh Usage: sharp_daemons_setup.sh (-s | -r) [-p SHARP location dir] -d <sharpd | sharp_am> [-m] -s - Setup SHARP daemon -r - Remove SHARP daemon -p - Path to alternative SHARP location dir -d - Daemon name (sharpd or sharp_am) -b - Enable socket based activation of the service
Socket-based activation is only valid on systems with SystemD support.
Registering sharp_am as a Service on the Subnet Manager Node
Run the following as root:
# $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharp_am
Daemon's log location is: /var/log/sharp_am.log
Set the "run level".
Start sharp_am as root.
# service sharp_am start
Registering sharpd as a Service on Compute Nodes
The following registration procedure requires pdsh package to be installed. In case the package is absent, use another parallel execution tool and consider the commands below as an example.
Run the following as root.
# pdsh -w <hostlist> $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharpd
Daemon's log location:
/var/log/sharpd.log
- Set the "run level".
Start sharpd daemons as root.
# pdsh –w <hostlist> service sharpd start
Registering sharpd as Socket-Based Activation Service on Compute Nodes
Socket-based activation installs sharpd as a daemon that is automatically activated when an application tries to communicate with sharpd.
sharpd from MLNX_OFED is automatically installed with socket-based activation on systems with SystemD.
The following registration procedure requires pdsh package to be installed. In case the package is absent, use another parallel execution tool and consider the command below as an example.
Run the following as root:
# pdsh -w <hostlist> $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -s -d sharpd -b
Daemon's log location: /var/log/sharpd.log
Removing Daemons
To remove sharp_am, run the following on the AM host:
# $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -r -d sharp_am
To remove sharpd, run:
# pdsh -w <hostlist> $HPCX_SHARP_DIR/sbin/sharp_daemons_setup.sh -r -d sharpd
Upgrading Daemons
Upgrading daemons requires their removal and re-registration instructed in this section.