image image image image image

On This Page

Aggregation Trees Diagnostics

Run ibdiagnet utility with SHARP diagnostics option. 

$ibdiagnet --sharp --fabric_summary

Check fabric summary table in ibdiagnet output for the number of identified aggregation nodes. For example: 

Fabric Summary

Total Nodes            : 24
IB Switches            : 4
IB Channel Adapters    : 16
IB Aggregation Nodes   : 4
IB Routers             : 0

Total number of links  : 24
Links at 4x50          : 24

Master SM: Port=1 LID=1 GUID=0x248a070300a28c4d devid=4119 Priority:0 Node_Type=CA Node_Description=pnemo HCA-2
Standby SM : No Standby SM

Check summary table in ibdiagnet output for errors in SHARP diagnostics stage. For example: 

Summary
-I- Stage                    Warnings   Errors     Comment  
-I- Discovery                0          0         
-I- Lids Check               0          0         
-I- Links Check              0          0         
-I- Subnet Manager           0          0         
-I- Port Counters            0          0         
-I- Nodes Information        0          0         
-I- Speed / Width checks     0          0        
-I- Alias GUIDs              0          0         
-I- Virtualization           0          0         
-I- Partition Keys           0          0         
-I- Temperature Sensing      0          0         
-I- SHARP                    0          0  

Check in SHARP diagnostics output file (/var/tmp/ibdiagnet2/ibdiagnet2.sharp) that SHARP aggregation trees are configured in the subnet.

For example: count number of configured aggregation trees constructed by Aggregation Manager using grep command: 

$cat /var/tmp/ibdiagnet2/ibdiagnet2.sharp | grep -c TreeID 
126


NVIDIA SHARP Hello

NVIDIA SHARP distribution provides sharp_hello test utility for testing SHARP's end-to-end functionality on a compute node. It creates a single SHARP job and sends a barrier request to SHARP Aggregation node.

Help

$sharp_hello -h
usage:  sharp_hello <-d | --ib_dev> <device> [OPTIONS]
OPTIONS:
        [-d | --ib_dev]      - HCA to use
        [-v | --verbose]     - libsharp coll verbosity level(default:2)
                                  Levels: (0-fatal 1-err 2-warn 3-info 4-debug 5-trace)
        [-V | --version]     - print program version
        [-h | --help]        - show this usage


Example #1 

$ sharp_hello -d mlx5_0:1 -v 3
[thor001:0:15042 - context.c:581] INFO job (ID: 12159720107860141553) resource request quota: ( osts:0 user_data_per_ost:0 max_groups:0 max_qps:1 max_group_channels:1, num_trees:1)
[thor001:0:15042 - context.c:751] INFO tree_info: type:LLT tree idx:0 treeID:0x0 caps:0x6 quota: ( osts:167 user_data_per_ost:1024 max_groups:167 max_qps:1 max_group_channels:1)
[thor001:0:15042 - comm.c:393] INFO [group#:0] group id:a tree idx:0 tree_type:LLT rail_idx:0 group size:1 quota: (osts:2 user_data_per_ost:1024) mgid: (subnet prefix:0xff12a01bfe800000 interface id:0x3f020000000a) mlid:c007
Test Passed.

Example #2 

$ SHARP_COLL_ENABLE_SAT=1 sharp_hello -d mlx5_0:1 -v 3

[swx-dgx01:0:59023 - context.c:581] INFO job (ID: 15134963379905498623) resource request quota: ( osts:0 user_data_per_ost:0 max_groups:0 max_qps:1 max_group_channels:1, num_trees:1)
[swx-dgx01:0:59023 - context.c:751] INFO tree_info: type:LLT tree idx:0 treeID:0x0 caps:0x6 quota: ( osts:167 user_data_per_ost:1024 max_groups:167 max_qps:1 max_group_channels:1)
[swx-dgx01:0:59023 - context.c:755] INFO tree_info: type:SAT tree idx:1 treeID:0x3f caps:0x16
[swx-dgx01:0:59023 - comm.c:393] INFO [group#:0] group id:3c tree idx:0 tree_type:LLT rail_idx:0 group size:1 quota: (osts:2 user_data_per_ost:1024) mgid: (subnet prefix:0xff12a01bfe800000 interface id:0xd6060000003c) mlid:c004
[swx-dgx01:0:59023 - comm.c:393] INFO [group#:1] group id:3c tree idx:1 tree_type:SAT rail_idx:0 group size:1 quota: (osts:64 user_data_per_ost:0) mgid: (subnet prefix:0x0 interface id:0x0) mlid:0
Test Passed

NVIDIA SHARP Benchmark

NVIDIA SHARP distribution provides a source code for the benchmark to test native SHARP low-level performance for allreduce and barrier operations.

Source code:

$module load hpcx
$HPCX_SHARP_DIR/share/sharp/examples/mpi/coll/

Build and run instructions:

$module load hpcx
$HPCX_SHARP_DIR/opt/Mellanox/sharp/share/sharp/examples/mpi/coll/README

NVIDIA SHARP Benchmark Script

NVIDIA SHARP distribution provides a test script which executes OSU (allreduce, barrier) benchmark running with and without NVIDIA SHARP. To run the NVIDIA SHARP benchmark script, the following packages are required to be installed.

  • ssh
  • pdsh
  • environment-modules.x86_64

You can find this script at $HPCX_SHARP_DIR/sbin/sharp_benchmark.sh after loading the HPC-X module. This script should be launched from a host running SM and Aggregation Manager. It receives a list of compute nodes from SLURM allocation or from “hostlist” environment variable. “hostlist” is a comma-separated list which requires hca environment variables to be supplied. It runs OSU allreduce and barrier benchmarks with and without NVIDIA SHARP.

Help 

This script includes OSU benchmarks for MPI_Allreduce and MPI_Barrier blocking collective operations.
Both benchmarks run with and without using SHARP technology.

Usage: sharp_benchmark.sh [-t] [-d] [-h] [-f]
        -t - tests list (e.g. sharp:barrier)
        -d - dry run
        -h - display this help and exit
        -f - supress error in prerequsites checking

Configuration:
 Runtime:
  sharp_ppn - number of processes per compute node (default 1)
  sharp_ib_dev - Infiniband device used for communication. Format <device_name>:<port_number>.
                 For example: sharp_ib_dev="mlx5_0:1"
                 This is a mandatory parameter. If it's absent, sharp_benchmark.sh tries to use the first active device on local machine
  sharp_groups_num - number of groups per communicator. (default is the number of devices in sharp_ib_dev)
  sharp_num_trees - number of trees to request. (default num tress based on the #rails and #channels)
  sharp_job_members_type - type of sharp job members list. (default is SHARP_MEMBER_LIST_PROCESSES_DATA)
  sharp_hostlist - hostnames of compute nodes used in the benchmark. The list may include normal host names,
                   a range of hosts in hostlist format. Under SLURM allocation, SLURM_NODELIST is used as a default
  sharp_test_iters - number of test iterations (default 10000)
  sharp_test_skip_iters - number of test iterations (default 1000)
  sharp_test_max_data - max data size used for testing (default and maximum 4096)
 Environment:
  SHARP_INI_FILE - takes configuration from given file instead of /labhome/danielk/.sharp_benchmark.ini
  SHARP_TMP_DIR - store temporary files here instead of /tmp
  HCOLL_INSTALL - use specified hcoll install instead from hpcx

Examples:
  sharp_ib_dev="mlx5_0:1" sharp_benchmark.sh  # run using "mlx5_0:1" IB port. Rest parameters are loaded from /labhome/danielk/.sharp_benchmark.ini or default
  SHARP_INI_FILE=~/benchmark.ini  sharp_benchmark.sh # Override default configuration file
  SHARP_INI_FILE=~/benchmark.ini  sharp_hostlist=ajna0[2-3]  sharp_ib_dev="mlx5_0:1" sharp_benchmark.sh # Use specific host list
  sharp_ppn=1 sharp_hostlist=ajna0[1-8] sharp_ib_dev="mlx5_0:1" sharp_benchmark.sh  -d # Print commands without actual run

Dependencies:
  This script uses "python-hostlist" package. Visit https://www.nsc.liu.se/~kent/python-hostlist/ for details