Using NVIDIA SHARP with Open MPI
NVIDIA SHARP library is integrated into HCOLL collective library to offload collective operations in MPI applications.
The following basic flags should be used in environment to enable NVIDIA SHARP protocol in the HCOLL middleware. For the rest of flags, please refer to NVIDIA SHARP Release Notes.
The following HCOLL flags can be used when running NVIDIA SHARP collective with mpirun utility.
| Flag | Description | 
| HCOLL_ENABLE_SHARP | Default: 0 Possible values: 
 | 
| SHARP_COLL_LOG_LEVEL | Default: 2 NVIDIA SHARP coll logging level. Messages with a higher or equal level to the selected will be printed. Possible values: 
 | 
| HCOLL_SHARP_NP | Number of nodes (node leaders) threshold in the communicator to create NVIDIA SHARP group and use NVIDIA SHARP collectives. Default: 4 | 
| HCOLL_SHARP_UPROGRESS_NUM_POLLS | Number of unsuccessful polling loops in libsharp coll for blocking collective wait before calling user progress (HCOLL, OMPI). Default: 999 | 
| HCOLL_ALLREDUCE_SHARP_MAX (or) HCOLL_BCOL_P2P_ALLREDUCE_SHARP_MAX | Maximum allreduce size run through NVIDIA SHARP. A message size greater than the above the specified value by this parameter will fall back to non-SHARP-based algorithms (multicast based or non-multicast based). The threshold is calculated based on the group resources. Threshold = #OSTS * Payload_per_ost Default: Dynamic | 
Example of Allreduce with Default Settings with SHARP Enable
            
            $ mpirun -np 128 -map-by ppr:1:node  -x UCX_TLS=dc,shm,self  -x HCOLL_ENABLE_SHARP=3 -x SHARP_COLL_ENABLE_SAT=1  $HPCX_OSU_DIR/osu_allreduce
# OSU MPI Allreduce Latency Test v5.6.2
# Size       Avg Latency(us)
4                       7.44
8                       8.43
16                      7.81
32                      8.55
64                      9.06
128                     8.44
256                     9.41
512                     8.50
1024                    9.03
2048                   10.43
4096                   42.61
8192                   37.93
16384                  15.48
32768                  16.26
65536                  17.62
131072                 23.09
262144                 33.90
524288                 58.98
1048576               101.53