NVIDIA Docs Hub Homepage NVIDIA Networking Accelerator Software NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) Rev 3.9.0 Using NVIDIA SHARP with UCC

Using NVIDIA SHARP with UCC

NVIDIA SHARP library is integrated into the Unified Collective Communication ( UCC) library to offload collective operations in MPI applications.

The following flags should be used in the environment to enable the NVIDIA SHARP protocol in the UCC middleware.

UCC Library Flags

The following HCOLL flags can be used when running NVIDIA SHARP collective with mpirun utility.

Flag	Description
OMPI_UCC_CL_BASIC_TLS	Currently, the UCC supported/compiled TLs list is: ucp,cuda,nccl,sharp Note: By default, "sharp" is disabled. Possible values: ucp,sharp: Adding "sharp" TL along with "ucp" ucp,cuda,sharp: Adding "sharp" TL along with "ucp" and "cuda"
UCC_TL_SHARP_MIN_TEAM_SIZE	Minimal UCC team size for which sharp can be used. Default: 2
UCC_TL_SHARP_DEVICES	List of comma-separated HCAs to be used with SHARP TL.
UCC_TL_SHARP_UPROGRESS_NUM_POLLS	Number of unsuccessful polling loops in libsharp coll for blocking collective wait before calling user progress (UCC, OMPI). Default: 999
UCC_TL_SHARP_ENABLE_LAZY_GROUP_ALLOC	Enables lazy group resource allocation Default: N

Example of Allreduce with Default Settings with SHARP Enable

Copy
Copied!

            
            $mpirun  --bind-to core --map-by node -np 8  -x LD_LIBRARY_PATH -x SHARP_COLL_LOG_LEVEL=3 -x SHARP_COLL_ENABLE_SAT=1 -x   OMPI_UCC_CL_BASIC_TLS=ucp,sharp  $HPCX_OSU_DIR/osu_allreduce
[elsa01:0:1852929 - context.c:687][2024-10-28 22:24:47] INFO job (ID: 139896884886908) resource request quota: ( osts:0 user_data_per_ost:0 max_groups:0 max_qps:1 max_group_channels:1, num_trees:1)
[elsa01:0:1852929 - context.c:889][2024-10-28 22:24:47] INFO sharp_job_id:3    resv_key: tree_type:LLT tree_idx:0  treeID:2 caps:0x66 quota:(osts:23 user_data_per_ost:1024 max_groups:23 max_qps:1 max_group_channels:1)
[elsa01:0:1852929 - context.c:896][2024-10-28 22:24:47] INFO sharp_job_id:3    tree_type:SAT tree_idx:1  treeID:514 caps:0x76
[elsa01:0:1852929 - comm.c:413][2024-10-28 22:24:47] INFO [group#:0] job_id:3 group id:0 tree idx:0 tree_type:LLT rail_idx:0 group size:6 quota: (osts:8 user_data_per_ost:1024) mgid: (subnet prefix:0x0 interface id:0x0) mlid:0
[elsa01:0:1852929 - comm.c:413][2024-10-28 22:24:47] INFO [group#:1] job_id:3 group id:0 tree idx:1 tree_type:SAT rail_idx:0 group size:6 quota: (osts:64 user_data_per_ost:0) mgid: (subnet prefix:0x0 interface id:0x0) mlid:0
 
 
 
 
# OSU MPI Allreduce Latency Test v7.4
# Datatype: MPI_FLOAT.
# Size       Avg Latency(us)
4                       4.95
8                       4.80
16                      5.08
32                      5.10
64                      5.62
128                     5.74
256                     7.27
512                     7.85
1024                    8.28
2048                    8.59
4096                   14.64
8192                   11.11
16384                  10.31
32768                  11.42
65536                  13.84
131072                 20.07
262144                 37.93
524288                 33.89
1048576                64.64