Using NVIDIA SHARP with UCC
NVIDIA SHARP library is integrated into the Unified Collective Communication ( UCC) library to offload collective operations in MPI applications.
The following flags should be used in the environment to enable the NVIDIA SHARP protocol in the UCC middleware.
The following HCOLL flags can be used when running NVIDIA SHARP collective with mpirun utility.
Flag | Description |
OMPI_UCC_CL_BASIC_TLS | Currently, the UCC supported/compiled TLs list is: ucp,cuda,nccl,sharp Note: By default, "sharp" is disabled. Possible values: ucp,sharp: Adding "sharp" TL along with "ucp" ucp,cuda,sharp: Adding "sharp" TL along with "ucp" and "cuda" |
UCC_TL_SHARP_MIN_TEAM_SIZE | Minimal UCC team size for which sharp can be used. Default: 2 |
UCC_TL_SHARP_DEVICES | List of comma-separated HCAs to be used with SHARP TL. |
UCC_TL_SHARP_UPROGRESS_NUM_POLLS | Number of unsuccessful polling loops in libsharp coll for blocking collective wait before calling user progress (UCC, OMPI). Default: 999 |
UCC_TL_SHARP_ENABLE_LAZY_GROUP_ALLOC | Enables lazy group resource allocation Default: N |
Example of Allreduce with Default Settings with SHARP Enable
$mpirun --bind-to core --map-by node -np 8 -x LD_LIBRARY_PATH -x SHARP_COLL_LOG_LEVEL=3 -x SHARP_COLL_ENABLE_SAT=1 -x OMPI_UCC_CL_BASIC_TLS=ucp,sharp $HPCX_OSU_DIR/osu_allreduce
[elsa01:0:1852929 - context.c:687][2024-10-28 22:24:47] INFO job (ID: 139896884886908) resource request quota: ( osts:0 user_data_per_ost:0 max_groups:0 max_qps:1 max_group_channels:1, num_trees:1)
[elsa01:0:1852929 - context.c:889][2024-10-28 22:24:47] INFO sharp_job_id:3 resv_key: tree_type:LLT tree_idx:0 treeID:2 caps:0x66 quota:(osts:23 user_data_per_ost:1024 max_groups:23 max_qps:1 max_group_channels:1)
[elsa01:0:1852929 - context.c:896][2024-10-28 22:24:47] INFO sharp_job_id:3 tree_type:SAT tree_idx:1 treeID:514 caps:0x76
[elsa01:0:1852929 - comm.c:413][2024-10-28 22:24:47] INFO [group#:0] job_id:3 group id:0 tree idx:0 tree_type:LLT rail_idx:0 group size:6 quota: (osts:8 user_data_per_ost:1024) mgid: (subnet prefix:0x0 interface id:0x0) mlid:0
[elsa01:0:1852929 - comm.c:413][2024-10-28 22:24:47] INFO [group#:1] job_id:3 group id:0 tree idx:1 tree_type:SAT rail_idx:0 group size:6 quota: (osts:64 user_data_per_ost:0) mgid: (subnet prefix:0x0 interface id:0x0) mlid:0
# OSU MPI Allreduce Latency Test v7.4
# Datatype: MPI_FLOAT.
# Size Avg Latency(us)
4 4.95
8 4.80
16 5.08
32 5.10
64 5.62
128 5.74
256 7.27
512 7.85
1024 8.28
2048 8.59
4096 14.64
8192 11.11
16384 10.31
32768 11.42
65536 13.84
131072 20.07
262144 37.93
524288 33.89
1048576 64.64