Using NVIDIA SHARP with UCC
NVIDIA SHARP library is integrated into the Unified Collective Communication ( UCC) library to offload collective operations in MPI applications.
The following flags should be used in the environment to enable the NVIDIA SHARP protocol in the UCC middleware.
The following HCOLL flags can be used when running NVIDIA SHARP collective with mpirun utility.
Flag |
Description |
OMPI_UCC_CL_BASIC_TLS |
Currently, the UCC supported/compiled TLs list is: ucp,cuda,nccl,sharp Note: By default, "sharp" is disabled. Possible values: ucp,sharp: Adding "sharp" TL along with "ucp" ucp,cuda,sharp: Adding "sharp" TL along with "ucp" and "cuda" |
UCC_TL_SHARP_MIN_TEAM_SIZE |
Minimal UCC team size for which sharp can be used. Default: 2 |
UCC_TL_SHARP_DEVICES |
List of comma-separated HCAs to be used with SHARP TL. |
UCC_TL_SHARP_UPROGRESS_NUM_POLLS |
Number of unsuccessful polling loops in libsharp coll for blocking collective wait before calling user progress (UCC, OMPI). Default: 999 |
UCC_TL_SHARP_ENABLE_LAZY_GROUP_ALLOC |
Enables lazy group resource allocation Default: N |
Example of Allreduce with Default Settings with SHARP Enable
$mpirun --bind-to core --map-by node -np 8
-x LD_LIBRARY_PATH -x SHARP_COLL_LOG_LEVEL=3
-x SHARP_COLL_ENABLE_SAT=1
-x OMPI_UCC_CL_BASIC_TLS=ucp,sharp $HPCX_OSU_DIR/osu_allreduce
[elsa01:0
:1852929
- context.c:687
][2024
-10
-28
22
:24
:47
] INFO job (ID: 139896884886908
) resource request quota: ( osts:0
user_data_per_ost:0
max_groups:0
max_qps:1
max_group_channels:1
, num_trees:1
)
[elsa01:0
:1852929
- context.c:889
][2024
-10
-28
22
:24
:47
] INFO sharp_job_id:3
resv_key: tree_type:LLT tree_idx:0
treeID:2
caps:0x66
quota:(osts:23
user_data_per_ost:1024
max_groups:23
max_qps:1
max_group_channels:1
)
[elsa01:0
:1852929
- context.c:896
][2024
-10
-28
22
:24
:47
] INFO sharp_job_id:3
tree_type:SAT tree_idx:1
treeID:514
caps:0x76
[elsa01:0
:1852929
- comm.c:413
][2024
-10
-28
22
:24
:47
] INFO [group#:0
] job_id:3
group id:0
tree idx:0
tree_type:LLT rail_idx:0
group size:6
quota: (osts:8
user_data_per_ost:1024
) mgid: (subnet prefix:0x0
interface
id:0x0
) mlid:0
[elsa01:0
:1852929
- comm.c:413
][2024
-10
-28
22
:24
:47
] INFO [group#:1
] job_id:3
group id:0
tree idx:1
tree_type:SAT rail_idx:0
group size:6
quota: (osts:64
user_data_per_ost:0
) mgid: (subnet prefix:0x0
interface
id:0x0
) mlid:0
# OSU MPI Allreduce Latency Test v7.4
# Datatype: MPI_FLOAT.
# Size Avg Latency(us)
4
4.95
8
4.80
16
5.08
32
5.10
64
5.62
128
5.74
256
7.27
512
7.85
1024
8.28
2048
8.59
4096
14.64
8192
11.11
16384
10.31
32768
11.42
65536
13.84
131072
20.07
262144
37.93
524288
33.89
1048576
64.64