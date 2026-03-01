Using NVIDIA SHARP with UCC
NVIDIA SHARP library is integrated into the Unified Collective Communication ( UCC) library to offload collective operations in MPI applications.
The following flags should be used in the environment to enable the NVIDIA SHARP protocol in the UCC middleware.
The following HCOLL flags can be used when running NVIDIA SHARP collective with mpirun utility.
Flag
Description
OMPI_UCC_CL_BASIC_TLS
Currently, the UCC supported/compiled TLs list is: ucp,cuda,nccl,sharp
Note: By default, "sharp" is disabled.
Possible values:
ucp,sharp: Adding "sharp" TL along with "ucp"
ucp,cuda,sharp: Adding "sharp" TL along with "ucp" and "cuda"
UCC_TL_SHARP_MIN_TEAM_SIZE
Minimal UCC team size for which sharp can be used.
Default: 2
UCC_TL_SHARP_DEVICES
List of comma-separated HCAs to be used with SHARP TL.
UCC_TL_SHARP_UPROGRESS_NUM_POLLS
Number of unsuccessful polling loops in libsharp coll for blocking collective wait before calling user progress (UCC, OMPI).
Default: 999
UCC_TL_SHARP_ENABLE_LAZY_GROUP_ALLOC
Enables lazy group resource allocation
Default: N
Example of Allreduce with Default Settings with SHARP Enable
$mpirun --bind-to core --map-by node -np
8 -x LD_LIBRARY_PATH -x SHARP_COLL_LOG_LEVEL=
3 -x SHARP_COLL_ENABLE_SAT=
1 -x OMPI_UCC_CL_BASIC_TLS=ucp,sharp $HPCX_OSU_DIR/osu_allreduce
[elsa01:
0:
1852929 - context.c:
687][
2024-
10-
28
22:
24:
47] INFO job (ID:
139896884886908) resource request quota: ( osts:
0 user_data_per_ost:
0 max_groups:
0 max_qps:
1 max_group_channels:
1, num_trees:
1)
[elsa01:
0:
1852929 - context.c:
889][
2024-
10-
28
22:
24:
47] INFO sharp_job_id:
3 resv_key: tree_type:LLT tree_idx:
0 treeID:
2 caps:
0x66 quota:(osts:
23 user_data_per_ost:
1024 max_groups:
23 max_qps:
1 max_group_channels:
1)
[elsa01:
0:
1852929 - context.c:
896][
2024-
10-
28
22:
24:
47] INFO sharp_job_id:
3 tree_type:SAT tree_idx:
1 treeID:
514 caps:
0x76
[elsa01:
0:
1852929 - comm.c:
413][
2024-
10-
28
22:
24:
47] INFO [group#:
0] job_id:
3 group id:
0 tree idx:
0 tree_type:LLT rail_idx:
0 group size:
6 quota: (osts:
8 user_data_per_ost:
1024) mgid: (subnet prefix:
0x0
interface id:
0x0) mlid:
0
[elsa01:
0:
1852929 - comm.c:
413][
2024-
10-
28
22:
24:
47] INFO [group#:
1] job_id:
3 group id:
0 tree idx:
1 tree_type:SAT rail_idx:
0 group size:
6 quota: (osts:
64 user_data_per_ost:
0) mgid: (subnet prefix:
0x0
interface id:
0x0) mlid:
0
# OSU MPI Allreduce Latency Test v7.
4
# Datatype: MPI_FLOAT.
# Size Avg Latency(us)
4
4.95
8
4.80
16
5.08
32
5.10
64
5.62
128
5.74
256
7.27
512
7.85
1024
8.28
2048
8.59
4096
14.64
8192
11.11
16384
10.31
32768
11.42
65536
13.84
131072
20.07
262144
37.93
524288
33.89
1048576
64.64