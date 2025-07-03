CPU/GPU Bcast
This feature implements the MCAST Bcast algorithm in UCC, which is disabled by default. To activate the algorithm, users must configure the following environment variables:
-x UCC_TL_MLX5_MCAST_NET_DEVICE=<HCA>(e.g., mlx5_0:1)
-x UCC_TL_MLX5_MCAST_ENABLE=1(Enables MCAST algorithms in TL_MLX5)
-x UCC_TL_MLX5_MIN_TEAM_SIZE=N(Where N is greater than or equal to 2 and less than or equal to the number of processes in the job)
-x UCC_TL_MLX5_TUNE=inf(Sets the maximum priority for all MLX5 algorithms)
Additionally, users should adjust the following Open MPI variables:
-x OMPI_UCC_CL_BASIC_TLS=^sharp,nccl
-x OMPI_UCC_CL_HIER_NODE_LEADERS_SBGP_TLS=^sharp,nccl,shm,cuda
Alternatively, users can customize the algorithm tuning for specific memory types by configuring the
UCC_TL_MLX5_TUNE variable:
-x UCC_TL_MLX5_TUNE=bcast:host:inf#cuda,cuda_managed:0(Sets maximum priority for Bcast algorithms for host memory and disables MLX5 for cuda and cuda managed memory).