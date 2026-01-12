This feature implements the MCAST Bcast algorithm in UCC, which is disabled by default. To activate the algorithm, users must configure the following environment variables:

-x UCC_TL_MLX5_MCAST_NET_DEVICE=<HCA> (e.g., mlx5_0:1)

-x UCC_TL_MLX5_MCAST_ENABLE=1 (Enables MCAST algorithms in TL_MLX5)

-x UCC_TL_MLX5_MIN_TEAM_SIZE=N (Where N is greater than or equal to 2 and less than or equal to the number of processes in the job)

-x UCC_TL_MLX5_TUNE=inf (Sets the maximum priority for all MLX5 algorithms)

Additionally, users should adjust the following Open MPI variables:

-x OMPI_UCC_CL_BASIC_TLS=^sharp,nccl

-x OMPI_UCC_CL_HIER_NODE_LEADERS_SBGP_TLS=^sharp,nccl,shm,cuda

Alternatively, users can customize the algorithm tuning for specific memory types by configuring the UCC_TL_MLX5_TUNE variable: