CPU/GPU Bcast

NVIDIA HPC-X Software Toolkit Rev 2.19.0

This feature implements the MCAST Bcast algorithm in UCC, which is disabled by default. To activate the algorithm, users must configure the following environment variables:

  • -x UCC_TL_MLX5_MCAST_NET_DEVICE=<HCA> (e.g., mlx5_0:1)

  • -x UCC_TL_MLX5_MCAST_ENABLE=1 (Enables MCAST algorithms in TL_MLX5)

  • -x UCC_TL_MLX5_MIN_TEAM_SIZE=N (Where N is greater than or equal to 2 and less than or equal to the number of processes in the job)

  • -x UCC_TL_MLX5_TUNE=inf (Sets the maximum priority for all MLX5 algorithms)

Additionally, users should adjust the following Open MPI variables:

  • -x OMPI_UCC_CL_BASIC_TLS=^sharp,nccl

  • -x OMPI_UCC_CL_HIER_NODE_LEADERS_SBGP_TLS=^sharp,nccl,shm,cuda

Alternatively, users can customize the algorithm tuning for specific memory types by configuring the UCC_TL_MLX5_TUNE variable:

  • -x UCC_TL_MLX5_TUNE=bcast:host:inf#cuda,cuda_managed:0 (Sets maximum priority for Bcast algorithms for host memory and disables MLX5 for cuda and cuda managed memory).

© Copyright 2024, NVIDIA. Last updated on May 6, 2024.