EC/CUDA One-shot Kernel with Cooperative Launch
This feature improves GPU collective performance by utilizing the CUDA cooperative launch feature. It enables the use of a single CUDA kernel for CUDA operations in UCC GPU collectives.
This feature can activated by enabling the UCC environment variable UCC_EC_CUDA_USE_COOPERATIVE_LAUNCH as follows:
UCC_EC_CUDA_USE_COOPERATIVE_LAUNCH=1