NVIDIA HPC-X Software Toolkit Rev 2.20.0
NVIDIA HPC-X Software Toolkit Rev 2.20.0

Unified Collective Communication (UCC)

Unified Collective Communication (UCC) was codesigned with industry partners for PyTorch-based deep learning recommender model training on multi-rail GPU platforms. UCC has been specifically designed and implemented for high-performance PGAS applications and runtimes. It serves as a drop-in replacement for HCOLL and will gradually assume the role of default collective library once UCC fully implements the range of HCOLL's hierarchical algorithms.

For further information on what UCC is and how to use it, please see https://github.com/openucx/ucc

Please see UCC PyTorch integration layer, Torch_UCC at https://github.com/facebookresearch/torch_ucc

Note

UCC is supported in both MPI and OSHMEM. However, it is not enabled by default.

  • To enable it in MPI, set -mca coll_ucc_enable to 1.

  • To enable it in OSHMEM, set -mca coll_scoll_enable to 1.

© Copyright 2024, NVIDIA. Last updated on Aug 14, 2024.