.. _group-calls: *********** Group Calls *********** Group functions (ncclGroupStart/ncclGroupEnd) can be used to merge multiple calls into one. This is needed for three purposes: managing multiple GPUs from one thread (to avoid deadlocks), aggregating communication operations to improve performance, or merging multiple send/receive point-to-point operations (see :ref:`point-to-point` section). All three usages can be combined together, with one exception : calls to :c:func:`ncclCommInitRank` cannot be merged with others. Management Of Multiple GPUs From One Thread ------------------------------------------- When a single thread is managing multiple devices, group semantics must be used. This is because every NCCL call may have to block, waiting for other threads/ranks to arrive, before effectively posting the NCCL operation on the given stream. Hence, a simple loop on multiple devices like shown below could block on the first call waiting for the other ones: .. code:: C for (int i=0; i