.. _group-calls: *********** Group Calls *********** Group functions (ncclGroupStart/ncclGroupEnd) can be used to merge multiple calls into one. This is needed for two purposes: managing multiple GPUs from one thread (to avoid deadlocks) and aggregating communication operations to improve performance. Management Of Multiple GPUs From One Thread ------------------------------------------- When a single thread is managing multiple devices, group semantics must be used. This is because every NCCL call may have to block, waiting for other threads/ranks to arrive, before effectively posting the NCCL operation on the given stream. Hence, a simple loop on multiple devices like shown below could block on the first call waiting for the other ones: .. code:: C for (int i=0; i