.. _point-to-point: **************************** Point-to-point communication **************************** (Since NCCL 2.7) Point-to-point communication can be used to express any communication pattern between ranks. Any point-to-point communication needs two NCCL calls : a call to :c:func:`ncclSend` on one rank and a corresponding :c:func:`ncclRecv` on the other rank, with the same count and data type. Multiple calls to :c:func:`ncclSend` and :c:func:`ncclRecv` targeting different peers can be fused together with :c:func:`ncclGroupStart` and :c:func:`ncclGroupEnd` to form more complex communication patterns such as one-to-all (scatter), all-to-one (gather), all-to-all or communication with neighbors in an N-dimensional space. Point-to-point calls within a group will be blocking until that group of calls completes, but calls within a group can be seen as progressing independently, hence should never block each other. It is therefore important to merge calls that need to progress concurrently to avoid deadlocks. The only exception is point-to-point calls within a group targeting the *same* peer, which are executed in order. Below are a few examples of classic point-to-point communication patterns used by parallel applications. NCCL semantics allow for all variants with different sizes, datatypes, and buffers, per rank. Sendrecv -------- In MPI terms, a sendrecv operation is when two ranks exchange data, both sending and receiving at the same time. This can be done by merging both ncclSend and ncclRecv calls into one : .. code:: C ncclGroupStart(); ncclSend(sendbuff, sendcount, sendtype, peer, comm, stream); ncclRecv(recvbuff, recvcount, recvtype, peer, comm, stream); ncclGroupEnd(); One-to-all (scatter) -------------------- A one-to-all operation from a ``root`` rank can be expressed by merging all send and receive operations in a group : .. code:: C ncclGroupStart(); if (rank == root) { for (int r=0; r