Communicator Creation and Management Functions¶
The following functions are public APIs exposed by NCCL to create and manage the collective communication operations.
ncclGetVersion¶
-
ncclResult_t
ncclGetVersion
(int* version)¶
The ncclGetVersion function returns the version number of the currently linked NCCL library.
The NCCL version number is returned in version and encoded as an integer which includes the
NCCL_MAJOR
, NCCL_MINOR
and NCCL_PATCH
levels.
The version number returned will be the same as the NCCL_VERSION_CODE
defined in nccl.h.
NCCL version numbers can be compared using the supplied macro; NCCL_VERSION(MAJOR,MINOR,PATCH)
ncclGetUniqueId¶
-
ncclResult_t
ncclGetUniqueId
(ncclUniqueId* uniqueId)¶
Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should be called once when creating a communicator and the Id should be distributed to all ranks in the communicator before calling ncclCommInitRank. uniqueId should point to a ncclUniqueId object allocated by the user.
ncclCommInitRank¶
-
ncclResult_t
ncclCommInitRank
(ncclComm_t* comm, int nranks, ncclUniqueId commId, int rank)¶
Creates a new communicator (multi thread/process version). rank must be between 0 and nranks-1 and unique within a communicator clique. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank. ncclCommInitRank implicitly synchronizes with other ranks, hence it must be called by different threads/processes or use ncclGroupStart/ncclGroupEnd.
ncclCommInitAll¶
-
ncclResult_t
ncclCommInitAll
(ncclComm_t* comms, int ndev, const int* devlist)¶
Creates a clique of communicators (single process version).
This is a convenience function to create a single-process communicator clique.
Returns an array of ndev newly initialized communicators in comms.
comms should be pre-allocated with size at least ndev*sizeof(ncclComm_t
).
devlist defines the CUDA devices associated with each rank. If devlist is NULL,
the first ndev CUDA devices are used, in order.
ncclCommDestroy¶
-
ncclResult_t
ncclCommDestroy
(ncclComm_t comm)¶
Frees resources that are allocated to a communicator object comm. Waits for any uncompleted operations before destroying the communicator.
ncclCommAbort¶
-
ncclResult_t
ncclCommAbort
(ncclComm_t comm)¶
Frees resources that are allocated to a communicator object comm. Will abort any uncompleted operations before destroying the communicator.
ncclCommGetAsyncError¶
-
ncclResult_t
ncclCommGetAsyncError
(ncclComm_t comm, ncclResult_t* asyncError)¶
Queries whether the communicator has encountered any asynchronous errors. If there
has been an error on the communicator, user should destroy the communicator with ncclCommAbort()
.
If an error occurs on the communicator, nothing can be assumed about the completion or correctness
of operations enqueued on that communicator.
ncclCommCount¶
-
ncclResult_t
ncclCommCount
(const ncclComm_t comm, int* count)¶
Returns in count the number of ranks in the NCCL communicator comm.
ncclCommCuDevice¶
-
ncclResult_t
ncclCommCuDevice
(const ncclComm_t comm, int* device)¶
Returns in device the CUDA device associated with the NCCL communicator comm.
ncclCommUserRank¶
-
ncclResult_t
ncclCommUserRank
(const ncclComm_t comm, int* rank)¶
Returns in rank the rank of the NCCL communicator comm.