Communicator Creation and Management Functions

The following functions are public APIs exposed by NCCL to create and manage the collective communication operations.

ncclGetVersion

ncclResult_t ncclGetVersion(int* version)

The ncclGetVersion function returns the version number of the currently linked NCCL library. The NCCL version number is returned in version and encoded as an integer which includes the NCCL_MAJOR, NCCL_MINOR and NCCL_PATCH levels. The version number returned will be the same as the NCCL_VERSION_CODE defined in nccl.h. NCCL version numbers can be compared using the supplied macro; NCCL_VERSION(MAJOR,MINOR,PATCH)

ncclGetUniqueId

ncclResult_t ncclGetUniqueId(ncclUniqueId* uniqueId)

Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should be called once when creating a communicator and the Id should be distributed to all ranks in the communicator before calling ncclCommInitRank. uniqueId should point to a ncclUniqueId object allocated by the user.

ncclCommInitRank

ncclResult_t ncclCommInitRank(ncclComm_t* comm, int nranks, ncclUniqueId commId, int rank)

Creates a new communicator (multi thread/process version). rank must be between 0 and nranks-1 and unique within a communicator clique. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank. ncclCommInitRank implicitly synchronizes with other ranks, hence it must be called by different threads/processes or use ncclGroupStart/ncclGroupEnd.

ncclCommInitAll

ncclResult_t ncclCommInitAll(ncclComm_t* comms, int ndev, const int* devlist)

Creates a clique of communicators (single process version). This is a convenience function to create a single-process communicator clique. Returns an array of ndev newly initialized communicators in comms. comms should be pre-allocated with size at least ndev*sizeof(ncclComm_t). devlist defines the CUDA devices associated with each rank. If devlist is NULL, the first ndev CUDA devices are used, in order.

ncclCommDestroy

ncclResult_t ncclCommDestroy(ncclComm_t comm)

Frees resources that are allocated to a communicator object comm. Waits for any uncompleted operations before destroying the communicator.

ncclCommAbort

ncclResult_t ncclCommAbort(ncclComm_t comm)

Frees resources that are allocated to a communicator object comm. Will abort any uncompleted operations before destroying the communicator.

ncclCommGetAsyncError

ncclResult_t ncclCommGetAsyncError(ncclComm_t comm, ncclResult_t* asyncError)

Queries whether the communicator has encountered any asynchronous errors. If there has been an error on the communicator, user should destroy the communicator with ncclCommAbort(). If an error occurs on the communicator, nothing can be assumed about the completion or correctness of operations enqueued on that communicator.

ncclCommCount

ncclResult_t ncclCommCount(const ncclComm_t comm, int* count)

Returns in count the number of ranks in the NCCL communicator comm.

ncclCommCuDevice

ncclResult_t ncclCommCuDevice(const ncclComm_t comm, int* device)

Returns in device the CUDA device associated with the NCCL communicator comm.

ncclCommUserRank

ncclResult_t ncclCommUserRank(const ncclComm_t comm, int* rank)

Returns in rank the rank of the NCCL communicator comm.