**********************************************
Communicator Creation and Management Functions
**********************************************

The following functions are public APIs exposed by NCCL to create and manage the collective communication operations.

ncclGetLastError
----------------

.. c:function:: const char* ncclGetLastError(ncclComm_t comm)

Returns a human-readable string of the last error that occurred in NCCL.
Note: The error is not cleared by calling this function.
The *comm* argument is currently unused and can be set to NULL.


ncclGetVersion
--------------

.. c:function:: ncclResult_t  ncclGetVersion(int* version)

The ncclGetVersion function returns the version number of the currently linked NCCL library.
The NCCL version number is returned in *version* and encoded as an integer which includes the
:c:macro:`NCCL_MAJOR`, :c:macro:`NCCL_MINOR` and :c:macro:`NCCL_PATCH` levels.
The version number returned will be the same as the :c:macro:`NCCL_VERSION_CODE` defined in *nccl.h*.
NCCL version numbers can be compared using the supplied macro; :c:macro:`NCCL_VERSION(MAJOR,MINOR,PATCH)`


ncclGetUniqueId
---------------

.. c:function:: ncclResult_t ncclGetUniqueId(ncclUniqueId* uniqueId)

Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should be
called once when creating a communicator and the Id should be distributed to all ranks in the
communicator before calling ncclCommInitRank. *uniqueId* should point to a ncclUniqueId object allocated by the user.

ncclCommInitRank
----------------

.. c:function:: ncclResult_t ncclCommInitRank(ncclComm_t* comm, int nranks, ncclUniqueId commId, int rank)

Creates a new communicator (multi thread/process version).
*rank* must be between 0 and *nranks*-1 and unique within a communicator clique.
Each rank is associated to a CUDA device, which has to be set before calling
ncclCommInitRank.
ncclCommInitRank implicitly synchronizes with other ranks, hence it must be
called by different threads/processes or use ncclGroupStart/ncclGroupEnd.

ncclCommInitAll
---------------

.. c:function:: ncclResult_t ncclCommInitAll(ncclComm_t* comms, int ndev, const int* devlist)

Creates a clique of communicators (single process version) in a blocking way.
This is a convenience function to create a single-process communicator clique.
Returns an array of *ndev* newly initialized communicators in *comms*.
*comms* should be pre-allocated with size at least ndev*sizeof(:c:type:`ncclComm_t`).
*devlist* defines the CUDA devices associated with each rank. If *devlist* is NULL,
the first *ndev* CUDA devices are used, in order.

ncclCommInitRankConfig
----------------------

.. c:function:: ncclResult_t ncclCommInitRankConfig(ncclComm_t* comm, int nranks, ncclUniqueId commId, int rank, ncclConfig_t* config)

This function works the same way as *ncclCommInitRank* but accepts a configuration argument of extra attributes for
the communicator. If config is passed as NULL, the communicator will have the default behavior, as if ncclCommInitRank
was called.

See the :ref:`init-rank-config` section for details on configuration options.

ncclCommFinalize
----------------

.. c:function:: ncclResult_t ncclCommFinalize(ncclComm_t comm)

Finalize a communicator object *comm*. When the communicator is marked as nonblocking, *ncclCommFinalize* is a 
nonblocking function. Successful return from it will set communicator state as *ncclInProgress* and indicates 
the communicator is under finalization where all uncompleted operations and the network-related resources are 
being flushed and freed. 
Once all NCCL operations are complete, the communicator will transition to the *ncclSuccess* state. Users 
can query that state with *ncclCommGetAsyncError*.

ncclCommDestroy
---------------

.. c:function:: ncclResult_t ncclCommDestroy(ncclComm_t comm)

Destroy a communicator object *comm*.
*ncclCommDestroy* only frees the local resources that are allocated to the communicator object *comm* if *ncclCommFinalize* 
was previously called on the communicator; otherwise, *ncclCommDestroy* will call ncclCommFinalize internally. 
If *ncclCommFinalize* is called by users, users should guarantee that the state of the communicator become *ncclSuccess* before 
calling *ncclCommDestroy*. 
In all cases, the communicators should no longer be accessed after ncclCommDestroy returns. It is recommended that 
user call *ncclCommFinalize* and then *ncclCommDestroy*.

ncclCommAbort
-------------

.. c:function:: ncclResult_t ncclCommAbort(ncclComm_t comm)

Frees resources that are allocated to a communicator object *comm*. Will abort any uncompleted
operations before destroying the communicator.

ncclCommGetAsyncError
---------------------

.. c:function:: ncclResult_t ncclCommGetAsyncError(ncclComm_t comm, ncclResult_t* asyncError)

Queries the progress and potential errors of asynchronous NCCL operations.
Operations which do not require a stream argument (e.g. ncclCommFinalize) can be considered complete as soon
as the function returns *ncclSuccess*; operations with a stream argument (e.g. ncclAllReduce) will return
*ncclSuccess* as soon as the operation is posted on the stream but may also report errors through
ncclCommGetAsyncError() until they are completed. If return code of any NCCL functions is *ncclInProgress*,
it means the operation is in the process of being enqueued in the background, and users must query the states
of the communicators until the all states become *ncclSuccess* before calling next NCCL function. Before the
states change into *ncclSuccess*, users are not allowed to issue CUDA kernel to the streams being used by NCCL.
If there has been an error on the communicator, user should destroy the communicator with :c:func:`ncclCommAbort`.
If an error occurs on the communicator, nothing can be assumed about the completion or correctness of operations
enqueued on that communicator.

ncclCommCount
-------------

.. c:function:: ncclResult_t ncclCommCount(const ncclComm_t comm, int* count)

Returns in *count* the number of ranks in the NCCL communicator *comm*.

ncclCommCuDevice
----------------

.. c:function:: ncclResult_t ncclCommCuDevice(const ncclComm_t comm, int* device)

Returns in *device* the CUDA device associated with the NCCL communicator *comm*. 

ncclCommUserRank
----------------

.. c:function:: ncclResult_t ncclCommUserRank(const ncclComm_t comm, int* rank)

Returns in *rank* the rank of the NCCL communicator *comm*.