***** Types ***** The following types are used by the NCCL library. ncclComm_t ---------- .. c:type:: ncclComm_t NCCL communicator. Points to an opaque structure inside NCCL. ncclResult_t ------------ .. c:type:: ncclResult_t Return values for all NCCL functions. Possible values are : .. c:macro:: ncclSuccess (``0``) Function succeeded. .. c:macro:: ncclUnhandledCudaError (``1``) A call to a CUDA function failed. .. c:macro:: ncclSystemError (``2``) A call to the system failed. .. c:macro:: ncclInternalError (``3``) An internal check failed. This is either a bug in NCCL or due to memory corruption. .. c:macro:: ncclInvalidArgument (``4``) One argument has an invalid value. .. c:macro:: ncclInvalidUsage (``5``) The call to NCCL is incorrect. This is usually reflecting a programming error. .. c:macro:: ncclRemoteError (``6``) A call failed possibly due to a network error or a remote process exiting prematurely. .. c:macro:: ncclInProgress (``7``) A NCCL operation on the communicator is being enqueued and is being progressed in the background. Whenever a function returns an error (not ncclSuccess), NCCL should print a more detailed message when the environment variable :ref:`NCCL_DEBUG` is set to "WARN". ncclDataType_t -------------- .. c:type:: ncclDataType_t NCCL defines the following integral and floating data-types. .. c:macro:: ncclInt8 Signed 8-bits integer .. c:macro:: ncclChar Signed 8-bits integer .. c:macro:: ncclUint8 Unsigned 8-bits integer .. c:macro:: ncclInt32 Signed 32-bits integer .. c:macro:: ncclInt Signed 32-bits integer .. c:macro:: ncclUint32 Unsigned 32-bits integer .. c:macro:: ncclInt64 Signed 64-bits integer .. c:macro:: ncclUint64 Unsigned 64-bits integer .. c:macro:: ncclFloat16 16-bits floating point number (half precision) .. c:macro:: ncclHalf 16-bits floating point number (half precision) .. c:macro:: ncclFloat32 32-bits floating point number (single precision) .. c:macro:: ncclFloat 32-bits floating point number (single precision) .. c:macro:: ncclFloat64 64-bits floating point number (double precision) .. c:macro:: ncclDouble 64-bits floating point number (double precision) .. c:macro:: ncclBfloat16 16-bits floating point number (truncated precision in bfloat16 format, CUDA 11 or later) ncclRedOp_t ----------- .. c:type:: ncclRedOp_t Defines the reduction operation. .. c:macro:: ncclSum Perform a sum (+) operation .. c:macro:: ncclProd Perform a product (*) operation .. c:macro:: ncclMin Perform a min operation .. c:macro:: ncclMax Perform a max operation .. c:macro:: ncclAvg Perform an average operation, i.e. a sum across all ranks, divided by the number of ranks. ncclScalarResidence_t --------------------- .. c:type:: ncclScalarResidence_t Indicates where (memory space) scalar arguments reside and when they can be dereferenced. .. c:macro:: ncclScalarHostImmediate The scalar resides in host memory and should be derefenced in the most immediate way. .. c:macro:: ncclScalarDevice The scalar resides on device visible memory and should be dereferenced once needed. ncclConfig_t --------------------- .. c:type:: ncclConfig_t A structure-based configuration users can set to initialize a communicator; a new created configuration must be initialized by NCCL_CONFIG_INITIALIZER. .. c:macro:: NCCL_CONFIG_INITIALIZER A configuration macro initializer which must be assigned to new created configuration. .. c:macro:: blocking This attribute can be set as integer 0 or 1 to indicate nonblocking or blocking communicator behavior correspondingly. Blocking is default value.