Types¶

The following types are used by the NCCL library.

ncclComm_t¶

ncclComm_t¶: NCCL communicator. Points to an opaque structure inside NCCL.

ncclResult_t¶

ncclResult_t¶

Return values for all NCCL functions. Possible values are :

ncclSuccess¶: (0) Function succeeded.

ncclUnhandledCudaError¶: (1) A call to a CUDA function failed.

ncclSystemError¶: (2) A call to the system failed.

ncclInternalError¶: (3) An internal check failed. This is either a bug in NCCL or due to memory corruption.

ncclInvalidArgument¶: (4) One argument has an invalid value.

ncclInvalidUsage¶: (5) The call to NCCL is incorrect. This is usually reflecting a programming error.

Whenever a function returns an error (not ncclSuccess), NCCL should print a more detailed message when the environment variable NCCL_DEBUG is set to “WARN”.

ncclDataType_t¶

ncclDataType_t¶

NCCL defines the following integral and floating data-types.

ncclInt8¶: Signed 8-bits integer

ncclChar¶: Signed 8-bits integer

ncclUint8¶: Unsigned 8-bits integer

ncclInt32¶: Signed 32-bits integer

ncclInt¶: Signed 32-bits integer

ncclUint32¶: Unsigned 32-bits integer

ncclInt64¶: Signed 64-bits integer

ncclUint64¶: Unsigned 64-bits integer

ncclFloat16¶: 16-bits floating point number (half precision)

ncclHalf¶: 16-bits floating point number (half precision)

ncclFloat32¶: 32-bits floating point number (single precision)

ncclFloat¶: 32-bits floating point number (single precision)

ncclFloat64¶: 64-bits floating point number (double precision)

ncclDouble¶: 64-bits floating point number (double precision)

ncclBfloat16¶: 16-bits floating point number (truncated precision in bfloat16 format, CUDA 11 or later)

ncclRedOp_t¶

ncclRedOp_t¶

Defines the reduction operation.

ncclSum¶: Perform a sum (+) operation

ncclProd¶: Perform a product (*) operation

ncclMin¶: Perform a min operation

ncclMax¶

Perform a max operation

ncclAvg¶

Perform an average operation, i.e. a sum across all ranks, divided by the number of ranks.

ncclScalarResidence_t¶

ncclScalarResidence_t¶

Indicates where (memory space) scalar arguments reside and when they can be dereferenced.

ncclScalarHostImmediate¶: The scalar resides in host memory and should be derefenced in the most immediate way.

ncclScalarDevice¶: The scalar resides on device visible memory and should be dereferenced once needed.