Types

The following types are used by the NCCL library.

ncclComm_t

ncclComm_t

NCCL communicator. Points to an opaque structure inside NCCL.

ncclResult_t

ncclResult_t

Return values for all NCCL functions. Possible values are :

ncclSuccess

(0) Function succeeded.

ncclUnhandledCudaError

(1) A call to a CUDA function failed.

ncclSystemError

(2) A call to the system failed.

ncclInternalError

(3) An internal check failed. This is either a bug in NCCL or due to memory corruption.

ncclInvalidArgument

(4) One argument has an invalid value.

ncclInvalidUsage

(5) The call to NCCL is incorrect. This is usually reflecting a programming error.

Whenever a function returns an error (not ncclSuccess), NCCL should print a more detailed message when the environment variable NCCL_DEBUG is set to “WARN”.

ncclDataType_t

ncclDataType_t

NCCL defines the following integral and floating data-types.

ncclInt8

Signed 8-bits integer

ncclChar

Signed 8-bits integer

ncclUint8

Unsigned 8-bits integer

ncclInt32

Signed 32-bits integer

ncclInt

Signed 32-bits integer

ncclUint32

Unsigned 32-bits integer

ncclInt64

Signed 64-bits integer

ncclUint64

Unsigned 64-bits integer

ncclFloat16

16-bits floating point number (half precision)

ncclHalf

16-bits floating point number (half precision)

ncclFloat32

32-bits floating point number (single precision)

ncclFloat

32-bits floating point number (single precision)

ncclFloat64

64-bits floating point number (double precision)

ncclDouble

64-bits floating point number (double precision)

ncclBfloat16

16-bits floating point number (truncated precision in bfloat16 format, CUDA 11 or later)

ncclRedOp_t

ncclRedOp_t

Defines the reduction operation.

ncclSum

Perform a sum (+) operation

ncclProd

Perform a product (*) operation

ncclMin

Perform a min operation

ncclMax

Perform a max operation

ncclAvg

Perform an average operation, i.e. a sum across all ranks, divided by the number of ranks.

ncclScalarResidence_t

ncclScalarResidence_t

Indicates where (memory space) scalar arguments reside and when they can be dereferenced.

ncclScalarHostImmediate

The scalar resides in host memory and should be derefenced in the most immediate way.

ncclScalarDevice

The scalar resides on device visible memory and should be dereferenced once needed.