User Defined Reduction Operators

The following functions are public APIs exposed by NCCL to create and destroy custom reduction operators for use in reduction collectives.

ncclRedOpCreatePreMulSum

ncclResult_t ncclRedOpCreatePreMulSum(ncclRedOp_t* op, void* scalar, ncclDataType_t datatype, ncclScalarResidence_t residence, ncclComm_t comm)

Creates a new reduction operator which pre-multiplies input values by a given scalar locally before reducing them with peer values via summation. For use only with collectives launched against comm and datatype. The residence argument indicates whether the memory pointed to by scalar should be dereferenced immediately by the host before this function returns (ncclScalarHostImmediate), or by the device during execution of the reduction collective (ncclScalarDevice). Upon return, the newly created operator’s handle is stored in op.

ncclRedOpDestroy

ncclResult_t ncclRedOpDestroy(ncclRedOp_t op, ncclComm_t comm)

Destroys the reduction operator op. The operator must have been created by ncclRedOpCreatePreMul with the matching communicator comm. An operator may be destroyed as soon as the last NCCL function which is given that operator returns.