reduce#

nvmath.bindings.cutensor.reduce( intptr_t handle, intptr_t plan, intptr_t alpha, intptr_t a, intptr_t beta, intptr_t c, intptr_t d, intptr_t workspace, uint64_t workspace_size, intptr_t stream, )[source]#

Performs the tensor reduction that is encoded by plan (see cutensorcreateReduction).

Parameters:

handle (intptr_t) – Opaque handle holding cuTENSOR’s library context.
plan (intptr_t) – Opaque handle holding the reduction execution plan (created by cutensorcreateReduction followed by cutensorcreatePlan).
alpha (intptr_t) – Scaling for a. Its data type is determined by ‘desccompute’ (see cutensorOperationdescriptorGetattribute(desc, cUTENSOR_OPERaTION_ScaLaR_TYPE)). Pointer to the host memory.
a (intptr_t) – Pointer to the data corresponding to a in device memory. Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to d.
beta (intptr_t) – Scaling for c. Its data type is determined by ‘desccompute’ (see cutensorOperationdescriptorGetattribute(desc, cUTENSOR_OPERaTION_ScaLaR_TYPE)). Pointer to the host memory.
c (intptr_t) – Pointer to the data corresponding to c in device memory. Pointer to the GPU-accessible memory.
d (intptr_t) – Pointer to the data corresponding to c in device memory. Pointer to the GPU-accessible memory.
workspace (intptr_t) – Scratchpad (device) memory of size –at least– workspace_size bytes; the workspace must be aligned to 256 bytes (i.e., the default alignment of cudaMalloc).
workspace_size (uint64_t) – Please use estimate_workspace_size() to query the required workspace.
stream (intptr_t) – The cUda stream in which all the computation is performed.