reduce#

nvmath.bindings.cutensor.reduce(
intptr_t handle,
intptr_t plan,
intptr_t alpha,
intptr_t a,
intptr_t beta,
intptr_t c,
intptr_t d,
intptr_t workspace,
uint64_t workspace_size,
intptr_t stream,
)[source]#

Performs the tensor reduction that is encoded by plan (see cutensorcreateReduction).

Parameters:
  • handle (intptr_t) – Opaque handle holding cuTENSOR’s library context.

  • plan (intptr_t) – Opaque handle holding the reduction execution plan (created by cutensorcreateReduction followed by cutensorcreatePlan).

  • alpha (intptr_t) – Scaling for a. Its data type is determined by ‘desccompute’ (see cutensorOperationdescriptorGetattribute(desc, cUTENSOR_OPERaTION_ScaLaR_TYPE)). Pointer to the host memory.

  • a (intptr_t) – Pointer to the data corresponding to a in device memory. Pointer to the GPU-accessible memory. The data accessed via this pointer must not overlap with the elements written to d.

  • beta (intptr_t) – Scaling for c. Its data type is determined by ‘desccompute’ (see cutensorOperationdescriptorGetattribute(desc, cUTENSOR_OPERaTION_ScaLaR_TYPE)). Pointer to the host memory.

  • c (intptr_t) – Pointer to the data corresponding to c in device memory. Pointer to the GPU-accessible memory.

  • d (intptr_t) – Pointer to the data corresponding to c in device memory. Pointer to the GPU-accessible memory.

  • workspace (intptr_t) – Scratchpad (device) memory of size –at least– workspace_size bytes; the workspace must be aligned to 256 bytes (i.e., the default alignment of cudaMalloc).

  • workspace_size (uint64_t) – Please use estimate_workspace_size() to query the required workspace.

  • stream (intptr_t) – The cUda stream in which all the computation is performed.

See also

cutensorReduce