NVPL TENSOR Functions

Helper Functions

The helper functions initialize nvplTENSOR, create tensor descriptors, check error codes, and retrieve library versions.


nvpltensorCreate()

nvpltensorStatus_t nvpltensorCreate(nvpltensorHandle_t *handle)

Initializes the nvplTENSOR library and allocates the memory for the library context.

The user is responsible for calling nvpltensorDestroy to free the resources associated with the handle.

Remark

blocking, no reentrant, and thread-safe

Parameters:

handle[out] Pointer to nvpltensorHandle_t

Return values:

NVPLTENSOR_STATUS_SUCCESS – on success and an error code otherwise


nvpltensorDestroy()

nvpltensorStatus_t nvpltensorDestroy(nvpltensorHandle_t handle)

Frees all resources related to the provided library handle.

Remark

blocking, no reentrant, and thread-safe

Parameters:

handle[inout] Pointer to nvpltensorHandle_t

Return values:

NVPLTENSOR_STATUS_SUCCESS – on success and an error code otherwise


nvpltensorCreateTensorDescriptor()

nvpltensorStatus_t nvpltensorCreateTensorDescriptor(const nvpltensorHandle_t handle, nvpltensorTensorDescriptor_t *desc, const uint32_t numModes, const int64_t extent[], const int64_t stride[], nvpltensorDataType_t dataType, uint32_t alignmentRequirement)

Creates a tensor descriptor.

This allocates a small amount of host-memory.

The user is responsible for calling nvpltensorDestroyTensorDescriptor() to free the associated resources once the tensor descriptor is no longer used.

Remark

non-blocking, no reentrant, and thread-safe

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[out] Pointer to the address where the allocated tensor descriptor object will be stored.

  • numModes[in] Number of modes.

  • extent[in] Extent of each mode (must be larger than zero).

  • stride[in] stride[i] denotes the displacement (stride) between two consecutive elements in the ith-mode. If stride is NULL, a packed generalized column-major memory layout is assumed (i.e., the strides increase monotonically from left to right). Each stride must be larger than zero; to be precise, a stride of zero can be achieved by omitting this mode entirely; for instance instead of writing C[a,b] = A[b,a] with strideA(a) = 0, you can write C[a,b] = A[b] directly; nvplTENSOR will then automatically infer that the a-mode in A should be broadcasted.

  • dataType[in] Data type of the stored entries.

  • alignmentRequirement[in] Alignment (in bytes) to the base pointer that will be used in conjunction with this tensor descriptor.

Return values:
  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if the requested descriptor is not supported (e.g., due to non-supported data type).

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).

Pre:

extent and stride arrays must each contain at least sizeof(int64_t) * numModes bytes


nvpltensorDestroyTensorDescriptor()

nvpltensorStatus_t nvpltensorDestroyTensorDescriptor(nvpltensorTensorDescriptor_t desc)

Frees all resources related to the provided tensor descriptor.

Remark

blocking, no reentrant, and thread-safe

Parameters:

desc[inout] The nvpltensorTensorDescriptor_t object that will be deallocated.

Return values:

NVPLTENSOR_STATUS_SUCCESS – on success and an error code otherwise


nvpltensorGetErrorString()

const char *nvpltensorGetErrorString(const nvpltensorStatus_t error)

Returns the description string for an error code.

Remark

non-blocking, no reentrant, and thread-safe

Parameters:

error[in] Error code to convert to string.

Return values:

The – null-terminated error string.


nvpltensorGetVersion()

size_t nvpltensorGetVersion()

Returns Version number of the NVPLTENSOR library.


Element-wise Operations

The following functions perform element-wise operations between tensors.


nvpltensorCreateElementwiseTrinary()

nvpltensorStatus_t nvpltensorCreateElementwiseTrinary(const nvpltensorHandle_t handle, nvpltensorOperationDescriptor_t *desc, const nvpltensorTensorDescriptor_t descA, const int32_t modeA[], nvpltensorOperator_t opA, const nvpltensorTensorDescriptor_t descB, const int32_t modeB[], nvpltensorOperator_t opB, const nvpltensorTensorDescriptor_t descC, const int32_t modeC[], nvpltensorOperator_t opC, const nvpltensorTensorDescriptor_t descD, const int32_t modeD[], nvpltensorOperator_t opAB, nvpltensorOperator_t opABC, const nvpltensorComputeDescriptor_t descCompute)

This function creates an operation descriptor that encodes an elementwise trinary operation.

Said trinary operation has the following general form:

\[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{ABC}(\Phi_{AB}(\alpha op_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \beta op_B(B_{\Pi^B(i_0,i_1,...,i_n)})), \gamma op_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]

Where

  • A,B,C,D are multi-mode tensors (of arbitrary data types).

  • \(\Pi^A, \Pi^B, \Pi^C \) are permutation operators that permute the modes of A, B, and C respectively.

  • \(op_{A},op_{B},op_{C}\) are unary element-wise operators (e.g., IDENTITY, CONJUGATE).

  • \(\Phi_{ABC}, \Phi_{AB}\) are binary element-wise operators (e.g., ADD, MUL, MAX, MIN).

Notice that the broadcasting (of a mode) can be achieved by simply omitting that mode from the respective tensor.

Moreover, modes may appear in any order, giving users a greater flexibility. The only restrictions are:

  • modes that appear in A or B must also appear in the output tensor; a mode that only appears in the input would be contracted and such an operation would be covered by either nvpltensorContract or nvpltensorReduce.

  • each mode may appear in each tensor at most once.

Input tensors may be read even if the value of the corresponding scalar is zero.

Examples:

  • \( D_{a,b,c,d} = A_{b,d,a,c}\)

  • \( D_{a,b,c,d} = 2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a}\)

  • \( D_{a,b,c,d} = 2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a} + C_{a,b,c,d}\)

  • \( D_{a,b,c,d} = min((2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a}), C_{a,b,c,d})\)

Call nvpltensorElementwiseTrinaryExecute to perform the actual operation.

Please use nvpltensorDestroyOperationDescriptor to deallocated the descriptor once it is no longer used.

Supported data-type combinations are:

typeA

typeB

typeC

descCompute

NVPLTENSOR_R_32F

NVPLTENSOR_R_32F

NVPLTENSOR_R_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_R_64F

NVPLTENSOR_R_64F

NVPLTENSOR_R_64F

NVPLTENSOR_COMPUTE_DESC_64F

NVPLTENSOR_C_32F

NVPLTENSOR_C_32F

NVPLTENSOR_C_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_C_64F

NVPLTENSOR_C_64F

NVPLTENSOR_C_64F

NVPLTENSOR_COMPUTE_DESC_64F

Remark

calls asynchronous functions, no reentrant, and thread-safe

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[out] This opaque struct gets allocated and filled with the information that encodes the requested elementwise operation.

  • descA[in] A descriptor that holds the information about the data type, modes, and strides of A.

  • modeA[in] Array of size descA->numModes that holds the names of the modes of A (e.g., if \(A_{a,b,c}\) then modeA = {‘a’,’b’,’c’}). The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to nvpltensorCreateTensorDescriptor.

  • opA[in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.

  • descB[in] A descriptor that holds information about the data type, modes, and strides of B.

  • modeB[in] Array of size descB->numModes that holds the names of the modes of B. modeB[i] corresponds to extent[i] and stride[i] of the nvpltensorCreateTensorDescriptor

  • opB[in] Unary operator that will be applied to each element of B before it is further processed. The original data of this tensor remains unchanged.

  • descC[in] A descriptor that holds information about the data type, modes, and strides of C.

  • modeC[in] Array of size descC->numModes that holds the names of the modes of C. The modeC[i] corresponds to extent[i] and stride[i] of the nvpltensorCreateTensorDescriptor.

  • opC[in] Unary operator that will be applied to each element of C before it is further processed. The original data of this tensor remains unchanged.

  • descD[in] A descriptor that holds information about the data type, modes, and strides of D. Notice that we currently request descD and descC to be identical.

  • modeD[in] Array of size descD->numModes that holds the names of the modes of D. The modeD[i] corresponds to extent[i] and stride[i] of the nvpltensorCreateTensorDescriptor.

  • opAB[in] Element-wise binary operator (see \(\Phi_{AB}\) above).

  • opABC[in] Element-wise binary operator (see \(\Phi_{ABC}\) above).

  • descCompute[in] Determines the precision in which this operations is performed.

Return values:
  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).


nvpltensorElementwiseTrinaryExecute()

nvpltensorStatus_t nvpltensorElementwiseTrinaryExecute(const nvpltensorHandle_t handle, const nvpltensorPlan_t plan, const void *alpha, const void *A, const void *beta, const void *B, const void *gamma, const void *C, void *D)

Performs an element-wise tensor operation for three input tensors (see nvpltensorCreateElementwiseTrinary)

This function performs a element-wise tensor operation of the form:

\[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{ABC}(\Phi_{AB}(\alpha op_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \beta op_B(B_{\Pi^B(i_0,i_1,...,i_n)})), \gamma op_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]

See nvpltensorCreateElementwiseTrinary() for details.

Remark

calls asynchronous functions, no reentrant, and thread-safe

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • plan[in] Opaque handle holding all information about the desired elementwise operation (created by nvpltensorCreateElementwiseTrinary followed by nvpltensorCreatePlan).

  • alpha[in] Pointer to the memory storing scaling factor for A (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). If alpha is zero, A is not read and the corresponding unary operator is not applied.

  • A[in] Pointer to the memory storing multi-mode tensor (described by descA as part of nvpltensorCreateElementwiseTrinary).

  • beta[in] Pointer to the memory storing scaling factor for B (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). If beta is zero, B is not read and the corresponding unary operator is not applied.

  • B[in] Pointer to the memory storing multi-mode tensor (described by descB as part of nvpltensorCreateElementwiseTrinary).

  • gamma[in] Pointer to the memory storing scaling factor for C (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). If gamma is zero, C is not read and the corresponding unary operator is not applied.

  • C[in] Pointer to the memory storing multi-mode tensor (described by descC as part of nvpltensorCreateElementwiseTrinary).

  • D[out] Pointer to the memory storing multi-mode tensor (described by descD as part of nvpltensorCreateElementwiseTrinary). C and D may be identical, if and only if descC == descD.

Return values:
  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported

  • NVPLTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value

  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully without error

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.


nvpltensorCreateElementwiseBinary()

nvpltensorStatus_t nvpltensorCreateElementwiseBinary(const nvpltensorHandle_t handle, nvpltensorOperationDescriptor_t *desc, const nvpltensorTensorDescriptor_t descA, const int32_t modeA[], nvpltensorOperator_t opA, const nvpltensorTensorDescriptor_t descC, const int32_t modeC[], nvpltensorOperator_t opC, const nvpltensorTensorDescriptor_t descD, const int32_t modeD[], nvpltensorOperator_t opAC, const nvpltensorComputeDescriptor_t descCompute)

This function creates an operation descriptor for an elementwise binary operation.

The binary operation has the following general form:

\[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{AC}(\alpha \Psi_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \gamma \Psi_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]

Call nvpltensorElementwiseBinaryExecute to perform the actual operation.

Supported data-type combinations are:

typeA

typeC

descCompute

NVPLTENSOR_R_32F

NVPLTENSOR_R_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_R_64F

NVPLTENSOR_R_64F

NVPLTENSOR_COMPUTE_DESC_64F

NVPLTENSOR_C_32F

NVPLTENSOR_C_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_C_64F

NVPLTENSOR_C_64F

NVPLTENSOR_COMPUTE_DESC_64F

Remark

calls asynchronous functions, no reentrant, and thread-safe

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[out] This opaque struct gets allocated and filled with the information that encodes the requested elementwise operation.

  • descA[in] The descriptor that holds the information about the data type, modes, and strides of A.

  • modeA[in] Array of size descA->numModes that holds the names of the modes of A (e.g., if A_{a,b,c} => modeA = {‘a’,’b’,’c’}). The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to nvpltensorCreateTensorDescriptor.

  • opA[in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.

  • descC[in] The descriptor that holds information about the data type, modes, and strides of C.

  • modeC[in] Array of size descC->numModes that holds the names of the modes of C. The modeC[i] corresponds to extent[i] and stride[i] of the nvpltensorCreateTensorDescriptor.

  • opC[in] Unary operator that will be applied to each element of C before it is further processed. The original data of this tensor remains unchanged.

  • descD[in] The descriptor that holds information about the data type, modes, and strides of D. Notice that we currently request descD and descC to be identical.

  • modeD[in] Array of size descD->numModes that holds the names of the modes of D. The modeD[i] corresponds to extent[i] and stride[i] of the nvpltensorCreateTensorDescriptor.

  • opAC[in] Element-wise binary operator (see \(\Phi_{AC}\) above).

  • descCompute[in] Determines the precision in which this operations is performed.

Return values:
  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported

  • NVPLTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value

  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully without error

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.


nvpltensorElementwiseBinaryExecute()

nvpltensorStatus_t nvpltensorElementwiseBinaryExecute(const nvpltensorHandle_t handle, const nvpltensorPlan_t plan, const void *alpha, const void *A, const void *gamma, const void *C, void *D)

Performs an element-wise tensor operation for two input tensors (see nvpltensorCreateElementwiseBinary)

This function performs a element-wise tensor operation of the form:

\[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{AC}(\alpha \Psi_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \gamma \Psi_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]

See nvpltensorCreateElementwiseBinary() for details.

Remark

calls asynchronous functions, no reentrant, and thread-safe

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • plan[in] Opaque handle holding all information about the desired elementwise operation (created by nvpltensorCreateElementwiseBinary followed by nvpltensorCreatePlan).

  • alpha[in] Pointer to the memory storing scaling factor for A (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). If alpha is zero, A is not read and the corresponding unary operator is not applied.

  • A[in] Pointer to the memory storing multi-mode tensor (described by descA as part of nvpltensorCreateElementwiseBinary).

  • gamma[in] Pointer to the memory storing scaling factor for C (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE) to query the expected data type). If gamma is zero, C is not read and the corresponding unary operator is not applied.

  • C[in] Pointer to the memory storing multi-mode tensor (described by descC as part of nvpltensorCreateElementwiseBinary).

  • D[out] Pointer to the memory storing multi-mode tensor (described by descD as part of nvpltensorCreateElementwiseBinary). C and D may be identical, if and only if descC == descD.

Return values:
  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported

  • NVPLTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value

  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully without error

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.


nvpltensorCreatePermutation()

nvpltensorStatus_t nvpltensorCreatePermutation(const nvpltensorHandle_t handle, nvpltensorOperationDescriptor_t *desc, const nvpltensorTensorDescriptor_t descA, const int32_t modeA[], nvpltensorOperator_t opA, const nvpltensorTensorDescriptor_t descB, const int32_t modeB[], const nvpltensorComputeDescriptor_t descCompute)

This function creates an operation descriptor for a tensor permutation.

The tensor permutation has the following general form:

\[ B_{\Pi^B(i_0,i_1,...,i_n)} = \alpha op_A(A_{\Pi^A(i_0,i_1,...,i_n)}) \]

Consequently, this function performs an out-of-place tensor permutation and is a specialization of nvpltensorCreateElementwiseBinary.

Where

  • A and B are multi-mode tensors (of arbitrary data types),

  • \(\Pi^A, \Pi^B\) are permutation operators that permute the modes of A, B respectively,

  • \(op_A\) is an unary element-wise operators (e.g., IDENTITY, SQR, CONJUGATE), and

  • \(\Psi\) is specified in the tensor descriptor descA.

Broadcasting (of a mode) can be achieved by simply omitting that mode from the respective tensor.

Modes may appear in any order. The only restrictions are:

  • modes that appear in A must also appear in the output tensor.

  • each mode may appear in each tensor at most once.

Supported data-type combinations are:

typeA

typeB

descCompute

NVPLTENSOR_R_32F

NVPLTENSOR_R_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_R_64F

NVPLTENSOR_R_64F

NVPLTENSOR_COMPUTE_DESC_64F

NVPLTENSOR_C_32F

NVPLTENSOR_C_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_C_64F

NVPLTENSOR_C_64F

NVPLTENSOR_COMPUTE_DESC_64F

Remark

calls asynchronous functions, no reentrant, and thread-safe

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[out] This opaque struct gets allocated and filled with the information that encodes the requested permutation.

  • descA[in] The descriptor that holds information about the data type, modes, and strides of A.

  • modeA[in] Array of size descA->numModes that holds the names of the modes of A (e.g., if A_{a,b,c} => modeA = {‘a’,’b’,’c’})

  • opA[in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.

  • descB[in] The descriptor that holds information about the data type, modes, and strides of B.

  • modeB[in] Array of size descB->numModes that holds the names of the modes of B

  • descCompute[in] Determines the precision in which this operations is performed.

Return values:
  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported

  • NVPLTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value

  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully without error

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.


nvpltensorPermute()

nvpltensorStatus_t nvpltensorPermute(const nvpltensorHandle_t handle, const nvpltensorPlan_t plan, const void *alpha, const void *A, void *B)

Performs the tensor permutation that is encoded by plan (see nvpltensorCreatePermutation).

This function performs an elementwise tensor operation of the form:

\[ B_{\Pi^B(i_0,i_1,...,i_n)} = \alpha \Psi(A_{\Pi^A(i_0,i_1,...,i_n)}) \]

Consequently, this function performs an out-of-place tensor permutation.

Where

  • A and B are multi-mode tensors (of arbitrary data types),

  • \(\Pi^A, \Pi^B\) are permutation operators that permute the modes of A, B respectively,

  • \(\Psi\) is an unary element-wise operators (e.g., IDENTITY, SQR, CONJUGATE), and

  • \(\Psi\) is specified in the tensor descriptor descA.

Remark

calls asynchronous functions, no reentrant, and thread-safe

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • plan[in] Opaque handle holding all information about the desired tensor reduction (created by nvpltensorCreatePermutation followed by nvpltensorCreatePlan).

  • alpha[in] Pointer to the memory storing scaling factor for A (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE)). If alpha is zero, A is not read and the corresponding unary operator is not applied.

  • A[in] Pointer to the memory storing multi-mode tensor (described by descA as part of nvpltensorCreatePermutation).

  • B[inout] Pointer to the memory storing multi-mode tensor (described by descB as part of nvpltensorCreatePermutation).

Return values:
  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported

  • NVPLTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value

  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully without error

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.


Contraction Operations

The following functions perform contractions between tensors.


nvpltensorCreateContraction()

nvpltensorStatus_t nvpltensorCreateContraction(const nvpltensorHandle_t handle, nvpltensorOperationDescriptor_t *desc, const nvpltensorTensorDescriptor_t descA, const int32_t modeA[], nvpltensorOperator_t opA, const nvpltensorTensorDescriptor_t descB, const int32_t modeB[], nvpltensorOperator_t opB, const nvpltensorTensorDescriptor_t descC, const int32_t modeC[], nvpltensorOperator_t opC, const nvpltensorTensorDescriptor_t descD, const int32_t modeD[], const nvpltensorComputeDescriptor_t descCompute)

This function allocates a nvpltensorOperationDescriptor_t object that encodes a tensor contraction of the form \( D = \alpha \mathcal{A} \mathcal{B} + \beta \mathcal{C} \).

Allocates data for desc to be used to perform a tensor contraction of the form

\[ \mathcal{D}_{{modes}_\mathcal{D}} \gets \alpha op_\mathcal{A}(\mathcal{A}_{{modes}_\mathcal{A}}) op_\mathcal{B}(B_{{modes}_\mathcal{B}}) + \beta op_\mathcal{C}(\mathcal{C}_{{modes}_\mathcal{C}}). \]

See nvpltensorCreatePlan (or nvpltensorCreatePlanAutotuned) to create the plan (i.e., to select the kernel) followed by a call to nvpltensorContract to perform the actual contraction.

The user is responsible for calling nvpltensorDestroyOperationDescriptor to free the resources associated with the descriptor.

Supported data-type combinations are:

typeA

typeB

typeC

descCompute

typeScalar

NVPLTENSOR_R_32F

NVPLTENSOR_R_32F

NVPLTENSOR_R_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_R_32F

NVPLTENSOR_R_64F

NVPLTENSOR_R_64F

NVPLTENSOR_R_64F

NVPLTENSOR_COMPUTE_DESC_64F

NVPLTENSOR_R_64F

NVPLTENSOR_C_32F

NVPLTENSOR_C_32F

NVPLTENSOR_C_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_C_32F

NVPLTENSOR_C_64F

NVPLTENSOR_C_64F

NVPLTENSOR_C_64F

NVPLTENSOR_COMPUTE_DESC_64F

NVPLTENSOR_C_64F

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[out] This opaque struct gets allocated and filled with the information that encodes the tensor contraction operation.

  • descA[in] The descriptor that holds the information about the data type, modes and strides of A.

  • modeA[in] Array with ‘nmodeA’ entries that represent the modes of A. The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to nvpltensorInitTensorDescriptor.

  • opA[in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.

  • descB[in] The descriptor that holds information about the data type, modes, and strides of B.

  • modeB[in] Array with ‘nmodeB’ entries that represent the modes of B. The modeB[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to nvpltensorInitTensorDescriptor.

  • opB[in] Unary operator that will be applied to each element of B before it is further processed. The original data of this tensor remains unchanged.

  • modeC[in] Array with ‘nmodeC’ entries that represent the modes of C. The modeC[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to nvpltensorInitTensorDescriptor.

  • descC[in] The escriptor that holds information about the data type, modes, and strides of C.

  • opC[in] Unary operator that will be applied to each element of C before it is further processed. The original data of this tensor remains unchanged.

  • modeD[in] Array with ‘nmodeD’ entries that represent the modes of D (must be identical to modeC for now). The modeD[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to nvpltensorInitTensorDescriptor.

  • descD[in] The descriptor that holds information about the data type, modes, and strides of D (must be identical to descC for now).

  • descCompute[in] Datatype of for the intermediate computation of typeCompute T = A * B.

Return values:
  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported

  • NVPLTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value

  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully without error

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.


nvpltensorContract()

nvpltensorStatus_t nvpltensorContract(const nvpltensorHandle_t handle, const nvpltensorPlan_t plan, const void *alpha, const void *A, const void *B, const void *beta, const void *C, void *D, void *workspace, uint64_t workspaceSize)

This routine computes the tensor contraction \( D = alpha * A * B + beta * C \).

\[ \mathcal{D}_{{modes}_\mathcal{D}} \gets \alpha * \mathcal{A}_{{modes}_\mathcal{A}} B_{{modes}_\mathcal{B}} + \beta \mathcal{C}_{{modes}_\mathcal{C}} \]

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • plan[in] Opaque handle holding the contraction execution plan (created by nvpltensorCreateContraction followed by nvpltensorCreatePlan).

  • alpha[in] Pointer to the memory storing scaling for A*B. Its data type is determined by ‘descCompute’ (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE)).

  • A[in] Pointer to the memory storing multi-mode tensor (described by descA as part of nvpltensorCreateContraction).

  • B[in] Pointer to the memory storing multi-mode tensor (described by descB as part of nvpltensorCreateContraction).

  • beta[in] Scaling for C. Its data type is determined by ‘descCompute’ (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE)). Pointer to the host memory.

  • C[in] Pointer to the memory storing multi-mode tensor (described by descC as part of nvpltensorCreateContraction).

  • D[out] Pointer to the memory storing multi-mode tensor (described by descD as part of nvpltensorCreateContraction). C and D may be identical, if and only if descC == descD.

  • workspace[out] Optional parameter that may be NULL. This pointer provides additional workspace to the library for additional optimizations; the workspace must be aligned to 256 bytes.

  • workspaceSize[in] Size of the workspace array in bytes; please refer to nvpltensorEstimateWorkspaceSize to query the required workspace. While nvpltensorContract does not strictly require a workspace for the contraction, it is still recommended to provided some small workspace (e.g., 128 MB).

Return values:
  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if operation is not supported.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).

  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.


Reduction Operations

The following functions perform tensor reductions.


nvpltensorCreateReduction()

nvpltensorStatus_t nvpltensorCreateReduction(const nvpltensorHandle_t handle, nvpltensorOperationDescriptor_t *desc, const nvpltensorTensorDescriptor_t descA, const int32_t modeA[], nvpltensorOperator_t opA, const nvpltensorTensorDescriptor_t descC, const int32_t modeC[], nvpltensorOperator_t opC, const nvpltensorTensorDescriptor_t descD, const int32_t modeD[], nvpltensorOperator_t opReduce, const nvpltensorComputeDescriptor_t descCompute)

Creates a nvpltensorOperatorDescriptor_t object that encodes a tensor reduction of the form \( D = alpha * opReduce(opA(A)) + beta * opC(C) \).

For example this function enables users to reduce an entire tensor to a scalar: C[] = alpha * A[i,j,k];

This function is also able to perform partial reductions; for instance: C[i,j] = alpha * A[k,j,i]; in this case only elements along the k-mode are contracted.

The binary opReduce operator provides extra control over what kind of a reduction ought to be performed. For instance, setting opReduce to NVPLTENSOR_OP_ADD reduces element of A via a summation while NVPLTENSOR_OP_MAX would find the largest element in A.

Supported data-type combinations are:

typeA

typeB

typeC

typeCompute

NVPLTENSOR_R_32F

NVPLTENSOR_R_32F

NVPLTENSOR_R_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_R_64F

NVPLTENSOR_R_64F

NVPLTENSOR_R_64F

NVPLTENSOR_COMPUTE_DESC_64F

NVPLTENSOR_C_32F

NVPLTENSOR_C_32F

NVPLTENSOR_C_32F

NVPLTENSOR_COMPUTE_DESC_32F

NVPLTENSOR_C_64F

NVPLTENSOR_C_64F

NVPLTENSOR_C_64F

NVPLTENSOR_COMPUTE_DESC_64F

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[out] This opaque struct gets allocated and filled with the information that encodes the requested tensor reduction operation.

  • descA[in] The descriptor that holds the information about the data type, modes and strides of A.

  • modeA[in] Array with ‘nmodeA’ entries that represent the modes of A. modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to nvpltensorCreateTensorDescriptor. Modes that only appear in modeA but not in modeC are reduced (contracted).

  • opA[in] Unary operator that will be applied to each element of A before it is further processed. The original data of this tensor remains unchanged.

  • descC[in] The descriptor that holds the information about the data type, modes and strides of C.

  • modeC[in] Array with ‘nmodeC’ entries that represent the modes of C. modeC[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to nvpltensorCreateTensorDescriptor.

  • opC[in] Unary operator that will be applied to each element of C before it is further processed. The original data of this tensor remains unchanged.

  • descD[in] Must be identical to descC for now.

  • modeD[in] Must be identical to modeC for now.

  • opReduce[in] binary operator used to reduce elements of A.

  • typeCompute[in] All arithmetic is performed using this data type (i.e., it affects the accuracy and performance).

Return values:
  • NVPLTENSOR_STATUS_NOT_SUPPORTED – if operation is not supported.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).

  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.


nvpltensorReduce()

nvpltensorStatus_t nvpltensorReduce(const nvpltensorHandle_t handle, const nvpltensorPlan_t plan, const void *alpha, const void *A, const void *beta, const void *C, void *D, void *workspace, uint64_t workspaceSize)

Performs the tensor reduction that is encoded by plan (see nvpltensorCreateReduction).

Parameters:
  • alpha[in] Pointer to the memory storing scaling for A. Its data type is determined by ‘descCompute’ (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE)).

  • A[in] Pointer to the memory storing multi-mode tensor (described by descA as part of nvpltensorCreateReduction).

  • beta[in] Pointer to the memory storing scaling for C. Its data type is determined by ‘descCompute’ (see nvpltensorOperationDescriptorGetAttribute(desc, NVPLTENSOR_OPERATION_SCALAR_TYPE)).

  • C[in] Pointer to the memory storing multi-mode tensor (described by descC as part of nvpltensorCreateReduction).

  • D[out] Pointer to the memory storing multi-mode tensor (described by descD as part of nvpltensorCreateReduction).

  • workspace[out] Scratchpad memory of size (at least) workspaceSize bytes; the workspace must be aligned to 256 bytes.

  • workspaceSize[in] Please use nvpltensorEstimateWorkspaceSize() to query the required workspace.

Return values:

NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.


Generic Operation Functions

The following functions are generic and work with all the different operations.


nvpltensorDestroyOperationDescriptor()

nvpltensorStatus_t nvpltensorDestroyOperationDescriptor(nvpltensorOperationDescriptor_t desc)

Frees all resources related to the provided descriptor.

Remark

blocking, no reentrant, and thread-safe

Parameters:

desc[inout] The nvpltensorOperationDescriptor_t object that will be deallocated.

Return values:

NVPLTENSOR_STATUS_SUCCESS – on success and an error code otherwise


nvpltensorOperationDescriptorGetAttribute()

nvpltensorStatus_t nvpltensorOperationDescriptorGetAttribute(const nvpltensorHandle_t handle, nvpltensorOperationDescriptor_t desc, nvpltensorOperationDescriptorAttribute_t attr, void *buf, size_t sizeInBytes)

This function retrieves an attribute of the provided nvpltensorOperationDescriptor_t object (see nvpltensorOperationDescriptorAttribute_t).

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[in] The nvpltensorOperationDescriptor_t object whos attribute is queried.

  • attr[in] Specifies the attribute that will be retrieved.

  • buf[out] This buffer (of size sizeInBytes) will hold the requested attribute of the provided nvpltensorOperationDescriptor_t object.

  • sizeInBytes[in] Size of buf (in bytes); see nvpltensorOperationDescriptorAttribute_t for the exact size.

Return values:
  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).


nvpltensorOperationDescriptorSetAttribute()

nvpltensorStatus_t nvpltensorOperationDescriptorSetAttribute(const nvpltensorHandle_t handle, nvpltensorOperationDescriptor_t desc, nvpltensorOperationDescriptorAttribute_t attr, const void *buf, size_t sizeInBytes)

Set attribute of a nvpltensorOperationDescriptor_t object.

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[inout] Operation descriptor that will be modified.

  • attr[in] Specifies the attribute that will be set.

  • buf[in] This buffer (of size sizeInBytes) determines the value to which attr will be set.

  • sizeInBytes[in] Size of buf (in bytes).

Return values:
  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).


nvpltensorCreatePlanPreference()

nvpltensorStatus_t nvpltensorCreatePlanPreference(const nvpltensorHandle_t handle, nvpltensorPlanPreference_t *pref, nvpltensorAlgo_t algo, nvpltensorJitMode_t jitMode)

Allocates the nvpltensorPlanPreference_t, enabling users to limit the applicable kernels for a given plan/operation.

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • pref[out] Pointer to the structure holding the nvpltensorPlanPreference_t allocated by this function. See nvpltensorPlanPreference_t.

  • algo[in] Allows users to select a specific algorithm. NVPLTENSOR_ALGO_DEFAULT lets the heuristic choose the algorithm. Any value >= 0 selects a specific GEMM-like algorithm and deactivates the heuristic. If a specified algorithm is not supported NVPLTENSOR_STATUS_NOT_SUPPORTED is returned. See nvpltensorAlgo_t for additional choices.

  • jitMode[in] Determines if nvplTENSOR is allowed to use JIT-compiled kernels (leading to a longer plan-creation phase); see nvpltensorJitMode_t.


nvpltensorDestroyPlanPreference()

nvpltensorStatus_t nvpltensorDestroyPlanPreference(nvpltensorPlanPreference_t pref)

Frees all resources related to the provided preference.

Remark

blocking, no reentrant, and thread-safe

Parameters:

pref[inout] The nvpltensorPlanPreference_t object that will be deallocated.

Return values:

NVPLTENSOR_STATUS_SUCCESS – on success and an error code otherwise


nvpltensorPlanPreferenceSetAttribute()

nvpltensorStatus_t nvpltensorPlanPreferenceSetAttribute(const nvpltensorHandle_t handle, nvpltensorPlanPreference_t pref, nvpltensorPlanPreferenceAttribute_t attr, const void *buf, size_t sizeInBytes)

Set attribute of a nvpltensorPlanPreference_t object.

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • pref[inout] This opaque struct restricts the search space of viable candidates.

  • attr[in] Specifies the attribute that will be set.

  • buf[in] This buffer (of size sizeInBytes) determines the value to which attr will be set.

  • sizeInBytes[in] Size of buf (in bytes); see nvpltensorPlanPreferenceAttribute_t for the exact size.

Return values:
  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).


nvpltensorEstimateWorkspaceSize()

nvpltensorStatus_t nvpltensorEstimateWorkspaceSize(const nvpltensorHandle_t handle, const nvpltensorOperationDescriptor_t desc, const nvpltensorPlanPreference_t planPref, const nvpltensorWorksizePreference_t workspacePref, uint64_t *workspaceSizeEstimate)

Determines the required workspaceSize for the given operation encoded by desc.

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • desc[in] This opaque struct encodes the operation.

  • planPref[in] This opaque struct restricts the space of viable candidates.

  • workspacePref[in] This parameter influences the size of the workspace; see nvpltensorWorksizePreference_t for details.

  • workspaceSizeEstimate[out] The workspace size (in bytes) that is required for the given operation.

Return values:
  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).


nvpltensorCreatePlan()

nvpltensorStatus_t nvpltensorCreatePlan(const nvpltensorHandle_t handle, nvpltensorPlan_t *plan, const nvpltensorOperationDescriptor_t desc, const nvpltensorPlanPreference_t pref, uint64_t workspaceSizeLimit)

This function allocates a nvpltensorPlan_t object, selects an appropriate kernel for a given operation (encoded by desc) and prepares a plan that encodes the execution.

This function applies nvplTENSOR’s heuristic to select a candidate/kernel for a given operation (created by either nvpltensorCreateContraction, nvpltensorCreateReduction, nvpltensorCreatePermutation, nvpltensorCreateElementwiseBinary, or nvpltensorCreateElementwiseTrinary). The created plan can then be be passed to either nvpltensorContract, nvpltensorReduce, nvpltensorPermute, nvpltensorElementwiseBinaryExecute, or nvpltensorElementwiseTrinaryExecute to perform the actual operation.

Parameters:
Return values:
  • NVPLTENSOR_STATUS_SUCCESS – If a viable candidate has been found.

  • NVPLTENSOR_STATUS_NOT_SUPPORTED – If no viable candidate could be found.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

  • NVPLTENSOR_STATUS_INSUFFICIENT_WORKSPACE – if The provided workspace was insufficient.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).


nvpltensorDestroyPlan()

nvpltensorStatus_t nvpltensorDestroyPlan(nvpltensorPlan_t plan)

Frees all resources related to the provided plan.

Remark

blocking, no reentrant, and thread-safe

Parameters:

plan[inout] The nvpltensorPlan_t object that will be deallocated.

Return values:

NVPLTENSOR_STATUS_SUCCESS – on success and an error code otherwise


nvpltensorPlanGetAttribute()

nvpltensorStatus_t nvpltensorPlanGetAttribute(const nvpltensorHandle_t handle, const nvpltensorPlan_t plan, nvpltensorPlanAttribute_t attr, void *buf, size_t sizeInBytes)

Retrieves information about an already-created plan (see nvpltensorPlanAttribute_t)

Parameters:
  • plan[in] Denotes an already-created plan (e.g., via nvpltensorCreatePlan or nvpltensorCreatePlanAutotuned)

  • attr[in] Requested attribute.

  • buf[out] On successful exit: Holds the information of the requested attribute.

  • sizeInBytes[in] size of buf in bytes.

Return values:
  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).


nvpltensorPlanPreferenceSetAttribute()

nvpltensorStatus_t nvpltensorPlanPreferenceSetAttribute(const nvpltensorHandle_t handle, nvpltensorPlanPreference_t pref, nvpltensorPlanPreferenceAttribute_t attr, const void *buf, size_t sizeInBytes)

Set attribute of a nvpltensorPlanPreference_t object.

Parameters:
  • handle[in] Opaque handle holding nvplTENSOR’s library context.

  • pref[inout] This opaque struct restricts the search space of viable candidates.

  • attr[in] Specifies the attribute that will be set.

  • buf[in] This buffer (of size sizeInBytes) determines the value to which attr will be set.

  • sizeInBytes[in] Size of buf (in bytes); see nvpltensorPlanPreferenceAttribute_t for the exact size.

Return values:
  • NVPLTENSOR_STATUS_SUCCESS – The operation completed successfully.

  • NVPLTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized.

  • NVPLTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error).


Logger Functions

nvpltensorLoggerSetCallback()

nvpltensorStatus_t nvpltensorLoggerSetCallback(nvpltensorLoggerCallback_t callback)

This function sets the logging callback routine.

Parameters:

callback[in] Pointer to a callback function. Check nvpltensorLoggerCallback_t.


nvpltensorLoggerSetFile()

nvpltensorStatus_t nvpltensorLoggerSetFile(FILE *file)

This function sets the logging output file.

Parameters:

file[in] An open file with write permission.


nvpltensorLoggerOpenFile()

nvpltensorStatus_t nvpltensorLoggerOpenFile(const char *logFile)

This function opens a logging output file in the given path.

Parameters:

logFile[in] Path to the logging output file.


nvpltensorLoggerSetLevel()

nvpltensorStatus_t nvpltensorLoggerSetLevel(int32_t level)

This function sets the value of the logging level.

Parameters:

level[in]

Log level, should be one of the following:

  • 0. Off

  • 1. Errors

  • 2. Performance Trace

  • 3. Performance Hints

  • 4. Heuristics Trace

  • 5. API Trace


nvpltensorLoggerSetMask()

nvpltensorStatus_t nvpltensorLoggerSetMask(int32_t mask)

This function sets the value of the log mask.

Parameters:

mask[in]

Log mask, the bitwise OR of the following:

  • 0. Off

  • 1. Errors

  • 2. Performance Trace

  • 4. Performance Hints

  • 8. Heuristics Trace

  • 16. API Trace


nvpltensorLoggerForceDisable()

nvpltensorStatus_t nvpltensorLoggerForceDisable()

This function disables logging for the entire run.