cuTENSOR Functions¶
Helper Functions¶
The helper functions initialize cuTENSOR, create tensor descriptors, check error codes, and retrieve library and CUDA runtime versions.
cutensorInit()¶
- 
cutensorStatus_t cutensorInit(cutensorHandle_t *handle)¶
- Remark - blocking, no reentrant, and thread-safe - Brief
- Initializes the cuTENSOR library 
- Details
- The device associated with a particular cuTENSOR handle is assumed to remain unchanged after the cutensorInit() call. In order for the cuTENSOR library to use a different device, the application must set the new device to be used by calling cudaSetDevice() and then create another cuTENSOR handle, which will be associated with the new device, by calling cutensorInit(). 
- Returns
- CUTENSOR_STATUS_SUCCESS on success and an error code otherwise 
 - Parameters:
- handle – [out] Pointer to cutensorHandle_t 
 
cutensorInitTensorDescriptor()¶
- 
cutensorStatus_t cutensorInitTensorDescriptor(const cutensorHandle_t *handle, cutensorTensorDescriptor_t *desc, const uint32_t numModes, const int64_t extent[], const int64_t stride[], cudaDataType_t dataType, cutensorOperator_t unaryOp)¶
- Remark - non-blocking, no reentrant, and thread-safe - Brief
- Initializes a tensor descriptor 
- Precondition
- extent and stride arrays must each contain at least sizeof(int64_t) * numModes bytes 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- desc – [out] Pointer to the address where the allocated tensor descriptor object is stored. 
- numModes – [in] Number of modes. 
- extent – [in] Extent of each mode (must be larger than zero). 
- stride – [in] stride[i] denotes the displacement (stride) between two consecutive elements in the ith-mode. If stride is NULL, a packed generalized column-major memory layout is assumed (i.e., the strides increase monotonically from left to right). Each stride must be larger than zero; to be precise, a stride of zero can be achieved by omitting this mode entirely; for instance instead of writing C[a,b] = A[b,a] with strideA(a) = 0, you can write C[a,b] = A[b] directly; cuTENSOR will then automatically infer that the a-mode in A should be broadcasted). 
- dataType – [in] Data type of the stored entries. 
- unaryOp – [in] Unary operator that will be applied to each element of the corresponding tensor in a lazy fashion (i.e., the algorithm uses this tensor as its operand only once). The original data of this tensor remains unchanged. 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_NOT_SUPPORTED – if the requested descriptor is not supported (e.g., due to non-supported data type). 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
 
 
cutensorGetAlignmentRequirement()¶
- 
cutensorStatus_t cutensorGetAlignmentRequirement(const cutensorHandle_t *handle, const void *ptr, const cutensorTensorDescriptor_t *desc, uint32_t *alignmentRequirement)¶
- Brief
- Computes the minimal alignment requirement for a given pointer and descriptor 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- ptr – [in] Raw pointer to the data of the respective tensor. 
- desc – [in] Tensor descriptor for ptr. 
- alignmentRequirement – [out] Largest alignment requirement that ptr can fulfill (in bytes). 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
 
 
cutensorGetErrorString()¶
- 
const char *cutensorGetErrorString(const cutensorStatus_t error)¶
- Remark - non-blocking, no reentrant, and thread-safe - Brief
- Returns the description string for an error code 
- Returns
- the error string 
 - Parameters:
- error – [in] Error code to convert to string. 
 
cutensorGetVersion()¶
- 
size_t cutensorGetVersion()¶
- Brief
- Returns Version number of the CUTENSOR library 
 
cutensorGetCudartVersion()¶
- 
size_t cutensorGetCudartVersion()¶
- Brief
- Returns version number of the CUDA runtime that cuTENSOR was compiled against 
- Details
- Can be compared against the CUDA runtime version from cudaRuntimeGetVersion(). 
 
Element-wise Operations¶
The following functions perform element-wise operations between tensors.
cutensorElementwiseTrinary()¶
- 
cutensorStatus_t cutensorElementwiseTrinary(const cutensorHandle_t *handle, const void *alpha, const void *A, const cutensorTensorDescriptor_t *descA, const int32_t modeA[], const void *beta, const void *B, const cutensorTensorDescriptor_t *descB, const int32_t modeB[], const void *gamma, const void *C, const cutensorTensorDescriptor_t *descC, const int32_t modeC[], void *D, const cutensorTensorDescriptor_t *descD, const int32_t modeD[], cutensorOperator_t opAB, cutensorOperator_t opABC, cudaDataType_t typeScalar, const cudaStream_t stream)¶
- Where - A,B,C,D are multi-mode tensors (of arbitrary data types). 
- \(\Pi^A, \Pi^B, \Pi^C \) are permutation operators that permute the modes of A, B, and C respectively. 
- \(\Psi_{A},\Psi_{B},\Psi_{C}\) are unary element-wise operators (e.g., IDENTITY, CONJUGATE). 
- \(\Phi_{ABC}, \Phi_{AB}\) are binary element-wise operators (e.g., ADD, MUL, MAX, MIN). 
 - Brief
- Element-wise tensor operation with three inputs 
- Details
- This function performs a element-wise tensor operation of the form: \[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{ABC}(\Phi_{AB}(\alpha \Psi_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \beta \Psi_B(B_{\Pi^B(i_0,i_1,...,i_n)})), \gamma \Psi_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]
 - Notice that the broadcasting (of a mode) can be achieved by simply omitting that mode from the respective tensor. - Moreover, modes may appear in any order, giving users a greater flexibility. The only restrictions are: - modes that appear in A or B must also appear in the output tensor; a mode that only appears in the input would be contracted and such an operation would be covered by either cutensorContraction or cutensorReduction. 
- each mode may appear in each tensor at most once. 
 - Input tensors may be read even if the value of the corresponding scalar is zero. - Examples: - \( D_{a,b,c,d} = A_{b,d,a,c}\) 
- \( D_{a,b,c,d} = 2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a}\) 
- \( D_{a,b,c,d} = 2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a} + C_{a,b,c,d}\) 
- \( D_{a,b,c,d} = min((2.2 * A_{b,d,a,c} + 1.3 * B_{c,b,d,a}), C_{a,b,c,d})\) 
 - Supported data-type combinations are: - typeA - typeB - typeC - typeScalar - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUDA_R_32F - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUDA_R_64F - CUDA_R_64F - CUDA_R_64F - CUDA_R_64F - CUDA_C_32F - CUDA_C_32F - CUDA_C_32F - CUDA_C_32F - CUDA_C_64F - CUDA_C_64F - CUDA_C_64F - CUDA_C_64F - CUDA_R_32F - CUDA_R_32F - CUDA_R_16F - CUDA_R_32F - CUDA_R_64F - CUDA_R_64F - CUDA_R_32F - CUDA_R_64F - CUDA_C_64F - CUDA_C_64F - CUDA_C_32F - CUDA_C_64F - Remark - calls asynchronous functions, no reentrant, and thread-safe - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- alpha – [in] Scaling factor for A (see equation above) of the type typeScalar. Pointer to the host memory. If alpha is zero, A is not read and the corresponding unary operator is not applied. 
- A – [in] Multi-mode tensor of type typeA with nmodeA modes. Pointer to the GPU-accessible memory. 
- descA – [in] A descriptor that holds the information about the data type, modes, and strides of A. 
- modeA – [in] Array (in host memory) of size descA->numModes that holds the names of the modes of A (e.g., if A_{a,b,c} => modeA = {‘a’,’b’,’c’}). The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor. 
- beta – [in] Scaling factor for B (see equation above) of the type typeScalar. Pointer to the host memory. If beta is zero, B is not read and the corresponding unary operator is not applied. 
- B – [in] Multi-mode tensor of type typeB with nmodeB many modes. Pointer to the GPU-accessible memory. 
- descB – [in] The B descriptor that holds information about the data type, modes, and strides of B. 
- modeB – [in] Array (in host memory) of size descB->numModes that holds the names of the modes of B. modeB[i] corresponds to extent[i] and stride[i] of the cutensorInitTensorDescriptor 
- gamma – [in] Scaling factor for C (see equation above) of type typeScalar. Pointer to the host memory. If gamma is zero, C is not read and the corresponding unary operator is not applied. 
- C – [in] Multi-mode tensor of type typeC with nmodeC many modes. Pointer to the GPU-accessible memory. 
- descC – [in] The C descriptor that holds information about the data type, modes, and strides of C. 
- modeC – [in] Array (in host memory) of size descC->numModes that holds the names of the modes of C. The modeC[i] corresponds to extent[i] and stride[i] of the cutensorInitTensorDescriptor. 
- D – [out] Multi-mode output tensor of type typeC with nmodeC modes that are ordered according to modeD. Pointer to the GPU-accessible memory. Notice that D may alias any input tensor if they share the same memory layout (i.e., same tensor descriptor). 
- descD – [in] The D descriptor that holds information about the data type, modes, and strides of D. Notice that we currently request descD and descC to be identical. 
- modeD – [in] Array (in host memory) of size descD->numModes that holds the names of the modes of D. The modeD[i] corresponds to extent[i] and stride[i] of the cutensorInitTensorDescriptor. 
- opAB – [in] Element-wise binary operator (see \(\Phi_{AB}\) above). 
- opABC – [in] Element-wise binary operator (see \(\Phi_{ABC}\) above). 
- typeScalar – [in] Denotes the data type for the scalars alpha, beta, and gamma. Moreover, typeScalar determines the data type that is used throughout the computation. 
- stream – [in] The cuda stream. 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
- CUTENSOR_STATUS_ARCH_MISMATCH – if the device is either not ready, or the target architecture is not supported. 
 
 
cutensorElementwiseBinary()¶
- 
cutensorStatus_t cutensorElementwiseBinary(const cutensorHandle_t *handle, const void *alpha, const void *A, const cutensorTensorDescriptor_t *descA, const int32_t modeA[], const void *gamma, const void *C, const cutensorTensorDescriptor_t *descC, const int32_t modeC[], void *D, const cutensorTensorDescriptor_t *descD, const int32_t modeD[], cutensorOperator_t opAC, cudaDataType_t typeScalar, cudaStream_t stream)¶
- See cutensorElementwiseTrinary() for details. - Brief
- Element-wise tensor operation for two input tensors 
- Details
- This function performs a element-wise tensor operation of the form: \[ D_{\Pi^C(i_0,i_1,...,i_n)} = \Phi_{AC}(\alpha \Psi_A(A_{\Pi^A(i_0,i_1,...,i_n)}), \gamma \Psi_C(C_{\Pi^C(i_0,i_1,...,i_n)})) \]
 - Remark - calls asynchronous functions, no reentrant, and thread-safe - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- alpha – [in] Scaling factor for A (see equation above) of the type typeScalar. Pointer to the host memory. If alpha is zero, A is not read and the corresponding unary operator is not applied. 
- A – [in] Multi-mode tensor of type typeA with nmodeA modes. Pointer to the GPU-accessible memory. 
- descA – [in] A descriptor that holds the information about the data type, modes, and strides of A. 
- modeA – [in] Array (in host memory) of size descA->numModes that holds the names of the modes of A (e.g., if A_{a,b,c} => modeA = {‘a’,’b’,’c’}). The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor. 
- gamma – [in] Scaling factor for C (see equation above) of type typeScalar. Pointer to the host memory. If gamma is zero, C is not read and the corresponding unary operator is not applied. 
- C – [in] Multi-mode tensor of type typeC with nmodeC many modes. Pointer to the GPU-accessible memory. 
- descC – [in] The C descriptor that holds information about the data type, modes, and strides of C. 
- modeC – [in] Array (in host memory) of size descC->numModes that holds the names of the modes of C. The modeC[i] corresponds to extent[i] and stride[i] of the cutensorInitTensorDescriptor. 
- D – [out] Multi-mode output tensor of type typeC with nmodeC modes that are ordered according to modeD. Pointer to the GPU-accessible memory. Notice that D may alias any input tensor if they share the same memory layout (i.e., same tensor descriptor). 
- descD – [in] The D descriptor that holds information about the data type, modes, and strides of D. Notice that we currently request descD and descC to be identical. 
- modeD – [in] Array (in host memory) of size descD->numModes that holds the names of the modes of D. The modeD[i] corresponds to extent[i] and stride[i] of the cutensorInitTensorDescriptor. 
- opAC – [in] Element-wise binary operator (see \(\Phi_{AC}\) above). 
- typeScalar – [in] Scalar type for the intermediate computation. 
- stream – [in] The cuda stream. 
 
- Return values:
- CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported 
- CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value 
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
 
 
cutensorPermutation()¶
- 
cutensorStatus_t cutensorPermutation(const cutensorHandle_t *handle, const void *alpha, const void *A, const cutensorTensorDescriptor_t *descA, const int32_t modeA[], void *B, const cutensorTensorDescriptor_t *descB, const int32_t modeB[], const cudaDataType_t typeScalar, const cudaStream_t stream)¶
- Consequently, this function performs an out-of-place tensor permutation and is a specialization of cutensorElementwise. - Brief
- Tensor permutation 
- Details
- This function performs an element-wise tensor operation of the form: \[ B_{\Pi^B(i_0,i_1,...,i_n)} = \alpha \Psi(A_{\Pi^A(i_0,i_1,...,i_n)}) \]
 - Where - A and B are multi-mode tensors (of arbitrary data types), 
- \(\Pi^A, \Pi^B\) are permutation operators that permute the modes of A, B respectively, 
- \(\Psi\) is an unary element-wise operators (e.g., IDENTITY, SQR, CONJUGATE), and 
- \(\Psi\) is specified in the tensor descriptor descA. 
 - Broadcasting (of a mode) can be achieved by simply omitting that mode from the respective tensor. - Modes may appear in any order. The only restrictions are: - modes that appear in A must also appear in the output tensor. 
- each mode may appear in each tensor at most once. 
 - Supported data-type combinations are: - typeA - typeB - typeScalar - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUDA_R_32F - CUDA_R_16F - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUDA_R_16F - CUDA_R_32F - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUDA_R_64F - CUDA_R_64F - CUDA_R_64F - CUDA_R_32F - CUDA_R_64F - CUDA_R_64F - CUDA_R_64F - CUDA_R_32F - CUDA_R_64F - CUDA_C_32F - CUDA_C_32F - CUDA_C_32F - CUDA_C_64F - CUDA_C_64F - CUDA_C_64F - CUDA_C_32F - CUDA_C_64F - CUDA_C_64F - CUDA_C_64F - CUDA_C_32F - CUDA_C_64F - Remark - calls asynchronous functions, no reentrant, and thread-safe - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- alpha – [in] Scaling factor for A (see equation above) of the type typeScalar. Pointer to the host memory. If alpha is zero, A is not read and the corresponding unary operator is not applied. 
- A – [in] Multi-mode tensor of type typeA with nmodeA modes. Pointer to the GPU-accessible memory. 
- descA – [in] A descriptor that holds information about the data type, modes, and strides of A. 
- modeA – [in] Array of size descA->numModes that holds the names of the modes of A (e.g., if A_{a,b,c} => modeA = {‘a’,’b’,’c’}) 
- B – [inout] Multi-mode tensor of type typeB with nmodeB modes. Pointer to the GPU-accessible memory. 
- descB – [in] A descriptor that holds information about the data type, modes, and strides of B. 
- modeB – [in] Array of size descB->numModes that holds the names of the modes of B 
- typeScalar – [in] data type of alpha 
- stream – [in] The CUDA stream. 
 
- Return values:
- CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported 
- CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value 
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
 
 
Contraction Operations¶
The following functions perform contractions between tensors.
cutensorInitContractionDescriptor()¶
- 
cutensorStatus_t cutensorInitContractionDescriptor(const cutensorHandle_t *handle, cutensorContractionDescriptor_t *desc, const cutensorTensorDescriptor_t *descA, const int32_t modeA[], const uint32_t alignmentRequirementA, const cutensorTensorDescriptor_t *descB, const int32_t modeB[], const uint32_t alignmentRequirementB, const cutensorTensorDescriptor_t *descC, const int32_t modeC[], const uint32_t alignmentRequirementC, const cutensorTensorDescriptor_t *descD, const int32_t modeD[], const uint32_t alignmentRequirementD, cutensorComputeType_t typeCompute)¶
- Brief
- Describes the tensor contraction problem of the form: \[ D = \alpha \mathcal{A} \mathcal{B} + \beta \mathcal{C} \]
- Details
- \[ \mathcal{D}_{{modes}_\mathcal{D}} \gets \alpha \mathcal{A}_{{modes}_\mathcal{A}} B_{{modes}_\mathcal{B}} + \beta \mathcal{C}_{{modes}_\mathcal{C}} \].
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- desc – [out] This opaque struct gets filled with the information that encodes the tensor contraction problem. 
- descA – [in] A descriptor that holds the information about the data type, modes and strides of A. 
- modeA – [in] Array with ‘nmodeA’ entries that represent the modes of A. The modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor. 
- alignmentRequirementA – [in] Alignment that cuTENSOR may require for A’s pointer (in bytes); you can use the helper function cutensorGetAlignmentRequirement to determine the best value for a given pointer. 
- descB – [in] The B descriptor that holds information about the data type, modes, and strides of B. 
- modeB – [in] Array with ‘nmodeB’ entries that represent the modes of B. The modeB[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor. 
- alignmentRequirementB – [in] Alignment that cuTENSOR may require for B’s pointer (in bytes); you can use the helper function cutensorGetAlignmentRequirement to determine the best value for a given pointer. 
- modeC – [in] Array with ‘nmodeC’ entries that represent the modes of C. The modeC[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor. 
- descC – [in] The C descriptor that holds information about the data type, modes, and strides of C. 
- alignmentRequirementC – [in] Alignment that cuTENSOR may require for C’s pointer (in bytes); you can use the helper function cutensorGetAlignmentRequirement to determine the best value for a given pointer. 
- modeD – [in] Array with ‘nmodeD’ entries that represent the modes of D (must be identical to modeC for now). The modeD[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor. 
- descD – [in] The D descriptor that holds information about the data type, modes, and strides of D (must be identical to descC for now). 
- alignmentRequirementD – [in] Alignment that cuTENSOR may require for D’s pointer (in bytes); you can use the helper function cutensorGetAlignmentRequirement to determine the best value for a given pointer. 
- typeCompute – [in] Datatype of for the intermediate computation of typeCompute T = A * B. 
 
- Return values:
- CUTENSOR_STATUS_NOT_SUPPORTED – if the combination of data types or operations is not supported 
- CUTENSOR_STATUS_INVALID_VALUE – if tensor dimensions or modes have an illegal value 
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully without error 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
 
 
cutensorContractionDescriptorSetAttribute()¶
- 
cutensorStatus_t cutensorContractionDescriptorSetAttribute(const cutensorHandle_t *handle, cutensorContractionDescriptor_t *desc, cutensorContractionDescriptorAttributes_t attr, const void *buf, size_t sizeInBytes)¶
- Brief
- Sett attribute for cutensorDescriptor 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- desc – [inout] Contraction descriptor that will be modified. 
- attr – [in] Specifies the attribute that will be set. 
- buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set. 
- sizeInBytes – [in] Size of buf (in bytes). 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
 
 
cutensorInitContractionFind()¶
- 
cutensorStatus_t cutensorInitContractionFind(const cutensorHandle_t *handle, cutensorContractionFind_t *find, const cutensorAlgo_t algo)¶
- Brief
- Limits the search space of viable candidates (a.k.a. algorithms) 
- Details
- This function gives the user finer control over the candidates that the subsequent call to cutensorInitContractionPlan is allowed to evaluate. 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- find – [out] 
- algo – [in] Allows users to select a specific algorithm. CUTENSOR_ALGO_DEFAULT lets the heuristic choose the algorithm. Any value >= 0 selects a specific GEMM-like algorithm and deactivates the heuristic. If a specified algorithm is not supported CUTENSOR_STATUS_NOT_SUPPORTED is returned. See cutensorAlgo_t for additional choices. 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
- CUTENSOR_STATUS_NOT_SUPPORTED – 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
 
 
cutensorContractionFindSetAttribute()¶
- 
cutensorStatus_t cutensorContractionFindSetAttribute(const cutensorHandle_t *handle, cutensorContractionFind_t *find, cutensorContractionFindAttributes_t attr, const void *buf, size_t sizeInBytes)¶
- Brief
- Set attribute for cutensorContractionFind 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- find – [inout] This opaque struct restricts the search space of viable candidates. 
- attr – [in] Specifies the attribute that will be set. 
- buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set. 
- sizeInBytes – [in] Size of buf (in bytes). 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
 
 
cutensorContractionGetWorkspaceSize()¶
- 
cutensorStatus_t cutensorContractionGetWorkspaceSize(const cutensorHandle_t *handle, const cutensorContractionDescriptor_t *desc, const cutensorContractionFind_t *find, const cutensorWorksizePreference_t pref, uint64_t *workspaceSize)¶
- Brief
- Determines the required workspaceSize for a given tensor contraction (see cutensorContraction) 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- desc – [in] This opaque struct encodes the tensor contraction problem. 
- find – [in] This opaque struct restricts the search space of viable candidates. 
- pref – [in] This parameter influences the size of the workspace; see cutensorWorksizePreference_t for details. 
- workspaceSize – [out] The workspace size (in bytes) that is required for the given tensor contraction. 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
 
 
cutensorContractionGetWorkspace()¶
Deprecated. Use cutensorContractionGetWorkspaceSize instead.
cutensorInitContractionPlan()¶
- 
cutensorStatus_t cutensorInitContractionPlan(const cutensorHandle_t *handle, cutensorContractionPlan_t *plan, const cutensorContractionDescriptor_t *desc, const cutensorContractionFind_t *find, const uint64_t workspaceSize)¶
- The plan is created for the active CUDA device. - Brief
- Initializes the contraction plan for a given tensor contraction problem 
- Details
- This function applies cuTENSOR’s heuristic to select a candidate for a given tensor contraction problem (encoded by desc). The resulting plan can be reused multiple times as long as the tensor contraction problem remains the same. 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- plan – [out] Opaque handle holding the contraction execution plan (i.e., the candidate that will be executed as well as all it’s runtime parameters for the given tensor contraction problem). 
- desc – [in] This opaque struct encodes the given tensor contraction problem. 
- find – [in] This opaque struct is used to restrict the search space of viable candidates. 
- workspaceSize – [in] Available workspace size (in bytes). 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – If a viable candidate has been found. 
- CUTENSOR_STATUS_NOT_SUPPORTED – If no viable candidate could be found. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_INSUFFICIENT_WORKSPACE – if The provided workspace was insufficient. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
 
 
cutensorContraction()¶
- 
cutensorStatus_t cutensorContraction(const cutensorHandle_t *handle, const cutensorContractionPlan_t *plan, const void *alpha, const void *A, const void *B, const void *beta, const void *C, void *D, void *workspace, uint64_t workspaceSize, cudaStream_t stream)¶
- The currently active CUDA device must match the CUDA device that was active at the time at which the plan was created. - Brief
- This routine computes the tensor contraction \[ D = alpha * A * B + beta * C \]
- Details
- \[ \mathcal{D}_{{modes}_\mathcal{D}} \gets \alpha * \mathcal{A}_{{modes}_\mathcal{A}} B_{{modes}_\mathcal{B}} + \beta \mathcal{C}_{{modes}_\mathcal{C}} \]
 - Supported data-type combinations are: - typeA - typeB - typeC - typeCompute - Tensor Core - CUDA_R_16F - CUDA_R_16F - CUDA_R_16F - CUTENSOR_COMPUTE_32F - Volta+ - CUDA_R_16BF - CUDA_R_16BF - CUDA_R_16BF - CUTENSOR_COMPUTE_32F - Ampere+ - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUTENSOR_COMPUTE_32F - No - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUTENSOR_COMPUTE_TF32 - Ampere+ - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUTENSOR_COMPUTE_16BF - Ampere+ - CUDA_R_32F - CUDA_R_32F - CUDA_R_32F - CUTENSOR_COMPUTE_16F - Volta+ - CUDA_R_64F - CUDA_R_64F - CUDA_R_64F - CUTENSOR_COMPUTE_64F - Ampere+ - CUDA_R_64F - CUDA_R_64F - CUDA_R_64F - CUTENSOR_COMPUTE_32F - No - CUDA_C_32F - CUDA_C_32F - CUDA_C_32F - CUTENSOR_COMPUTE_32F - No - CUDA_C_32F - CUDA_C_32F - CUDA_C_32F - CUTENSOR_COMPUTE_TF32 - Ampere+ - CUDA_C_64F - CUDA_C_64F - CUDA_C_64F - CUTENSOR_COMPUTE_64F - Ampere+ - CUDA_C_64F - CUDA_C_64F - CUDA_C_64F - CUTENSOR_COMPUTE_32F - No - CUDA_R_64F - CUDA_C_64F - CUDA_C_64F - CUTENSOR_COMPUTE_64F - No - CUDA_C_64F - CUDA_R_64F - CUDA_C_64F - CUTENSOR_COMPUTE_64F - No - [Example]
- See https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuTENSOR/contraction.cu for a concrete example. 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- plan – [in] Opaque handle holding the contraction execution plan. 
- alpha – [in] Scaling for A*B. Its data type is determined by ‘typeCompute’. Pointer to the host memory. 
- A – [in] Pointer to the data corresponding to A in device memory. Pointer to the GPU-accessible memory. 
- B – [in] Pointer to the data corresponding to B. Pointer to the GPU-accessible memory. 
- beta – [in] Scaling for C. Its data type is determined by ‘typeCompute’. Pointer to the host memory. 
- C – [in] Pointer to the data corresponding to C. Pointer to the GPU-accessible memory. 
- D – [out] Pointer to the data corresponding to D. Pointer to the GPU-accessible memory. 
- workspace – [out] Optional parameter that may be NULL. This pointer provides additional workspace, in device memory, to the library for additional optimizations; the workspace must be aligned to 256 bytes. 
- workspaceSize – [in] Size of the workspace array in bytes; please refer to cutensorContractionGetWorkspace() to query the required workspace. While cutensorContraction() does not strictly require a workspace for the reduction, it is still recommended to provided some small workspace (e.g., 128 MB). 
- stream – [in] The CUDA stream in which all the computation is performed. 
 
- Return values:
- CUTENSOR_STATUS_NOT_SUPPORTED – if operation is not supported. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_ARCH_MISMATCH – if the plan was created for a different device than the currently active device. 
- CUTENSOR_STATUS_INSUFFICIENT_DRIVER – if the driver is insufficient. 
- CUTENSOR_STATUS_CUDA_ERROR – if some unknown CUDA error has occurred (e.g., out of memory). 
 
 
cutensorContractionMaxAlgos()¶
- 
cutensorStatus_t cutensorContractionMaxAlgos(int32_t *maxNumAlgos)¶
- Brief
- This routine returns the maximum number of algorithms available to compute tensor contractions 
- [NOTE] Not all algorithms might be applicable to your specific problem. cutensorContraction() will return CUTENSOR_STATUS_NOT_SUPPORTED if an algorithm is not applicable.
 - Parameters:
- maxNumAlgos – [out] This value will hold the maximum number of algorithms available for cutensorContraction(). You can use the returned integer for auto-tuning purposes (i.e., iterate over all algorithms up to the returned value). 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
 
 
Reduction Operations¶
The following functions perform tensor reductions.
cutensorReduction()¶
- 
cutensorStatus_t cutensorReduction(const cutensorHandle_t *handle, const void *alpha, const void *A, const cutensorTensorDescriptor_t *descA, const int32_t modeA[], const void *beta, const void *C, const cutensorTensorDescriptor_t *descC, const int32_t modeC[], void *D, const cutensorTensorDescriptor_t *descD, const int32_t modeD[], cutensorOperator_t opReduce, cutensorComputeType_t typeCompute, void *workspace, uint64_t workspaceSize, cudaStream_t stream)¶
- This function is also able to perform partial reductions; for instance: C[i,j] = alpha * A[k,j,i]; in this case only elements along the k-mode are contracted. - Brief
- Implements a tensor reduction of the form \[ D = alpha * opReduce(opA(A)) + beta * opC(C) \]
- Details
- For example this function enables users to reduce an entire tensor to a scalar: C[] = alpha * A[i,j,k]; 
 - The binary opReduce operator provides extra control over what kind of a reduction ought to be perfromed. For instance, opReduce == CUTENSOR_OP_ADD reduces element of A via a summation while CUTENSOR_OP_MAX would find the largest element in A. - Supported data-type combinations are: - typeA - typeB - typeC - typeCompute - CUDA_R_16F- CUDA_R_16F- CUDA_R_16F- CUDA_R_16F- CUDA_R_16F- CUDA_R_16F- CUDA_R_16BF- CUDA_R_16BF- CUDA_R_16BF- CUDA_R_16BF- CUDA_R_16BF- CUDA_R_16BF- CUDA_R_32F- CUDA_R_32F- CUDA_R_32F- CUDA_R_64F- CUDA_R_64F- CUDA_R_64F- CUDA_C_32F- CUDA_C_32F- CUDA_C_32F- CUDA_C_64F- CUDA_C_64F- CUDA_C_64F- Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- alpha – [in] Scaling for A; its data type is determined by ‘typeCompute’. Pointer to the host memory. 
- A – [in] Pointer to the data corresponding to A in device memory. Pointer to the GPU-accessible memory. 
- descA – [in] A descriptor that holds the information about the data type, modes and strides of A. 
- modeA – [in] Array with ‘nmodeA’ entries that represent the modes of A. modeA[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor. Modes that only appear in modeA but not in modeC are reduced (contracted). 
- beta – [in] Scaling for C; its data type is determined by ‘typeCompute’. Pointer to the host memory. 
- C – [in] Pointer to the data corresponding to C in device memory. Pointer to the GPU-accessible memory. 
- descC – [in] A descriptor that holds the information about the data type, modes and strides of C. 
- modeC – [in] Array with ‘nmodeC’ entries that represent the modes of C. modeC[i] corresponds to extent[i] and stride[i] w.r.t. the arguments provided to cutensorInitTensorDescriptor. 
- D – [out] Pointer to the data corresponding to C in device memory. Pointer to the GPU-accessible memory. 
- descD – [in] Must be identical to descC for now. 
- modeD – [in] Must be identical to modeC for now. 
- opReduce – [in] binary operator used to reduce elements of A. 
- typeCompute – [in] All arithmetic is performed using this data type (i.e., it affects the accuracy and performance). 
- workspace – [out] Scratchpad (device) memory; the workspace must be aligned to 128 bytes. 
- workspaceSize – [in] Please use cutensorReductionGetWorkspaceSize() to query the required workspace. While lower values, including zero, are valid, they may lead to grossly suboptimal performance. 
- stream – [in] The CUDA stream in which all the computation is performed. 
 
- Return values:
- CUTENSOR_STATUS_NOT_SUPPORTED – if operation is not supported. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
 
 
cutensorReductionGetWorkspaceSize()¶
- 
cutensorStatus_t cutensorReductionGetWorkspaceSize(const cutensorHandle_t *handle, const void *A, const cutensorTensorDescriptor_t *descA, const int32_t modeA[], const void *C, const cutensorTensorDescriptor_t *descC, const int32_t modeC[], const void *D, const cutensorTensorDescriptor_t *descD, const int32_t modeD[], cutensorOperator_t opReduce, cutensorComputeType_t typeCompute, uint64_t *workspaceSize)¶
- Brief
- Determines the required workspaceSize for a given tensor reduction (see cutensorReduction) 
 - Parameters:
- handle – [in] Opaque handle holding cuTENSOR’s library context. 
- A – [in] same as in cutensorReduction 
- descA – [in] same as in cutensorReduction 
- modeA – [in] same as in cutensorReduction 
- C – [in] same as in cutensorReduction 
- descC – [in] same as in cutensorReduction 
- modeC – [in] same as in cutensorReduction 
- D – [in] same as in cutensorReduction 
- descD – [in] same as in cutensorReduction 
- modeD – [in] same as in cutensorReduction 
- opReduce – [in] same as in cutensorReduction 
- typeCompute – [in] same as in cutensorReduction 
- workspaceSize – [out] The workspace size (in bytes) that is required for the given tensor reduction. 
 
- Return values:
- CUTENSOR_STATUS_SUCCESS – The operation completed successfully. 
- CUTENSOR_STATUS_NOT_INITIALIZED – if the handle is not initialized. 
- CUTENSOR_STATUS_INVALID_VALUE – if some input data is invalid (this typically indicates an user error). 
 
 
cutensorReductionGetWorkspace()¶
Deprecated. Use cutensorReductionGetWorkspaceSize instead.
Logger Functions¶
cutensorLoggerSetCallback()¶
- 
cutensorStatus_t cutensorLoggerSetCallback(cutensorLoggerCallback_t callback)¶
- Brief
- This function sets the logging callback routine. 
 - Parameters:
- callback – [in] Pointer to a callback function. Check cutensorLoggerCallback_t. 
 
cutensorLoggerSetFile()¶
- 
cutensorStatus_t cutensorLoggerSetFile(FILE *file)¶
- Brief
- This function sets the logging output file. 
 - Parameters:
- file – [in] An open file with write permission. 
 
cutensorLoggerOpenFile()¶
- 
cutensorStatus_t cutensorLoggerOpenFile(const char *logFile)¶
- Brief
- This function opens a logging output file in the given path. 
 - Parameters:
- logFile – [in] Path to the logging output file. 
 
cutensorLoggerSetLevel()¶
- 
cutensorStatus_t cutensorLoggerSetLevel(int32_t level)¶
- Brief
- This function sets the value of the logging level. 
 - Parameters:
- level – [in] Log level, should be one of the following: 0. Off - Errors 
- Performance Trace 
- Performance Hints 
- Heuristics Trace 
- API Trace 
 
 
cutensorLoggerSetMask()¶
- 
cutensorStatus_t cutensorLoggerSetMask(int32_t mask)¶
- Brief
- This function sets the value of the log mask. 
 - Parameters:
- mask – [in] Log mask, the bitwise OR of the following: 0. Off - Errors 
- Performance Trace 
- Performance Hints 
- Heuristics Trace 
- API Trace 
 
 
cutensorLoggerForceDisable()¶
- 
cutensorStatus_t cutensorLoggerForceDisable()¶
- Brief
- This function disables logging for the entire run.