cuTENSOR Data Types

cutensorComputeType_t

enum cutensorComputeType_t

Brief

Encodes cuTENSOR’s compute type (see “User Guide - Accuracy Guarantees” for details).

Values:

enumerator CUTENSOR_COMPUTE_16F = (1U << 0U)

floating-point: 5-bit exponent and 10-bit mantissa (aka half)

enumerator CUTENSOR_COMPUTE_16BF = (1U << 10U)

floating-point: 8-bit exponent and 7-bit mantissa (aka bfloat)

enumerator CUTENSOR_COMPUTE_TF32 = (1U << 12U)

floating-point: 8-bit exponent and 10-bit mantissa (aka tensor-float-32)

enumerator CUTENSOR_COMPUTE_32F = (1U << 2U)

floating-point: 8-bit exponent and 23-bit mantissa (aka float)

enumerator CUTENSOR_COMPUTE_64F = (1U << 4U)

floating-point: 11-bit exponent and 52-bit mantissa (aka double)

enumerator CUTENSOR_COMPUTE_8U = (1U << 6U)

8-bit unsigned integer

enumerator CUTENSOR_COMPUTE_8I = (1U << 8U)

8-bit signed integer

enumerator CUTENSOR_COMPUTE_32U = (1U << 7U)

32-bit unsigned integer

enumerator CUTENSOR_COMPUTE_32I = (1U << 9U)

32-bit signed integer

enumerator CUTENSOR_R_MIN_16F = (1U << 0U)

DEPRECATED (real as a half), please use CUTENSOR_COMPUTE_16F instead.

enumerator CUTENSOR_C_MIN_16F = (1U << 1U)

DEPRECATED (complex as a half), please use CUTENSOR_COMPUTE_16F instead.

enumerator CUTENSOR_R_MIN_32F = (1U << 2U)

DEPRECATED (real as a float), please use CUTENSOR_COMPUTE_32F instead.

enumerator CUTENSOR_C_MIN_32F = (1U << 3U)

DEPRECATED (complex as a float), please use CUTENSOR_COMPUTE_32F instead.

enumerator CUTENSOR_R_MIN_64F = (1U << 4U)

DEPRECATED (real as a double), please use CUTENSOR_COMPUTE_64F instead.

enumerator CUTENSOR_C_MIN_64F = (1U << 5U)

DEPRECATED (complex as a double), please use CUTENSOR_COMPUTE_64F instead.

enumerator CUTENSOR_R_MIN_8U = (1U << 6U)

DEPRECATED (real as a uint8), please use CUTENSOR_COMPUTE_8U instead.

enumerator CUTENSOR_R_MIN_32U = (1U << 7U)

DEPRECATED (real as a uint32), please use CUTENSOR_COMPUTE_32U instead.

enumerator CUTENSOR_R_MIN_8I = (1U << 8U)

DEPRECATED (real as a int8), please use CUTENSOR_COMPUTE_8I instead.

enumerator CUTENSOR_R_MIN_32I = (1U << 9U)

DEPRECATED (real as a int32), please use CUTENSOR_COMPUTE_32I instead.

enumerator CUTENSOR_R_MIN_16BF = (1U << 10U)

DEPRECATED (real as a bfloat16), please use CUTENSOR_COMPUTE_16BF instead.

enumerator CUTENSOR_R_MIN_TF32 = (1U << 11U)

DEPRECATED (real as a tensorfloat32), please use CUTENSOR_COMPUTE_TF32 instead.

enumerator CUTENSOR_C_MIN_TF32 = (1U << 12U)

DEPRECATED (complex as a tensorfloat32), please use CUTENSOR_COMPUTE_TF32 instead.


cutensorHandle_t

struct cutensorHandle_t

Brief

Opaque structure holding cuTENSOR’s library context.


cutensorTensorDescriptor_t

struct cutensorTensorDescriptor_t

Brief

Opaque structure representing a tensor descriptor.


cutensorContractionDescriptor_t

struct cutensorContractionDescriptor_t

Brief

Opaque structure representing a tensor contraction descriptor.


cutensorContractionDescriptorAttributes_t

enum cutensorContractionDescriptorAttributes_t

This enum lists all attributes of a cutensorContractionContraction_t that can be modified.

Values:

enumerator CUTENSOR_CONTRACTION_DESCRIPTOR_TAG

uint32_t: enables users to distinguish two identical tensor contractions w.r.t. the sw-managed plan-cache. (default value: 0)


cutensorContractionFind_t

struct cutensorContractionFind_t

Brief

Opaque structure representing a candidate.


cutensorContractionFindAttributes_t

enum cutensorContractionFindAttributes_t

This enum lists all attributes of a cutensorContractionFind_t that can be modified.

Values:

enumerator CUTENSOR_CONTRACTION_FIND_AUTOTUNE_MODE

cutensorAutotuneMode_t: Determines if the corresponding algrithm/kernel for this plan should be cached.

enumerator CUTENSOR_CONTRACTION_FIND_CACHE_MODE

cutensorCacheMode_t: Gives fine control over what is considered a cachehit.

enumerator CUTENSOR_CONTRACTION_FIND_INCREMENTAL_COUNT

uint32_t: Only applicable if CUTENSOR_CONTRACTION_FIND_CACHE_MODE is set to CUTENSOR_AUTOTUNE_INCREMENTAL


cutensorContractionPlan_t

struct cutensorContractionPlan_t

Brief

Opaque structure representing a plan.


cutensorAutotuneMode_t

enum cutensorAutotuneMode_t

This enum is important w.r.t. cuTENSOR’s caching capability of plans.

Values:

enumerator CUTENSOR_AUTOTUNE_NONE

Indicates no autotuning (default); in this case the cache will help to reduce the plan-creation overhead. In the case of a cachehit: the cached plan will be reused, otherwise the plancache will be neglected.

enumerator CUTENSOR_AUTOTUNE_INCREMENTAL

Indicates an incremental autotuning (i.e., each invocation of corresponding cutensorInitContractionPlan() will create a plan based on a different algorithm/kernel; the maximum number of kernels that will be tested is defined by the CUTENSOR_CONTRACTION_FIND_INCREMENTAL_COUNT FindAttributes_t). WARNING: If this autotuning mode is selected, then we cannot guarantee bit-wise identical results (since different algorithms could be executed).


cutensorCacheMode_t

enum cutensorCacheMode_t

This enum defines what is considered a cache hit.

Values:

enumerator CUTENSOR_CACHE_MODE_NONE

Plan will not be cached.

enumerator CUTENSOR_CACHE_MODE_PEDANTIC

All parameters of the corresponding descriptor must be identical to the cached plan (default).


cutensorAlgo_t

enum cutensorAlgo_t

Brief

Allows users to specify the algorithm to be used for performing the tensor contraction.

Details

This enum gives users finer control over which algorithm should be executed by cutensorContraction(); values >= 0 correspond to certain sub-algorithms of GETT.

Values:

enumerator CUTENSOR_ALGO_GETT = -4

Choose the GETT algorithm.

enumerator CUTENSOR_ALGO_TGETT = -3

Transpose (A or B) + GETT.

enumerator CUTENSOR_ALGO_TTGT = -2

Transpose-Transpose-GEMM-Transpose (requires additional memory)

enumerator CUTENSOR_ALGO_DEFAULT = -1

Lets the internal heuristic choose.


cutensorWorksizePreference_t

enum cutensorWorksizePreference_t

Brief

This enum gives users finer control over the suggested workspace

Details

This enum gives users finer control over the amount of workspace that is suggested by cutensorContractionGetWorkspace

Values:

enumerator CUTENSOR_WORKSPACE_MIN = 1

At least one algorithm will be available.

enumerator CUTENSOR_WORKSPACE_RECOMMENDED = 2

The most suitable algorithm will be available.

enumerator CUTENSOR_WORKSPACE_MAX = 3

All algorithms will be available.


cutensorOperator_t

enum cutensorOperator_t

Brief

This enum captures all unary and binary element-wise operations supported by the cuTENSOR library.

Values:

enumerator CUTENSOR_OP_IDENTITY = 1

Identity operator (i.e., elements are not changed)

enumerator CUTENSOR_OP_SQRT = 2

Square root.

enumerator CUTENSOR_OP_RELU = 8

Rectified linear unit.

enumerator CUTENSOR_OP_CONJ = 9

Complex conjugate.

enumerator CUTENSOR_OP_RCP = 10

Reciprocal.

enumerator CUTENSOR_OP_SIGMOID = 11

y=1/(1+exp(-x))

enumerator CUTENSOR_OP_TANH = 12

y=tanh(x)

enumerator CUTENSOR_OP_EXP = 22

Exponentiation.

enumerator CUTENSOR_OP_LOG = 23

Log (base e).

enumerator CUTENSOR_OP_ABS = 24

Absolute value.

enumerator CUTENSOR_OP_NEG = 25

Negation.

enumerator CUTENSOR_OP_SIN = 26

Sine.

enumerator CUTENSOR_OP_COS = 27

Cosine.

enumerator CUTENSOR_OP_TAN = 28

Tangent.

enumerator CUTENSOR_OP_SINH = 29

Hyperbolic sine.

enumerator CUTENSOR_OP_COSH = 30

Hyperbolic cosine.

enumerator CUTENSOR_OP_ASIN = 31

Inverse sine.

enumerator CUTENSOR_OP_ACOS = 32

Inverse cosine.

enumerator CUTENSOR_OP_ATAN = 33

Inverse tangent.

enumerator CUTENSOR_OP_ASINH = 34

Inverse hyperbolic sine.

enumerator CUTENSOR_OP_ACOSH = 35

Inverse hyperbolic cosine.

enumerator CUTENSOR_OP_ATANH = 36

Inverse hyperbolic tangent.

enumerator CUTENSOR_OP_CEIL = 37

Ceiling.

enumerator CUTENSOR_OP_FLOOR = 38

Floor.

enumerator CUTENSOR_OP_ADD = 3

Addition of two elements.

enumerator CUTENSOR_OP_MUL = 5

Multiplication of two elements.

enumerator CUTENSOR_OP_MAX = 6

Maximum of two elements.

enumerator CUTENSOR_OP_MIN = 7

Minimum of two elements.

enumerator CUTENSOR_OP_UNKNOWN = 126

reserved for internal use only


cutensorStatus_t

enum cutensorStatus_t

Brief

cuTENSOR status type returns

Details

The type is used for function status returns. All cuTENSOR library functions return their status, which can have the following values.

Values:

enumerator CUTENSOR_STATUS_SUCCESS = 0

The operation completed successfully.

enumerator CUTENSOR_STATUS_NOT_INITIALIZED = 1

The cuTENSOR library was not initialized.

enumerator CUTENSOR_STATUS_ALLOC_FAILED = 3

Resource allocation failed inside the cuTENSOR library.

enumerator CUTENSOR_STATUS_INVALID_VALUE = 7

An unsupported value or parameter was passed to the function (indicates an user error).

enumerator CUTENSOR_STATUS_ARCH_MISMATCH = 8

Indicates that the device is either not ready, or the target architecture is not supported.

enumerator CUTENSOR_STATUS_MAPPING_ERROR = 11

An access to GPU memory space failed, which is usually caused by a failure to bind a texture.

enumerator CUTENSOR_STATUS_EXECUTION_FAILED = 13

The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.

enumerator CUTENSOR_STATUS_INTERNAL_ERROR = 14

An internal cuTENSOR error has occurred.

enumerator CUTENSOR_STATUS_NOT_SUPPORTED = 15

The requested operation is not supported.

enumerator CUTENSOR_STATUS_LICENSE_ERROR = 16

The functionality requested requires some license and an error was detected when trying to check the current licensing.

enumerator CUTENSOR_STATUS_CUBLAS_ERROR = 17

A call to CUBLAS did not succeed.

enumerator CUTENSOR_STATUS_CUDA_ERROR = 18

Some unknown CUDA error has occurred.

enumerator CUTENSOR_STATUS_INSUFFICIENT_WORKSPACE = 19

The provided workspace was insufficient.

enumerator CUTENSOR_STATUS_INSUFFICIENT_DRIVER = 20

Indicates that the driver version is insufficient.

enumerator CUTENSOR_STATUS_IO_ERROR = 21

Indicates an error related to file I/O.