cuTENSOR Data Types#

cutensorDataType_t#

Warning

The cutensorDataType_t has been discontinued in favor of cudaDataType_t. Code that uses cutensorDataType_t will still be functional, but we recommend switching to cudaDataType_t.


Note

We recommend using the latest CUDA version to get the most out of emulation. If the selected emulated datatype is not available for the version of CUDA or the target device, cuTENSOR will automatically default to using native FP32 or FP64 compute

cutensorComputeDescriptor_t#

typedef struct cutensorComputeDescriptor *cutensorComputeDescriptor_t#

Opaque structure representing a compute descriptor.

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_16F

floating-point: 5-bit exponent and 10-bit mantissa (aka half)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_16BF

floating-point: 8-bit exponent and 7-bit mantissa (aka bfloat)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_TF32

floating-point: 8-bit exponent and 10-bit mantissa (aka tensor-float-32)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_3XTF32

floating-point: More precise than TF32, but less precise than float

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_32F

floating-point: 8-bit exponent and 23-bit mantissa (aka float)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_64F

floating-point: 11-bit exponent and 52-bit mantissa (aka double)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_9X16BF

floating-point composed of 3 bf16 values for a total of 23 mantissa bits.

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_8XINT8

floating-point based on multiple int8_t values. The number of retained mantissa bits is computed at runtime to ensure the same or better accuracy than the native floating point representation.


cutensorHandle_t#

typedef struct cutensorHandle *cutensorHandle_t#

Opaque structure holding cuTENSOR’s library context.


cutensorTensorDescriptor_t#

typedef struct cutensorTensorDescriptor *cutensorTensorDescriptor_t#

Opaque structure representing a tensor descriptor.


Beta

This type is in beta and may change in future releases.

cutensorBlockSparseTensorDescriptor_t#

typedef struct cutensorBlockSparseTensorDescriptor *cutensorBlockSparseTensorDescriptor_t#

Opaque structure representing a block-sparse tensor descriptor.


cutensorOperationDescriptor_t#

typedef struct cutensorOperationDescriptor *cutensorOperationDescriptor_t#

Opaque structure representing any type of problem descriptor (e.g., contraction, reduction, elementwise).


cutensorOperationDescriptorAttribute_t#

enum cutensorOperationDescriptorAttribute_t#

This enum lists all attributes of a cutensorOperationDescriptor_t that can be modified (see cutensorOperationDescriptorSetAttribute and cutensorOperationDescriptorGetAttribute).

Values:

enumerator CUTENSOR_OPERATION_DESCRIPTOR_TAG#

int32_t: enables users to distinguish two identical problems w.r.t. the sw-managed plan-cache. (default value: 0)

enumerator CUTENSOR_OPERATION_DESCRIPTOR_SCALAR_TYPE#

cudaDataType_t: data type of the scaling factors

enumerator CUTENSOR_OPERATION_DESCRIPTOR_FLOPS#

float: number of floating-point operations necessary to perform this operation (assuming all scalar are not equal to zero, unless otherwise specified)

enumerator CUTENSOR_OPERATION_DESCRIPTOR_MOVED_BYTES#

float: minimal number of bytes transferred from/to global-memory (assuming all scalar are not equal to zero, unless otherwise specified)

enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_LEFT#

uint32_t[] (of size descOut->numModes): Each entry i holds the number of padded values that should be padded to the left of the ith dimension

enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_RIGHT#

uint32_t[] (of size descOut->numModes): Each entry i holds the number of padded values that should be padded to the right of the ith dimension

enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_VALUE#

host-side pointer to element of the same type as the output tensor: Constant padding value


cutensorPlanPreference_t#

typedef struct cutensorPlanPreference *cutensorPlanPreference_t#

Opaque structure that narrows down the space of applicable algorithms/variants/kernels.


cutensorPlanPreferenceAttribute_t#

enum cutensorPlanPreferenceAttribute_t#

This enum lists all attributes of a cutensorPlanPreference_t object that can be modified.

Values:

enumerator CUTENSOR_PLAN_PREFERENCE_AUTOTUNE_MODE#

cutensorAutotuneMode_t: Determines if recurrent executions of the plan (e.g., via cutensorContract, cutensorPermute) should autotune (i.e., try different kernels); see section “Plan Cache” for details.

enumerator CUTENSOR_PLAN_PREFERENCE_CACHE_MODE#

cutensorCacheMode_t: Determines if the corresponding algorithm/kernel for this plan should be cached and it gives fine control over what is considered a cachehit.

enumerator CUTENSOR_PLAN_PREFERENCE_INCREMENTAL_COUNT#

int32_t: Only applicable if CUTENSOR_PLAN_PREFERENCE_CACHE_MODE is set to CUTENSOR_AUTOTUNE_MODE_INCREMENTAL

enumerator CUTENSOR_PLAN_PREFERENCE_ALGO#

cutensorAlgo_t: Fixes a certain cutensorAlgo_t

enumerator CUTENSOR_PLAN_PREFERENCE_KERNEL_RANK#

int32_t: Fixes a kernel (a sub variant of an algo; e.g., kernel_rank==1 while algo == CUTENSOR_ALGO_TGETT would select the second-best GETT kernel/variant according to cuTENSOR’s performance model; kernel_rank==2 would select the third-best)

enumerator CUTENSOR_PLAN_PREFERENCE_JIT#

cutensorJitMode_t: determines if just-in-time compilation is enabled or disabled (default: CUTENSOR_JIT_MODE_NONE)

enumerator CUTENSOR_PLAN_PREFERENCE_GPU_ARCH#

int32_t: Plan for a specific GPU architecture and not one associated with the context. Value should encode SM version via 10 * SM.major + SM.minor. Currently only SM versions 80, 90 and 100 are supported.


cutensorPlan_t#

typedef struct cutensorPlan *cutensorPlan_t#

Opaque structure representing a plan (e.g, contraction, reduction, elementwise).


cutensorPlanAttribute_t#

enum cutensorPlanAttribute_t#

This enum lists all attributes of a cutensorPlan_t object that can be retrieved via cutensorPlanGetAttribute.

Values:

enumerator CUTENSOR_PLAN_REQUIRED_WORKSPACE#

uint64_t: exact required workspace in bytes that is needed to execute the plan


cutensorAutotuneMode_t#

enum cutensorAutotuneMode_t#

This enum determines the mode w.r.t. cuTENSOR’s auto-tuning capability.

Values:

enumerator CUTENSOR_AUTOTUNE_MODE_NONE#

Indicates no autotuning (default); in this case the cache will help to reduce the plan-creation overhead. In the case of a cachehit: the cached plan will be reused, otherwise the plancache will be neglected.

enumerator CUTENSOR_AUTOTUNE_MODE_INCREMENTAL#

Indicates an incremental autotuning (i.e., each invocation of corresponding cutensorCreatePlan() will create a plan based on a different algorithm/kernel; the maximum number of kernels that will be tested is defined by the CUTENSOR_PLAN_PREFERENCE_INCREMENTAL_COUNT of cutensorPlanPreferenceAttribute_t). WARNING: If this autotuning mode is selected, then we cannot guarantee bit-wise identical results (since different algorithms could be executed).


cutensorJitMode_t#

enum cutensorJitMode_t#

This enum determines the mode w.r.t. cuTENSOR’s just-in-time compilation capability.

Values:

enumerator CUTENSOR_JIT_MODE_NONE#

Indicates that no kernel will be just-in-time compiled.

enumerator CUTENSOR_JIT_MODE_DEFAULT#

Indicates that the corresponding plan will try to compile a dedicated kernel for the given operation. Only supported for GPUs with compute capability >= 8.0 (Ampere or newer).


cutensorCacheMode_t#

enum cutensorCacheMode_t#

This enum defines what is considered a cache hit.

Values:

enumerator CUTENSOR_CACHE_MODE_NONE#

Plan will not be cached.

enumerator CUTENSOR_CACHE_MODE_PEDANTIC#

All parameters of the corresponding descriptor must be identical to the cached plan (default).


cutensorAlgo_t#

enum cutensorAlgo_t#

Allows users to specify the algorithm to be used for performing the desired tensor operation.

Values:

enumerator CUTENSOR_ALGO_DEFAULT_PATIENT#

More time-consuming than CUTENSOR_DEFAULT, but typically provides a more accurate kernel selection.

enumerator CUTENSOR_ALGO_GETT#

Choose the GETT algorithm (only applicable to contractions)

enumerator CUTENSOR_ALGO_TGETT#

Transpose (A or B) + GETT (only applicable to contractions)

enumerator CUTENSOR_ALGO_TTGT#

Transpose-Transpose-GEMM-Transpose (requires additional memory) (only applicable to contractions)

enumerator CUTENSOR_ALGO_DEFAULT#

A performance model chooses the appropriate algorithm and kernel.


cutensorWorksizePreference_t#

enum cutensorWorksizePreference_t#

This enum gives users finer control over the suggested workspace.

This enum gives users finer control over the amount of workspace that is suggested by cutensorEstimateWorkspaceSize

Values:

enumerator CUTENSOR_WORKSPACE_MIN#

Least memory requirement; at least one algorithm will be available.

enumerator CUTENSOR_WORKSPACE_DEFAULT#

Aims to attain high performance while also reducing the workspace requirement.

enumerator CUTENSOR_WORKSPACE_MAX#

Highest memory requirement; all algorithms will be available (choose this option if memory footprint is not a concern)


cutensorOperator_t#

enum cutensorOperator_t#

This enum captures all unary and binary element-wise operations supported by the cuTENSOR library.

Values:

enumerator CUTENSOR_OP_IDENTITY#

Identity operator (i.e., elements are not changed)

enumerator CUTENSOR_OP_SQRT#

Square root.

enumerator CUTENSOR_OP_RELU#

Rectified linear unit.

enumerator CUTENSOR_OP_CONJ#

Complex conjugate.

enumerator CUTENSOR_OP_RCP#

Reciprocal.

enumerator CUTENSOR_OP_SIGMOID#

y=1/(1+exp(-x))

enumerator CUTENSOR_OP_TANH#

y=tanh(x)

enumerator CUTENSOR_OP_EXP#

Exponentiation.

enumerator CUTENSOR_OP_LOG#

Log (base e).

enumerator CUTENSOR_OP_ABS#

Absolute value.

enumerator CUTENSOR_OP_NEG#

Negation.

enumerator CUTENSOR_OP_SIN#

Sine.

enumerator CUTENSOR_OP_COS#

Cosine.

enumerator CUTENSOR_OP_TAN#

Tangent.

enumerator CUTENSOR_OP_SINH#

Hyperbolic sine.

enumerator CUTENSOR_OP_COSH#

Hyperbolic cosine.

enumerator CUTENSOR_OP_ASIN#

Inverse sine.

enumerator CUTENSOR_OP_ACOS#

Inverse cosine.

enumerator CUTENSOR_OP_ATAN#

Inverse tangent.

enumerator CUTENSOR_OP_ASINH#

Inverse hyperbolic sine.

enumerator CUTENSOR_OP_ACOSH#

Inverse hyperbolic cosine.

enumerator CUTENSOR_OP_ATANH#

Inverse hyperbolic tangent.

enumerator CUTENSOR_OP_CEIL#

Ceiling.

enumerator CUTENSOR_OP_FLOOR#

Floor.

enumerator CUTENSOR_OP_MISH#

Mish y=x*tanh(softplus(x)).

enumerator CUTENSOR_OP_SWISH#

Swish y=x*sigmoid(x).

enumerator CUTENSOR_OP_SOFT_PLUS#

Softplus y=log(exp(x)+1).

enumerator CUTENSOR_OP_SOFT_SIGN#

Softsign y=x/(abs(x)+1).

enumerator CUTENSOR_OP_ADD#

Addition of two elements.

enumerator CUTENSOR_OP_MUL#

Multiplication of two elements.

enumerator CUTENSOR_OP_MAX#

Maximum of two elements.

enumerator CUTENSOR_OP_MIN#

Minimum of two elements.

enumerator CUTENSOR_OP_UNKNOWN#

reserved for internal use only


cutensorStatus_t#

enum cutensorStatus_t#

cuTENSOR status type returns

The type is used for function status returns. All cuTENSOR library functions return their status, which can have the following values.

Values:

enumerator CUTENSOR_STATUS_SUCCESS#

The operation completed successfully.

enumerator CUTENSOR_STATUS_NOT_INITIALIZED#

The opaque data structure was not initialized.

enumerator CUTENSOR_STATUS_ALLOC_FAILED#

Resource allocation failed inside the cuTENSOR library.

enumerator CUTENSOR_STATUS_INVALID_VALUE#

An unsupported value or parameter was passed to the function (indicates an user error).

enumerator CUTENSOR_STATUS_ARCH_MISMATCH#

Indicates that the device is either not ready, or the target architecture is not supported.

enumerator CUTENSOR_STATUS_MAPPING_ERROR#

An access to GPU memory space failed, which is usually caused by a failure to bind a texture.

enumerator CUTENSOR_STATUS_EXECUTION_FAILED#

The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.

enumerator CUTENSOR_STATUS_INTERNAL_ERROR#

An internal cuTENSOR error has occurred.

enumerator CUTENSOR_STATUS_NOT_SUPPORTED#

The requested operation is not supported.

enumerator CUTENSOR_STATUS_LICENSE_ERROR#

The functionality requested requires some license and an error was detected when trying to check the current licensing.

enumerator CUTENSOR_STATUS_CUBLAS_ERROR#

A call to CUBLAS did not succeed.

enumerator CUTENSOR_STATUS_CUDA_ERROR#

Some unknown CUDA error has occurred.

enumerator CUTENSOR_STATUS_INSUFFICIENT_WORKSPACE#

The provided workspace was insufficient.

enumerator CUTENSOR_STATUS_INSUFFICIENT_DRIVER#

Indicates that the driver version is insufficient.

enumerator CUTENSOR_STATUS_IO_ERROR#

Indicates an error related to file I/O.


cudaDataType_t#

enum cudaDataType_t#

cudaDataType_t is an enumeration of the types supported by CUDA libraries. cuTENSOR supports real FP16, BF16, FP32 and FP64 as well as complex FP32 and FP64 input types.

Values:

enumerator CUDA_R_16F#

16-bit real half precision floating-point type

enumerator CUDA_R_16BF#

16-bit real BF16 floating-point type

enumerator CUDA_R_32F#

32-bit real single precision floating-point type

enumerator CUDA_C_32F#

32-bit complex single precision floating-point type (represented as pair of real and imaginary part)

enumerator CUDA_R_64F#

64-bit real double precision floating-point type

enumerator CUDA_C_64F#

64-bit complex double precision floating-point type (represented as pair of real and imaginary part)


cutensorLoggerCallback_t#

typedef void (*cutensorLoggerCallback_t)(int32_t logLevel, const char *functionName, const char *message)#

A function pointer type for logging.