cuTENSOR Data Types#

`cutensorDataType_t`#

enum cutensorDataType_t#

This enum specifies the data precision. It is used when the data reference does not carry the type itself (e.g void *)

Values:

enumerator CUTENSOR_R_16F#: real as a half

enumerator CUTENSOR_C_16F#: complex as a pair of half numbers

enumerator CUTENSOR_R_16BF#: real as a nv_bfloat16

enumerator CUTENSOR_C_16BF#: complex as a pair of nv_bfloat16 numbers

enumerator CUTENSOR_R_32F#: real as a float

enumerator CUTENSOR_C_32F#: complex as a pair of float numbers

enumerator CUTENSOR_R_64F#: real as a double

enumerator CUTENSOR_C_64F#: complex as a pair of double numbers

enumerator CUTENSOR_R_4I#: real as a signed 4-bit int

enumerator CUTENSOR_C_4I#: complex as a pair of signed 4-bit int numbers

enumerator CUTENSOR_R_4U#: real as a unsigned 4-bit int

enumerator CUTENSOR_C_4U#: complex as a pair of unsigned 4-bit int numbers

enumerator CUTENSOR_R_8I#: real as a signed 8-bit int

enumerator CUTENSOR_C_8I#: complex as a pair of signed 8-bit int numbers

enumerator CUTENSOR_R_8U#: real as a unsigned 8-bit int

enumerator CUTENSOR_C_8U#: complex as a pair of unsigned 8-bit int numbers

enumerator CUTENSOR_R_16I#: real as a signed 16-bit int

enumerator CUTENSOR_C_16I#: complex as a pair of signed 16-bit int numbers

enumerator CUTENSOR_R_16U#: real as a unsigned 16-bit int

enumerator CUTENSOR_C_16U#: complex as a pair of unsigned 16-bit int numbers

enumerator CUTENSOR_R_32I#: real as a signed 32-bit int

enumerator CUTENSOR_C_32I#: complex as a pair of signed 32-bit int numbers

enumerator CUTENSOR_R_32U#: real as a unsigned 32-bit int

enumerator CUTENSOR_C_32U#: complex as a pair of unsigned 32-bit int numbers

enumerator CUTENSOR_R_64I#: real as a signed 64-bit int

enumerator CUTENSOR_C_64I#: complex as a pair of signed 64-bit int numbers

enumerator CUTENSOR_R_64U#: real as a unsigned 64-bit int

enumerator CUTENSOR_C_64U#: complex as a pair of unsigned 64-bit int numbers

`cutensorComputeDescriptor_t`#

typedef struct cutensorComputeDescriptor *cutensorComputeDescriptor_t#: Opaque structure representing a compute descriptor.

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_16F: floating-point: 5-bit exponent and 10-bit mantissa (aka half)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_16BF: floating-point: 8-bit exponent and 7-bit mantissa (aka bfloat)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_TF32: floating-point: 8-bit exponent and 10-bit mantissa (aka tensor-float-32)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_3XTF32: floating-point: More precise than TF32, but less precise than float

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_32F: floating-point: 8-bit exponent and 23-bit mantissa (aka float)

CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_64F: floating-point: 11-bit exponent and 52-bit mantissa (aka double)

`cutensorHandle_t`#

typedef struct cutensorHandle *cutensorHandle_t#: Opaque structure holding cuTENSOR’s library context.

`cutensorTensorDescriptor_t`#

typedef struct cutensorTensorDescriptor *cutensorTensorDescriptor_t#: Opaque structure representing a tensor descriptor.

`cutensorOperationDescriptor_t`#

typedef struct cutensorOperationDescriptor *cutensorOperationDescriptor_t#: Opaque structure representing any type of problem descriptor (e.g., contraction, reduction, elementwise).

`cutensorOperationDescriptorAttribute_t`#

enum cutensorOperationDescriptorAttribute_t#

This enum lists all attributes of a cutensorOperationDescriptor_t that can be modified (see cutensorOperationDescriptorSetAttribute and cutensorOperationDescriptorGetAttribute).

Values:

enumerator CUTENSOR_OPERATION_DESCRIPTOR_TAG#: int32_t: enables users to distinguish two identical problems w.r.t. the sw-managed plan-cache. (default value: 0)

enumerator CUTENSOR_OPERATION_DESCRIPTOR_SCALAR_TYPE#: cutensorDataType_t: data type of the scaling factors

enumerator CUTENSOR_OPERATION_DESCRIPTOR_FLOPS#: float: number of floating-point operations necessary to perform this operation (assuming all scalar are not equal to zero, unless otherwise specified)

enumerator CUTENSOR_OPERATION_DESCRIPTOR_MOVED_BYTES#: float: minimal number of bytes transferred from/to global-memory (assuming all scalar are not equal to zero, unless otherwise specified)

enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_LEFT#: uint32_t[] (of size descOut->numModes): Each entry i holds the number of padded values that should be padded to the left of the ith dimension

enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_RIGHT#: uint32_t[] (of size descOut->numModes): Each entry i holds the number of padded values that should be padded to the right of the ith dimension

enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_VALUE#: host-side pointer to element of the same type as the output tensor: Constant padding value

`cutensorPlanPreference_t`#

typedef struct cutensorPlanPreference *cutensorPlanPreference_t#: Opaque structure that narrows down the space of applicable algorithms/variants/kernels.

`cutensorPlanPreferenceAttribute_t`#

enum cutensorPlanPreferenceAttribute_t#

This enum lists all attributes of a cutensorPlanPreference_t object that can be modified.

Values:

enumerator CUTENSOR_PLAN_PREFERENCE_AUTOTUNE_MODE#: cutensorAutotuneMode_t: Determines if recurrent executions of the plan (e.g., via cutensorContract, cutensorPermute) should autotune (i.e., try different kernels); see section “Plan Cache” for details.

enumerator CUTENSOR_PLAN_PREFERENCE_CACHE_MODE#: cutensorCacheMode_t: Determines if the corresponding algorithm/kernel for this plan should be cached and it gives fine control over what is considered a cachehit.

enumerator CUTENSOR_PLAN_PREFERENCE_INCREMENTAL_COUNT#: int32_t: Only applicable if CUTENSOR_PLAN_PREFERENCE_CACHE_MODE is set to CUTENSOR_AUTOTUNE_MODE_INCREMENTAL

enumerator CUTENSOR_PLAN_PREFERENCE_ALGO#: cutensorAlgo_t: Fixes a certain cutensorAlgo_t

enumerator CUTENSOR_PLAN_PREFERENCE_KERNEL_RANK#: int32_t: Fixes a kernel (a sub variant of an algo; e.g., kernel_rank==1 while algo == CUTENSOR_ALGO_TGETT would select the second-best GETT kernel/variant according to cuTENSOR’s performance model; kernel_rank==2 would select the third-best)

enumerator CUTENSOR_PLAN_PREFERENCE_JIT#: cutensorJitMode_t: determines if just-in-time compilation is enabled or disabled (default: CUTENSOR_JIT_MODE_NONE)

`cutensorPlan_t`#

typedef struct cutensorPlan *cutensorPlan_t#: Opaque structure representing a plan (e.g, contraction, reduction, elementwise).

`cutensorPlanAttribute_t`#

enum cutensorPlanAttribute_t#

This enum lists all attributes of a cutensorPlan_t object that can be retrieved via cutensorPlanGetAttribute.

Values:

enumerator CUTENSOR_PLAN_REQUIRED_WORKSPACE#: uint64_t: exact required workspace in bytes that is needed to execute the plan

`cutensorAutotuneMode_t`#

enum cutensorAutotuneMode_t#

This enum determines the mode w.r.t. cuTENSOR’s auto-tuning capability.

Values:

enumerator CUTENSOR_AUTOTUNE_MODE_NONE#: Indicates no autotuning (default); in this case the cache will help to reduce the plan-creation overhead. In the case of a cachehit: the cached plan will be reused, otherwise the plancache will be neglected.

enumerator CUTENSOR_AUTOTUNE_MODE_INCREMENTAL#: Indicates an incremental autotuning (i.e., each invocation of corresponding cutensorCreatePlan() will create a plan based on a different algorithm/kernel; the maximum number of kernels that will be tested is defined by the CUTENSOR_PLAN_PREFERENCE_INCREMENTAL_COUNT of cutensorPlanPreferenceAttribute_t). WARNING: If this autotuning mode is selected, then we cannot guarantee bit-wise identical results (since different algorithms could be executed).

`cutensorJitMode_t`#

enum cutensorJitMode_t#

This enum determines the mode w.r.t. cuTENSOR’s just-in-time compilation capability.

Values:

enumerator CUTENSOR_JIT_MODE_NONE#: Indicates that no kernel will be just-in-time compiled.

enumerator CUTENSOR_JIT_MODE_DEFAULT#: Indicates that the corresponding plan will try to compile a dedicated kernel for the given operation. Only supported for GPUs with compute capability >= 8.0 (Ampere or newer).

`cutensorCacheMode_t`#

enum cutensorCacheMode_t#

This enum defines what is considered a cache hit.

Values:

enumerator CUTENSOR_CACHE_MODE_NONE#: Plan will not be cached.

enumerator CUTENSOR_CACHE_MODE_PEDANTIC#: All parameters of the corresponding descriptor must be identical to the cached plan (default).

`cutensorAlgo_t`#

enum cutensorAlgo_t#

Allows users to specify the algorithm to be used for performing the desired tensor operation.

Values:

enumerator CUTENSOR_ALGO_DEFAULT_PATIENT#: More time-consuming than CUTENSOR_DEFAULT, but typically provides a more accurate kernel selection.

enumerator CUTENSOR_ALGO_GETT#: Choose the GETT algorithm (only applicable to contractions)

enumerator CUTENSOR_ALGO_TGETT#: Transpose (A or B) + GETT (only applicable to contractions)

enumerator CUTENSOR_ALGO_TTGT#: Transpose-Transpose-GEMM-Transpose (requires additional memory) (only applicable to contractions)

enumerator CUTENSOR_ALGO_DEFAULT#: A performance model chooses the appropriate algorithm and kernel.

`cutensorWorksizePreference_t`#

enum cutensorWorksizePreference_t#

This enum gives users finer control over the suggested workspace.

This enum gives users finer control over the amount of workspace that is suggested by cutensorEstimateWorkspaceSize

Values:

enumerator CUTENSOR_WORKSPACE_MIN#: Least memory requirement; at least one algorithm will be available.

enumerator CUTENSOR_WORKSPACE_DEFAULT#: Aims to attain high performance while also reducing the workspace requirement.

enumerator CUTENSOR_WORKSPACE_MAX#: Highest memory requirement; all algorithms will be available (choose this option if memory footprint is not a concern)

`cutensorOperator_t`#

enum cutensorOperator_t#

This enum captures all unary and binary element-wise operations supported by the cuTENSOR library.

Values:

enumerator CUTENSOR_OP_IDENTITY#: Identity operator (i.e., elements are not changed)

enumerator CUTENSOR_OP_SQRT#: Square root.

enumerator CUTENSOR_OP_RELU#: Rectified linear unit.

enumerator CUTENSOR_OP_CONJ#: Complex conjugate.

enumerator CUTENSOR_OP_RCP#: Reciprocal.

enumerator CUTENSOR_OP_SIGMOID#: y=1/(1+exp(-x))

enumerator CUTENSOR_OP_TANH#: y=tanh(x)

enumerator CUTENSOR_OP_EXP#: Exponentiation.

enumerator CUTENSOR_OP_LOG#: Log (base e).

enumerator CUTENSOR_OP_ABS#: Absolute value.

enumerator CUTENSOR_OP_NEG#: Negation.

enumerator CUTENSOR_OP_SIN#: Sine.

enumerator CUTENSOR_OP_COS#: Cosine.

enumerator CUTENSOR_OP_TAN#: Tangent.

enumerator CUTENSOR_OP_SINH#: Hyperbolic sine.

enumerator CUTENSOR_OP_COSH#: Hyperbolic cosine.

enumerator CUTENSOR_OP_ASIN#: Inverse sine.

enumerator CUTENSOR_OP_ACOS#: Inverse cosine.

enumerator CUTENSOR_OP_ATAN#: Inverse tangent.

enumerator CUTENSOR_OP_ASINH#: Inverse hyperbolic sine.

enumerator CUTENSOR_OP_ACOSH#: Inverse hyperbolic cosine.

enumerator CUTENSOR_OP_ATANH#: Inverse hyperbolic tangent.

enumerator CUTENSOR_OP_CEIL#: Ceiling.

enumerator CUTENSOR_OP_FLOOR#: Floor.

enumerator CUTENSOR_OP_MISH#: Mish y=x*tanh(softplus(x)).

enumerator CUTENSOR_OP_SWISH#: Swish y=x*sigmoid(x).

enumerator CUTENSOR_OP_SOFT_PLUS#: Softplus y=log(exp(x)+1).

enumerator CUTENSOR_OP_SOFT_SIGN#: Softsign y=x/(abs(x)+1).

enumerator CUTENSOR_OP_ADD#: Addition of two elements.

enumerator CUTENSOR_OP_MUL#: Multiplication of two elements.

enumerator CUTENSOR_OP_MAX#: Maximum of two elements.

enumerator CUTENSOR_OP_MIN#: Minimum of two elements.

enumerator CUTENSOR_OP_UNKNOWN#: reserved for internal use only

`cutensorStatus_t`#

enum cutensorStatus_t#

cuTENSOR status type returns

The type is used for function status returns. All cuTENSOR library functions return their status, which can have the following values.

Values:

enumerator CUTENSOR_STATUS_SUCCESS#: The operation completed successfully.

enumerator CUTENSOR_STATUS_NOT_INITIALIZED#: The opaque data structure was not initialized.

enumerator CUTENSOR_STATUS_ALLOC_FAILED#: Resource allocation failed inside the cuTENSOR library.

enumerator CUTENSOR_STATUS_INVALID_VALUE#: An unsupported value or parameter was passed to the function (indicates an user error).

enumerator CUTENSOR_STATUS_ARCH_MISMATCH#: Indicates that the device is either not ready, or the target architecture is not supported.

enumerator CUTENSOR_STATUS_MAPPING_ERROR#: An access to GPU memory space failed, which is usually caused by a failure to bind a texture.

enumerator CUTENSOR_STATUS_EXECUTION_FAILED#: The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.

enumerator CUTENSOR_STATUS_INTERNAL_ERROR#: An internal cuTENSOR error has occurred.

enumerator CUTENSOR_STATUS_NOT_SUPPORTED#: The requested operation is not supported.

enumerator CUTENSOR_STATUS_LICENSE_ERROR#: The functionality requested requires some license and an error was detected when trying to check the current licensing.

enumerator CUTENSOR_STATUS_CUBLAS_ERROR#: A call to CUBLAS did not succeed.

enumerator CUTENSOR_STATUS_CUDA_ERROR#: Some unknown CUDA error has occurred.

enumerator CUTENSOR_STATUS_INSUFFICIENT_WORKSPACE#: The provided workspace was insufficient.

enumerator CUTENSOR_STATUS_INSUFFICIENT_DRIVER#: Indicates that the driver version is insufficient.

enumerator CUTENSOR_STATUS_IO_ERROR#: Indicates an error related to file I/O.

`cudaDataType_t`#

enum cudaDataType_t#

cudaDataType_t is an enumeration of the types supported by CUDA libraries. cuTENSOR supports real FP16, BF16, FP32 and FP64 as well as complex FP32 and FP64 input types.

Values:

enumerator CUDA_R_16F#: 16-bit real half precision floating-point type

enumerator CUDA_R_16BF#: 16-bit real BF16 floating-point type

enumerator CUDA_R_32F#: 32-bit real single precision floating-point type

enumerator CUDA_C_32F#: 32-bit complex single precision floating-point type (represented as pair of real and imaginary part)

enumerator CUDA_R_64F#: 64-bit real double precision floating-point type

enumerator CUDA_C_64F#: 64-bit complex double precision floating-point type (represented as pair of real and imaginary part)

`cutensorLoggerCallback_t`#

typedef void (*cutensorLoggerCallback_t)(int32_t logLevel, const char *functionName, const char *message)#: A function pointer type for logging.

cuTENSOR Data Types#

cutensorDataType_t#

cutensorComputeDescriptor_t#

cutensorHandle_t#

cutensorTensorDescriptor_t#

cutensorOperationDescriptor_t#

cutensorOperationDescriptorAttribute_t#

cutensorPlanPreference_t#

cutensorPlanPreferenceAttribute_t#

cutensorPlan_t#

cutensorPlanAttribute_t#

cutensorAutotuneMode_t#

cutensorJitMode_t#

cutensorCacheMode_t#

cutensorAlgo_t#

cutensorWorksizePreference_t#

cutensorOperator_t#

cutensorStatus_t#

cudaDataType_t#

cutensorLoggerCallback_t#

`cutensorDataType_t`#

`cutensorComputeDescriptor_t`#

`cutensorHandle_t`#

`cutensorTensorDescriptor_t`#

`cutensorOperationDescriptor_t`#

`cutensorOperationDescriptorAttribute_t`#

`cutensorPlanPreference_t`#

`cutensorPlanPreferenceAttribute_t`#

`cutensorPlan_t`#

`cutensorPlanAttribute_t`#

`cutensorAutotuneMode_t`#

`cutensorJitMode_t`#

`cutensorCacheMode_t`#

`cutensorAlgo_t`#

`cutensorWorksizePreference_t`#

`cutensorOperator_t`#

`cutensorStatus_t`#

`cudaDataType_t`#

`cutensorLoggerCallback_t`#