cuTENSOR Data Types¶
cutensorDataType_t
¶
-
enum cutensorDataType_t¶
This enum specifies the data precision. It is used when the data reference does not carry the type itself (e.g void *)
Values:
-
enumerator CUTENSOR_R_16F¶
real as a half
-
enumerator CUTENSOR_C_16F¶
complex as a pair of half numbers
-
enumerator CUTENSOR_R_16BF¶
real as a nv_bfloat16
-
enumerator CUTENSOR_C_16BF¶
complex as a pair of nv_bfloat16 numbers
-
enumerator CUTENSOR_R_32F¶
real as a float
-
enumerator CUTENSOR_C_32F¶
complex as a pair of float numbers
-
enumerator CUTENSOR_R_64F¶
real as a double
-
enumerator CUTENSOR_C_64F¶
complex as a pair of double numbers
-
enumerator CUTENSOR_R_4I¶
real as a signed 4-bit int
-
enumerator CUTENSOR_C_4I¶
complex as a pair of signed 4-bit int numbers
-
enumerator CUTENSOR_R_4U¶
real as a unsigned 4-bit int
-
enumerator CUTENSOR_C_4U¶
complex as a pair of unsigned 4-bit int numbers
-
enumerator CUTENSOR_R_8I¶
real as a signed 8-bit int
-
enumerator CUTENSOR_C_8I¶
complex as a pair of signed 8-bit int numbers
-
enumerator CUTENSOR_R_8U¶
real as a unsigned 8-bit int
-
enumerator CUTENSOR_C_8U¶
complex as a pair of unsigned 8-bit int numbers
-
enumerator CUTENSOR_R_16I¶
real as a signed 16-bit int
-
enumerator CUTENSOR_C_16I¶
complex as a pair of signed 16-bit int numbers
-
enumerator CUTENSOR_R_16U¶
real as a unsigned 16-bit int
-
enumerator CUTENSOR_C_16U¶
complex as a pair of unsigned 16-bit int numbers
-
enumerator CUTENSOR_R_32I¶
real as a signed 32-bit int
-
enumerator CUTENSOR_C_32I¶
complex as a pair of signed 32-bit int numbers
-
enumerator CUTENSOR_R_32U¶
real as a unsigned 32-bit int
-
enumerator CUTENSOR_C_32U¶
complex as a pair of unsigned 32-bit int numbers
-
enumerator CUTENSOR_R_64I¶
real as a signed 64-bit int
-
enumerator CUTENSOR_C_64I¶
complex as a pair of signed 64-bit int numbers
-
enumerator CUTENSOR_R_64U¶
real as a unsigned 64-bit int
-
enumerator CUTENSOR_C_64U¶
complex as a pair of unsigned 64-bit int numbers
-
enumerator CUTENSOR_R_16F¶
cutensorComputeDescriptor_t
¶
-
typedef struct cutensorComputeDescriptor *cutensorComputeDescriptor_t¶
Opaque structure representing a compute descriptor.
- CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_16F
floating-point: 5-bit exponent and 10-bit mantissa (aka half)
- CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_16BF
floating-point: 8-bit exponent and 7-bit mantissa (aka bfloat)
- CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_TF32
floating-point: 8-bit exponent and 10-bit mantissa (aka tensor-float-32)
- CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_3XTF32
floating-point: More precise than TF32, but less precise than float
- CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_32F
floating-point: 8-bit exponent and 23-bit mantissa (aka float)
- CUTENSOR_EXTERN const cutensorComputeDescriptor_t CUTENSOR_COMPUTE_DESC_64F
floating-point: 11-bit exponent and 52-bit mantissa (aka double)
cutensorHandle_t
¶
-
typedef struct cutensorHandle *cutensorHandle_t¶
Opaque structure holding cuTENSOR’s library context.
cutensorTensorDescriptor_t
¶
-
typedef struct cutensorTensorDescriptor *cutensorTensorDescriptor_t¶
Opaque structure representing a tensor descriptor.
cutensorOperationDescriptor_t
¶
-
typedef struct cutensorOperationDescriptor *cutensorOperationDescriptor_t¶
Opaque structure representing any type of problem descriptor (e.g., contraction, reduction, elementwise).
cutensorOperationDescriptorAttribute_t
¶
-
enum cutensorOperationDescriptorAttribute_t¶
This enum lists all attributes of a cutensorOperationDescriptor_t that can be modified (see cutensorOperationDescriptorSetAttribute and cutensorOperationDescriptorGetAttribute).
Values:
-
enumerator CUTENSOR_OPERATION_DESCRIPTOR_TAG¶
int32_t: enables users to distinguish two identical problems w.r.t. the sw-managed plan-cache. (default value: 0)
-
enumerator CUTENSOR_OPERATION_DESCRIPTOR_SCALAR_TYPE¶
cutensorDataType_t: data type of the scaling factors
-
enumerator CUTENSOR_OPERATION_DESCRIPTOR_FLOPS¶
float: number of floating-point operations necessary to perform this operation (assuming all scalar are not equal to zero, unless otherwise specified)
-
enumerator CUTENSOR_OPERATION_DESCRIPTOR_MOVED_BYTES¶
float: minimal number of bytes transferred from/to global-memory (assuming all scalar are not equal to zero, unless otherwise specified)
-
enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_LEFT¶
uint32_t[] (of size descOut->numModes): Each entry i holds the number of padded values that should be padded to the left of the ith dimension
-
enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_RIGHT¶
uint32_t[] (of size descOut->numModes): Each entry i holds the number of padded values that should be padded to the right of the ith dimension
-
enumerator CUTENSOR_OPERATION_DESCRIPTOR_PADDING_VALUE¶
host-side pointer to element of the same type as the output tensor: Constant padding value
-
enumerator CUTENSOR_OPERATION_DESCRIPTOR_TAG¶
cutensorPlanPreference_t
¶
-
typedef struct cutensorPlanPreference *cutensorPlanPreference_t¶
Opaque structure that narrows down the space of applicable algorithms/variants/kernels.
cutensorPlanPreferenceAttribute_t
¶
-
enum cutensorPlanPreferenceAttribute_t¶
This enum lists all attributes of a cutensorPlanPreference_t object that can be modified.
Values:
-
enumerator CUTENSOR_PLAN_PREFERENCE_AUTOTUNE_MODE¶
cutensorAutotuneMode_t: Determines if recurrent executions of the plan (e.g., via cutensorContract, cutensorPermute) should autotune (i.e., try different kernels); see section “Plan Cache” for details.
-
enumerator CUTENSOR_PLAN_PREFERENCE_CACHE_MODE¶
cutensorCacheMode_t: Determines if the corresponding algorithm/kernel for this plan should be cached and it gives fine control over what is considered a cachehit.
-
enumerator CUTENSOR_PLAN_PREFERENCE_INCREMENTAL_COUNT¶
int32_t: Only applicable if CUTENSOR_PLAN_PREFERENCE_CACHE_MODE is set to CUTENSOR_AUTOTUNE_MODE_INCREMENTAL
-
enumerator CUTENSOR_PLAN_PREFERENCE_ALGO¶
cutensorAlgo_t: Fixes a certain cutensorAlgo_t
-
enumerator CUTENSOR_PLAN_PREFERENCE_KERNEL_RANK¶
int32_t: Fixes a kernel (a sub variant of an algo; e.g., kernel_rank==1 while algo == CUTENSOR_ALGO_TGETT would select the second-best GETT kernel/variant according to cuTENSOR’s performance model; kernel_rank==2 would select the third-best)
-
enumerator CUTENSOR_PLAN_PREFERENCE_JIT¶
cutensorJitMode_t: determines if just-in-time compilation is enabled or disabled (default: CUTENSOR_JIT_MODE_NONE)
-
enumerator CUTENSOR_PLAN_PREFERENCE_AUTOTUNE_MODE¶
cutensorPlan_t
¶
-
typedef struct cutensorPlan *cutensorPlan_t¶
Opaque structure representing a plan (e.g, contraction, reduction, elementwise).
cutensorPlanAttribute_t
¶
-
enum cutensorPlanAttribute_t¶
This enum lists all attributes of a cutensorPlan_t object that can be retrieved via cutensorPlanGetAttribute.
Values:
-
enumerator CUTENSOR_PLAN_REQUIRED_WORKSPACE¶
uint64_t: exact required workspace in bytes that is needed to execute the plan
-
enumerator CUTENSOR_PLAN_REQUIRED_WORKSPACE¶
cutensorAutotuneMode_t
¶
-
enum cutensorAutotuneMode_t¶
This enum determines the mode w.r.t. cuTENSOR’s auto-tuning capability.
Values:
-
enumerator CUTENSOR_AUTOTUNE_MODE_NONE¶
Indicates no autotuning (default); in this case the cache will help to reduce the plan-creation overhead. In the case of a cachehit: the cached plan will be reused, otherwise the plancache will be neglected.
-
enumerator CUTENSOR_AUTOTUNE_MODE_INCREMENTAL¶
Indicates an incremental autotuning (i.e., each invocation of corresponding cutensorCreatePlan() will create a plan based on a different algorithm/kernel; the maximum number of kernels that will be tested is defined by the CUTENSOR_PLAN_PREFERENCE_INCREMENTAL_COUNT of cutensorPlanPreferenceAttribute_t). WARNING: If this autotuning mode is selected, then we cannot guarantee bit-wise identical results (since different algorithms could be executed).
-
enumerator CUTENSOR_AUTOTUNE_MODE_NONE¶
cutensorJitMode_t
¶
-
enum cutensorJitMode_t¶
This enum determines the mode w.r.t. cuTENSOR’s just-in-time compilation capability.
Values:
-
enumerator CUTENSOR_JIT_MODE_NONE¶
Indicates that no kernel will be just-in-time compiled.
-
enumerator CUTENSOR_JIT_MODE_DEFAULT¶
Indicates that the corresponding plan will try to compile a dedicated kernel for the given operation. Only supported for GPUs with compute capability >= 8.0 (Ampere or newer).
-
enumerator CUTENSOR_JIT_MODE_NONE¶
cutensorCacheMode_t
¶
-
enum cutensorCacheMode_t¶
This enum defines what is considered a cache hit.
Values:
-
enumerator CUTENSOR_CACHE_MODE_NONE¶
Plan will not be cached.
-
enumerator CUTENSOR_CACHE_MODE_PEDANTIC¶
All parameters of the corresponding descriptor must be identical to the cached plan (default).
-
enumerator CUTENSOR_CACHE_MODE_NONE¶
cutensorAlgo_t
¶
-
enum cutensorAlgo_t¶
Allows users to specify the algorithm to be used for performing the desired tensor operation.
Values:
-
enumerator CUTENSOR_ALGO_DEFAULT_PATIENT¶
More time-consuming than CUTENSOR_DEFAULT, but typically provides a more accurate kernel selection.
-
enumerator CUTENSOR_ALGO_GETT¶
Choose the GETT algorithm (only applicable to contractions)
-
enumerator CUTENSOR_ALGO_TGETT¶
Transpose (A or B) + GETT (only applicable to contractions)
-
enumerator CUTENSOR_ALGO_TTGT¶
Transpose-Transpose-GEMM-Transpose (requires additional memory) (only applicable to contractions)
-
enumerator CUTENSOR_ALGO_DEFAULT¶
A performance model chooses the appropriate algorithm and kernel.
-
enumerator CUTENSOR_ALGO_DEFAULT_PATIENT¶
cutensorWorksizePreference_t
¶
-
enum cutensorWorksizePreference_t¶
This enum gives users finer control over the suggested workspace.
This enum gives users finer control over the amount of workspace that is suggested by cutensorEstimateWorkspaceSize
Values:
-
enumerator CUTENSOR_WORKSPACE_MIN¶
Least memory requirement; at least one algorithm will be available.
-
enumerator CUTENSOR_WORKSPACE_DEFAULT¶
Aims to attain high performance while also reducing the workspace requirement.
-
enumerator CUTENSOR_WORKSPACE_MAX¶
Highest memory requirement; all algorithms will be available (choose this option if memory footprint is not a concern)
-
enumerator CUTENSOR_WORKSPACE_MIN¶
cutensorOperator_t
¶
-
enum cutensorOperator_t¶
This enum captures all unary and binary element-wise operations supported by the cuTENSOR library.
Values:
-
enumerator CUTENSOR_OP_IDENTITY¶
Identity operator (i.e., elements are not changed)
-
enumerator CUTENSOR_OP_SQRT¶
Square root.
-
enumerator CUTENSOR_OP_RELU¶
Rectified linear unit.
-
enumerator CUTENSOR_OP_CONJ¶
Complex conjugate.
-
enumerator CUTENSOR_OP_RCP¶
Reciprocal.
-
enumerator CUTENSOR_OP_SIGMOID¶
y=1/(1+exp(-x))
-
enumerator CUTENSOR_OP_TANH¶
y=tanh(x)
-
enumerator CUTENSOR_OP_EXP¶
Exponentiation.
-
enumerator CUTENSOR_OP_LOG¶
Log (base e).
-
enumerator CUTENSOR_OP_ABS¶
Absolute value.
-
enumerator CUTENSOR_OP_NEG¶
Negation.
-
enumerator CUTENSOR_OP_SIN¶
Sine.
-
enumerator CUTENSOR_OP_COS¶
Cosine.
-
enumerator CUTENSOR_OP_TAN¶
Tangent.
-
enumerator CUTENSOR_OP_SINH¶
Hyperbolic sine.
-
enumerator CUTENSOR_OP_COSH¶
Hyperbolic cosine.
-
enumerator CUTENSOR_OP_ASIN¶
Inverse sine.
-
enumerator CUTENSOR_OP_ACOS¶
Inverse cosine.
-
enumerator CUTENSOR_OP_ATAN¶
Inverse tangent.
-
enumerator CUTENSOR_OP_ASINH¶
Inverse hyperbolic sine.
-
enumerator CUTENSOR_OP_ACOSH¶
Inverse hyperbolic cosine.
-
enumerator CUTENSOR_OP_ATANH¶
Inverse hyperbolic tangent.
-
enumerator CUTENSOR_OP_CEIL¶
Ceiling.
-
enumerator CUTENSOR_OP_FLOOR¶
Floor.
-
enumerator CUTENSOR_OP_MISH¶
Mish y=x*tanh(softplus(x)).
-
enumerator CUTENSOR_OP_SWISH¶
Swish y=x*sigmoid(x).
-
enumerator CUTENSOR_OP_SOFT_PLUS¶
Softplus y=log(exp(x)+1).
-
enumerator CUTENSOR_OP_SOFT_SIGN¶
Softsign y=x/(abs(x)+1).
-
enumerator CUTENSOR_OP_ADD¶
Addition of two elements.
-
enumerator CUTENSOR_OP_MUL¶
Multiplication of two elements.
-
enumerator CUTENSOR_OP_MAX¶
Maximum of two elements.
-
enumerator CUTENSOR_OP_MIN¶
Minimum of two elements.
-
enumerator CUTENSOR_OP_UNKNOWN¶
reserved for internal use only
-
enumerator CUTENSOR_OP_IDENTITY¶
cutensorStatus_t
¶
-
enum cutensorStatus_t¶
cuTENSOR status type returns
The type is used for function status returns. All cuTENSOR library functions return their status, which can have the following values.
Values:
-
enumerator CUTENSOR_STATUS_SUCCESS¶
The operation completed successfully.
-
enumerator CUTENSOR_STATUS_NOT_INITIALIZED¶
The opaque data structure was not initialized.
-
enumerator CUTENSOR_STATUS_ALLOC_FAILED¶
Resource allocation failed inside the cuTENSOR library.
-
enumerator CUTENSOR_STATUS_INVALID_VALUE¶
An unsupported value or parameter was passed to the function (indicates an user error).
-
enumerator CUTENSOR_STATUS_ARCH_MISMATCH¶
Indicates that the device is either not ready, or the target architecture is not supported.
-
enumerator CUTENSOR_STATUS_MAPPING_ERROR¶
An access to GPU memory space failed, which is usually caused by a failure to bind a texture.
-
enumerator CUTENSOR_STATUS_EXECUTION_FAILED¶
The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.
-
enumerator CUTENSOR_STATUS_INTERNAL_ERROR¶
An internal cuTENSOR error has occurred.
-
enumerator CUTENSOR_STATUS_NOT_SUPPORTED¶
The requested operation is not supported.
-
enumerator CUTENSOR_STATUS_LICENSE_ERROR¶
The functionality requested requires some license and an error was detected when trying to check the current licensing.
-
enumerator CUTENSOR_STATUS_CUBLAS_ERROR¶
A call to CUBLAS did not succeed.
-
enumerator CUTENSOR_STATUS_CUDA_ERROR¶
Some unknown CUDA error has occurred.
-
enumerator CUTENSOR_STATUS_INSUFFICIENT_WORKSPACE¶
The provided workspace was insufficient.
-
enumerator CUTENSOR_STATUS_INSUFFICIENT_DRIVER¶
Indicates that the driver version is insufficient.
-
enumerator CUTENSOR_STATUS_IO_ERROR¶
Indicates an error related to file I/O.
-
enumerator CUTENSOR_STATUS_SUCCESS¶
cudaDataType_t
¶
-
enum cudaDataType_t¶
cudaDataType_t is an enumeration of the types supported by CUDA libraries. cuTENSOR supports real FP16, BF16, FP32 and FP64 as well as complex FP32 and FP64 input types.
Values:
-
enumerator CUDA_R_16F¶
16-bit real half precision floating-point type
-
enumerator CUDA_R_16BF¶
16-bit real BF16 floating-point type
-
enumerator CUDA_R_32F¶
32-bit real single precision floating-point type
-
enumerator CUDA_C_32F¶
32-bit complex single precision floating-point type (represented as pair of real and imaginary part)
-
enumerator CUDA_R_64F¶
64-bit real double precision floating-point type
-
enumerator CUDA_C_64F¶
64-bit complex double precision floating-point type (represented as pair of real and imaginary part)
-
enumerator CUDA_R_16F¶
cutensorLoggerCallback_t
¶
-
typedef void (*cutensorLoggerCallback_t)(int32_t logLevel, const char *functionName, const char *message)¶
A function pointer type for logging.