cuTensorNet Data Types¶
cutensornetHandle_t
¶
-
typedef void *cutensornetHandle_t¶
-
Opaque structure holding cuTensorNet’s library context.
This handle holds the cuTensorNet library context (device properties, system information, etc.). The handle must be initialized and destroyed with cutensornetCreate() and cutensornetDestroy() functions, respectively.
cutensornetLoggerCallback_t
¶
-
typedef void (*cutensornetLoggerCallback_t)(int32_t logLevel, const char *functionName, const char *message)¶
-
A callback function pointer type for logging APIs. Use cutensornetLoggerSetCallback() to set the callback function.
- Parameters
-
logLevel – [in] the log level
functionName – [in] the name of the API that logged this message
message – [in] the log message
cutensornetStatus_t
¶
-
enum cutensornetStatus_t¶
-
cuTensorNet status type returns
The type is used for function status returns. All cuTensorNet library functions return their status, which can have the following values.
Values:
-
enumerator CUTENSORNET_STATUS_SUCCESS = 0¶
-
The operation completed successfully.
-
enumerator CUTENSORNET_STATUS_NOT_INITIALIZED = 1¶
-
The cuTensorNet library was not initialized.
-
enumerator CUTENSORNET_STATUS_ALLOC_FAILED = 3¶
-
Resource allocation failed inside the cuTensorNet library.
-
enumerator CUTENSORNET_STATUS_INVALID_VALUE = 7¶
-
An unsupported value or parameter was passed to the function (indicates a user error).
-
enumerator CUTENSORNET_STATUS_ARCH_MISMATCH = 8¶
-
The device is either not ready, or the target architecture is not supported.
-
enumerator CUTENSORNET_STATUS_MAPPING_ERROR = 11¶
-
An access to GPU memory space failed, which is usually caused by a failure to bind a texture.
-
enumerator CUTENSORNET_STATUS_EXECUTION_FAILED = 13¶
-
The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.
-
enumerator CUTENSORNET_STATUS_INTERNAL_ERROR = 14¶
-
An internal cuTensorNet error has occurred.
-
enumerator CUTENSORNET_STATUS_NOT_SUPPORTED = 15¶
-
The requested operation is not supported.
-
enumerator CUTENSORNET_STATUS_LICENSE_ERROR = 16¶
-
The functionality requested requires some license and an error was detected when trying to check the current licensing.
-
enumerator CUTENSORNET_STATUS_CUBLAS_ERROR = 17¶
-
A call to CUBLAS did not succeed.
-
enumerator CUTENSORNET_STATUS_CUDA_ERROR = 18¶
-
Some unknown CUDA error has occurred.
-
enumerator CUTENSORNET_STATUS_INSUFFICIENT_WORKSPACE = 19¶
-
The provided workspace was insufficient.
-
enumerator CUTENSORNET_STATUS_INSUFFICIENT_DRIVER = 20¶
-
The driver version is insufficient.
-
enumerator CUTENSORNET_STATUS_IO_ERROR = 21¶
-
An error occurred related to file I/O.
-
enumerator CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH = 22¶
-
The dynamically linked cuTENSOR library is incompatible.
-
enumerator CUTENSORNET_STATUS_SUCCESS = 0¶
cutensornetComputeType_t
¶
-
enum cutensornetComputeType_t¶
-
Encodes cuTensorNet’s compute type (see “User Guide - Accuracy Guarantees” for details).
Values:
-
enumerator CUTENSORNET_COMPUTE_16F = (1U << 0U)¶
-
floating-point: 5-bit exponent and 10-bit mantissa (aka half)
-
enumerator CUTENSORNET_COMPUTE_16BF = (1U << 10U)¶
-
floating-point: 8-bit exponent and 7-bit mantissa (aka bfloat)
-
enumerator CUTENSORNET_COMPUTE_TF32 = (1U << 12U)¶
-
floating-point: 8-bit exponent and 10-bit mantissa (aka tensor-float-32)
-
enumerator CUTENSORNET_COMPUTE_32F = (1U << 2U)¶
-
floating-point: 8-bit exponent and 23-bit mantissa (aka float)
-
enumerator CUTENSORNET_COMPUTE_64F = (1U << 4U)¶
-
floating-point: 11-bit exponent and 52-bit mantissa (aka double)
-
enumerator CUTENSORNET_COMPUTE_8U = (1U << 6U)¶
-
8-bit unsigned integer
-
enumerator CUTENSORNET_COMPUTE_8I = (1U << 8U)¶
-
8-bit signed integer
-
enumerator CUTENSORNET_COMPUTE_32U = (1U << 7U)¶
-
32-bit unsigned integer
-
enumerator CUTENSORNET_COMPUTE_32I = (1U << 9U)¶
-
32-bit signed integer
-
enumerator CUTENSORNET_COMPUTE_16F = (1U << 0U)¶
cutensornetContractionOptimizerConfigAttributes_t
¶
-
enum cutensornetContractionOptimizerConfigAttributes_t¶
-
This enum lists all attributes of a cutensornetContractionOptimizerConfig_t that can be modified.
Values:
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_NUM_PARTITIONS¶
-
int32_t: The network is recursively split over
num_partitions
until the size of each partition is less than or equal to the cutoff. The allowed range fornum_partitions
is [2, 30]. When the hyper-optimizer is disabled the default value is 8.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_CUTOFF_SIZE¶
-
int32_t: The network is recursively split over
num_partitions
until the size of each partition is less than or equal to this cutoff. The allowed range forcutoff_size
is [4, 50]. When the hyper-optimizer is disabled the default value is 8.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_ALGORITHM¶
-
cutensornetGraphAlgo_t: the graph algorithm to be used in graph partitioning. Choices are: CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_ALGORITHM_KWAY (default) or CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_ALGORITHM_RB.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_IMBALANCE_FACTOR¶
-
int32_t: Specifies the maximum allowed size imbalance among the partitions. Allowed range [30, 2000]. When the hyper-optimizer is disabled the default value is 200.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_NUM_ITERATIONS¶
-
int32_t: Specifies the number of iterations for the refinement algorithms at each stage of the uncoarsening process of the graph partitioner. Allowed range [1, 500]. When the hyper-optimizer is disabled the default value is 60.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_NUM_CUTS¶
-
int32_t: Specifies the number of different partitioning that the graph partitioner will compute. The final partitioning is the one that achieves the best edge-cut or communication volume. Allowed range [1, 40]. When the hyper-optimizer is disabled the default value is 10.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_RECONFIG_NUM_ITERATIONS¶
-
int32_t: Specifies the number of subtrees to be chosen for reconfiguration. A value of 0 disables reconfiguration. The default value is 500. The amount of time spent in reconfiguration, which usually dominates the pathfinder run time, is proportional to this.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_RECONFIG_NUM_LEAVES¶
-
int32_t: Specifies the maximum number of leaves in the subtree chosen for optimization in each reconfiguration iteration. The default value is 8. The default value is 500. The amount of time spent in reconfiguration, which usually dominates the pathfinder run time, is proportional to this.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_DISABLE_SLICING¶
-
int32_t: If set to 1, disables slicing regardless of memory available.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_MODEL¶
-
cutensornetMemoryModel_t: Memory model used to determine workspace size. CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_MODEL_HEURISTIC uses a simple memory model that does not require external calls. CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_MODEL_CUTENSOR (default) uses cuTENSOR to more precisely evaluate the amount of memory cuTENSOR will need for the contraction.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_FACTOR¶
-
int32_t: Percentage of the slicing memory_model allowed during the first slice-finding iteration. Allowed range [1, 100], default are: 60 when using cuTENSOR, 100 when using heuristic.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MIN_SLICES¶
-
int32_t: Minimum number of slices to produce at the first round of slicing. Default is 1.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_SLICE_FACTOR¶
-
int32_t: Factor by which to increase the total number of slice at each slicing round. Default is 2.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_HYPER_NUM_SAMPLES¶
-
int32_t: Number of hyper-optimizer random samples. Default 0 (disabled).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SIMPLIFICATION_DISABLE_DR¶
-
int32_t: If set to 1, disable deferred rank simplification.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SEED¶
-
int32_t: random seed number to be used internally in order to reproduce same path.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_NUM_PARTITIONS¶
cutensornetContractionOptimizerInfoAttributes_t
¶
-
enum cutensornetContractionOptimizerInfoAttributes_t¶
-
This enum lists all attributes of a cutensornetContractionOptimizerInfo_t that are accessible.
Values:
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_NUM_SLICES¶
-
int64_t: Total number of slices.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_NUM_SLICED_MODES¶
-
int32_t: Total number of sliced modes.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_SLICED_MODE¶
-
int32_t* slices: slices[i], with i < numSlices, corresponds to the ith sliced mode name (see modesIn w.r.t. cutensornetCreateNetworkDescriptor()).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_SLICED_EXTENT¶
-
int64_t* slices: slices[i], with i < numSlices, corresponds to the ith sliced mode extent (see modesIn w.r.t. cutensornetCreateNetworkDescriptor()).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_PATH¶
-
cutensornetContractionPath_t: Pointer to the contraction path (see cutensornetContractionPath_t).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_PHASE1_FLOP_COUNT¶
-
double: Flop count for the given network after phase 1 of pathfinding (i.e., before slicing and reconfig).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_FLOP_COUNT¶
-
double: Flop count for the given network after slicing.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_LARGEST_TENSOR¶
-
double: Size of the largest intermediate tensor in bytes.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_SLICING_OVERHEAD¶
-
double: Overhead due to slicing.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_NUM_SLICES¶
cutensornetContractionAutotunePreferenceAttributes_t
¶
-
enum cutensornetContractionAutotunePreferenceAttributes_t¶
-
This enum lists all attributes of a cutensornetContractionAutotunePreference_t that are accessible.
Values:
-
enumerator CUTENSORNET_CONTRACTION_AUTOTUNE_MAX_ITERATIONS¶
-
int32_t: Maximal number of auto-tune iterations for each pairwise contraction (default: 3).
-
enumerator CUTENSORNET_CONTRACTION_AUTOTUNE_TIME_LIMIT¶
-
float: Time limit for the auto-tuning process (default: -1, denoting unlimited). // {$nv-internal-release}
-
enumerator CUTENSORNET_CONTRACTION_AUTOTUNE_JIT¶
-
int32_t: If set to 1, cutensorContractionAutotune() generates dedicated pairwise contraction kernels for each contraction (default: 0). // {$nv-internal-release}
-
enumerator CUTENSORNET_CONTRACTION_AUTOTUNE_MAX_ITERATIONS¶
cutensornetGraphAlgo_t
¶
-
enum cutensornetGraphAlgo_t¶
-
This enum lists graph algorithms that can be set.
Values:
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_ALGORITHM_RB¶
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_ALGORITHM_KWAY¶
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_ALGORITHM_RB¶
cutensornetMemoryModel_t
¶
-
enum cutensornetMemoryModel_t¶
-
This enum lists memory models used to determine workspace size.
Values:
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_MODEL_HEURISTIC¶
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_MODEL_CUTENSOR¶
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_MODEL_HEURISTIC¶
cutensornetNetworkDescriptor_t
¶
-
typedef void *cutensornetNetworkDescriptor_t¶
-
Opaque structure holding cuTensorNet’s network descriptor.
cutensornetContractionPlan_t
¶
-
typedef void *cutensornetContractionPlan_t¶
-
Opaque structure holding cuTensorNet’s contraction plan.
cutensornetNodePair_t
¶
-
typedef struct cutensornetNodePair cutensornetNodePair_t¶
-
A pair of int32_t values (typically referring to tensor IDs inside of the network).
cutensornetContractionPath_t
¶
-
typedef struct cutensornetContractionPath cutensornetContractionPath_t¶
-
Holds information about the path (see https://numpy.org/doc/stable/reference/generated/numpy.einsum_path.html)
The provided path is interchangeably with numpy’s einsum path.
cutensornetContractionOptimizerConfig_t
¶
-
typedef void *cutensornetContractionOptimizerConfig_t¶
-
Opaque structure holding cuTensorNet’s path-finder config.
cutensornetContractionOptimizerInfo_t
¶
-
typedef void *cutensornetContractionOptimizerInfo_t¶
-
Opaque structure holding information about the optimized path and the slices (see cutensornetContractionOptimizerInfoAttributes_t).
cutensornetContractionAutotunePreference_t
¶
-
typedef void *cutensornetContractionAutotunePreference_t¶
-
Opaque structure information about the auto-tuning phase.
cudaDataType_t
¶
The type is an enumerant to specify the data precision.
It is used when the data reference does not carry the type itself (e.g void *
).
-
enum cudaDataType_t¶
-
Values:
-
enumerator CUDA_R_16F = 2¶
-
real as a half
-
enumerator CUDA_C_16F = 6¶
-
complex as a pair of half numbers
-
enumerator CUDA_R_16BF = 14¶
-
real as a nv_bfloat16
-
enumerator CUDA_C_16BF = 15¶
-
complex as a pair of nv_bfloat16 numbers
-
enumerator CUDA_R_32F = 0¶
-
real as a float
-
enumerator CUDA_C_32F = 4¶
-
complex as a pair of float numbers
-
enumerator CUDA_R_64F = 1¶
-
real as a double
-
enumerator CUDA_C_64F = 5¶
-
complex as a pair of double numbers
-
enumerator CUDA_R_4I = 16¶
-
real as a signed 4-bit int
-
enumerator CUDA_C_4I = 17¶
-
complex as a pair of signed 4-bit int numbers
-
enumerator CUDA_R_4U = 18¶
-
real as a unsigned 4-bit int
-
enumerator CUDA_C_4U = 19¶
-
complex as a pair of unsigned 4-bit int numbers
-
enumerator CUDA_R_8I = 3¶
-
real as a signed 8-bit int
-
enumerator CUDA_C_8I = 7¶
-
complex as a pair of signed 8-bit int numbers
-
enumerator CUDA_R_8U = 8¶
-
real as a unsigned 8-bit int
-
enumerator CUDA_C_8U = 9¶
-
complex as a pair of unsigned 8-bit int numbers
-
enumerator CUDA_R_16I = 20¶
-
real as a signed 16-bit int
-
enumerator CUDA_C_16I = 21¶
-
complex as a pair of signed 16-bit int numbers
-
enumerator CUDA_R_16U = 22¶
-
real as a unsigned 16-bit int
-
enumerator CUDA_C_16U = 23¶
-
complex as a pair of unsigned 16-bit int numbers
-
enumerator CUDA_R_32I = 10¶
-
real as a signed 32-bit int
-
enumerator CUDA_C_32I = 11¶
-
complex as a pair of signed 32-bit int numbers
-
enumerator CUDA_R_32U = 12¶
-
real as a unsigned 32-bit int
-
enumerator CUDA_C_32U = 13¶
-
complex as a pair of unsigned 32-bit int numbers
-
enumerator CUDA_R_64I = 24¶
-
real as a signed 64-bit int
-
enumerator CUDA_C_64I = 25¶
-
complex as a pair of signed 64-bit int numbers
-
enumerator CUDA_R_64U = 26¶
-
real as a unsigned 64-bit int
-
enumerator CUDA_C_64U = 27¶
-
complex as a pair of unsigned 64-bit int numbers
-
enumerator CUDA_R_16F = 2¶