cuTensorNet Data Types¶
cutensornetHandle_t
¶
-
typedef void *cutensornetHandle_t¶
Opaque structure holding cuTensorNet’s library context.
This handle holds the cuTensorNet library context (device properties, system information, etc.). The handle must be initialized and destroyed with cutensornetCreate() and cutensornetDestroy() functions, respectively.
cutensornetLoggerCallback_t
¶
-
typedef void (*cutensornetLoggerCallback_t)(int32_t logLevel, const char *functionName, const char *message)¶
A callback function pointer type for logging APIs. Use cutensornetLoggerSetCallback() to set the callback function.
- Parameters
logLevel – [in] the log level
functionName – [in] the name of the API that logged this message
message – [in] the log message
cutensornetLoggerCallbackData_t
¶
-
typedef void (*cutensornetLoggerCallbackData_t)(int32_t logLevel, const char *functionName, const char *message, void *userData)¶
A callback function pointer type for logging APIs. Use cutensornetLoggerSetCallbackData() to set the callback function and user data.
- Parameters
logLevel – [in] the log level
functionName – [in] the name of the API that logged this message
message – [in] the log message
userData – [in] user’s data to be used by the callback
cutensornetStatus_t
¶
-
enum cutensornetStatus_t¶
cuTensorNet status type returns
The type is used for function status returns. All cuTensorNet library functions return their status, which can have the following values.
Values:
-
enumerator CUTENSORNET_STATUS_SUCCESS = 0¶
The operation completed successfully.
-
enumerator CUTENSORNET_STATUS_NOT_INITIALIZED = 1¶
The cuTensorNet library was not initialized.
-
enumerator CUTENSORNET_STATUS_ALLOC_FAILED = 3¶
Resource allocation failed inside the cuTensorNet library.
-
enumerator CUTENSORNET_STATUS_INVALID_VALUE = 7¶
An unsupported value or parameter was passed to the function (indicates a user error).
-
enumerator CUTENSORNET_STATUS_ARCH_MISMATCH = 8¶
The device is either not ready, or the target architecture is not supported.
-
enumerator CUTENSORNET_STATUS_MAPPING_ERROR = 11¶
An access to GPU memory space failed, which is usually caused by a failure to bind a texture.
-
enumerator CUTENSORNET_STATUS_EXECUTION_FAILED = 13¶
The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.
-
enumerator CUTENSORNET_STATUS_INTERNAL_ERROR = 14¶
An internal cuTensorNet error has occurred.
-
enumerator CUTENSORNET_STATUS_NOT_SUPPORTED = 15¶
The requested operation is not supported.
-
enumerator CUTENSORNET_STATUS_LICENSE_ERROR = 16¶
The functionality requested requires some license and an error was detected when trying to check the current licensing.
-
enumerator CUTENSORNET_STATUS_CUBLAS_ERROR = 17¶
A call to CUBLAS did not succeed.
-
enumerator CUTENSORNET_STATUS_CUDA_ERROR = 18¶
Some unknown CUDA error has occurred.
-
enumerator CUTENSORNET_STATUS_INSUFFICIENT_WORKSPACE = 19¶
The provided workspace was insufficient.
-
enumerator CUTENSORNET_STATUS_INSUFFICIENT_DRIVER = 20¶
The driver version is insufficient.
-
enumerator CUTENSORNET_STATUS_IO_ERROR = 21¶
An error occurred related to file I/O.
-
enumerator CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH = 22¶
The dynamically linked cuTENSOR library is incompatible.
-
enumerator CUTENSORNET_STATUS_NO_DEVICE_ALLOCATOR = 23¶
Drawing device memory from a mempool is requested, but the mempool is not set.
-
enumerator CUTENSORNET_STATUS_ALL_HYPER_SAMPLES_FAILED = 24¶
All hyper samples failed for one or more errors please enable LOGs via export CUTENSORNET_LOG_LEVEL= > 1 for details.
-
enumerator CUTENSORNET_STATUS_CUSOLVER_ERROR = 25¶
A call to cuSOLVER did not succeed.
-
enumerator CUTENSORNET_STATUS_DEVICE_ALLOCATOR_ERROR = 26¶
Operation with the device memory pool failed.
-
enumerator CUTENSORNET_STATUS_DISTRIBUTED_FAILURE = 27¶
Distributed communication service failed.
-
enumerator CUTENSORNET_STATUS_INTERRUPTED = 28¶
Operation interrupted by user and cannot recover or complete.
-
enumerator CUTENSORNET_STATUS_SUCCESS = 0¶
cutensornetComputeType_t
¶
-
enum cutensornetComputeType_t¶
Encodes cuTensorNet’s compute type (see “User Guide - Accuracy Guarantees” for details).
Values:
-
enumerator CUTENSORNET_COMPUTE_16F = (1U << 0U)¶
floating-point: 5-bit exponent and 10-bit mantissa (aka half)
-
enumerator CUTENSORNET_COMPUTE_16BF = (1U << 10U)¶
floating-point: 8-bit exponent and 7-bit mantissa (aka bfloat)
-
enumerator CUTENSORNET_COMPUTE_TF32 = (1U << 12U)¶
floating-point: 8-bit exponent and 10-bit mantissa (aka tensor-float-32)
-
enumerator CUTENSORNET_COMPUTE_3XTF32 = (1U << 13U)¶
floating-point: More precise than TF32, but less precise than float
-
enumerator CUTENSORNET_COMPUTE_32F = (1U << 2U)¶
floating-point: 8-bit exponent and 23-bit mantissa (aka float)
-
enumerator CUTENSORNET_COMPUTE_64F = (1U << 4U)¶
floating-point: 11-bit exponent and 52-bit mantissa (aka double)
-
enumerator CUTENSORNET_COMPUTE_8U = (1U << 6U)¶
8-bit unsigned integer
-
enumerator CUTENSORNET_COMPUTE_8I = (1U << 8U)¶
8-bit signed integer
-
enumerator CUTENSORNET_COMPUTE_32U = (1U << 7U)¶
32-bit unsigned integer
-
enumerator CUTENSORNET_COMPUTE_32I = (1U << 9U)¶
32-bit signed integer
-
enumerator CUTENSORNET_COMPUTE_16F = (1U << 0U)¶
cutensornetContractionOptimizerConfigAttributes_t
¶
-
enum cutensornetContractionOptimizerConfigAttributes_t¶
This enum lists all attributes of a cutensornetContractionOptimizerConfig_t that can be modified.
Values:
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_NUM_PARTITIONS = 0¶
int32_t: The network is recursively split over
num_partitions
until the size of each partition is less than or equal to the cutoff. The allowed range fornum_partitions
is [2, 30]. When the hyper-optimizer is disabled the default value is 8.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_CUTOFF_SIZE = 1¶
int32_t: The network is recursively split over
num_partitions
until the size of each partition is less than or equal to this cutoff. The allowed range forcutoff_size
is [4, 50]. When the hyper-optimizer is disabled the default value is 8.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_ALGORITHM = 2¶
cutensornetGraphAlgo_t: the graph algorithm to be used in graph partitioning. Choices include CUTENSORNET_GRAPH_ALGO_KWAY (default) or CUTENSORNET_GRAPH_ALGO_RB.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_IMBALANCE_FACTOR = 3¶
int32_t: Specifies the maximum allowed size imbalance among the partitions. Allowed range [30, 2000]. When the hyper-optimizer is disabled the default value is 200.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_NUM_ITERATIONS = 4¶
int32_t: Specifies the number of iterations for the refinement algorithms at each stage of the uncoarsening process of the graph partitioner. Allowed range [1, 500]. When the hyper-optimizer is disabled the default value is 60.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_NUM_CUTS = 5¶
int32_t: Specifies the number of different partitioning that the graph partitioner will compute. The final partitioning is the one that achieves the best edge-cut or communication volume. Allowed range [1, 40]. When the hyper-optimizer is disabled the default value is 10.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_RECONFIG_NUM_ITERATIONS = 10¶
int32_t: Specifies the number of subtrees to be chosen for reconfiguration. A value of 0 disables reconfiguration. The default value is 500. The amount of time spent in reconfiguration, which usually dominates the pathfinder run time, is proportional to this.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_RECONFIG_NUM_LEAVES = 11¶
int32_t: Specifies the maximum number of leaves in the subtree chosen for optimization in each reconfiguration iteration. The default value is 8. The amount of time spent in reconfiguration, which usually dominates the pathfinder run time, is proportional to this.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_DISABLE_SLICING = 20¶
int32_t: If set to 1, disables slicing regardless of memory available.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_MODEL = 21¶
cutensornetMemoryModel_t: Memory model used to determine workspace size. CUTENSORNET_MEMORY_MODEL_HEURISTIC uses a simple memory model that does not require external calls. CUTENSORNET_MEMORY_MODEL_CUTENSOR (default) uses cuTENSOR to more precisely evaluate the amount of memory cuTENSOR will need for the contraction.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MEMORY_FACTOR = 22¶
int32_t: The memory limit for the first slice-finding iteration as a percentage of the workspace size. Allowed range [1, 100]. The default is 80 when using CUTENSORNET_MEMORY_MODEL_CUTENSOR for the memory model and 100 when using CUTENSORNET_MEMORY_MODEL_HEURISTIC.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_MIN_SLICES = 23¶
int32_t: Minimum number of slices to produce at the first round of slicing. Default is 1.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SLICER_SLICE_FACTOR = 24¶
int32_t: Factor by which to increase the total number of slice at each slicing round. Default is 32, must be at least 2.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_HYPER_NUM_SAMPLES = 30¶
int32_t: Number of hyper-optimizer random samples. Default 0 (disabled).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_HYPER_NUM_THREADS = 31¶
int32_t: Number of parallel hyper-optimizer threads. Default is number-of-cores / 2. When user-provided, it will be limited by the number of cores.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SIMPLIFICATION_DISABLE_DR = 40¶
int32_t: If set to 1, disable deferred rank simplification.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SEED = 60¶
int32_t: Random seed to be used internally in order to reproduce same path.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_COST_FUNCTION_OBJECTIVE = 61¶
cutensornetOptimizerCost_t: the objective function to use for finding the optimal contraction path. CUTENSORNET_OPTIMIZER_COST_FLOPS (default) find a path that minimizes FLOP count. CUTENSORNET_OPTIMIZER_COST_TIME (experimental) find a path that minimizes the estimated time. The estimated time is computed based on arithmetic intensity. CUTENSORNET_OPTIMIZER_COST_TIME_TUNED (experimental) find a path that minimizes the estimated time. The estimated time is computed based on performance heuristics of pairwise contraction measured for each architecture.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_CACHE_REUSE_NRUNS = 62¶
int32_t: Number of runs that utilize cache-reuse
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SMART_OPTION = 63¶
cutensornetSmartOption_t: enable or disable smart options.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_GRAPH_NUM_PARTITIONS = 0¶
cutensornetContractionOptimizerInfoAttributes_t
¶
-
enum cutensornetContractionOptimizerInfoAttributes_t¶
This enum lists all attributes of a cutensornetContractionOptimizerInfo_t that are accessible.
Values:
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_PATH = 0¶
cutensornetContractionPath_t: Pointer to the contraction path.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_NUM_SLICES = 10¶
int64_t: Total number of slices.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_NUM_SLICED_MODES = 11¶
int32_t: Total number of sliced modes. (get-only)
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_SLICED_MODE = 12¶
DEPRECATED int32_t* slicedModes: slicedModes[i] with i <
numSlicedModes
refers to the mode label of the i-th sliced mode (seemodesIn
w.r.t. cutensornetCreateNetworkDescriptor()). (get-only)
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_SLICED_EXTENT = 13¶
DEPRECATED int64_t* slicedExtents: slicedExtents[i] with i <
numSlicedModes
refers to the sliced extent of the i-th sliced mode (seeextentsIn
w.r.t. cutensornetCreateNetworkDescriptor()). (get-only)
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_SLICING_CONFIG = 14¶
cutensornetSlicingConfig_t*: Pointer to the slice configuration settings (number of slices, sliced modes, and sliced extents) used with the given path.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_SLICING_OVERHEAD = 15¶
double: Overhead due to slicing.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_PHASE1_FLOP_COUNT = 20¶
double: FLOP count for the given network after phase 1 of pathfinding (i.e., before slicing and reconfig).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_FLOP_COUNT = 21¶
double: FLOP count for the given network after slicing.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_EFFECTIVE_FLOPS_EST = 22¶
double: Experimental. Returns the total flop-equivalent for one pass for all slices based on the cost function. When the cost function is flops, conventional flops are returned. When a time-based cost function is chosen, effectiveFlopsEstimation = RuntimeEstimation * ops_peak.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_RUNTIME_EST = 23¶
double: Experimental. Returns the runtime estimation in [s] based on the time cost function objective for one pass for all slices.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_LARGEST_TENSOR = 24¶
double: The number of elements in the largest intermediate tensor.
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_INTERMEDIATE_MODES = 30¶
int32_t* intermediateModes: The modes in \([\text{intermediateModes}[\sum_{n=0}^{i-1}\text{numIntermediateModes}[n]], \text{intermediateModes}[\sum_{n=0}^{i}\text{numIntermediateModes}[n]])\) are the modes for the intermediate tensor
i
(so the total bytes to storeintermediateModes
is \(\text{sizeof}(\text{int32_t})*\left(\sum_n \text{numIntermediateModes}[n]\right)\)).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_NUM_INTERMEDIATE_MODES = 31¶
int32_t* numIntermediateModes: numIntermediateModes[i] with i <
numInputs
- 1 is the number of modes for the intermediate tensori
(seenumInputs
w.r.t. cutensornetCreateNetworkDescriptor()).
-
enumerator CUTENSORNET_CONTRACTION_OPTIMIZER_INFO_PATH = 0¶
cutensornetContractionAutotunePreferenceAttributes_t
¶
-
enum cutensornetContractionAutotunePreferenceAttributes_t¶
This enum lists all attributes of a cutensornetContractionAutotunePreference_t that are accessible.
Values:
-
enumerator CUTENSORNET_CONTRACTION_AUTOTUNE_MAX_ITERATIONS¶
int32_t: Maximal number of auto-tune iterations for each pairwise contraction (default: 3).
-
enumerator CUTENSORNET_CONTRACTION_AUTOTUNE_INTERMEDIATE_MODES¶
int32_t: 0=OFF, 1=ON, 2=AUTO (default). If set to 1, cutensorContractionAutotune() auto-tunes the intermediate mode order by executing one slice of the network a few times in order to determine how to achieve the best performance with cuTENSOR. If set to 2, heuristically chooses whether to auto-tune the intermediate mode order based upon network characteristics.
-
enumerator CUTENSORNET_CONTRACTION_AUTOTUNE_MAX_ITERATIONS¶
cutensornetGraphAlgo_t
¶
-
enum cutensornetGraphAlgo_t¶
This enum lists graph algorithms that can be set.
Values:
-
enumerator CUTENSORNET_GRAPH_ALGO_RB¶
-
enumerator CUTENSORNET_GRAPH_ALGO_KWAY¶
-
enumerator CUTENSORNET_GRAPH_ALGO_RB¶
cutensornetMemoryModel_t
¶
-
enum cutensornetMemoryModel_t¶
This enum lists memory models used to determine workspace size.
Values:
-
enumerator CUTENSORNET_MEMORY_MODEL_HEURISTIC¶
-
enumerator CUTENSORNET_MEMORY_MODEL_CUTENSOR¶
-
enumerator CUTENSORNET_MEMORY_MODEL_HEURISTIC¶
cutensornetOptimizerCost_t
¶
-
enum cutensornetOptimizerCost_t¶
This enum lists various cost functions to optimize with.
Values:
-
enumerator CUTENSORNET_OPTIMIZER_COST_FLOPS¶
Conventional flops (default)
-
enumerator CUTENSORNET_OPTIMIZER_COST_TIME¶
Time estimation based on arithmetic intensity (experimental). It is only available for Volta and later architectures.
-
enumerator CUTENSORNET_OPTIMIZER_COST_TIME_TUNED¶
Time estimation based on performance heuristics of pairwise contraction measured for each architecture (experimental). It is only available for Volta and later architectures.
-
enumerator CUTENSORNET_OPTIMIZER_COST_FLOPS¶
cutensornetSmartOption_t
¶
-
enum cutensornetSmartOption_t¶
This enum lists various smart optimization options.
Values:
-
enumerator CUTENSORNET_SMART_OPTION_DISABLED = 0¶
No smart options are enabled.
-
enumerator CUTENSORNET_SMART_OPTION_ENABLED = 1¶
Automatic configuration (SMART) options of the contractionOptimizer are enabled (default behavior). This include but not limited to limit the pathfinder elapsed time and to avoid meaningless configuration as well as adjusting configuration on the fly.
-
enumerator CUTENSORNET_SMART_OPTION_DISABLED = 0¶
cutensornetNetworkDescriptor_t
¶
-
typedef void *cutensornetNetworkDescriptor_t¶
Opaque structure holding cuTensorNet’s network descriptor.
cutensornetNetworkAttributes_t
¶
-
enum cutensornetNetworkAttributes_t¶
This enum lists all attributes of a cutensornetNetworkDescriptor_t that are accessible.
Values:
-
enumerator CUTENSORNET_NETWORK_INPUT_TENSORS_NUM_CONSTANT = 0¶
int32_t: The number of input tensors that are constant (get-only).
-
enumerator CUTENSORNET_NETWORK_INPUT_TENSORS_CONSTANT = 1¶
cutensornetTensorIDList_t: Structure holding number of, and indices of input tensors that are constant. Setting this attribute will override previous setting of
CUTENSORNET_NETWORK_INPUT_TENSORS_CONSTANT
.
-
enumerator CUTENSORNET_NETWORK_INPUT_TENSORS_NUM_CONJUGATED = 10¶
int32_t: The number of input tensors that are conjugated (get-only).
-
enumerator CUTENSORNET_NETWORK_INPUT_TENSORS_CONJUGATED = 11¶
cutensornetTensorIDList_t: Structure holding number of, and indices of input tensors that are conjugated. Setting number of conjugated tesnors to -1 will select all tensors. Setting this attribute will override previous setting of
CUTENSORNET_NETWORK_INPUT_TENSORS_CONJUGATED
.
-
enumerator CUTENSORNET_NETWORK_INPUT_TENSORS_NUM_REQUIRE_GRAD = 20¶
int32_t: The number of input tensors that require gradient computation (get-only).
-
enumerator CUTENSORNET_NETWORK_INPUT_TENSORS_REQUIRE_GRAD = 21¶
cutensornetTensorIDList_t: Structure holding number of, and indices of input tensors that require gradient computation. Setting number of tensors requiring gradient computation to -1 will select all tensors. Setting this attribute will override previous setting of
CUTENSORNET_NETWORK_INPUT_TENSORS_REQUIRE_GRAD
.
-
enumerator CUTENSORNET_NETWORK_INPUT_TENSORS_NUM_CONSTANT = 0¶
cutensornetContractionPlan_t
¶
-
typedef void *cutensornetContractionPlan_t¶
Opaque structure holding cuTensorNet’s contraction plan.
cutensornetNodePair_t
¶
-
struct cutensornetNodePair_t¶
A pair of int32_t values (typically referring to tensor IDs inside of the network).
cutensornetContractionPath_t
¶
-
struct cutensornetContractionPath_t¶
Holds information about the contraction path.
The provided path is interchangeable with the path returned by numpy.einsum_path.
Public Members
-
int32_t numContractions¶
total number of tensor contractions.
-
cutensornetNodePair_t *data¶
array of size
numContractions
. The tensors corresponding todata[i].first
anddata[i].second
will be contracted.
-
int32_t numContractions¶
cutensornetContractionOptimizerConfig_t
¶
-
typedef void *cutensornetContractionOptimizerConfig_t¶
Opaque structure holding cuTensorNet’s pathfinder config.
cutensornetContractionOptimizerInfo_t
¶
-
typedef void *cutensornetContractionOptimizerInfo_t¶
Opaque structure holding information about the optimized path and the slices (see cutensornetContractionOptimizerInfoAttributes_t).
cutensornetContractionAutotunePreference_t
¶
-
typedef void *cutensornetContractionAutotunePreference_t¶
Opaque structure information about the auto-tuning phase.
cutensornetSliceGroup_t
¶
-
typedef void *cutensornetSliceGroup_t¶
Opaque structure capturing a group of slices.
cutensornetDeviceMemHandler_t
¶
-
struct cutensornetDeviceMemHandler_t¶
The device memory handler structure holds information about the user-provided, stream-ordered device memory pool (mempool).
Public Members
-
void *ctx¶
A pointer to the user-owned mempool/context object.
-
int (*device_alloc)(void *ctx, void **ptr, size_t size, cudaStream_t stream)¶
A function pointer to the user-provided routine for allocating device memory of
size
onstream
.The allocated memory should be made accessible to the current device (or more precisely, to the current CUDA context bound to the library handle).
This interface supports any stream-ordered memory allocator
ctx
. Upon success, the allocated memory can be immediately used on the given stream by any operations enqueued/ordered on the same stream after this call.It is the caller’s responsibility to ensure a proper stream order is established.
The allocated memory should be at least 256-byte aligned.
- Parameters
ctx – [in] A pointer to the user-owned mempool object.
ptr – [out] On success, a pointer to the allocated buffer.
size – [in] The amount of memory in bytes to be allocated.
stream – [in] The CUDA stream on which the memory is allocated (and the stream order is established).
- Returns
Error status of the invocation. Return 0 on success and any nonzero integer otherwise. This function must not throw if it is a C++ function.
-
int (*device_free)(void *ctx, void *ptr, size_t size, cudaStream_t stream)¶
A function pointer to the user-provided routine for de-allocating device memory of
size
onstream
.This interface supports any stream-ordered memory allocator. Upon success, any subsequent accesses (of the memory pointed to by the pointer
ptr
) ordered after this call are undefined behaviors.It is the caller’s responsibility to ensure a proper stream order is established.
If the arguments
ctx
andsize
are not the same as those passed todevice_alloc
to allocate the memory pointed to byptr
, the behavior is undefined.The argument
stream
need not be identical to the one used for allocatingptr
, as long as the stream order is correctly established. The behavior is undefined if this assumption is not held.- Parameters
ctx – [in] A pointer to the user-owned mempool object.
ptr – [in] The pointer to the allocated buffer.
size – [in] The size of the allocated memory.
stream – [in] The CUDA stream on which the memory is de-allocated (and the stream ordering is established).
- Returns
Error status of the invocation. Return 0 on success and any nonzero integer otherwise. This function must not throw if it is a C++ function.
-
char name[CUTENSORNET_ALLOCATOR_NAME_LEN]¶
The name of the provided mempool.
-
void *ctx¶
CUTENSORNET_ALLOCATOR_NAME_LEN
¶
-
CUTENSORNET_ALLOCATOR_NAME_LEN¶
The maximal length of the name for a user-provided mempool.
cutensornetWorkspaceDescriptor_t
¶
-
typedef void *cutensornetWorkspaceDescriptor_t¶
Opaque structure that holds information about the user-provided workspace.
cutensornetWorksizePref_t
¶
-
enum cutensornetWorksizePref_t¶
Workspace preference enumeration.
Values:
-
enumerator CUTENSORNET_WORKSIZE_PREF_MIN = 0¶
At least one algorithm will be available for each contraction.
-
enumerator CUTENSORNET_WORKSIZE_PREF_RECOMMENDED = 1¶
The most suitable algorithm will be available for each contraction.
-
enumerator CUTENSORNET_WORKSIZE_PREF_MAX = 2¶
All algorithms will be available for each contraction.
-
enumerator CUTENSORNET_WORKSIZE_PREF_MIN = 0¶
cutensornetMemspace_t
¶
-
enum cutensornetMemspace_t¶
Memory space enumeration for workspace allocation.
Values:
-
enumerator CUTENSORNET_MEMSPACE_DEVICE = 0¶
Device memory space. Workspace memory buffers allocated on this memory space must be device accessible. Memory buffers are device accessible if allocated natively on device (
cudaMalloc
), or on managed memory (cudaMallocManaged
), or registered on host (cudaMallocHost
,cudaHostAlloc
, orcudaHostRegister
), or a system memory with Full CUDA Unified Memory support.
-
enumerator CUTENSORNET_MEMSPACE_HOST = 1¶
Host memory space. Workspace memory buffers allocated on this memory space must be CPU accessible. Memory buffers are CPU accessible if allocated natively on host (e.g.
malloc
), or on managed memory (cudaMallocManaged
), or registered on host (cudaMallocHost
orcudaHostAlloc
).
-
enumerator CUTENSORNET_MEMSPACE_DEVICE = 0¶
cutensornetWorkspaceKind_t
¶
-
enum cutensornetWorkspaceKind_t¶
Type enumeration for workspace allocation.
Values:
-
enumerator CUTENSORNET_WORKSPACE_SCRATCH = 0¶
Scratch workspace memory.
-
enumerator CUTENSORNET_WORKSPACE_CACHE = 1¶
Cache workspace memory, must be maintained valid and contents not modified until referencing operation iterations are completed.
-
enumerator CUTENSORNET_WORKSPACE_SCRATCH = 0¶
cutensornetTensorQualifiers_t
¶
-
struct cutensornetTensorQualifiers_t¶
Holds qualifiers/flags about the input tensors.
Public Members
-
int32_t isConjugate¶
if set to 1, indicates the tensor should be complex-conjugated (applies only to complex data types).
-
int32_t isConstant¶
if set to 1, indicates the tensor’s data will not change across different network contractions.
-
int32_t requiresGradient¶
if set to 1, indicates the tensor required gradient computation.
-
int32_t isConjugate¶
cutensornetSliceInfoPair_t
¶
-
struct cutensornetSliceInfoPair_t¶
A pair of int32_t and int64_t values holding the sliced Mode and intended extent size of mode.
cutensornetSlicingConfig_t
¶
-
struct cutensornetSlicingConfig_t¶
Holds information about slicing.
Public Members
-
uint32_t numSlicedModes¶
total number of sliced modes.
-
cutensornetSliceInfoPair_t *data¶
array of size
numSlicedModes
.
-
uint32_t numSlicedModes¶
cutensornetTensorDescriptor_t
¶
-
typedef void *cutensornetTensorDescriptor_t¶
Opaque structure holding cuTensorNet’s tensor descriptor.
cutensornetTensorSVDConfig_t
¶
-
typedef void *cutensornetTensorSVDConfig_t¶
Opaque structure holding cuTensorNet’s tensor SVD configuration.
cutensornetTensorSVDConfigAttributes_t
¶
-
enum cutensornetTensorSVDConfigAttributes_t¶
This enum lists all attributes of a cutensornetTensorSVDConfig_t that can be modified.
Note
When multiple truncation cutoffs (CUTENSORNET_TENSOR_SVD_CONFIG_ABS_CUTOFF, CUTENSORNET_TENSOR_SVD_CONFIG_REL_CUTOFF, CUTENSORNET_TENSOR_SVD_CONFIG_DISCARDED_WEIGHT_CUTOFF) or maximal extent in input cutensornetTensorDescriptor_t are specified. The runtime reduced extent will be determined as the lowest among all.
Values:
-
enumerator CUTENSORNET_TENSOR_SVD_CONFIG_ABS_CUTOFF¶
double: The absolute cutoff value for truncation and the default is 0.
-
enumerator CUTENSORNET_TENSOR_SVD_CONFIG_REL_CUTOFF¶
double: The cutoff value for truncation (relative to the largest singular value) and the default is 0.
-
enumerator CUTENSORNET_TENSOR_SVD_CONFIG_S_NORMALIZATION¶
cutensornetTensorSVDNormalization_t: How to normalize the singular values (after potential truncation). Default is no normalization.
-
enumerator CUTENSORNET_TENSOR_SVD_CONFIG_S_PARTITION¶
cutensornetTensorSVDPartition_t: How to partition the singular values.
-
enumerator CUTENSORNET_TENSOR_SVD_CONFIG_ALGO¶
cutensornetTensorSVDAlgo_t: The SVD algorithm and the default is
gesvd
>
-
enumerator CUTENSORNET_TENSOR_SVD_CONFIG_ALGO_PARAMS¶
-
enumerator CUTENSORNET_TENSOR_SVD_CONFIG_DISCARDED_WEIGHT_CUTOFF¶
Optional, the parameters specific to the SVD algorithm
cutensornetTensorSVDAlgo_t
. Current supports cutensornetGesvdjParams_t for CUTENSORNET_TENSOR_SVD_ALGO_GESVDJ and cutensornetGesvdrParams_t for CUTENSORNET_TENSOR_SVD_ALGO_GESVDR.double: The maxiaml cumulative discarded weight (square sum of discarded singular values divided by square sum of all singular values) and the default is 0. This option is not allowed when CUTENSORNET_TENSOR_SVD_ALGO_GESVDR is used.
-
enumerator CUTENSORNET_TENSOR_SVD_CONFIG_ABS_CUTOFF¶
cutensornetTensorSVDPartition_t
¶
-
enum cutensornetTensorSVDPartition_t¶
This enum lists various partition schemes for singular values.
Values:
-
enumerator CUTENSORNET_TENSOR_SVD_PARTITION_NONE¶
Return U, S, V as defined (default).
-
enumerator CUTENSORNET_TENSOR_SVD_PARTITION_US¶
Absorb S onto U, i.e, US, nullptr, V.
-
enumerator CUTENSORNET_TENSOR_SVD_PARTITION_SV¶
Absorb S onto V, i.e, U, nullptr, SV.
-
enumerator CUTENSORNET_TENSOR_SVD_PARTITION_UV_EQUAL¶
Absorb S onto U and V equally, i.e, US^{1/2}, nullptr, S^{1/2}V.
-
enumerator CUTENSORNET_TENSOR_SVD_PARTITION_NONE¶
cutensornetTensorSVDNormalization_t
¶
-
enum cutensornetTensorSVDNormalization_t¶
This enum lists various normalization methods for singular values.
Values:
-
enumerator CUTENSORNET_TENSOR_SVD_NORMALIZATION_NONE¶
No normalization.
-
enumerator CUTENSORNET_TENSOR_SVD_NORMALIZATION_L1¶
Normalize the truncated singular values such that the L1 norm becomes 1.
-
enumerator CUTENSORNET_TENSOR_SVD_NORMALIZATION_L2¶
Normalize the truncated singular values such that the L2 norm becomes 1.
-
enumerator CUTENSORNET_TENSOR_SVD_NORMALIZATION_LINF¶
Normalize the truncated singular values such that the L-Infinty norm becomes 1.
-
enumerator CUTENSORNET_TENSOR_SVD_NORMALIZATION_NONE¶
cutensornetTensorSVDAlgo_t
¶
-
enum cutensornetTensorSVDAlgo_t¶
This enum lists various algorithms for SVD.
Values:
-
enumerator CUTENSORNET_TENSOR_SVD_ALGO_GESVD¶
cusolverDnGesvd
(default).
-
enumerator CUTENSORNET_TENSOR_SVD_ALGO_GESVDJ¶
cusolverDnGesvdj
.
-
enumerator CUTENSORNET_TENSOR_SVD_ALGO_GESVDP¶
cusolverDnXgesvdp
.
-
enumerator CUTENSORNET_TENSOR_SVD_ALGO_GESVDR¶
cusolverDnXgesvdr
.
-
enumerator CUTENSORNET_TENSOR_SVD_ALGO_GESVD¶
cutensornetGesvdjParams_t
¶
-
struct cutensornetGesvdjParams_t¶
This struct holds parameters for the gesvdj setting.
Public Members
-
double tol¶
The tolerance to control the accuracy of numerical singular values and the default (setting tol to 0.) adopts the default tolerance (machine precision) from cuSolver.
-
int32_t maxSweeps¶
The maximum number of sweeps for gesvdj and the default (setting maxSweep to 0) adopts the default gesvdj max sweep setting from cuSolver.
-
double tol¶
cutensornetGesvdrParams_t
¶
-
struct cutensornetGesvdrParams_t¶
This struct holds parameters for the gesvdr setting.
Public Members
-
int64_t oversampling¶
The size of oversampling and the default (setting oversampling to 0) is the lower of 4 times the truncated extent
k
and the difference between full rank andk
.
-
int64_t niters¶
Number of iteration of power method for
gesvdr
and the default (setting niters to 0) is 10.
-
int64_t oversampling¶
cutensornetGesvdjStatus_t
¶
-
struct cutensornetGesvdjStatus_t¶
This struct holds information for the gesvdj execution.
cutensornetGesvdpStatus_t
¶
-
struct cutensornetGesvdpStatus_t¶
This struct holds information for the gesvdp execution.
Public Members
-
double errSigma¶
The magnitude of the perturbation in gesvdp, showing the accuracy of SVD.
-
double errSigma¶
cutensornetTensorSVDInfo_t
¶
-
typedef void *cutensornetTensorSVDInfo_t¶
Opaque structure holding cuTensorNet’s tensor SVD information.
cutensornetTensorSVDInfoAttributes_t
¶
-
enum cutensornetTensorSVDInfoAttributes_t¶
This enum lists all attributes of a cutensornetTensorSVDInfo_t.
Values:
-
enumerator CUTENSORNET_TENSOR_SVD_INFO_FULL_EXTENT¶
int64_t: The expected extent of the shared mode if no truncation takes place.
-
enumerator CUTENSORNET_TENSOR_SVD_INFO_REDUCED_EXTENT¶
int64_t: The true extent of the shared mode found at runtime.
-
enumerator CUTENSORNET_TENSOR_SVD_INFO_DISCARDED_WEIGHT¶
double: The discarded weight of a singular value truncation. This information is not computed when fixed extent truncation is enabled with svd algorithm set to
CUTENSORNET_TENSOR_SVD_ALGO_GESVDR
.
-
enumerator CUTENSORNET_TENSOR_SVD_INFO_ALGO¶
cutensornetTensorSVDAlgo_t: The SVD algorithm used for computation.
-
enumerator CUTENSORNET_TENSOR_SVD_INFO_ALGO_STATUS¶
-
enumerator CUTENSORNET_TENSOR_SVD_INFO_FULL_EXTENT¶
cutensornetGateSplitAlgo_t
¶
-
enum cutensornetGateSplitAlgo_t¶
This enum lists algorithms for applying a gate tensor to two connected tensors.
Values:
-
enumerator CUTENSORNET_GATE_SPLIT_ALGO_DIRECT¶
The direct algorithm with contraction and SVD for the gate split process.
-
enumerator CUTENSORNET_GATE_SPLIT_ALGO_REDUCED¶
The reduced algorithm with additional QR for the gate split process.
-
enumerator CUTENSORNET_GATE_SPLIT_ALGO_DIRECT¶
cutensornetState_t
¶
-
typedef void *cutensornetState_t¶
Opaque structure holding the tensor network state.
cutensornetStatePurity_t
¶
-
enum cutensornetStatePurity_t¶
This enum captures tensor network state purity.
Values:
-
enumerator CUTENSORNET_STATE_PURITY_PURE¶
Pure tensor network state (belongs to the primary tensor space)
-
enumerator CUTENSORNET_STATE_PURITY_PURE¶
cutensornetBoundaryCondition_t
¶
-
enum cutensornetBoundaryCondition_t¶
This enum lists supported boundary conditions for supported tensor network factorizations.
Values:
-
enumerator CUTENSORNET_BOUNDARY_CONDITION_OPEN¶
Open boundary condition.
-
enumerator CUTENSORNET_BOUNDARY_CONDITION_OPEN¶
cutensornetStateAttributes_t
¶
-
enum cutensornetStateAttributes_t¶
This enum lists all attributes associated with computation of a cutensornetState_t.
Values:
-
enumerator CUTENSORNET_STATE_MPS_CANONICAL_CENTER = 0¶
DEPRECATED int32_t: The site where canonical center of the target MPS should be placed at. If less than 0 (default -1), no canonical center will be enforced.
-
enumerator CUTENSORNET_STATE_MPS_SVD_CONFIG_ABS_CUTOFF = 1¶
DEPRECATED double: The absolute cutoff value for SVD truncation (default is 0).
-
enumerator CUTENSORNET_STATE_MPS_SVD_CONFIG_REL_CUTOFF = 2¶
DEPRECATED double: The cutoff value for SVD truncation relative to the largest singular value (default is 0).
-
enumerator CUTENSORNET_STATE_MPS_SVD_CONFIG_S_NORMALIZATION = 3¶
DEPRECATED cutensornetTensorSVDNormalization_t: How to normalize singular values after potential truncation. Default is no normalization.
-
enumerator CUTENSORNET_STATE_MPS_SVD_CONFIG_ALGO = 4¶
DEPRECATED cutensornetTensorSVDAlgo_t: The SVD algorithm (default is
gesvd
).
-
enumerator CUTENSORNET_STATE_MPS_SVD_CONFIG_ALGO_PARAMS = 5¶
-
enumerator CUTENSORNET_STATE_MPS_SVD_CONFIG_DISCARDED_WEIGHT_CUTOFF = 6¶
DEPRECATED Optional, the parameters specific to the SVD algorithm
cutensornetTensorSVDAlgo_t
, currently supporting cutensornetGesvdjParams_t for CUTENSORNET_TENSOR_SVD_ALGO_GESVDJ and cutensornetGesvdrParams_t for CUTENSORNET_TENSOR_SVD_ALGO_GESVDR.DEPRECATED double: The maximal cumulative discarded weight (square sum of discarded singular values divided by the square sum of all singular values), defaults to 0. This option is not allowed when CUTENSORNET_TENSOR_SVD_ALGO_GESVDR is chosen.
-
enumerator CUTENSORNET_STATE_NUM_HYPER_SAMPLES = 7¶
DEPRECATED int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_STATE_CONFIG_MPS_CANONICAL_CENTER = 16¶
int32_t: The site where canonical center of the target MPS should be placed at. If less than 0 (default -1), no canonical center will be enforced.
-
enumerator CUTENSORNET_STATE_CONFIG_MPS_SVD_ABS_CUTOFF = 17¶
double: The absolute cutoff value for SVD truncation (default is 0).
-
enumerator CUTENSORNET_STATE_CONFIG_MPS_SVD_REL_CUTOFF = 18¶
double: The cutoff value for SVD truncation relative to the largest singular value (default is 0).
-
enumerator CUTENSORNET_STATE_CONFIG_MPS_SVD_S_NORMALIZATION = 19¶
cutensornetTensorSVDNormalization_t: How to normalize singular values after potential truncation. Default is no normalization.
-
enumerator CUTENSORNET_STATE_CONFIG_MPS_SVD_ALGO = 20¶
cutensornetTensorSVDAlgo_t: The SVD algorithm (default is
gesvd
).
-
enumerator CUTENSORNET_STATE_CONFIG_MPS_SVD_ALGO_PARAMS = 21¶
-
enumerator CUTENSORNET_STATE_CONFIG_MPS_SVD_DISCARDED_WEIGHT_CUTOFF = 22¶
Optional, the parameters specific to the SVD algorithm
cutensornetTensorSVDAlgo_t
, currently supporting cutensornetGesvdjParams_t for CUTENSORNET_TENSOR_SVD_ALGO_GESVDJ and cutensornetGesvdrParams_t for CUTENSORNET_TENSOR_SVD_ALGO_GESVDR.double: The maximal cumulative discarded weight (square sum of discarded singular values divided by the square sum of all singular values), defaults to 0. This option is not allowed when CUTENSORNET_TENSOR_SVD_ALGO_GESVDR is chosen.
-
enumerator CUTENSORNET_STATE_CONFIG_MPS_MPO_APPLICATION = 23¶
Optional, the computational setting for all contraction and decomposition operations in MPS-MPO computation (swap included). Default is set to
CUTENSORNET_STATE_MPO_APPLICATION_INEXACT
.
-
enumerator CUTENSORNET_STATE_CONFIG_NUM_HYPER_SAMPLES = 30¶
int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_STATE_INFO_FLOPS = 64¶
double: Total Flop count estimate associated with explicit computation of the tensor network state.
-
enumerator CUTENSORNET_STATE_MPS_CANONICAL_CENTER = 0¶
cutensornetStateMPOApplication_t
¶
-
enum cutensornetStateMPOApplication_t¶
This enum lists all options for contraction and decomposition operations in MPS-MPO computation.
Values:
-
enumerator CUTENSORNET_STATE_MPO_APPLICATION_INEXACT¶
All swap and decomposition operations in MPS-MPO multiplication will follow the same constraints set by the underlying SVD configurations and target extents set by cutensornetStateFinalizeMPS().
-
enumerator CUTENSORNET_STATE_MPO_APPLICATION_EXACT¶
All swap and decomposition operations in MPS-MPO multiplication will be performed in an exact manner with all constraints from underlying SVD configuration and target extents specification dismissed. Note as of current version, this option shall only be used when exact MPS computation is required.
-
enumerator CUTENSORNET_STATE_MPO_APPLICATION_INEXACT¶
cutensornetNetworkOperator_t
¶
-
typedef void *cutensornetNetworkOperator_t¶
Opaque structure holding the tensor network operator object.
cutensornetStateAccessor_t
¶
-
typedef void *cutensornetStateAccessor_t¶
Opaque structure holding the tensor network state amplitudes (a slice of the full output state tensor).
cutensornetAccessorAttributes_t
¶
-
enum cutensornetAccessorAttributes_t¶
This enum lists attributes associated with computation of tensor network state amplitudes.
Values:
-
enumerator CUTENSORNET_ACCESSOR_OPT_NUM_HYPER_SAMPLES = 0¶
DEPRECATED int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_ACCESSOR_CONFIG_NUM_HYPER_SAMPLES = 1¶
int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_ACCESSOR_INFO_FLOPS = 64¶
double: Total Flop count estimate associated with computing the specified set of tensor network state amplitudes.
-
enumerator CUTENSORNET_ACCESSOR_OPT_NUM_HYPER_SAMPLES = 0¶
cutensornetStateExpectation_t
¶
-
typedef void *cutensornetStateExpectation_t¶
Opaque structure holding the tensor network state expectation value.
cutensornetExpectationAttributes_t
¶
-
enum cutensornetExpectationAttributes_t¶
This enum lists attributes associated with computation of a tensor network state expectation value.
Values:
-
enumerator CUTENSORNET_EXPECTATION_OPT_NUM_HYPER_SAMPLES = 0¶
DEPRECATED int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_EXPECTATION_CONFIG_NUM_HYPER_SAMPLES = 1¶
int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_EXPECTATION_INFO_FLOPS = 64¶
double: Total Flop count estimate associated with computing the tensor network state expectation value.
-
enumerator CUTENSORNET_EXPECTATION_OPT_NUM_HYPER_SAMPLES = 0¶
cutensornetStateMarginal_t
¶
-
typedef void *cutensornetStateMarginal_t¶
Opaque structure holding the tensor network state marginal (aka reduced density matrix).
cutensornetMarginalAttributes_t
¶
-
enum cutensornetMarginalAttributes_t¶
This enum lists attributes associated with computation of a tensor network state marginal tensor.
Values:
-
enumerator CUTENSORNET_MARGINAL_OPT_NUM_HYPER_SAMPLES = 0¶
DEPRECATED int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_MARGINAL_CONFIG_NUM_HYPER_SAMPLES = 1¶
int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_MARGINAL_INFO_FLOPS = 64¶
double: Total Flop count estimate associated with computing the tensor network state marginal tensor.
-
enumerator CUTENSORNET_MARGINAL_OPT_NUM_HYPER_SAMPLES = 0¶
cutensornetStateSampler_t
¶
-
typedef void *cutensornetStateSampler_t¶
Opaque structure holding the tensor network state sampler.
cutensornetSamplerAttributes_t
¶
-
enum cutensornetSamplerAttributes_t¶
This enum lists attributes associated with tensor network state sampling.
Values:
-
enumerator CUTENSORNET_SAMPLER_OPT_NUM_HYPER_SAMPLES = 0¶
DEPRECATED int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_SAMPLER_CONFIG_NUM_HYPER_SAMPLES = 1¶
int32_t: Number of hyper-samples used by the tensor network contraction path finder.
-
enumerator CUTENSORNET_SAMPLER_CONFIG_DETERMINISTIC = 2¶
int32_t: A positive random seed will ensure deterministic sampling results across multiple application runs.
-
enumerator CUTENSORNET_SAMPLER_INFO_FLOPS = 64¶
double: Total Flop count estimate associated with generating a single sample from the tensor network state.
-
enumerator CUTENSORNET_SAMPLER_OPT_NUM_HYPER_SAMPLES = 0¶
cudaDataType_t
¶
The type is an enumerant to specify the data precision.
It is used when the data reference does not carry the type itself (e.g void *
).
-
enum cudaDataType_t¶
Values:
-
enumerator CUDA_R_16F = 2¶
real as a half
-
enumerator CUDA_C_16F = 6¶
complex as a pair of half numbers
-
enumerator CUDA_R_16BF = 14¶
real as a nv_bfloat16
-
enumerator CUDA_C_16BF = 15¶
complex as a pair of nv_bfloat16 numbers
-
enumerator CUDA_R_32F = 0¶
real as a float
-
enumerator CUDA_C_32F = 4¶
complex as a pair of float numbers
-
enumerator CUDA_R_64F = 1¶
real as a double
-
enumerator CUDA_C_64F = 5¶
complex as a pair of double numbers
-
enumerator CUDA_R_4I = 16¶
real as a signed 4-bit int
-
enumerator CUDA_C_4I = 17¶
complex as a pair of signed 4-bit int numbers
-
enumerator CUDA_R_4U = 18¶
real as a unsigned 4-bit int
-
enumerator CUDA_C_4U = 19¶
complex as a pair of unsigned 4-bit int numbers
-
enumerator CUDA_R_8I = 3¶
real as a signed 8-bit int
-
enumerator CUDA_C_8I = 7¶
complex as a pair of signed 8-bit int numbers
-
enumerator CUDA_R_8U = 8¶
real as a unsigned 8-bit int
-
enumerator CUDA_C_8U = 9¶
complex as a pair of unsigned 8-bit int numbers
-
enumerator CUDA_R_16I = 20¶
real as a signed 16-bit int
-
enumerator CUDA_C_16I = 21¶
complex as a pair of signed 16-bit int numbers
-
enumerator CUDA_R_16U = 22¶
real as a unsigned 16-bit int
-
enumerator CUDA_C_16U = 23¶
complex as a pair of unsigned 16-bit int numbers
-
enumerator CUDA_R_32I = 10¶
real as a signed 32-bit int
-
enumerator CUDA_C_32I = 11¶
complex as a pair of signed 32-bit int numbers
-
enumerator CUDA_R_32U = 12¶
real as a unsigned 32-bit int
-
enumerator CUDA_C_32U = 13¶
complex as a pair of unsigned 32-bit int numbers
-
enumerator CUDA_R_64I = 24¶
real as a signed 64-bit int
-
enumerator CUDA_C_64I = 25¶
complex as a pair of signed 64-bit int numbers
-
enumerator CUDA_R_64U = 26¶
real as a unsigned 64-bit int
-
enumerator CUDA_C_64U = 27¶
complex as a pair of unsigned 64-bit int numbers
-
enumerator CUDA_R_16F = 2¶