cuTensorNet Functions¶
Handle Management API¶
cutensornetCreate
¶
-
cutensornetStatus_t cutensornetCreate(cutensornetHandle_t *handle)¶
-
Initializes the cuTensorNet library.
The device associated with a particular cuTensorNet handle is assumed to remain unchanged after the cutensornetCreate() call. In order for the cuTensorNet library to use a different device, the application must set the new device to be used by calling cudaSetDevice() and then create another cuTensorNet handle, which will be associated with the new device, by calling cutensornetCreate().
- Remark
-
blocking, no reentrant, and thread-safe
- Parameters
-
handle – [out] Pointer to cutensornetHandle_t
- Returns
-
CUTENSOR_STATUS_SUCCESS on success and an error code otherwise
cutensornetDestroy
¶
-
cutensornetStatus_t cutensornetDestroy(cutensornetHandle_t handle)¶
-
Destroys the cuTensorNet library handle.
This function releases resources used by the cuTensorNet library handle. This function is the last call with a particular handle to the cuTensorNet library. Calling any cuTensorNet function which uses cutensornetHandle_t after cutensornetDestroy() will return an error.
- Parameters
-
handle – [inout] Opaque handle holding cuTensorNet’s library context.
Network Descriptor API¶
cutensornetCreateNetworkDescriptor
¶
-
cutensornetStatus_t cutensornetCreateNetworkDescriptor(const cutensornetHandle_t handle, int32_t numInputs, const int32_t numModesIn[], const int64_t *const extentsIn[], const int64_t *const stridesIn[], const int32_t *const modesIn[], const uint32_t alignmentRequirementsIn[], int32_t numModesOut, const int64_t extentsOut[], const int64_t stridesOut[], const int32_t modesOut[], uint32_t alignmentRequirementsOut, cudaDataType_t dataType, cutensornetComputeType_t computeType, cutensornetNetworkDescriptor_t *descNet)¶
-
Initializes a cutensornetNetworkDescriptor_t, describing connectivity between the tensors.
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyNetworkDescriptor() is called once
descNet
is no longer required.Supported data-type combinations are:
Data type
Compute type
Tensor Core
CUDA_R_16F
CUTENSORNET_COMPUTE_32F
Volta+
CUDA_R_16BF
CUTENSORNET_COMPUTE_32F
Ampere+
CUDA_R_32F
CUTENSORNET_COMPUTE_32F
No
CUDA_R_32F
CUTENSORNET_COMPUTE_TF32
Ampere+
CUDA_R_32F
CUTENSORNET_COMPUTE_16BF
Ampere+
CUDA_R_32F
CUTENSORNET_COMPUTE_16F
Volta+
CUDA_R_64F
CUTENSORNET_COMPUTE_64F
Ampere+
CUDA_R_64F
CUTENSORNET_COMPUTE_32F
No
CUDA_C_32F
CUTENSORNET_COMPUTE_32F
No
CUDA_C_32F
CUTENSORNET_COMPUTE_TF32
Ampere+
CUDA_C_64F
CUTENSORNET_COMPUTE_64F
Ampere+
CUDA_C_64F
CUTENSORNET_COMPUTE_32F
No
Note
If
stridesIn
(stridesOut
) is set to 0 (NULL
), it means the input tensors (output tensor) are in the Fortran (column-major) layout.- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
numInputs – [in] Number of input tensors.
numModesIn – [in] Array of size
numInputs
;numModesIn[i]
denotes the number of modes available in the i-th tensor.extentsIn – [in] Array of size
numInputs
;extentsIn[i]
hasnumModesIn[i]
many entries withextentsIn[i][j]
(j
<numModesIn[i]
) corresponding to the extent of the j-th mode of tensori
.stridesIn – [in] Array of size
numInputs
;stridesIn[i]
hasnumModesIn[i]
many entries withstridesIn[i][j]
(j
<numModesIn[i]
) corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of tensori
.modesIn – [in] Array of size
numInputs
;modesIn[i]
hasnumModesIn[i]
many entries each entry corresponds to a mode. Each mode that does not appear in the input tensor is implicitly contracted.alignmentRequirementsIn – [in] Array of size
numInputs
;alignmentRequirementsIn[i]
denotes the (minimal) alignment (in bytes) for the data pointer that corresponds to the i-th tensor (seerawDataIn[i]
of cutensornetContraction()). It is recommended that each pointer is aligned to a 256-byte boundary.extentsOut – [in] Array of size
numModesOut
;extentsOut[j]
(j
<numModesOut
) corresponding to the extent of the j-th mode of the output tensor.stridesOut – [in] Array of size
numModesOut
;stridesOut[j]
(j
<numModesOut
) corresponding to the offset between two logically-neighboring elements within the j-th mode of the output tensor.numModesOut – [in] number of modes of the output tensor. On entry, if this value is 0 and the output modes are not provided, the network will infer the output modes.
extentsOut – [in] Array of size
numModesOut
;extentsOut[j]
denotes the extent of the j-th mode of the output tensor.stridesIn – [in] Array of size
numOutputs
;stridesOut[j]
corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of the output tensor.modesOut – [in] Array of size
numModesOut
;modesOut[j]
denotes the j-th mode of the output tensor.alignmentRequirementsOut – [in] Denotes the (minimal) alignment (in bytes) for the data pointer that corresponds to the output tensor (see rawDataOut of cutensornetContraction()). It’s recommended that each pointer is aligned to a 256-byte boundary. output tensor.
dataType – [in] Denotes the data type for all input an output tensors.
computeType – [in] Denotes the compute type used throughout the computation.
descNet – [out] Pointer to a cutensornetNetworkDescriptor_t.
cutensornetDestroyNetworkDescriptor
¶
-
cutensornetStatus_t cutensornetDestroyNetworkDescriptor(cutensornetNetworkDescriptor_t desc)¶
-
Frees all the memory associated with desc.
- Parameters
-
desc – [inout] Opaque handle to a tensor network descriptor.
Contraction Optimizer API¶
cutensornetCreateContractionOptimizerConfig
¶
-
cutensornetStatus_t cutensornetCreateContractionOptimizerConfig(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t *optimizerConfig)¶
-
Sets up the required hyper-optimization parameters for the contraction order solver (see cutensornetContractionOptimize())
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionOptimizerConfig() is called once
optimizerConfig
is no longer required.- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerConfig – [out] This data structure holds all information about the user-requested hyper-optimization parameters.
cutensornetDestroyContractionOptimizerConfig
¶
-
cutensornetStatus_t cutensornetDestroyContractionOptimizerConfig(cutensornetContractionOptimizerConfig_t optimizerConfig)¶
-
Frees all the memory associated with optimizerConfig.
- Parameters
-
optimizerConfig – [inout] Opaque structure.
cutensornetContractionOptimizerConfigGetAttribute
¶
-
cutensornetStatus_t cutensornetContractionOptimizerConfigGetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t optimizerConfig, cutensornetContractionOptimizerConfigAttributes_t attr, void *buf, size_t sizeInBytes)¶
-
Get attributes of optimizerConfig.
- Parameters
-
handle – [in] Opaque handle holding cuTENSORNet’s library context.
optimizerConfig – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within optimizerConfig.
sizeInBytes – [in] Size of buf (in bytes).
cutensornetContractionOptimizerConfigSetAttribute
¶
-
cutensornetStatus_t cutensornetContractionOptimizerConfigSetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t optimizerConfig, cutensornetContractionOptimizerConfigAttributes_t attr, const void *buf, size_t sizeInBytes)¶
-
Set attributes of optimizerConfig.
- Parameters
-
handle – [in] Opaque handle holding cuTENSORNet’s library context.
optimizerConfig – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set.
sizeInBytes – [in] Size of buf (in bytes).
cutensornetCreateContractionOptimizerInfo
¶
-
cutensornetStatus_t cutensornetCreateContractionOptimizerInfo(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, cutensornetContractionOptimizerInfo_t *optimizerInfo)¶
-
Allocates resources for optimizerInfo.
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionOptimizerInfo() is called once
optimizerInfo
is no longer required.- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity) for which optimizerInfo is created.
optimizerInfo – [out] Pointer to cutensornetContractionOptimizerInfo_t.
cutensornetDestroyContractionOptimizerInfo
¶
-
cutensornetStatus_t cutensornetDestroyContractionOptimizerInfo(cutensornetContractionOptimizerInfo_t optimizerInfo)¶
-
Frees all the memory associated with optimizerInfo.
- Parameters
-
optimizerInfo – [inout] Opaque structure.
cutensornetContractionOptimize
¶
-
cutensornetStatus_t cutensornetContractionOptimize(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerConfig_t optimizerConfig, uint64_t workspaceSizeConstraint, cutensornetContractionOptimizerInfo_t optimizerInfo)¶
-
Computes an “optimized” contraction order as well as slicing info (for more information see Overview section) for a given tensor network.
Computes an “optimized” contraction order as well as slicing info (for more information see Overview section) for a given tensor network such that the total time to solution is minimized while adhering to the user-provided memory constraint.
- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the topology of the tensor network (i.e., all tensors, their connectivity and modes).
optimizerConfig – [in] Holds all hyper-optimization parameters that govern the search for an “optimal” contraction order.
workspaceSizeConstraint – [in] Maximal device memory that will be provided by the user (i.e., cuTensorNet has to find a viable path/slicing solution within this user-defined constraint).
-
optimizerInfo – [inout] On return, this object will hold all necessary information about the optimized path and the related slicing information. optimizerInfo will hold information about (see cutensornetContractionOptimizerInfoAttributes_t):
Total number of slices.
Total number of sliced modes.
Information about the sliced modes (i.e., the ids of the sliced modes (see modesIn w.r.t. cutensornetCreateNetworkDescriptor()) as well as their extent (see Overview section for additional documentation).
Optimized path
Flop count
Size (in bytes) of the largest intermediate tensor.
cutensornetContractionOptimizerInfoGetAttribute
¶
-
cutensornetStatus_t cutensornetContractionOptimizerInfoGetAttribute(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetContractionOptimizerInfoAttributes_t attr, void *buf, size_t sizeInBytes)¶
-
Access attributes of optimizerInfo.
- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within optimizeInfo.
sizeInBytes – [in] Size of buf (in bytes).
cutensornetContractionOptimizerInfoSetAttribute
¶
-
cutensornetStatus_t cutensornetContractionOptimizerInfoSetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetContractionOptimizerInfoAttributes_t attr, const void *buf, size_t sizeInBytes)¶
-
Sett attributes of optimizerInfo.
- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set.
sizeInBytes – [in] Size of buf (in bytes).
Contraction Plan API¶
cutensornetCreateContractionPlan
¶
-
cutensornetStatus_t cutensornetCreateContractionPlan(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, const uint64_t workspaceSize, cutensornetContractionPlan_t *plan)¶
-
Initializes a cutensornetContractionPlan_t.
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionPlan() is called once
plan
is no longer required.- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity).
optimizerInfo – [in] Opaque structure.
workspaceSize – [in] Size of the provided workspace (in bytes).
plan – [out] cuTensorNet’s contraction plan holds all the information required to perform the tensor contractions; to be precise, it initializes a cutensorContractionPlan for each tensor contraction that is required to contract the entire tensor network.
cutensornetDestroyContractionPlan
¶
-
cutensornetStatus_t cutensornetDestroyContractionPlan(cutensornetContractionPlan_t plan)¶
-
Frees all resources owned by plan.
- Parameters
-
plan – [inout] Opaque structure.
cutensornetContractionAutotune
¶
-
cutensornetStatus_t cutensornetContractionAutotune(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, void *workspace, uint64_t workspaceSize, const cutensornetContractionAutotunePreference_t pref, cudaStream_t stream)¶
-
This function auto-tunes the contraction plan to find the best cutensorContractionPlan_t for each pair-wise contraction.
This function auto-tunes the contraction plan to find the best cutensorContractionPlan_t for each pair-wise contraction. This function is blocking due to the nature of the auto-tuning process.
- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [inout] The plan must already be created (see cutensornetCreateContractionPlan()); the individual contraction plans will be fine-tuned.
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor()); rawDataIn[i] points to the data associated with the i-th input tensor (in device memory).
rawDataOut – [out] Points to the raw data of the output tensor (in device memory).
workspace – [out] Points to a scratchpad memory (in device memory).
workspaceSize – [in] Size of the provided workspace (in bytes).
pref – [in] Controls the auto-tuning process and gives the user control over how much time is spent in this routine.
stream – [inout] The CUDA stream in which all the computation is performed. in
cutensornetCreateContractionAutotunePreference
¶
-
cutensornetStatus_t cutensornetCreateContractionAutotunePreference(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t *autotunePreference)¶
-
Sets up the required auto-tune parameters for the contraction plan.
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionAutotunePreference() is called once
autotunePreference
is no longer required.- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
autotunePreference – [out] This data structure holds all information about the user-requested auto-tune parameters.
cutensornetContractionAutotunePreferenceGetAttribute
¶
-
cutensornetStatus_t cutensornetContractionAutotunePreferenceGetAttribute(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t autotunePreference, cutensornetContractionAutotunePreferenceAttributes_t attr, void *buf, size_t sizeInBytes)¶
-
Get attributes of AutotunePreference.
- Parameters
-
handle – [in] Opaque handle holding cuTENSORNet’s library context.
autotunePreference – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within autotunePreference.
sizeInBytes – [in] Size of buf (in bytes).
cutensornetContractionAutotunePreferenceSetAttribute
¶
-
cutensornetStatus_t cutensornetContractionAutotunePreferenceSetAttribute(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t autotunePreference, cutensornetContractionAutotunePreferenceAttributes_t attr, const void *buf, size_t sizeInBytes)¶
-
Set attributes of AutotunePreference.
- Parameters
-
handle – [in] Opaque handle holding cuTENSORNet’s library context.
autotunePreference – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set.
sizeInBytes – [in] Size of buf (in bytes).
cutensornetDestroyContractionAutotunePreference
¶
-
cutensornetStatus_t cutensornetDestroyContractionAutotunePreference(cutensornetContractionAutotunePreference_t autotunePreference)¶
-
Frees all the memory associated with autotunePreference.
- Parameters
-
autotunePreference – [inout] Opaque structure.
Network Contraction API¶
cutensornetContractionGetWorkspaceSize
¶
-
cutensornetStatus_t cutensornetContractionGetWorkspaceSize(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, uint64_t *workspaceSize)¶
-
Computes the minimum workspace memory size needed to contract the input tensor-network using the provided contraction path.
- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity).
optimizerInfo – [in] Opaque structure.
workspaceSize – [out] The workspace memory size in bytes needed to successfully contract this tensor-network.
cutensornetContraction
¶
-
cutensornetStatus_t cutensornetContraction(const cutensornetHandle_t handle, const cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, void *workspace, uint64_t workspaceSize, int64_t sliceId, cudaStream_t stream)¶
-
This function performs the actual contraction of the tensor network.
- Parameters
-
handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [in] Encodes the execution of a tensor network contraction (see cutensornetCreateContractionPlan and cutensornetContractionAutotune).
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor()); rawDataIn[i] points to the data associated with the i-th input tensor (in device memory).
rawDataOut – [out] Points to the raw data of the output tensor (in device memory).
workspace – [out] Points to a scratchpad memory (in device memory).
workspaceSize – [in] Size of the provided workspace (in bytes).
sliceId – [in] The id of the slice that is currently contracted (this value ranges between 0 and optimizerInfo.numSlices); use 0 if no slices are used.
stream – [inout] The CUDA stream in which all the computation is performed. in
Error Management API¶
cutensornetGetErrorString
¶
-
const char *cutensornetGetErrorString(cutensornetStatus_t error)¶
-
Returns the description string for an error code.
- Remark
-
non-blocking, no reentrant, and thread-safe
- Parameters
-
error – [in] Error code to convert to string.
- Returns
-
the error string
Logger API¶
cutensornetLoggerSetCallback
¶
-
cutensornetStatus_t cutensornetLoggerSetCallback(cutensornetLoggerCallback_t callback)¶
-
This function sets the logging callback routine.
- Parameters
-
callback – [in] Pointer to a callback function. Check cutensornetLoggerCallback_t.
cutensornetLoggerSetFile
¶
-
cutensornetStatus_t cutensornetLoggerSetFile(FILE *file)¶
-
This function sets the logging output file.
- Parameters
-
file – [in] An open file with write permission.
cutensornetLoggerOpenFile
¶
-
cutensornetStatus_t cutensornetLoggerOpenFile(const char *logFile)¶
-
This function opens a logging output file in the given path.
- Parameters
-
logFile – [in] Path to the logging output file.
cutensornetLoggerSetLevel
¶
-
cutensornetStatus_t cutensornetLoggerSetLevel(int32_t level)¶
-
This function sets the value of the logging level.
- Parameters
-
level – [in] Log level, should be one of the following:
Level
Summary
Long Description
“0”
Off
logging is disabled (default)
“1”
Errors
only errors will be logged
“2”
Performance Trace
API calls that launch CUDA kernels will log their parameters and important information
“3”
Performance Hints
hints that can potentially improve the application’s performance
“4”
Heuristics Trace
provides general information about the library execution, may contain details about heuristic status
“5”
API Trace
API Trace - API calls will log their parameter and important information
cutensornetLoggerSetMask
¶
-
cutensornetStatus_t cutensornetLoggerSetMask(int32_t mask)¶
-
This function sets the value of the log mask.
- Parameters
-
mask – [in] Value of the logging mask. Masks are defined as a combination (bitwise OR) of the following masks:
Level
Description
“0”
Off
“1”
Errors
“2”
Performance Trace
“4”
Performance Hints
“8”
Heuristics Trace
“16”
API Trace
cutensornetLoggerSetLevel
for the details.
cutensornetLoggerForceDisable
¶
-
cutensornetStatus_t cutensornetLoggerForceDisable()¶
-
This function disables logging for the entire run.