cuTensorNet Functions¶
Handle Management API¶
cutensornetCreate
¶
-
cutensornetStatus_t cutensornetCreate(cutensornetHandle_t *handle)¶
Initializes the cuTensorNet library.
The device associated with a particular cuTensorNet handle is assumed to remain unchanged after the cutensornetCreate() call. In order for the cuTensorNet library to use a different device, the application must set the new device to be used by calling cudaSetDevice() and then create another cuTensorNet handle, which will be associated with the new device, by calling cutensornetCreate().
Remark
blocking, no reentrant, and thread-safe
- Parameters
handle – [out] Pointer to cutensornetHandle_t
- Returns
CUTENSORNET_STATUS_SUCCESS on success and an error code otherwise
cutensornetDestroy
¶
-
cutensornetStatus_t cutensornetDestroy(cutensornetHandle_t handle)¶
Destroys the cuTensorNet library handle.
This function releases resources used by the cuTensorNet library handle. This function is the last call with a particular handle to the cuTensorNet library. Calling any cuTensorNet function which uses cutensornetHandle_t after cutensornetDestroy() will return an error.
- Parameters
handle – [inout] Opaque handle holding cuTensorNet’s library context.
Network Descriptor API¶
cutensornetCreateNetworkDescriptor
¶
-
cutensornetStatus_t cutensornetCreateNetworkDescriptor(const cutensornetHandle_t handle, int32_t numInputs, const int32_t numModesIn[], const int64_t *const extentsIn[], const int64_t *const stridesIn[], const int32_t *const modesIn[], const uint32_t alignmentRequirementsIn[], int32_t numModesOut, const int64_t extentsOut[], const int64_t stridesOut[], const int32_t modesOut[], uint32_t alignmentRequirementsOut, cudaDataType_t dataType, cutensornetComputeType_t computeType, cutensornetNetworkDescriptor_t *descNet)¶
Initializes a cutensornetNetworkDescriptor_t, describing the connectivity (i.e., network topology) between the tensors.
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyNetworkDescriptor() is called once
descNet
is no longer required.Supported data-type combinations are:
Data type
Compute type
Tensor Core
CUDA_R_16F
CUTENSORNET_COMPUTE_32F
Volta+
CUDA_R_16BF
CUTENSORNET_COMPUTE_32F
Ampere+
CUDA_R_32F
CUTENSORNET_COMPUTE_32F
No
CUDA_R_32F
CUTENSORNET_COMPUTE_TF32
Ampere+
CUDA_R_32F
CUTENSORNET_COMPUTE_16BF
Ampere+
CUDA_R_32F
CUTENSORNET_COMPUTE_16F
Volta+
CUDA_R_64F
CUTENSORNET_COMPUTE_64F
Ampere+
CUDA_R_64F
CUTENSORNET_COMPUTE_32F
No
CUDA_C_32F
CUTENSORNET_COMPUTE_32F
No
CUDA_C_32F
CUTENSORNET_COMPUTE_TF32
Ampere+
CUDA_C_64F
CUTENSORNET_COMPUTE_64F
Ampere+
CUDA_C_64F
CUTENSORNET_COMPUTE_32F
No
Note
If
stridesIn
(stridesOut
) is set to 0 (NULL
), it means the input tensors (output tensor) are in the Fortran (column-major) layout.Note
numModesOut
can be set to-1
for cuTensorNet to infer the output modes based on the input modes, or to0
to perform a full reduction.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
numInputs – [in] Number of input tensors.
numModesIn – [in] Array of size
numInputs
;numModesIn[i]
denotes the number of modes available in the i-th tensor.extentsIn – [in] Array of size
numInputs
;extentsIn[i]
hasnumModesIn[i]
many entries withextentsIn[i][j]
(j
<numModesIn[i]
) corresponding to the extent of the j-th mode of tensori
.stridesIn – [in] Array of size
numInputs
;stridesIn[i]
hasnumModesIn[i]
many entries withstridesIn[i][j]
(j
<numModesIn[i]
) corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of tensori
.modesIn – [in] Array of size
numInputs
;modesIn[i]
hasnumModesIn[i]
many entries each entry corresponds to a mode. Each mode that does not appear in the input tensor is implicitly contracted.alignmentRequirementsIn – [in] Array of size
numInputs
;alignmentRequirementsIn[i]
denotes the (minimal) alignment (in bytes) for the data pointer that corresponds to the i-th tensor (seerawDataIn[i]
of cutensornetContraction()). It is recommended that each pointer is aligned to a 256-byte boundary.numModesOut – [in] number of modes of the output tensor. On entry, if this value is
-1
and the output modes are not provided, the network will infer the output modes. If this value is0
, the network is force reduced.extentsOut – [in] Array of size
numModesOut
;extentsOut[j]
(j
<numModesOut
) corresponding to the extent of the j-th mode of the output tensor.stridesOut – [in] Array of size
numModesOut
;stridesOut[j]
(j
<numModesOut
) corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of the output tensor.modesOut – [in] Array of size
numModesOut
;modesOut[j]
denotes the j-th mode of the output tensor.alignmentRequirementsOut – [in] Denotes the (minimal) alignment (in bytes) for the data pointer that corresponds to the output tensor (see
rawDataOut
of cutensornetContraction()). It’s recommended that each pointer is aligned to a 256-byte boundary. output tensor.dataType – [in] Denotes the data type for all input an output tensors.
computeType – [in] Denotes the compute type used throughout the computation.
descNet – [out] Pointer to a cutensornetNetworkDescriptor_t.
cutensornetDestroyNetworkDescriptor
¶
-
cutensornetStatus_t cutensornetDestroyNetworkDescriptor(cutensornetNetworkDescriptor_t desc)¶
Frees all the memory associated with the network descriptor.
- Parameters
desc – [inout] Opaque handle to a tensor network descriptor.
cutensornetGetOutputTensorDetails
¶
-
cutensornetStatus_t cutensornetGetOutputTensorDetails(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, int32_t *numModesOut, size_t *dataSizeOut, int32_t *modeLabelsOut, int64_t *extentsOut, int64_t *stridesOut)¶
Gets the number of output modes, data size, modes, extents, and strides of the output tensor.
If all information regarding the output tensor is needed by the user, this function should be called twice (the first time to retrieve
numModesOut
for allocating memory, and the second to retrievemodesOut
,extentsOut
, andstridesOut
).- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Pointer to a cutensornetNetworkDescriptor_t.
numModesOut – [out] on return, holds the number of modes of the output tensor. Cannot be null.
dataSizeOut – [out] if not null on return, holds the size (in bytes) of the memory needed for the output tensor. Optionally, can be null.
modeLabelsOut – [out] if not null on return, holds the modes of the output tensor. Optionally, can be null.
extentsOut – [out] if not null on return, holds the extents of the output tensor. Optionally, can be null.
stridesOut – [out] if not null on return, holds the strides of the output tensor. Optionally, can be null.
Contraction Optimizer API¶
cutensornetCreateContractionOptimizerConfig
¶
-
cutensornetStatus_t cutensornetCreateContractionOptimizerConfig(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t *optimizerConfig)¶
Sets up the required hyper-optimization parameters for the contraction order solver (see cutensornetContractionOptimize())
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionOptimizerConfig() is called once
optimizerConfig
is no longer required.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerConfig – [out] This data structure holds all information about the user-requested hyper-optimization parameters.
cutensornetDestroyContractionOptimizerConfig
¶
-
cutensornetStatus_t cutensornetDestroyContractionOptimizerConfig(cutensornetContractionOptimizerConfig_t optimizerConfig)¶
Frees all the memory associated with
optimizerConfig
.- Parameters
optimizerConfig – [inout] Opaque structure.
cutensornetContractionOptimizerConfigGetAttribute
¶
-
cutensornetStatus_t cutensornetContractionOptimizerConfigGetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t optimizerConfig, cutensornetContractionOptimizerConfigAttributes_t attr, void *buf, size_t sizeInBytes)¶
Gets attributes of
optimizerConfig
.- Parameters
handle – [in] Opaque handle holding cuTENSORNet’s library context.
optimizerConfig – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size
sizeInBytes
) holds the value that corresponds toattr
withinoptimizerConfig
.sizeInBytes – [in] Size of
buf
(in bytes).
cutensornetContractionOptimizerConfigSetAttribute
¶
-
cutensornetStatus_t cutensornetContractionOptimizerConfigSetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t optimizerConfig, cutensornetContractionOptimizerConfigAttributes_t attr, const void *buf, size_t sizeInBytes)¶
Sets attributes of
optimizerConfig
.- Parameters
handle – [in] Opaque handle holding cuTENSORNet’s library context.
optimizerConfig – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size
sizeInBytes
) determines the value to whichattr
will be set.sizeInBytes – [in] Size of
buf
(in bytes).
cutensornetCreateContractionOptimizerInfo
¶
-
cutensornetStatus_t cutensornetCreateContractionOptimizerInfo(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, cutensornetContractionOptimizerInfo_t *optimizerInfo)¶
Allocates resources for
optimizerInfo
.Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionOptimizerInfo() is called once
optimizerInfo
is no longer required.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity) for which
optimizerInfo
is created.optimizerInfo – [out] Pointer to cutensornetContractionOptimizerInfo_t.
cutensornetDestroyContractionOptimizerInfo
¶
-
cutensornetStatus_t cutensornetDestroyContractionOptimizerInfo(cutensornetContractionOptimizerInfo_t optimizerInfo)¶
Frees all the memory associated with
optimizerInfo
.- Parameters
optimizerInfo – [inout] Opaque structure.
cutensornetContractionOptimize
¶
-
cutensornetStatus_t cutensornetContractionOptimize(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerConfig_t optimizerConfig, uint64_t workspaceSizeConstraint, cutensornetContractionOptimizerInfo_t optimizerInfo)¶
Computes an “optimized” contraction order as well as slicing info (for more information see Overview section) for a given tensor network such that the total time to solution is minimized while adhering to the user-provided memory constraint.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the topology of the tensor network (i.e., all tensors, their connectivity and modes).
optimizerConfig – [in] Holds all hyper-optimization parameters that govern the search for an “optimal” contraction order.
workspaceSizeConstraint – [in] Maximal device memory that will be provided by the user (i.e., cuTensorNet has to find a viable path/slicing solution within this user-defined constraint).
optimizerInfo – [inout] On return, this object will hold all necessary information about the optimized path and the related slicing information.
optimizerInfo
will hold information including (see cutensornetContractionOptimizerInfoAttributes_t):Total number of slices.
Total number of sliced modes.
Information about the sliced modes (i.e., the IDs of the sliced modes (see
modesIn
w.r.t. cutensornetCreateNetworkDescriptor()) as well as their extents (see Overview section for additional documentation).Optimized path.
FLOP count.
Total number of elements in the largest intermediate tensor.
The mode labels for all intermediate tensors.
The estimated runtime and “effective” flops.
cutensornetContractionOptimizerInfoGetAttribute
¶
-
cutensornetStatus_t cutensornetContractionOptimizerInfoGetAttribute(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetContractionOptimizerInfoAttributes_t attr, void *buf, size_t sizeInBytes)¶
Gets attributes of
optimizerInfo
.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size
sizeInBytes
) holds the value that corresponds toattr
withinoptimizeInfo
.sizeInBytes – [in] Size of
buf
(in bytes).
cutensornetContractionOptimizerInfoSetAttribute
¶
-
cutensornetStatus_t cutensornetContractionOptimizerInfoSetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetContractionOptimizerInfoAttributes_t attr, const void *buf, size_t sizeInBytes)¶
Sets attributes of optimizerInfo.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size
sizeInBytes
) determines the value to whichattr
will be set.sizeInBytes – [in] Size of
buf
(in bytes).
cutensornetContractionOptimizerInfoGetPackedSize
¶
-
cutensornetStatus_t cutensornetContractionOptimizerInfoGetPackedSize(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, size_t *sizeInBytes)¶
Gets the packed size of the
optimizerInfo
object.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure of type cutensornetContractionOptimizerInfo_t.
sizeInBytes – [out] The packed size (in bytes).
cutensornetContractionOptimizerInfoPackData
¶
-
cutensornetStatus_t cutensornetContractionOptimizerInfoPackData(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, void *buffer, size_t sizeInBytes)¶
Packs the
optimizerInfo
object into the provided buffer.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure of type cutensornetContractionOptimizerInfo_t.
buffer – [out] On return, this buffer holds the contents of optimizerInfo in packed form.
sizeInBytes – [in] The size of the buffer (in bytes).
cutensornetCreateContractionOptimizerInfoFromPackedData
¶
-
cutensornetStatus_t cutensornetCreateContractionOptimizerInfoFromPackedData(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const void *buffer, size_t sizeInBytes, cutensornetContractionOptimizerInfo_t *optimizerInfo)¶
Create an optimizerInfo object from the provided buffer.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity) for which
optimizerInfo
is created.buffer – [in] A buffer with the contents of optimizerInfo in packed form.
sizeInBytes – [in] The size of the buffer (in bytes).
optimizerInfo – [out] Pointer to cutensornetContractionOptimizerInfo_t.
cutensornetUpdateContractionOptimizerInfoFromPackedData
¶
-
cutensornetStatus_t cutensornetUpdateContractionOptimizerInfoFromPackedData(const cutensornetHandle_t handle, const void *buffer, size_t sizeInBytes, cutensornetContractionOptimizerInfo_t optimizerInfo)¶
Update the provided
optimizerInfo
object from the provided buffer.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
buffer – [in] A buffer with the contents of optimizerInfo in packed form.
sizeInBytes – [in] The size of the buffer (in bytes).
optimizerInfo – [inout] Opaque object of type cutensornetContractionOptimizerInfo_t that will be updated.
Contraction Plan API¶
cutensornetCreateContractionPlan
¶
-
cutensornetStatus_t cutensornetCreateContractionPlan(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetContractionPlan_t *plan)¶
Initializes a cutensornetContractionPlan_t.
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionPlan() is called once
plan
is no longer required.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity).
optimizerInfo – [in] Opaque structure.
workDesc – [in] Opaque structure describing the workspace. At the creation of the contraction plan, only the workspace size is needed; the pointer to the workspace memory may be left null. If a device memory handler is set,
workDesc
can be set either to null (in which case the “recommended” workspace size is inferred, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid cutensornetWorkspaceDescriptor_t with the desired workspace size set and a null workspace pointer, see Memory Management API section.plan – [out] cuTensorNet’s contraction plan holds all the information required to perform the tensor contractions; to be precise, it initializes a
cutensorContractionPlan_t
for each tensor contraction that is required to contract the entire tensor network.
cutensornetDestroyContractionPlan
¶
-
cutensornetStatus_t cutensornetDestroyContractionPlan(cutensornetContractionPlan_t plan)¶
Frees all resources owned by
plan
.- Parameters
plan – [inout] Opaque structure.
cutensornetContractionAutotune
¶
-
cutensornetStatus_t cutensornetContractionAutotune(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, const cutensornetWorkspaceDescriptor_t workDesc, const cutensornetContractionAutotunePreference_t pref, cudaStream_t stream)¶
Auto-tunes the contraction plan to find the best
cutensorContractionPlan_t
for each pair-wise contraction.Note
This function is blocking due to the nature of the auto-tuning process.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [inout] The plan must already be created (see cutensornetCreateContractionPlan()); the individual contraction plans will be fine-tuned.
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor());
rawDataIn[i]
points to the data associated with the i-th input tensor (in device memory).rawDataOut – [out] Points to the raw data of the output tensor (in device memory).
workDesc – [in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than both the minimum needed and the value provided at plan creation). See cutensornetCreateContractionPlan(), cutensornetWorkspaceGetSize() & cutensornetWorkspaceSet(). If a device memory handler is set, the
workDesc
can be set to null, or the workspace pointer inworkDesc
can be set to null, and the workspace size can be set either to 0 (in which case the “recommended” size is used, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid size. A workspace of the specified size will be drawn from the user’s mempool and released back once done.pref – [in] Controls the auto-tuning process and gives the user control over how much time is spent in this routine.
stream – [in] The CUDA stream on which the computation is performed.
cutensornetCreateContractionAutotunePreference
¶
-
cutensornetStatus_t cutensornetCreateContractionAutotunePreference(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t *autotunePreference)¶
Sets up the required auto-tune parameters for the contraction plan.
Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionAutotunePreference() is called once
autotunePreference
is no longer required.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
autotunePreference – [out] This data structure holds all information about the user-requested auto-tune parameters.
cutensornetContractionAutotunePreferenceGetAttribute
¶
-
cutensornetStatus_t cutensornetContractionAutotunePreferenceGetAttribute(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t autotunePreference, cutensornetContractionAutotunePreferenceAttributes_t attr, void *buf, size_t sizeInBytes)¶
Gets attributes of
autotunePreference
.- Parameters
handle – [in] Opaque handle holding cuTENSORNet’s library context.
autotunePreference – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size
sizeInBytes
) holds the value that corresponds toattr
withinautotunePreference
.sizeInBytes – [in] Size of
buf
(in bytes).
cutensornetContractionAutotunePreferenceSetAttribute
¶
-
cutensornetStatus_t cutensornetContractionAutotunePreferenceSetAttribute(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t autotunePreference, cutensornetContractionAutotunePreferenceAttributes_t attr, const void *buf, size_t sizeInBytes)¶
Sets attributes of
autotunePreference
.- Parameters
handle – [in] Opaque handle holding cuTENSORNet’s library context.
autotunePreference – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size
sizeInBytes
) determines the value to whichattr
will be set.sizeInBytes – [in] Size of
buf
(in bytes).
cutensornetDestroyContractionAutotunePreference
¶
-
cutensornetStatus_t cutensornetDestroyContractionAutotunePreference(cutensornetContractionAutotunePreference_t autotunePreference)¶
Frees all the memory associated with
autotunePreference
.- Parameters
autotunePreference – [inout] Opaque structure.
Workspace Management API¶
cutensornetCreateWorkspaceDescriptor
¶
-
cutensornetStatus_t cutensornetCreateWorkspaceDescriptor(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t *workDesc)¶
Creates a workspace descriptor that holds information about the user provided memory buffer.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [out] Pointer to the opaque workspace descriptor.
cutensornetWorkspaceComputeSizes
¶
-
cutensornetStatus_t cutensornetWorkspaceComputeSizes(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetWorkspaceDescriptor_t workDesc)¶
Computes the workspace size needed to contract the input tensor network using the provided contraction path.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity).
optimizerInfo – [in] Opaque structure.
workDesc – [out] The workspace descriptor in which the information is collected.
cutensornetWorkspaceGetSize
¶
-
cutensornetStatus_t cutensornetWorkspaceGetSize(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetWorksizePref_t workPref, cutensornetMemspace_t memSpace, uint64_t *workspaceSize)¶
Retrieves the needed workspace size for the given workspace preference and memory space.
The needed sizes must be pre-calculated by calling cutensornetWorkspaceComputeSizes().
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [in] Opaque structure describing the workspace.
workPref – [in] Preference of workspace for planning.
memSpace – [in] The memory space where the workspace is allocated.
workspaceSize – [out] Needed workspace size.
cutensornetWorkspaceSet
¶
-
cutensornetStatus_t cutensornetWorkspaceSet(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, void *const workspacePtr, uint64_t workspaceSize)¶
Sets the memory address and workspace size of workspace provided by user.
A workspace is valid in the following cases:
workspacePtr
is valid andworkspaceSize
> 0workspacePtr
is null andworkspaceSize
> 0 (used during cutensornetCreateContractionPlan() to provide the available workspace).workspacePtr
is null andworkspaceSize
= 0 (workspace memory will be drawn from the user’s mempool)
A workspace will be validated against the minimal required at usage (cutensornetCreateContractionPlan(), cutensornetContractionAutotune(), cutensornetContraction())
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [inout] Opaque structure describing the workspace.
memSpace – [in] The memory space where the workspace is allocated.
workspacePtr – [in] Workspace memory pointer, may be null.
workspaceSize – [in] Workspace size, must be >= 0.
cutensornetWorkspaceGet
¶
-
cutensornetStatus_t cutensornetWorkspaceGet(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, void **workspacePtr, uint64_t *workspaceSize)¶
Retrieves the memory address and workspace size of workspace hosted in the workspace descriptor.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [in] Opaque structure describing the workspace.
memSpace – [in] The memory space where the workspace is allocated.
workspacePtr – [out] Workspace memory pointer.
workspaceSize – [out] Workspace size.
cutensornetDestroyWorkspaceDescriptor
¶
-
cutensornetStatus_t cutensornetDestroyWorkspaceDescriptor(cutensornetWorkspaceDescriptor_t desc)¶
Frees the workspace descriptor.
Note that this API does not free the memory provided by cutensornetWorkspaceSet().
- Parameters
desc – [inout] Opaque structure.
Network Contraction API¶
cutensornetContraction
¶
-
cutensornetStatus_t cutensornetContraction(const cutensornetHandle_t handle, const cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, const cutensornetWorkspaceDescriptor_t workDesc, int64_t sliceId, cudaStream_t stream)¶
DEPRECATED: Performs the actual contraction of the tensor network.
Note
If multiple slices are created, the order of contracting over slices using cutensornetContraction() should be ascending starting from slice 0. If parallelizing over slices manually (in any fashion: streams, devices, processes, etc.), please make sure the output tensors (that are subject to a global reduction) are zero-initialized.
Note
This function is asynchronous w.r.t. the calling CPU thread. The user should guarantee that the memory buffer provided in
workDesc
is valid until a synchronization with the stream or the device is executed.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [in] Encodes the execution of a tensor network contraction (see cutensornetCreateContractionPlan() and cutensornetContractionAutotune()).
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor());
rawDataIn[i]
points to the data associated with the i-th input tensor (in device memory).rawDataOut – [out] Points to the raw data of the output tensor (in device memory).
workDesc – [in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than both the minimum needed and the value provided at plan creation). See cutensornetCreateContractionPlan(), cutensornetWorkspaceGetSize() & cutensornetWorkspaceSet()). If a device memory handler is set, then
workDesc
can be set to null, or the workspace pointer inworkDesc
can be set to null, and the workspace size can be set either to 0 (in which case the “recommended” size is used, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid size. A workspace of the specified size will be drawn from the user’s mempool and released back once done.sliceId – [in] The ID of the slice that is currently contracted (this value ranges between
0
andoptimizerInfo.numSlices
); use0
if no slices are used.stream – [in] The CUDA stream on which the computation is performed.
cutensornetContractSlices
¶
-
cutensornetStatus_t cutensornetContractSlices(const cutensornetHandle_t handle, const cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, int32_t accumulateOutput, const cutensornetWorkspaceDescriptor_t workDesc, const cutensornetSliceGroup_t sliceGroup, cudaStream_t stream)¶
Performs the actual contraction of the tensor network.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [in] Encodes the execution of a tensor network contraction (see cutensornetCreateContractionPlan() and cutensornetContractionAutotune()).
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor());
rawDataIn[i]
points to the data associated with the i-th input tensor (in device memory).rawDataOut – [out] Points to the raw data of the output tensor (in device memory).
accumulateOutput – [in] If 0, write the contraction result into rawDataOut; otherwise accumulate the result into rawDataOut.
workDesc – [in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than both the minimum needed and the value provided at plan creation). See cutensornetCreateContractionPlan(), cutensornetWorkspaceGetSize() & cutensornetWorkspaceSet()). If a device memory handler is set, then
workDesc
can be set to null, or the workspace pointer inworkDesc
can be set to null, and the workspace size can be set either to 0 (in which case the “recommended” size is used, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid size. A workspace of the specified size will be drawn from the user’s mempool and released back once done.sliceGroup – [in] Opaque object specifying the slices to be contracted (see cutensornetCreateSliceGroupFromIDRange() and cutensornetCreateSliceGroupFromIDs()). If set to null, all slices will be contracted.
stream – [in] The CUDA stream on which the computation is performed.
Slice Group API¶
cutensornetCreateSliceGroupFromIDRange
¶
-
cutensornetStatus_t cutensornetCreateSliceGroupFromIDRange(const cutensornetHandle_t handle, int64_t sliceIdStart, int64_t sliceIdStop, int64_t sliceIdStep, cutensornetSliceGroup_t *sliceGroup)¶
Create a
cutensornetSliceGroup_t
object from a range, which produces a sequence of slice IDs from the specified start (inclusive) to the specified stop (exclusive) values with the specified step. The sequence can be increasing or decreasing depending on the start and stop values.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
sliceIdStart – [in] The start slice ID.
sliceIdStop – [in] The final slice ID is the largest (smallest) integer that excludes this value and all those above (below) for an increasing (decreasing) sequence.
sliceIdStep – [in] The step size between two successive slice IDs. A negative step size should be specified for a decreasing sequence.
sliceGroup – [out] Opaque object specifying the slice IDs.
cutensornetCreateSliceGroupFromIDs
¶
-
cutensornetStatus_t cutensornetCreateSliceGroupFromIDs(const cutensornetHandle_t handle, const int64_t *beginIDSequence, const int64_t *endIDSequence, cutensornetSliceGroup_t *sliceGroup)¶
Create a
cutensornetSliceGroup_t
object from a sequence of slice IDs. Duplicates in the input slice ID sequence will be removed.- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
beginIDSequence – [in] A pointer to the beginning of the slice ID sequence.
endIDSequence – [in] A pointer to the end of the slice ID sequence.
sliceGroup – [out] Opaque object specifying the slice IDs.
cutensornetDestroySliceGroup
¶
-
cutensornetStatus_t cutensornetDestroySliceGroup(cutensornetSliceGroup_t sliceGroup)¶
Releases the resources associated with a
cutensornetSliceGroup_t
object and sets its value to null.- Parameters
sliceGroup – [inout] Opaque object specifying the slices to be contracted (see cutensornetCreateSliceGroupFromIDRange() and cutensornetCreateSliceGroupFromIDs()).
Memory Management API¶
A stream-ordered memory allocator (or mempool for short) allocates/deallocates memory asynchronously from/to a mempool in a
stream-ordered fashion, meaning memory operations and computations enqueued on the streams have a well-defined inter- and intra-
stream dependency. There are several well-implemented stream-ordered mempools available, such as cudaMemPool_t
that is built-in
at the CUDA driver level since CUDA 11.2 (so that all CUDA applications in the same process can easily share the same pool,
see here) and the RAPIDS
Memory Manager (RMM). For a detailed introduction, see the NVIDIA Developer Blog.
The new device memory handler APIs allow users to bind a stream-ordered mempool to the library handle, such that cuTensorNet can take care of most of the memory management for users. Below is an illustration of what can be done:
MyMemPool pool = MyMemPool(); // kept alive for the entire process in real apps
int my_alloc(void* ctx, void** ptr, size_t size, cudaStream_t stream) {
// assuming this is the memory allocation routine provided by my mempool
return reinterpret_cast<MyMemPool*>(ctx)->alloc(ptr, size, stream);
}
int my_dealloc(void* ctx, void* ptr, size_t size, cudaStream_t stream) {
// assuming this is the memory deallocation routine provided by my mempool
return reinterpret_cast<MyMemPool*>(ctx)->dealloc(ptr, size, stream);
}
// create a mem handler and fill in the required members for the library to use
cutensornetDeviceMemHandler_t handler;
handler.ctx = reinterpret_cast<void*>(&pool);
handler.device_alloc = my_alloc;
handler.device_free = my_dealloc;
memcpy(handler.name, std::string("my pool").c_str(), CUTENSORNET_ALLOCATOR_NAME_LEN);
// bind the handler to the library handle
cutensornetSetDeviceMemHandler(handle, &handler);
/* ... perform the network creation & optimization as usual ... */
// create a workspace descriptor
cutensornetWorkspaceDescriptor_t workDesc;
// (this step is optional and workDesc can be set to NULL if one just wants
// to use the "recommended" workspace size)
cutensornetCreateWorkspaceDescriptor(handle, &workDesc);
// User doesn’t compute the required sizes
// User doesn’t query the workspace size (but one can if desired)
// User doesn’t allocate memory!
// User sets workspacePtr=NULL for the corresponding memory space (device, in this case) to indicate the library should
// draw memory (of the "recommended" size, if the workspace size is set to 0 as shown below) from the user's pool;
// if a nonzero size is set, we would use the given size instead of the recommended one.
// (this step is also optional if workDesc has been set to NULL)
cutensornetWorkspaceSet(handle, workDesc, CUTENSORNET_MEMSPACE_DEVICE, NULL, 0);
// create a contraction plan
cutensornetContractionPlan_t plan;
cutensornetCreateContractionPlan(handle, descNet, optimizerInfo, workDesc, &plan);
// autotune the plan with the workspace
cutensornetContractionAutotune(handle, plan, rawDataIn, rawDataOut, workDesc, pref, stream);
// perform actual contraction with the workspace
for (int sliceId=0; sliceId<num_slices; sliceId++) {
cutensornetContraction(
handle, plan, rawDataIn, rawDataOut, workDesc, sliceId, stream);
}
// clean up
cutensornetDestroyContractionPlan(plan);
cutensornetDestroyWorkspaceDescriptor(workDesc); // optional if workDesc has been set to NULL
// User doesn’t deallocate memory!
As shown above, several calls to the workspace-related APIs can be skipped. Moreover, allowing the library to share your memory pool not only can alleviate potential memory conflicts, but also enable possible optimizations.
Note
In the current release, only a device mempool can be bound.
cutensornetSetDeviceMemHandler
¶
-
cutensornetStatus_t cutensornetSetDeviceMemHandler(cutensornetHandle_t handle, const cutensornetDeviceMemHandler_t *devMemHandler)¶
Set the current device memory handler.
Once set, when cuTensorNet needs device memory in various API calls it will allocate from the user-provided memory pool and deallocate at completion. See cutensornetDeviceMemHandler_t and APIs that require cutensornetWorkspaceDescriptor_t for further detail.
The internal stream order is established using the user-provided stream passed to cutensornetContractionAutotune() and cutensornetContraction().
Warning
It is undefined behavior for the following scenarios:
the library handle is bound to a memory handler and subsequently to another handler
the library handle outlives the attached memory pool
the memory pool is not stream-ordered
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
devMemHandler – [in] the device memory handler that encapsulates the user’s mempool. The struct content is copied internally.
cutensornetGetDeviceMemHandler
¶
-
cutensornetStatus_t cutensornetGetDeviceMemHandler(cutensornetHandle_t handle, cutensornetDeviceMemHandler_t *devMemHandler)¶
Get the current device memory handler.
- Parameters
handle – [in] Opaque handle holding cuTensorNet’s library context.
devMemHandler – [out] If previously set, the struct pointed to by
handler
is filled in, otherwise CUTENSORNET_STATUS_NO_DEVICE_ALLOCATOR is returned.
Error Management API¶
cutensornetGetErrorString
¶
-
const char *cutensornetGetErrorString(cutensornetStatus_t error)¶
Returns the description string for an error code.
Remark
non-blocking, no reentrant, and thread-safe
- Parameters
error – [in] Error code to convert to string.
- Returns
the error string
Logger API¶
cutensornetLoggerSetCallback
¶
-
cutensornetStatus_t cutensornetLoggerSetCallback(cutensornetLoggerCallback_t callback)¶
This function sets the logging callback routine.
- Parameters
callback – [in] Pointer to a callback function. Check cutensornetLoggerCallback_t.
cutensornetLoggerSetCallbackData
¶
-
cutensornetStatus_t cutensornetLoggerSetCallbackData(cutensornetLoggerCallbackData_t callback, void *userData)¶
This function sets the logging callback routine, along with user data.
- Parameters
callback – [in] Pointer to a callback function. Check cutensornetLoggerCallbackData_t.
userData – [in] Pointer to user-provided data to be used by the callback.
cutensornetLoggerSetFile
¶
-
cutensornetStatus_t cutensornetLoggerSetFile(FILE *file)¶
This function sets the logging output file.
- Parameters
file – [in] An open file with write permission.
cutensornetLoggerOpenFile
¶
-
cutensornetStatus_t cutensornetLoggerOpenFile(const char *logFile)¶
This function opens a logging output file in the given path.
- Parameters
logFile – [in] Path to the logging output file.
cutensornetLoggerSetLevel
¶
-
cutensornetStatus_t cutensornetLoggerSetLevel(int32_t level)¶
This function sets the value of the logging level.
- Parameters
level – [in] Log level, should be one of the following:
Level
Summary
Long Description
“0”
Off
logging is disabled (default)
“1”
Errors
only errors will be logged
“2”
Performance Trace
API calls that launch CUDA kernels will log their parameters and important information
“3”
Performance Hints
hints that can potentially improve the application’s performance
“4”
Heuristics Trace
provides general information about the library execution, may contain details about heuristic status
“5”
API Trace
API Trace - API calls will log their parameter and important information
cutensornetLoggerSetMask
¶
-
cutensornetStatus_t cutensornetLoggerSetMask(int32_t mask)¶
This function sets the value of the log mask.
Refer to cutensornetLoggerSetLevel() for details.
- Parameters
mask – [in] Value of the logging mask. Masks are defined as a combination (bitwise OR) of the following masks:
Level
Description
“0”
Off
“1”
Errors
“2”
Performance Trace
“4”
Performance Hints
“8”
Heuristics Trace
“16”
API Trace
cutensornetLoggerForceDisable
¶
-
cutensornetStatus_t cutensornetLoggerForceDisable()¶
This function disables logging for the entire run.