cuTensorNet Functions¶

Handle Management API¶

`cutensornetCreate`¶

cutensornetStatus_t cutensornetCreate(cutensornetHandle_t *handle)¶

Initializes the cuTensorNet library.

The device associated with a particular cuTensorNet handle is assumed to remain unchanged after the cutensornetCreate() call. In order for the cuTensorNet library to use a different device, the application must set the new device to be used by calling cudaSetDevice() and then create another cuTensorNet handle, which will be associated with the new device, by calling cutensornetCreate().

Remark

blocking, no reentrant, and thread-safe

Parameters: handle – [out] Pointer to cutensornetHandle_t
Returns: CUTENSORNET_STATUS_SUCCESS on success and an error code otherwise

`cutensornetDestroy`¶

cutensornetStatus_t cutensornetDestroy(cutensornetHandle_t handle)¶

Destroys the cuTensorNet library handle.

This function releases resources used by the cuTensorNet library handle. This function is the last call with a particular handle to the cuTensorNet library. Calling any cuTensorNet function which uses cutensornetHandle_t after cutensornetDestroy() will return an error.

Parameters: handle – [inout] Opaque handle holding cuTensorNet’s library context.

Network Descriptor API¶

`cutensornetCreateNetworkDescriptor`¶

cutensornetStatus_t cutensornetCreateNetworkDescriptor(const cutensornetHandle_t handle, int32_t numInputs, const int32_t numModesIn[], const int64_t *const extentsIn[], const int64_t *const stridesIn[], const int32_t *const modesIn[], const cutensornetTensorQualifiers_t qualifiersIn[], int32_t numModesOut, const int64_t extentsOut[], const int64_t stridesOut[], const int32_t modesOut[], cudaDataType_t dataType, cutensornetComputeType_t computeType, cutensornetNetworkDescriptor_t *descNet)¶

Initializes a cutensornetNetworkDescriptor_t, describing the connectivity (i.e., network topology) between the tensors.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyNetworkDescriptor() is called once descNet is no longer required.

Supported data-type combinations are:

Data type	Compute type	Tensor Core
CUDA_R_16F	CUTENSORNET_COMPUTE_32F	Volta+
CUDA_R_16BF	CUTENSORNET_COMPUTE_32F	Ampere+
CUDA_R_32F	CUTENSORNET_COMPUTE_32F	No
CUDA_R_32F	CUTENSORNET_COMPUTE_TF32	Ampere+
CUDA_R_32F	CUTENSORNET_COMPUTE_3XTF32	Ampere+
CUDA_R_32F	CUTENSORNET_COMPUTE_16BF	Ampere+
CUDA_R_32F	CUTENSORNET_COMPUTE_16F	Volta+
CUDA_R_64F	CUTENSORNET_COMPUTE_64F	Ampere+
CUDA_R_64F	CUTENSORNET_COMPUTE_32F	No
CUDA_C_32F	CUTENSORNET_COMPUTE_32F	No
CUDA_C_32F	CUTENSORNET_COMPUTE_TF32	Ampere+
CUDA_C_32F	CUTENSORNET_COMPUTE_3XTF32	Ampere+
CUDA_C_64F	CUTENSORNET_COMPUTE_64F	Ampere+
CUDA_C_64F	CUTENSORNET_COMPUTE_32F	No

Note

If stridesIn (stridesOut) is set to 0 (NULL), it means the input tensors (output tensor) are in the Fortran (column-major) layout.

numModesOut can be set to -1 for cuTensorNet to infer the output modes based on the input modes, or to 0 to perform a full reduction.

If qualifiersIn is set to 0 (NULL), cuTensorNet will use the defaults in cutensornetTensorQualifiers_t .

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
numInputs – [in] Number of input tensors.
numModesIn – [in] Array of size numInputs; numModesIn[i] denotes the number of modes available in the i-th tensor.
extentsIn – [in] Array of size numInputs; extentsIn[i] has numModesIn[i] many entries with extentsIn[i][j] (j < numModesIn[i]) corresponding to the extent of the j-th mode of tensor i.
stridesIn – [in] Array of size numInputs; stridesIn[i] has numModesIn[i] many entries with stridesIn[i][j] (j < numModesIn[i]) corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of tensor i.
modesIn – [in] Array of size numInputs; modesIn[i] has numModesIn[i] many entries each entry corresponds to a mode. Each mode that does not appear in the input tensor is implicitly contracted.
qualifiersIn – [in] Array of size numInputs; qualifiersIn[i] denotes the qualifiers of i-th input tensor. Refer to cutensornetTensorQualifiers_t
numModesOut – [in] number of modes of the output tensor. On entry, if this value is -1 and the output modes are not provided, the network will infer the output modes. If this value is 0, the network is force reduced.
extentsOut – [in] Array of size numModesOut; extentsOut[j] (j < numModesOut) corresponding to the extent of the j-th mode of the output tensor.
stridesOut – [in] Array of size numModesOut; stridesOut[j] (j < numModesOut) corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of the output tensor.
modesOut – [in] Array of size numModesOut; modesOut[j] denotes the j-th mode of the output tensor. output tensor.
dataType – [in] Denotes the data type for all input an output tensors.
computeType – [in] Denotes the compute type used throughout the computation.
descNet – [out] Pointer to a cutensornetNetworkDescriptor_t.

`cutensornetDestroyNetworkDescriptor`¶

cutensornetStatus_t cutensornetDestroyNetworkDescriptor(cutensornetNetworkDescriptor_t desc)¶

Frees all the memory associated with the network descriptor.

Parameters: desc – [inout] Opaque handle to a tensor network descriptor.

`cutensornetNetworkGetAttribute`¶

cutensornetStatus_t cutensornetNetworkGetAttribute(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t networkDesc, cutensornetNetworkAttributes_t attr, void *buf, size_t sizeInBytes)¶

Gets attributes of networkDescriptor.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
networkDesc – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within networkDesc.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetNetworkSetAttribute`¶

cutensornetStatus_t cutensornetNetworkSetAttribute(const cutensornetHandle_t handle, cutensornetNetworkDescriptor_t networkDesc, cutensornetNetworkAttributes_t attr, const void *buf, size_t sizeInBytes)¶

Sets attributes of networkDescriptor.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
networkDesc – [inout] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetGetOutputTensorDetails`¶

cutensornetStatus_t cutensornetGetOutputTensorDetails(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, int32_t *numModes, size_t *dataSize, int32_t *modeLabels, int64_t *extents, int64_t *strides)¶

DEPRECATED: Gets the number of output modes, data size, modes, extents, and strides of the output tensor.

If all information regarding the output tensor is needed by the user, this function should be called twice (the first time to retrieve numModesOut for allocating memory, and the second to retrieve modesOut, extentsOut, and stridesOut).

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Pointer to a cutensornetNetworkDescriptor_t.
numModes – [out] on return, holds the number of modes of the output tensor. Cannot be null.
dataSize – [out] if not null on return, holds the size (in bytes) of the memory needed for the output tensor. Optionally, can be null.
modeLabels – [out] if not null on return, holds the modes of the output tensor. Optionally, can be null.
extents – [out] if not null on return, holds the extents of the output tensor. Optionally, can be null.
strides – [out] if not null on return, holds the strides of the output tensor. Optionally, can be null.

`cutensornetGetOutputTensorDescriptor`¶

cutensornetStatus_t cutensornetGetOutputTensorDescriptor(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, cutensornetTensorDescriptor_t *outputTensorDesc)¶

Creates a cutensornetTensorDescriptor_t representing the output tensor of the network.

This function will create a descriptor pointed to by outputTensorDesc. The user is responsible for calling cutensornetDestroyTensorDescriptor to destroy the descriptor.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Pointer to a cutensornetNetworkDescriptor_t.
outputTensorDesc – [out] an opaque cutensornetTensorDescriptor_t struct. Cannot be null. On return, a new cutensornetTensorDescriptor_t holds the meta-data of the descNet output tensor.

Tensor Descriptor API¶

`cutensornetCreateTensorDescriptor`¶

cutensornetStatus_t cutensornetCreateTensorDescriptor(const cutensornetHandle_t handle, int32_t numModes, const int64_t extents[], const int64_t strides[], const int32_t modes[], cudaDataType_t dataType, cutensornetTensorDescriptor_t *descTensor)¶

Initializes a cutensornetTensorDescriptor_t, describing the information of a tensor.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyTensorDescriptor() is called once descTensor is no longer required.

Note

If strides is set to NULL, it means the tensor is in the Fortran (column-major) layout.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
numModes – [in] The number of modes of the tensor.
extents – [in] Array of size numModes; extents[j] corresponding to the extent of the j-th mode of the tensor.
strides – [in] Array of size numModes; strides[j] corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of the tensor.
modes – [in] Array of size numModes; modes[j] denotes the j-th mode of the tensor.
dataType – [in] Denotes the data type for the tensor.
descTensor – [out] Pointer to a cutensornetTensorDescriptor_t.

`cutensornetGetTensorDetails`¶

cutensornetStatus_t cutensornetGetTensorDetails(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t tensorDesc, int32_t *numModes, size_t *dataSize, int32_t *modeLabels, int64_t *extents, int64_t *strides)¶

Gets the number of output modes, data size, mode labels, extents, and strides of a tensor.

If all information regarding the tensor is needed by the user, this function should be called twice (the first time to retrieve numModes for allocating memory, and the second to retrieve modeLabels, extents, and strides).

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
tensorDesc – [in] Opaque handle to a tensor descriptor.
numModes – [out] On return, holds the number of modes of the tensor. Cannot be null.
dataSize – [out] If not null on return, holds the size (in bytes) of the memory needed for the tensor. Optionally, can be null.
modeLabels – [out] If not null on return, holds the modes of the tensor. Optionally, can be null.
extents – [out] If not null on return, holds the extents of the tensor. Optionally, can be null.
strides – [out] If not null on return, holds the strides of the tensor. Optionally, can be null.

`cutensornetDestroyTensorDescriptor`¶

cutensornetStatus_t cutensornetDestroyTensorDescriptor(cutensornetTensorDescriptor_t descTensor)¶

Frees all the memory associated with the tensor descriptor.

Parameters: descTensor – [inout] Opaque handle to a tensor descriptor.

Contraction Optimizer API¶

`cutensornetCreateContractionOptimizerConfig`¶

cutensornetStatus_t cutensornetCreateContractionOptimizerConfig(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t *optimizerConfig)¶

Sets up the required hyper-optimization parameters for the contraction order solver (see cutensornetContractionOptimize())

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionOptimizerConfig() is called once optimizerConfig is no longer required.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerConfig – [out] This data structure holds all information about the user-requested hyper-optimization parameters.

`cutensornetDestroyContractionOptimizerConfig`¶

cutensornetStatus_t cutensornetDestroyContractionOptimizerConfig(cutensornetContractionOptimizerConfig_t optimizerConfig)¶

Frees all the memory associated with optimizerConfig.

Parameters: optimizerConfig – [inout] Opaque structure.

`cutensornetContractionOptimizerConfigGetAttribute`¶

cutensornetStatus_t cutensornetContractionOptimizerConfigGetAttribute(const cutensornetHandle_t handle, const cutensornetContractionOptimizerConfig_t optimizerConfig, cutensornetContractionOptimizerConfigAttributes_t attr, void *buf, size_t sizeInBytes)¶

Gets attributes of optimizerConfig.

Parameters

handle – [in] Opaque handle holding cuTENSORNet’s library context.
optimizerConfig – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within optimizerConfig.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetContractionOptimizerConfigSetAttribute`¶

cutensornetStatus_t cutensornetContractionOptimizerConfigSetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t optimizerConfig, cutensornetContractionOptimizerConfigAttributes_t attr, const void *buf, size_t sizeInBytes)¶

Sets attributes of optimizerConfig.

Parameters

handle – [in] Opaque handle holding cuTENSORNet’s library context.
optimizerConfig – [inout] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetCreateContractionOptimizerInfo`¶

cutensornetStatus_t cutensornetCreateContractionOptimizerInfo(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, cutensornetContractionOptimizerInfo_t *optimizerInfo)¶

Allocates resources for optimizerInfo.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionOptimizerInfo() is called once optimizerInfo is no longer required.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity) for which optimizerInfo is created.
optimizerInfo – [out] Pointer to cutensornetContractionOptimizerInfo_t.

`cutensornetDestroyContractionOptimizerInfo`¶

cutensornetStatus_t cutensornetDestroyContractionOptimizerInfo(cutensornetContractionOptimizerInfo_t optimizerInfo)¶

Frees all the memory associated with optimizerInfo.

Parameters: optimizerInfo – [inout] Opaque structure.

`cutensornetContractionOptimize`¶

cutensornetStatus_t cutensornetContractionOptimize(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerConfig_t optimizerConfig, uint64_t workspaceSizeConstraint, cutensornetContractionOptimizerInfo_t optimizerInfo)¶

Computes an “optimized” contraction order as well as slicing info (for more information see Overview section) for a given tensor network such that the total time to solution is minimized while adhering to the user-provided memory constraint.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the topology of the tensor network (i.e., all tensors, their connectivity and modes).
optimizerConfig – [in] Holds all hyper-optimization parameters that govern the search for an “optimal” contraction order.
workspaceSizeConstraint – [in] Maximal device memory that will be provided by the user (i.e., cuTensorNet has to find a viable path/slicing solution within this user-defined constraint).
optimizerInfo – [inout] On return, this object will hold all necessary information about the optimized path and the related slicing information. optimizerInfo will hold information including (see cutensornetContractionOptimizerInfoAttributes_t):
- Total number of slices.
- Total number of sliced modes.
- Information about the sliced modes (i.e., the IDs of the sliced modes (see modesIn w.r.t. cutensornetCreateNetworkDescriptor()) as well as their extents (see Overview section for additional documentation).
- Optimized path.
- FLOP count.
- Total number of elements in the largest intermediate tensor.
- The mode labels for all intermediate tensors.
- The estimated runtime and “effective” flops.

`cutensornetContractionOptimizerInfoGetAttribute`¶

cutensornetStatus_t cutensornetContractionOptimizerInfoGetAttribute(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetContractionOptimizerInfoAttributes_t attr, void *buf, size_t sizeInBytes)¶

Gets attributes of optimizerInfo.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within optimizeInfo.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetContractionOptimizerInfoSetAttribute`¶

cutensornetStatus_t cutensornetContractionOptimizerInfoSetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetContractionOptimizerInfoAttributes_t attr, const void *buf, size_t sizeInBytes)¶

Sets attributes of optimizerInfo.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [inout] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetContractionOptimizerInfoGetPackedSize`¶

cutensornetStatus_t cutensornetContractionOptimizerInfoGetPackedSize(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, size_t *sizeInBytes)¶

Gets the packed size of the optimizerInfo object.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure of type cutensornetContractionOptimizerInfo_t.
sizeInBytes – [out] The packed size (in bytes).

`cutensornetContractionOptimizerInfoPackData`¶

cutensornetStatus_t cutensornetContractionOptimizerInfoPackData(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, void *buffer, size_t sizeInBytes)¶

Packs the optimizerInfo object into the provided buffer.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
optimizerInfo – [in] Opaque structure of type cutensornetContractionOptimizerInfo_t.
buffer – [out] On return, this buffer holds the contents of optimizerInfo in packed form.
sizeInBytes – [in] The size of the buffer (in bytes).

`cutensornetCreateContractionOptimizerInfoFromPackedData`¶

cutensornetStatus_t cutensornetCreateContractionOptimizerInfoFromPackedData(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const void *buffer, size_t sizeInBytes, cutensornetContractionOptimizerInfo_t *optimizerInfo)¶

Create an optimizerInfo object from the provided buffer.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity) for which optimizerInfo is created.
buffer – [in] A buffer with the contents of optimizerInfo in packed form.
sizeInBytes – [in] The size of the buffer (in bytes).
optimizerInfo – [out] Pointer to cutensornetContractionOptimizerInfo_t.

`cutensornetUpdateContractionOptimizerInfoFromPackedData`¶

cutensornetStatus_t cutensornetUpdateContractionOptimizerInfoFromPackedData(const cutensornetHandle_t handle, const void *buffer, size_t sizeInBytes, cutensornetContractionOptimizerInfo_t optimizerInfo)¶

Update the provided optimizerInfo object from the provided buffer.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
buffer – [in] A buffer with the contents of optimizerInfo in packed form.
sizeInBytes – [in] The size of the buffer (in bytes).
optimizerInfo – [inout] Opaque object of type cutensornetContractionOptimizerInfo_t that will be updated.

Contraction Plan API¶

`cutensornetCreateContractionPlan`¶

cutensornetStatus_t cutensornetCreateContractionPlan(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetContractionPlan_t *plan)¶

Initializes a cutensornetContractionPlan_t.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionPlan() is called once plan is no longer required.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity).
optimizerInfo – [in] Opaque structure.
workDesc – [in] Opaque structure describing the workspace. At the creation of the contraction plan, only the workspace size is needed; the pointer to the workspace memory may be left null. If a device memory handler is set, workDesc can be set either to null (in which case the “recommended” workspace size is inferred, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid cutensornetWorkspaceDescriptor_t with the desired workspace size set and a null workspace pointer, see Memory Management API section.
plan – [out] cuTensorNet’s contraction plan holds all the information required to perform the tensor contractions; to be precise, it initializes a cutensorContractionPlan_t for each tensor contraction that is required to contract the entire tensor network.

`cutensornetDestroyContractionPlan`¶

cutensornetStatus_t cutensornetDestroyContractionPlan(cutensornetContractionPlan_t plan)¶

Frees all resources owned by plan.

Parameters: plan – [inout] Opaque structure.

`cutensornetContractionAutotune`¶

cutensornetStatus_t cutensornetContractionAutotune(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, cutensornetWorkspaceDescriptor_t workDesc, const cutensornetContractionAutotunePreference_t pref, cudaStream_t stream)¶

Auto-tunes the contraction plan to find the best cutensorContractionPlan_t for each pair-wise contraction.

Note

This function is blocking due to the nature of the auto-tuning process.

Input and output data pointers are recommended to be 256-byte aligned for best performance.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [inout] The plan must already be created (see cutensornetCreateContractionPlan()); the individual contraction plans will be fine-tuned.
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor()); rawDataIn[i] points to the data associated with the i-th input tensor (in device memory).
rawDataOut – [out] Points to the raw data of the output tensor (in device memory).
workDesc – [in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than both the minimum needed and the value provided at plan creation). See cutensornetCreateContractionPlan(), cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory(). If a device memory handler is set, the workDesc can be set to null, or the workspace pointer in workDesc can be set to null, and the workspace size can be set either to 0 (in which case the “recommended” size is used, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid size. A workspace of the specified size will be drawn from the user’s mempool and released back once done.
pref – [in] Controls the auto-tuning process and gives the user control over how much time is spent in this routine.
stream – [in] The CUDA stream on which the computation is performed.

`cutensornetCreateContractionAutotunePreference`¶

cutensornetStatus_t cutensornetCreateContractionAutotunePreference(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t *autotunePreference)¶

Sets up the required auto-tune parameters for the contraction plan.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionAutotunePreference() is called once autotunePreference is no longer required.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
autotunePreference – [out] This data structure holds all information about the user-requested auto-tune parameters.

`cutensornetContractionAutotunePreferenceGetAttribute`¶

cutensornetStatus_t cutensornetContractionAutotunePreferenceGetAttribute(const cutensornetHandle_t handle, const cutensornetContractionAutotunePreference_t autotunePreference, cutensornetContractionAutotunePreferenceAttributes_t attr, void *buf, size_t sizeInBytes)¶

Gets attributes of autotunePreference.

Parameters

handle – [in] Opaque handle holding cuTENSORNet’s library context.
autotunePreference – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within autotunePreference.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetContractionAutotunePreferenceSetAttribute`¶

cutensornetStatus_t cutensornetContractionAutotunePreferenceSetAttribute(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t autotunePreference, cutensornetContractionAutotunePreferenceAttributes_t attr, const void *buf, size_t sizeInBytes)¶

Sets attributes of autotunePreference.

Parameters

handle – [in] Opaque handle holding cuTENSORNet’s library context.
autotunePreference – [inout] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetDestroyContractionAutotunePreference`¶

cutensornetStatus_t cutensornetDestroyContractionAutotunePreference(cutensornetContractionAutotunePreference_t autotunePreference)¶

Frees all the memory associated with autotunePreference.

Parameters: autotunePreference – [inout] Opaque structure.

Workspace Management API¶

`cutensornetCreateWorkspaceDescriptor`¶

cutensornetStatus_t cutensornetCreateWorkspaceDescriptor(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t *workDesc)¶

Creates a workspace descriptor that holds information about the user provided memory buffer.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [out] Pointer to the opaque workspace descriptor.

`cutensornetWorkspaceComputeSizes`¶

cutensornetStatus_t cutensornetWorkspaceComputeSizes(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetWorkspaceDescriptor_t workDesc)¶

DEPRECATED: Computes the workspace size needed to contract the input tensor network using the provided contraction path.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity).
optimizerInfo – [in] Opaque structure.
workDesc – [out] The workspace descriptor in which the information is collected.

`cutensornetWorkspaceComputeContractionSizes`¶

cutensornetStatus_t cutensornetWorkspaceComputeContractionSizes(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetWorkspaceDescriptor_t workDesc)¶

Computes the workspace size needed to contract the input tensor network using the provided contraction path.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descNet – [in] Describes the tensor network (i.e., its tensors and their connectivity).
optimizerInfo – [in] Opaque structure.
workDesc – [out] The workspace descriptor in which the information is collected.

`cutensornetWorkspaceComputeQRSizes`¶

cutensornetStatus_t cutensornetWorkspaceComputeQRSizes(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorIn, const cutensornetTensorDescriptor_t descTensorQ, const cutensornetTensorDescriptor_t descTensorR, cutensornetWorkspaceDescriptor_t workDesc)¶

Computes the workspace size needed to perform the tensor QR operation.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descTensorIn – [in] Describes the modes, extents and other metadata information for a tensor.
descTensorQ – [in] Describes the modes, extents and other metadata information for the output tensor Q.
descTensorR – [in] Describes the modes, extents and other metadata information for the output tensor R.
workDesc – [out] The workspace descriptor in which the information is collected.

`cutensornetWorkspaceComputeSVDSizes`¶

cutensornetStatus_t cutensornetWorkspaceComputeSVDSizes(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorIn, const cutensornetTensorDescriptor_t descTensorU, const cutensornetTensorDescriptor_t descTensorV, const cutensornetTensorSVDConfig_t svdConfig, cutensornetWorkspaceDescriptor_t workDesc)¶

Computes the workspace size needed to perform the tensor SVD operation.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descTensorIn – [in] Describes the modes, extents and other metadata information for a tensor.
descTensorU – [in] Describes the modes, extents and other metadata information for the output tensor U.
descTensorV – [in] Describes the modes, extents and other metadata information for the output tensor V.
svdConfig – [in] This data structure holds the user-requested svd parameters.
workDesc – [out] The workspace descriptor in which the information is collected.

`cutensornetWorkspaceComputeGateSplitSizes`¶

cutensornetStatus_t cutensornetWorkspaceComputeGateSplitSizes(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorInA, const cutensornetTensorDescriptor_t descTensorInB, const cutensornetTensorDescriptor_t descTensorInG, const cutensornetTensorDescriptor_t descTensorU, const cutensornetTensorDescriptor_t descTensorV, const cutensornetGateSplitAlgo_t gateAlgo, const cutensornetTensorSVDConfig_t svdConfig, cutensornetComputeType_t computeType, cutensornetWorkspaceDescriptor_t workDesc)¶

Computes the workspace size needed to perform the gating operation.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descTensorInA – [in] Describes the modes, extents, and other metadata information of the input tensor A.
descTensorInB – [in] Describes the modes, extents, and other metadata information of the input tensor B.
descTensorInG – [in] Describes the modes, extents, and other metadata information of the input gate tensor.
descTensorU – [in] Describes the modes, extents, and other metadata information of the output U tensor. The extents of uncontracted modes are expected to be consistent with descTensorInA and descTensorInG.
descTensorV – [in] Describes the modes, extents, and other metadata information of the output V tensor. The extents of uncontracted modes are expected to be consistent with descTensorInB and descTensorInG.
gateAlgo – [in] The algorithm to use for splitting the gate tensor onto tensor A and B.
svdConfig – [in] Opaque structure holding the user-requested SVD parameters.
computeType – [in] Denotes the compute type used throughout the computation.
workDesc – [out] Opaque structure describing the workspace.

`cutensornetWorkspaceGetSize`¶

cutensornetStatus_t cutensornetWorkspaceGetSize(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetWorksizePref_t workPref, cutensornetMemspace_t memSpace, uint64_t *workspaceSize)¶

DEPRECATED: Retrieves the needed workspace size for the given workspace preference and memory space.

The needed sizes for different tasks must be pre-calculated by calling the corresponding API, e.g, cutensornetWorkspaceComputeContractionSizes(), cutensornetWorkspaceComputeQRSizes(), cutensornetWorkspaceComputeSVDSizes() and cutensornetWorkspaceComputeGateSplitSizes().

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [in] Opaque structure describing the workspace.
workPref – [in] Preference of workspace for planning.
memSpace – [in] The memory space where the workspace is allocated.
workspaceSize – [out] Needed workspace size.

`cutensornetWorkspaceGetMemorySize`¶

cutensornetStatus_t cutensornetWorkspaceGetMemorySize(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetWorksizePref_t workPref, cutensornetMemspace_t memSpace, cutensornetWorkspaceKind_t workKind, int64_t *memorySize)¶

Retrieves the needed workspace size for the given workspace preference, memory space, workspace kind.

The needed sizes for different tasks must be pre-calculated by calling the corresponding API, e.g, cutensornetWorkspaceComputeContractionSizes(), cutensornetWorkspaceComputeQRSizes(), cutensornetWorkspaceComputeSVDSizes() and cutensornetWorkspaceComputeGateSplitSizes().

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [in] Opaque structure describing the workspace.
workPref – [in] Preference of workspace for planning.
memSpace – [in] The memory space where the workspace is allocated.
workKind – [in] The kind of workspace.
memorySize – [out] Needed workspace size.

`cutensornetWorkspaceSet`¶

cutensornetStatus_t cutensornetWorkspaceSet(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, void *const workspacePtr, uint64_t workspaceSize)¶

DEPRECATED: Sets the memory address and workspace size of workspace provided by user.

A workspace is valid in the following cases:

workspacePtr is valid and workspaceSize > 0
workspacePtr is null and workspaceSize > 0 (used during cutensornetCreateContractionPlan() to provide the available workspace).
workspacePtr is null and workspaceSize = 0 (workspace memory will be drawn from the user’s mempool)

A workspace will be validated against the minimal required at usage (cutensornetCreateContractionPlan(), cutensornetContractionAutotune(), cutensornetContraction())

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [inout] Opaque structure describing the workspace.
memSpace – [in] The memory space where the workspace is allocated.
workspacePtr – [in] Workspace memory pointer, may be null.
workspaceSize – [in] Workspace size, must be >= 0.

`cutensornetWorkspaceSetMemory`¶

cutensornetStatus_t cutensornetWorkspaceSetMemory(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, cutensornetWorkspaceKind_t workKind, void *const memoryPtr, int64_t memorySize)¶

Sets the memory address and workspace size of the workspace provided by user.

A workspace is valid in the following cases:

memoryPtr is valid and memorySize > 0
memoryPtr is null and memorySize > 0: used to indicate memory with the indicated memorySize should be drawn from the mempool, or for cutensornetCreateContractionPlan() to indicate the available workspace size.
memoryPtr is null and memorySize = 0: indicates the workspace of the specified kind is disabled (currently applies to CACHE kind only).
memoryPtr is null and memorySize < 0: indicates workspace memory should be drawn from the user’s mempool with the CUTENSORNET_WORKSIZE_PREF_RECOMMENDED size (see cutensornetWorksizePref_t).

The memorySize of the SCRATCH kind will be validated against the minimal required at usage (cutensornetCreateContractionPlan(), cutensornetContractionAutotune(), cutensornetContraction(), cutensornetContractSlices()) The CACHE memory size can be any, the larger the better.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [inout] Opaque structure describing the workspace.
memSpace – [in] The memory space where the workspace is allocated.
workKind – [in] The kind of workspace.
memoryPtr – [in] Workspace memory pointer, may be null.
memorySize – [in] Workspace size.

`cutensornetWorkspaceGet`¶

cutensornetStatus_t cutensornetWorkspaceGet(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, void **workspacePtr, uint64_t *workspaceSize)¶

DEPRECATED: Retrieves the memory address and workspace size of workspace hosted in the workspace descriptor.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [in] Opaque structure describing the workspace.
memSpace – [in] The memory space where the workspace is allocated.
workspacePtr – [out] Workspace memory pointer.
workspaceSize – [out] Workspace size.

`cutensornetWorkspaceGetMemory`¶

cutensornetStatus_t cutensornetWorkspaceGetMemory(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, cutensornetWorkspaceKind_t workKind, void **memoryPtr, int64_t *memorySize)¶

Retrieves the memory address and workspace size of workspace hosted in the workspace descriptor.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [in] Opaque structure describing the workspace.
memSpace – [in] The memory space where the workspace is allocated.
workKind – [in] The kind of workspace.
memoryPtr – [out] Workspace memory pointer.
memorySize – [out] Workspace size.

`cutensornetWorkspacePurgeCache`¶

cutensornetStatus_t cutensornetWorkspacePurgeCache(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace)¶

Purges the cached data in the specified memory space.

Purges/invalidates the cached data in the CUTENSORNET_WORKSPACE_CACHE workspace kind on the memSpace memory space, but does not free the memory nor return it to the memory pool.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
workDesc – [inout] Opaque structure describing the workspace.
memSpace – [in] The memory space where the workspace is allocated.

`cutensornetDestroyWorkspaceDescriptor`¶

cutensornetStatus_t cutensornetDestroyWorkspaceDescriptor(cutensornetWorkspaceDescriptor_t desc)¶

Frees the workspace descriptor.

Note that this API does not free the memory provided by cutensornetWorkspaceSetMemory().

Parameters: desc – [inout] Opaque structure.

Network Contraction API¶

`cutensornetContraction`¶

cutensornetStatus_t cutensornetContraction(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, cutensornetWorkspaceDescriptor_t workDesc, int64_t sliceId, cudaStream_t stream)¶

DEPRECATED: Performs the actual contraction of the tensor network.

Note

If multiple slices are created, the order of contracting over slices using cutensornetContraction() should be ascending starting from slice 0. If parallelizing over slices manually (in any fashion: streams, devices, processes, etc.), please make sure the output tensors (that are subject to a global reduction) are zero-initialized.

Input and output data pointers are recommended to be 256-byte aligned for best performance.

This function is asynchronous w.r.t. the calling CPU thread. The user should guarantee that the memory buffer provided in workDesc is valid until a synchronization with the stream or the device is executed.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [inout] Encodes the execution of a tensor network contraction (see cutensornetCreateContractionPlan() and cutensornetContractionAutotune()). Some internal meta-data may be updated upon contraction.
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor()); rawDataIn[i] points to the data associated with the i-th input tensor (in device memory).
rawDataOut – [out] Points to the raw data of the output tensor (in device memory).
workDesc – [in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than both the minimum needed and the value provided at plan creation). See cutensornetCreateContractionPlan(), cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory()). If a device memory handler is set, then workDesc can be set to null, or the workspace pointer in workDesc can be set to null, and the workspace size can be set either to 0 (in which case the “recommended” size is used, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid size. A workspace of the specified size will be drawn from the user’s mempool and released back once done.
sliceId – [in] The ID of the slice that is currently contracted (this value ranges between 0 and optimizerInfo.numSlices); use 0 if no slices are used.
stream – [in] The CUDA stream on which the computation is performed.

`cutensornetContractSlices`¶

cutensornetStatus_t cutensornetContractSlices(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, int32_t accumulateOutput, cutensornetWorkspaceDescriptor_t workDesc, const cutensornetSliceGroup_t sliceGroup, cudaStream_t stream)¶

Performs the actual contraction of the tensor network.

Note

Input and output data pointers are recommended to be at least 256-byte aligned for best performance.

Warning

In the current release, this function will synchronize the stream in case distributed execution is activated (via cutensornetDistributedResetConfiguration)

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [inout] Encodes the execution of a tensor network contraction (see cutensornetCreateContractionPlan() and cutensornetContractionAutotune()). Some internal meta-data may be updated upon contraction.
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified in cutensornetCreateNetworkDescriptor()): rawDataIn[i] points to the data associated with the i-th input tensor (in device memory).
rawDataOut – [out] Points to the raw data of the output tensor (in device memory).
accumulateOutput – [in] If 0, write the contraction result into rawDataOut; otherwise accumulate the result into rawDataOut.
workDesc – [in] Opaque structure describing the workspace. The provided CUTENSORNET_WORKSPACE_SCRATCH workspace must be valid (the workspace size must be the same as or larger than both the minimum needed and the value provided at plan creation). See cutensornetCreateContractionPlan(), cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory(). The provided CUTENSORNET_WORKSPACE_CACHE workspace can be of any size, the larger the better, up to the size that can be queried with cutensornetWorkspaceGetMemorySize(). If a device memory handler is set, then workDesc can be set to null, or the memory pointer in workDesc of either the workspace kinds can be set to null, and the workspace size can be set either to a negative value (in which case the “recommended” size is used, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid size. For a workspace of kind CUTENSORNET_WORKSPACE_SCRATCH, a memory buffer with the specified size will be drawn from the user’s mempool and released back once done. For a workspace of kind CUTENSORNET_WORKSPACE_CACHE, a memory buffer with the specified size will be drawn from the user’s mempool and released back once the workDesc is destroyed, if workDesc != NULL, otherwise, once the plan is destroyed, or an alternative workDesc with a different memory address/size is provided in a subsequent cutensornetContractSlices() call.
sliceGroup – [in] Opaque object specifying the slices to be contracted (see cutensornetCreateSliceGroupFromIDRange() and cutensornetCreateSliceGroupFromIDs()). If set to null, all slices will be contracted.
stream – [in] The CUDA stream on which the computation is performed.

Gradient Computation API¶

`cutensornetComputeGradientsBackward`¶

cutensornetStatus_t cutensornetComputeGradientsBackward(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], const void *outputGradient, void *const gradients[], int32_t accumulateOutput, cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t stream)¶

Computes the gradients of the network w.r.t. the input tensors whose gradients are required. The network must have been contracted and loaded in the workDesc CACHE. Operates only on networks with single slice and no singleton modes.

Note

This function is experimental and is subject to change in future releases.

Note

This function should be preceded with a call to cutensornetContractSlices(); Both calls to cutensornetContractSlices() and cutensornetComputeGradientsBackward() should use either the same workDesc instance (in order to share the CACHE memory), or both pass null to workDesc to use same mempool allocation for CACHE. workDesc and plan should not be altered in between these calls.

Calling cutensornetWorkspacePurgeCache() is necessary for computing gradients of different data sets (the combo cutensornetContractSlices() and cutensornetComputeGradientsBackward() calls generate cached data that is only valid for the corresponding dataset and should be purged when the input tensors’ data change)

Input data’s, output data’s, and workspace buffers’ pointers are recommended to be at least 256-byte aligned for best performance.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
plan – [inout] Encodes the execution of a tensor network contraction (see cutensornetCreateContractionPlan() and cutensornetContractionAutotune()). Some internal meta-data may be updated upon contraction.
rawDataIn – [in] Array of N pointers (N being the number of input tensors specified in cutensornetCreateNetworkDescriptor()): rawDataIn[i] points to the data associated with the i-th input tensor (in device memory).
outputGradient – [in] Gradient of the output tensor (in device memory). Must have the same memory layout (strides) as the output tensor of the tensor network.
gradients – [inout] Array of N pointers: gradients[i] points to the gradient data associated with the i-th input tensor in device memory. Setting gradients[i] to null would skip computing the gradient of the i-th input tensor. Generated gradient data has the same memory layout (strides) as their corresponding input tensors.
accumulateOutput – [in] If 0, write the gradient results into gradients; otherwise accumulates the results into gradients.
workDesc – [in] Opaque structure describing the workspace. The provided CUTENSORNET_WORKSPACE_SCRATCH workspace must be valid (the workspace size must be the same as or larger than the minimum needed). See cutensornetWorkspaceComputeContractionSizes(), cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory(). The provided CUTENSORNET_WORKSPACE_CACHE workspace must be valid, and contains the cached intermediate tensors from the corresponding cutensornetContractSlices() call. If a device memory handler is set, and workDesc is set to null, or the memory pointer in workDesc of either the workspace kinds is set to null, for both calls to cutensornetContractSlices() and cutensornetComputeGradientsBackward(), memory will be drawn from the memory pool. See cutensornetContractSlices() for details.
stream – [in] The CUDA stream on which the computation is performed.

Slice Group API¶

`cutensornetCreateSliceGroupFromIDRange`¶

cutensornetStatus_t cutensornetCreateSliceGroupFromIDRange(const cutensornetHandle_t handle, int64_t sliceIdStart, int64_t sliceIdStop, int64_t sliceIdStep, cutensornetSliceGroup_t *sliceGroup)¶

Create a cutensornetSliceGroup_t object from a range, which produces a sequence of slice IDs from the specified start (inclusive) to the specified stop (exclusive) values with the specified step. The sequence can be increasing or decreasing depending on the start and stop values.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
sliceIdStart – [in] The start slice ID.
sliceIdStop – [in] The final slice ID is the largest (smallest) integer that excludes this value and all those above (below) for an increasing (decreasing) sequence.
sliceIdStep – [in] The step size between two successive slice IDs. A negative step size should be specified for a decreasing sequence.
sliceGroup – [out] Opaque object specifying the slice IDs.

`cutensornetCreateSliceGroupFromIDs`¶

cutensornetStatus_t cutensornetCreateSliceGroupFromIDs(const cutensornetHandle_t handle, const int64_t *beginIDSequence, const int64_t *endIDSequence, cutensornetSliceGroup_t *sliceGroup)¶

Create a cutensornetSliceGroup_t object from a sequence of slice IDs. Duplicates in the input slice ID sequence will be removed.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
beginIDSequence – [in] A pointer to the beginning of the slice ID sequence.
endIDSequence – [in] A pointer to the end of the slice ID sequence.
sliceGroup – [out] Opaque object specifying the slice IDs.

`cutensornetDestroySliceGroup`¶

cutensornetStatus_t cutensornetDestroySliceGroup(cutensornetSliceGroup_t sliceGroup)¶

Releases the resources associated with a cutensornetSliceGroup_t object and sets its value to null.

Parameters: sliceGroup – [inout] Opaque object specifying the slices to be contracted (see cutensornetCreateSliceGroupFromIDRange() and cutensornetCreateSliceGroupFromIDs()).

Approximate Tensor Network Execution API¶

`cutensornetTensorQR`¶

cutensornetStatus_t cutensornetTensorQR(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorIn, const void *const rawDataIn, const cutensornetTensorDescriptor_t descTensorQ, void *q, const cutensornetTensorDescriptor_t descTensorR, void *r, const cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t stream)¶

Performs QR decomposition of a tensor.

The partition of all input modes in descTensorIn is specified in descTensorQ and descTensorR. descTensorQ and descTensorR are expected to share exactly one mode and the extent of that mode shall not exceed the minimum of m (row dimension) and n (column dimension) of the equivalent combined matrix QR.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descTensorIn – [in] Describes the modes, extents, and other metadata information of a tensor.
rawDataIn – [in] Pointer to the raw data of the input tensor (in device memory).
descTensorQ – [in] Describes the modes, extents, and other metadata information of the output tensor Q.
q – [out] Pointer to the output tensor data Q (in device memory).
descTensorR – [in] Describes the modes, extents, and other metadata information of the output tensor R.
r – [out] Pointer to the output tensor data R (in device memory).
workDesc – [in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than the minimum needed). See cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory().
stream – [in] The CUDA stream on which the computation is performed.

`cutensornetTensorSVD`¶

cutensornetStatus_t cutensornetTensorSVD(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorIn, const void *const rawDataIn, cutensornetTensorDescriptor_t descTensorU, void *u, void *s, cutensornetTensorDescriptor_t descTensorV, void *v, const cutensornetTensorSVDConfig_t svdConfig, cutensornetTensorSVDInfo_t svdInfo, const cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t stream)¶

Performs SVD decomposition of a tensor.

The partition of all input modes in descTensorIn is specified in descTensorU and descTensorV. descTensorU and descTensorV are expected to share exactly one mode. The extent of the shared mode shall not exceed the minimum of m (row dimension) and n (column dimension) for the equivalent combined matrix SVD. The following variants of tensor SVD are supported:

1. Exact SVD: This can be specified by setting the extent of the shared mode in descTensorU and descTensorV to be the mininum of m and n, and setting svdConfig to NULL.
2. SVD with fixed extent truncation: This can be specified by setting the extent of the shared mode in descTensorU and descTensorV to be lower than the mininum of m and n.
3. SVD with value-based truncation: This can be specified by setting CUTENSORNET_TENSOR_SVD_CONFIG_ABS_CUTOFF or CUTENSORNET_TENSOR_SVD_CONFIG_REL_CUTOFF attribute of svdConfig.
4. SVD with a combination of fixed extent and value-based truncation as described above.

Note

In the case of exact SVD or SVD with fixed extent truncation, descTensorU and descTensorV will remain constant after the execution. The data in u and v will respect the extent and stride in these tensor descriptors.

When value-based truncation is requested in svdConfig, cutensornetTensorSVD searches for the minimal extent that satifies both the value-based truncation and fixed extent requirement. If the resulting extent is found to be the same as the one specified in U/V tensor descriptors, the extent and stride from the tensor descriptors will be respected. If the resulting extent is found to be lower than the one specified in U/V tensor descriptors, the data in u and v will adopt a new Fortran-layout matching the reduced extent found. The extent and stride in descTensorU and descTensorV will also be overwritten to reflect this change. The user can query the reduced extent with cutensornetTensorSVDInfoGetAttribute() or cutensornetGetTensorDetails() (which also returns the new strides).

As the reduced size for value-based truncation is not known until runtime, the user should always allocate based on the full data size specified by the initial descTensorU and descTensorV for u and v.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descTensorIn – [in] Describes the modes, extents, and other metadata information of a tensor.
rawDataIn – [in] Pointer to the raw data of the input tensor (in device memory).
descTensorU – [inout] Describes the modes, extents, and other metadata information of the output tensor U. The extents for uncontracted modes are expected to be consistent with descTensorIn.
u – [out] Pointer to the output tensor data U (in device memory).
s – [out] Pointer to the output tensor data S (in device memory). Can be NULL when the CUTENSORNET_TENSOR_SVD_CONFIG_S_PARTITION attribute of svdConfig is not set to default (CUTENSORNET_TENSOR_SVD_PARTITION_NONE).
descTensorV – [inout] Describes the modes, extents, and other metadata information of the output tensor V.
v – [out] Pointer to the output tensor data V (in device memory).
svdConfig – [in] This data structure holds the user-requested SVD parameters. Can be NULL if users do not need to perform value-based truncation or singular value partitioning.
svdInfo – [out] Opaque structure holding all information about the trucation at runtime. Can be NULL if runtime information on singular value truncation is not needed.
workDesc – [in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than the minimum needed). See cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory().
stream – [in] The CUDA stream on which the computation is performed.

`cutensornetGateSplit`¶

cutensornetStatus_t cutensornetGateSplit(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorInA, const void *rawDataInA, const cutensornetTensorDescriptor_t descTensorInB, const void *rawDataInB, const cutensornetTensorDescriptor_t descTensorInG, const void *rawDataInG, cutensornetTensorDescriptor_t descTensorU, void *u, void *s, cutensornetTensorDescriptor_t descTensorV, void *v, const cutensornetGateSplitAlgo_t gateAlgo, const cutensornetTensorSVDConfig_t svdConfig, cutensornetComputeType_t computeType, cutensornetTensorSVDInfo_t svdInfo, const cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t stream)¶

Performs gate split operation.

descTensorInA, descTensorInB, and descTensorInG are expected to form a fully connected graph where the uncontracted modes are partitioned onto descTensorU and descTensorV via tensor SVD. descTensorU and descTensorV are expected to share exactly one mode. The extent of that mode shall not exceed the minimum of m (row dimension) and n (column dimension) of the smallest equivalent matrix SVD problem.

Note

The options for truncation and the treatment of extent and stride follows the same logic as tensor SVD, see cutensornetTensorSVD().

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
descTensorInA – [in] Describes the modes, extents, and other metadata information of the input tensor A.
rawDataInA – [in] Pointer to the raw data of the input tensor A (in device memory).
descTensorInB – [in] Describes the modes, extents, and other metadata information of the input tensor B.
rawDataInB – [in] Pointer to the raw data of the input tensor B (in device memory).
descTensorInG – [in] Describes the modes, extents, and other metadata information of the input gate tensor.
rawDataInG – [in] Pointer to the raw data of the input gate tensor G (in device memory).
descTensorU – [in] Describes the modes, extents, and other metadata information of the output U tensor. The extents of uncontracted modes are expected to be consistent with descTensorInA and descTensorInG.
u – [out] Pointer to the output tensor data U (in device memory).
s – [out] Pointer to the output tensor data S (in device memory). Can be NULL when the CUTENSORNET_TENSOR_SVD_CONFIG_S_PARTITION attribute of svdConfig is not set to default (CUTENSORNET_TENSOR_SVD_PARTITION_NONE).
descTensorV – [in] Describes the modes, extents, and other metadata information of the output V tensor. The extents of uncontracted modes are expected to be consistent with descTensorInB and descTensorInG.
v – [out] Pointer to the output tensor data V (in device memory).
gateAlgo – [in] The algorithm to use for splitting the gate tensor into tensor A and B.
svdConfig – [in] Opaque structure holding the user-requested SVD parameters.
computeType – [in] Denotes the compute type used throughout the computation.
svdInfo – [out] Opaque structure holding all information about the truncation at runtime.
workDesc – [in] Opaque structure describing the workspace.
stream – [in] The CUDA stream on which the computation is performed.

Tensor SVD Config API¶

`cutensornetCreateTensorSVDConfig`¶

cutensornetStatus_t cutensornetCreateTensorSVDConfig(const cutensornetHandle_t handle, cutensornetTensorSVDConfig_t *svdConfig)¶

Sets up the options for singular value decomposition and truncation.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyTensorSVDConfig() is called once svdConfig is no longer required.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
svdConfig – [out] This data structure holds the user-requested svd parameters.

`cutensornetDestroyTensorSVDConfig`¶

cutensornetStatus_t cutensornetDestroyTensorSVDConfig(cutensornetTensorSVDConfig_t svdConfig)¶

Frees all the memory associated with the tensor svd configuration.

Parameters: svdConfig – [inout] Opaque handle to a tensor svd configuration.

`cutensornetTensorSVDConfigGetAttribute`¶

cutensornetStatus_t cutensornetTensorSVDConfigGetAttribute(const cutensornetHandle_t handle, const cutensornetTensorSVDConfig_t svdConfig, cutensornetTensorSVDConfigAttributes_t attr, void *buf, size_t sizeInBytes)¶

Gets attributes of svdConfig.

Parameters

handle – [in] Opaque handle holding cuTENSORNet’s library context.
svdConfig – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within svdConfig.
sizeInBytes – [in] Size of buf (in bytes).

`cutensornetTensorSVDConfigSetAttribute`¶

cutensornetStatus_t cutensornetTensorSVDConfigSetAttribute(const cutensornetHandle_t handle, cutensornetTensorSVDConfig_t svdConfig, cutensornetTensorSVDConfigAttributes_t attr, const void *buf, size_t sizeInBytes)¶

Sets attributes of svdConfig.

Parameters

handle – [in] Opaque handle holding cuTENSORNet’s library context.
svdConfig – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [in] This buffer (of size sizeInBytes) determines the value to which attr will be set.
sizeInBytes – [in] Size of buf (in bytes).

Tensor SVD Info API¶

`cutensornetCreateTensorSVDInfo`¶

cutensornetStatus_t cutensornetCreateTensorSVDInfo(const cutensornetHandle_t handle, cutensornetTensorSVDInfo_t *svdInfo)¶

Sets up the information for singular value decomposition.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyTensorSVDInfo() is called once svdInfo is no longer required.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
svdInfo – [out] This data structure holds all information about the trucation at runtime.

`cutensornetDestroyTensorSVDInfo`¶

cutensornetStatus_t cutensornetDestroyTensorSVDInfo(cutensornetTensorSVDInfo_t svdInfo)¶

Frees all the memory associated with the TensorSVDInfo object.

Parameters: svdInfo – [inout] Opaque handle to a TensorSVDInfo object.

`cutensornetTensorSVDInfoGetAttribute`¶

cutensornetStatus_t cutensornetTensorSVDInfoGetAttribute(const cutensornetHandle_t handle, const cutensornetTensorSVDInfo_t svdInfo, cutensornetTensorSVDInfoAttributes_t attr, void *buf, size_t sizeInBytes)¶

Gets attributes of svdInfo.

Parameters

handle – [in] Opaque handle holding cuTENSORNet’s library context.
svdInfo – [in] Opaque structure that is accessed.
attr – [in] Specifies the attribute that is requested.
buf – [out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within svdConfig.
sizeInBytes – [in] Size of buf (in bytes).

Distributed Parallelization API¶

`cutensornetDistributedResetConfiguration`¶

cutensornetStatus_t cutensornetDistributedResetConfiguration(cutensornetHandle_t handle, const void *commPtr, size_t commSize)¶

Resets the distributed MPI parallelization configuration.

This function accepts a user-provided MPI communicator in a type-erased form and stores a copy of it inside the cuTensorNet library handle. The provided MPI communicator must be explicitly created by calling MPI_Comm_dup (please see the MPI specification). The subsequent calls to the contraction path finder, contraction plan autotuning, and contraction execution will be parallelized across all MPI processes in the provided MPI communicator. The provided MPI communicator is owned by the user, it should stay alive until the next reset call with a different MPI communicator. If NULL is provided as the pointer to the MPI communicator, no parallelization will be applied to the above mentioned procedures such that those procedures will execute redundantly across all MPI processes. As an example, please refer to the tensornet_example_mpi_auto.cu sample.

To enable distributed parallelism, cuTensorNet requires users to set an environment variable $CUTENSORNET_COMM_LIB containing the path to a shared library wrapping the communication primitives. For MPI users, we ship a wrapper source file cutensornet_distributed_interface_mpi.c that can be compiled against the target MPI library using the build script provided in the same folder inside the tar archive distribution. cuTensorNet will use the included function pointers to perform inter-process communication using the chosen MPI library.

Warning

This is a collective call that must be executed by all MPI processes. Note that one can still provide different (non-NULL) MPI communicators to different subgroups of MPI processes (to create concurrent cuTensorNet distributed subgroups).

The provided MPI communicator must not be used by more than one cuTensorNet library handle. This is automatically ensured by using MPI_Comm_dup.

The current library implementation assumes one GPU instance per MPI rank since the cutensornet library handle is associated with a single GPU instance. In case of multiple GPUs per node, each MPI process running on the same node may still see all GPU devices if CUDA_VISIBLE_DEVICES was not set to provide an exclusive access to each GPU. In such a case, the cutensornet library runtime will assign GPU #(processRank % numVisibleDevices), where processRank is the rank of the current process in its MPI communicator, and numVisibleDevices is the number of GPU devices visible to the current MPI process. The assigned GPU must coincide with the one associated with the cutensornet library handle, otherwise resulting in an error. To ensure consistency, the user must call cudaSetDevice in each MPI process to select the correct GPU device prior to creating a cutensornet library handle.

It is user’s responsibility to ensure that each MPI process in each provided MPI communicator executes exactly the same sequence of cutensornet API calls, which otherwise will result in an undefined behavior.

Parameters

handle – [in] cuTensorNet library handle.
commPtr – [in] A pointer to the provided MPI communicator created by MPI_Comm_dup.
commSize – [in] The size of the provided MPI communicator: sizeof(MPI_Comm).

`cutensornetDistributedGetNumRanks`¶

cutensornetStatus_t cutensornetDistributedGetNumRanks(const cutensornetHandle_t handle, int32_t *numRanks)¶

Queries the number of MPI ranks in the current distributed MPI configuration.

Warning

The number of ranks corresponds to the MPI communicator used by the current MPI process. If different subgroups of MPI processes used different MPI communicators, the reported number will refer to their specific MPI communicators.

Parameters

handle – [in] cuTensorNet library handle.
numRanks – [out] Number of MPI ranks in the current distributed MPI configuration.

`cutensornetDistributedGetProcRank`¶

cutensornetStatus_t cutensornetDistributedGetProcRank(const cutensornetHandle_t handle, int32_t *procRank)¶

Queries the rank of the current MPI process in the current distributed MPI configuration.

Warning

The MPI process rank corresponds to the MPI communicator used by that MPI process. If different subgroups of MPI processes used different MPI communicators, the reported number will refer to their specific MPI communicators.

Parameters

handle – [in] cuTensorNet library handle.
procRank – [out] Rank of the current MPI process in the current distributed MPI configuration.

`cutensornetDistributedSynchronize`¶

cutensornetStatus_t cutensornetDistributedSynchronize(const cutensornetHandle_t handle)¶

Globally synchronizes all MPI processes in the current distributed MPI configuration, ensuring that all preceding cutensornet API calls have completed across all MPI processes.

Warning

This is a collective call that must be executed by all MPI processes.

Prior to performing the global synchronization, the user is still required to synchronize GPU operations locally (via CUDA stream synchronization).

Parameters: handle – [in] cuTensorNet library handle.

High-Level Tensor Network API¶

The high-level tensor network API functions centered around cutensornetState_t allow users to define complex tensor network states by gradually applying tensor operators (e.g., quantum gates) to the initial (vacuum) state residing in the user-defined direct-product space, that is, a tensor space constructed as a direct product of multiple vector spaces of given dimensions. In particular, this way of defining a tensor network state is very convenient for quantum circuit simulators since the final output state of a given quantum circuit is constructed by gradually applying quantum gates to the initial (vacuum) state of all qubits (or qudits). Once the action of all tensor operators has been specified, the underlying tensor network is completely defined, together with the final state of the tensor network that resides in the same direct-product space as the initial (vacuum) state. Users also have the following options to request certain approximations or actions applied to the final state before computing its various properties:

No additional approximations or actions. No additional API calls are required and users may directly proceed to the next steps for property computation. All properties will be computed by direct contraction of the full tensor network(s) corresponding to the property of interest.

Explicitly precompute the full state tensor. This can be achieved via a sequence of calls to cutensornetStateConfigure(), cutensornetStatePrepare(), and cutensornetStateCompute() to configure, prepare and, finally, compute the full state tensor. The calculation of all subsequent properties may leverage the computed full state tensor (it is not guaranteed to do so).

Provide the initial quantum state to be a non-vacuum state in the Matrix Product State (MPS) form. This can be achieved by calling the cutensornetStateInitializeMPS() API to provide the initial state before subsequent calls to the configure, prepare and compute API functions. Note that this API does not specify that the final state is to be computed in MPS (see below) and therefore can be used in conjunction with either contraction-based computation or MPS-based representation.

Factorize the final state in the MPS form. The cutensornetStateFinalizeMPS() API function can be used to specify the MPS factorization structure for the defined tensor network state. Once the desired MPS factorization has been specified, subsequent calls to the configure, prepare, and compute API functions will configure the MPS computation, prepare the MPS computation, and, finally, compute the MPS factorization. All subsequent properties will be computed by using the MPS-factorized form of the original tensor network state.

After choosing one of the approaches above, users can take advantage of the following APIs to compute various properties associated with the tensor network state:

cutensornetStateAccessor_t and corresponding APIs can be used to compute the full state tensor, any of its cartesian slices, or individual amplitudes.

cutensornetStateExpectation_t and corresponding APIs can be used to compute an expectation value of a given tensor network operator with respect to the given tensor network state. A tensor network operator (cutensornetNetworkOperator_t) is defined as a sum of products of tensor operators where tensors constituting each product (component) act on disjoint degrees of freedom (e.g., disjoint subsets of qudits).

cutensornetStateMarginal_t and corresponding APIs can be used to compute the marginal probability distribution tensors (reduced density matrices) over the specified modes of the tensor network state. In particular, including all tensor network state modes would result in the computation of the full density matrix of the tensor network state.

cutensornetStateSampler_t and corresponding APIs can be used to sample from the probability distribution associated with the specified tensor network state modes.

../../_images/cutn_high_level_api_workflow.png

Figure 1. cuTensorNet high-level API workflow logic. Arrows indicate allowed sequences of API calls.¶

For each such property of the tensor network state that one may want to compute, the corresponding subset of the high-level API includes API functions for defining (creating) the feature-providing object (Create), like those named above, configuring it by setting specific attributes (Configure), preparing it for computation (Prepare), and, finally, computing it (Compute), and, of course, destroying it at the end (Destroy). See Figure 1 for the API workflow logic, where arrows indicate allowed sequences of API calls.

Note

When the properties are to be computed for an MPS-factorized form of the tensor network state, users are responsible for explictly allocating memory for the MPS state tensor(s) and maintaining it for the lifetime of the cutensornetState_t object. For more detailed descriptions of MPS-based property computations, please refer to these sections on amplitudes, expectation value, marginal distribution and sampling.

`cutensornetCreateState`¶

cutensornetStatus_t cutensornetCreateState(const cutensornetHandle_t handle, cutensornetStatePurity_t purity, int32_t numStateModes, const int64_t *stateModeExtents, cudaDataType_t dataType, cutensornetState_t *tensorNetworkState)¶

Creates an empty tensor network state of a given shape defined by the number of primary tensor modes and their extents.

A tensor network state is a tensor representing the result of a full contraction of some (yet unspecified) tensor network. That is, a tensor network state is simply a tensor living in a given primary tensor space constructed as a direct product of a given number of vector spaces which are specified by their dimensions (each vector space represents a state mode). A tensor network state (state tensor) can be either pure or mixed. A pure state tensor resides in the defining primary direct-product space and is represented by a tensor from that space. A mixed tensor network state (state tensor) resides in the direct-product space formed by the defining primary direct-product space tensored with its dual (conjugate) tensor space. A mixed state tensor is a tensor with twice more modes, namely, the modes from the defining primary direct-product space, followed by the same number of modes from its dual (conjugate) space. Subsequently, the initial (empty) vacuum tensor state can be evolved into the final target tensor state by applying user-defined tensor operators (e.g., quantum gates) via cutensornetStateApplyTensorOperator(). By default, the final target tensor state is formally represented by a single output tensor, the result of the full tensor network contraction (it does not have to be explicitly computed). However, the user may choose to impose a certain tensor factorization on the final tensor state via the cutensornetStateFinalizeXXX() call, where the supported tensor factorizations (XXX) are: MPS (Matrix Product State). In this case, the final tensor state, which now has to be computed explicitly, will be represented by a tuple of output tensors according to the chosen factorization scheme. The information on the output tensor(s) can be queried by calling cutensornetGetOutputStateDetails().

Note

To give a concrete example, a pure state tensor of any quantum circuit with 4 qubits has the shape [2,2,2,2] (quantum circuit is a specific kind of tensor network). A mixed state tensor in this case will have the shape [2,2,2,2, 2,2,2,2] corresponding to the density matrix of the 4-qubit register, although there are still only 4 defining modes associated with the primary direct-product space of 4 qubits in this case (direct product of 4 vector spaces of dimension 2). That is, a mixed state tensor contains two sets of modes, one from the primary direct-product space and one from its dual space, but it is still defined by the modes of the primary direct-product space, specifically, by a tuple of dimensions of the constituting vector spaces (2-dimensional vector spaces in case of qubits). For clarity, we will refer to the modes of the primary direct-product tensor space as State Modes. Subsequent actions of quantum gates on the tensor network state via calls to cutensornetStateAppyTensor() can now be conveniently specified via subsets of the state modes acted on by the quantum gate.

Warning

The current cuTensorNet library release only supports pure tensor network states and provides the MPS factorization as a preview feature.

Parameters

handle – [in] cuTensorNet library handle.
purity – [in] Desired purity of the tensor network state (pure or mixed).
numStateModes – [in] Number of the defining state modes, irrespective of state purity. Note that both pure and mixed tensor network states are defined solely by the modes of the primary direct-product space.
stateModeExtents – [in] Pointer to the extents of the defining state modes (dimensions of the vector spaces constituting the primary direct-product space).
dataType – [in] Data type of the state tensor.
tensorNetworkState – [out] Tensor network state (empty at this point, aka vacuum).

`cutensornetDestroyState`¶

cutensornetStatus_t cutensornetDestroyState(cutensornetState_t tensorNetworkState)¶

Frees all resources owned by the tensor network state.

Note

After the tensor network state is destroyed, all pointers to the tensor operator data used for specifying the final target state may be invalidated.

Parameters: tensorNetworkState – [in] Tensor network state.

`cutensornetStateApplyTensor`¶

cutensornetStatus_t cutensornetStateApplyTensor(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, int32_t numStateModes, const int32_t *stateModes, void *tensorData, const int64_t *tensorModeStrides, const int32_t immutable, const int32_t adjoint, const int32_t unitary, int64_t *tensorId)¶

DEPRECATED: Applies a tensor operator to the tensor network state.

A tensor operator acts on a specified subset of the tensor state modes, where the number of state modes acted upon defines its rank. A tensor operator is represented by a tensor with twice more modes than the number of state modes it acts on, where the first half of the tensor operator modes is contracted with the state modes of the input state tensor while the second half of the tensor operator modes forms the output state tensor modes. Since the default tensor storage strides follow the generalized columnwise layout, the action of a rank-2 tensor operator G on a rank-2 state tensor Q0 can be expressed symbolically as: Q1(i1,i0) = Q0(j1,j0) * G(j1,j0,i1,i0), which is simply the reversed form of the standard notation: Q1(i0,i1) = G(i0,i1,j0,j1) * Q0(j0,j1), given that a graphical representation of tensor circuits traditionally applies tensor operators (gates) from left to right. In this way, we conveniently ensure the standard row-following initialization of the tensor operator (gate) when using the C-language array initialization syntax. In the above example, tensor operator (gate) G has four modes and acts on two state modes.

Note

For the purpose of quantum circuit definition, our current convention conveniently allows initialization of a 2-qubit CNOT gate (tensor operator) with a C array with elements precisely following the canonical textbook (row-following) definition of the CNOT gate:

\[\begin{split} \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix} \end{split}\]

Note

In case the tensor operator elements change their value while still residing at the same storage location, one must still call cutensornetStateUpdateTensorOperator to register such a change with the same pointer (storage location).

Warning

The pointer to the tensor operator elements is owned by the user and it must stay valid for the whole lifetime of the tensor network state, unless explicitly replaced by another pointer via cutensornetStateUpdateTensorOperator.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [inout] Tensor network state.
numStateModes – [in] Number of state modes the tensor operator acts on.
stateModes – [in] Pointer to the state modes the tensor operator acts on.
tensorData – [in] Elements of the tensor operator (must be of the same data type as the elements of the state tensor).
tensorModeStrides – [in] Strides of the tensor operator data layout (note that the tensor operator has twice more modes than the number of state modes it acts on). Passing NULL will assume the default generalized columnwise layout.
immutable – [in] Whether or not the tensor operator data may change during the lifetime of the tensor network state. Any data change must be registered via a call to cutensornetStateUpdateTensorOperator.
adjoint – [in] Whether or not the tensor operator is applied as an adjoint (ket and bra modes reversed, with all tensor elements complex conjugated).
unitary – [in] Whether or not the tensor operator is unitary with respect to the first and second halves of its modes.
tensorId – [out] Unique integer id (for later identification of the tensor operator).

`cutensornetStateApplyTensorOperator`¶

cutensornetStatus_t cutensornetStateApplyTensorOperator(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, int32_t numStateModes, const int32_t *stateModes, void *tensorData, const int64_t *tensorModeStrides, const int32_t immutable, const int32_t adjoint, const int32_t unitary, int64_t *tensorId)¶

Applies a tensor operator to the tensor network state.

A tensor operator acts on a specified subset of the tensor state modes, where the number of state modes acted upon defines its rank. A tensor operator is represented by a tensor with twice more modes than the number of state modes it acts on, where the first half of the tensor operator modes is contracted with the state modes of the input state tensor while the second half of the tensor operator modes forms the output state tensor modes. Since the default tensor storage strides follow the generalized columnwise layout, the action of a rank-2 tensor operator G on a rank-2 state tensor Q0 can be expressed symbolically as: Q1(i1,i0) = Q0(j1,j0) * G(j1,j0,i1,i0), which is simply the reversed form of the standard notation: Q1(i0,i1) = G(i0,i1,j0,j1) * Q0(j0,j1), given that a graphical representation of tensor circuits traditionally applies tensor operators (gates) from left to right. In this way, we conveniently ensure the standard row-following initialization of the tensor operator (gate) when using the C-language array initialization syntax. In the above example, tensor operator (gate) G has four modes and acts on two state modes.

Note

For the purpose of quantum circuit definition, our current convention conveniently allows initialization of a 2-qubit CNOT gate (tensor operator) with a C array with elements precisely following the canonical textbook (row-following) definition of the CNOT gate:

\[\begin{split} \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix} \end{split}\]

Note

In case the tensor operator elements change their value while still residing at the same storage location, one must still call cutensornetStateUpdateTensorOperator to register such a change with the same pointer (storage location).

Warning

The pointer to the tensor operator elements is owned by the user and it must stay valid for the whole lifetime of the tensor network state, unless explicitly replaced by another pointer via cutensornetStateUpdateTensorOperator.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [inout] Tensor network state.
numStateModes – [in] Number of state modes the tensor operator acts on.
stateModes – [in] Pointer to the state modes the tensor operator acts on.
tensorData – [in] Elements of the tensor operator (must be of the same data type as the elements of the state tensor).
tensorModeStrides – [in] Strides of the tensor operator data layout (note that the tensor operator has twice more modes than the number of state modes it acts on). Passing NULL will assume the default generalized columnwise layout.
immutable – [in] Whether or not the tensor operator data may change during the lifetime of the tensor network state. Any data change must be registered via a call to cutensornetStateUpdateTensorOperator.
adjoint – [in] Whether or not the tensor operator is applied as an adjoint (ket and bra modes reversed, with all tensor elements complex conjugated).
unitary – [in] Whether or not the tensor operator is unitary with respect to the first and second halves of its modes.
tensorId – [out] Unique integer id (for later identification of the tensor operator).

`cutensornetStateApplyControlledTensorOperator`¶

cutensornetStatus_t cutensornetStateApplyControlledTensorOperator(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, int32_t numControlModes, const int32_t *stateControlModes, const int64_t *stateControlValues, int32_t numTargetModes, const int32_t *stateTargetModes, void *tensorData, const int64_t *tensorModeStrides, const int32_t immutable, const int32_t adjoint, const int32_t unitary, int64_t *tensorId)¶

Applies a controlled tensor operator to the tensor network state.

This API function performs the same operation as cutensornetStateApplyTensorOperator except that the tensor operator is specified via the control-target representation typical for multi-qubit quantum gates. Namely, only the target tensor of the full controlled tensor operator needs to be provided here (the number of modes in the provided target tensor is twice the number of the target state modes it acts on). The full tensor operator representation will be automatically generated from the target tensor and the list of control state modes/values.

Warning

Currently, only immutable controlled tensor operators are supported. This restriction may be lifted in future.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [inout] Tensor network state.
numControlModes – Number of control state modes used by the tensor operator.
stateControlModes – Controlling state modes used by the tensor operator.
stateControlValues – Control values for the controlling state modes. A control value is the sequential integer id of the qudit basis component which activates the action of the target tensor operator. If NULL, all control values are assumed to be set to the max id (last qudit basis component), which will be 1 for qubits.
numTargetModes – Number of target state modes acted on by the tensor operator.
stateTargetModes – Target state modes acted on by the tensor operator.
tensorData – [in] Elements of the target tensor of the controlled tensor operator (must be of the same data type as the elements of the state tensor).
tensorModeStrides – [in] Strides of the tensor operator data layout (note that the tensor operator has twice more modes than the number of the target state modes it acts on). Passing NULL will assume the default generalized columnwise layout.
immutable – [in] Whether or not the tensor operator data may change during the lifetime of the tensor network state. Any data change must be registered via a call to cutensornetStateUpdateTensorOperator.
adjoint – [in] Whether or not the tensor operator is applied as an adjoint (ket and bra modes reversed, with all tensor elements complex conjugated).
unitary – [in] Whether or not the controlled tensor operator is unitary with respect to the first and second halves of its modes.
tensorId – [out] Unique integer id (for later identification of the tensor operator).

`cutensornetStateApplyNetworkOperator`¶

cutensornetStatus_t cutensornetStateApplyNetworkOperator(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, const cutensornetNetworkOperator_t tensorNetworkOperator, const int32_t immutable, const int32_t adjoint, const int32_t unitary, int64_t *operatorId)¶

Applies a tensor network operator to a tensor network state.

Note

Currently the applied tensor network operators are restricted to those containing only one component (either a tensor product or an MPO).

Note

The returned unique integer id (operatorId) defines the beginning of a contiguous range of unique integer ids associated with the tensors constituting the sole component of the tensor network operator: [operatorId..(operatorId + N - 1)], where N is the number of tensors constituting the sole component of the tensor network operator. The tensor ids from this contiguous range can then be used for registering updates on the corresponding tensors via cutensornetStateUpdateTensorOperator.

Warning

In the current release, only immutable tensor network operators are supported. This restriction may be lifted in future.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [inout] Tensor network state.
tensorNetworkOperator – [in] Tensor network operator containg only a single component.
immutable – [in] Whether or not the tensor network operator data may change during the lifetime of the tensor network state.
adjoint – [in] Whether or not the tensor network operator is applied as an adjoint.
unitary – [in] Whether or not the tensor network operator is unitary with respect to the first and second halves of its modes.
operatorId – [out] Unique integer id (for later identification of the tensor network operator).

`cutensornetStateUpdateTensor`¶

cutensornetStatus_t cutensornetStateUpdateTensor(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, int64_t tensorId, void *tensorData, int32_t unitary)¶

Registers an external update of the elements of the specified tensor operator that was previously applied to the tensor network state.

Note

The provided pointer to the tensor elements location may or may not coincide with the originally used pointer. However, the originally provided strides of the tensor operator data layout are assumed applicable to the updated tensor operator data location, that is, one cannot change the storage strides during the tensor operator data update.

Warning

The pointer to the tensor operator elements is owned by the user and it must stay valid for the whole lifetime of the tensor network state, unless explicitly replaced by another pointer via cutensornetStateUpdateTensorOperator.

In the current release, controlled tensor operators cannot be updated.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Tensor network state.
tensorId – [in] Tensor id assigned during the cutensornetStateApplyTensorOperator call.
tensorData – [in] Pointer to the updated elements of the tensor operator (tensor operator elements must be of the same type as the state tensor).
unitary – [in] Whether or not the tensor operator is unitary with respect to the first and second halves of its modes. This parameter is not applicable to the tensors that are part of a matrix product operator (MPO).

`cutensornetStateUpdateTensorOperator`¶

cutensornetStatus_t cutensornetStateUpdateTensorOperator(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, int64_t tensorId, void *tensorData, int32_t unitary)¶

Registers an external update of the elements of the specified tensor operator that was previously applied to the tensor network state.

Note

The provided pointer to the tensor elements location may or may not coincide with the originally used pointer. However, the originally provided strides of the tensor operator data layout are assumed applicable to the updated tensor operator data location, that is, one cannot change the storage strides during the tensor operator data update.

Warning

The pointer to the tensor operator elements is owned by the user and it must stay valid for the whole lifetime of the tensor network state, unless explicitly replaced by another pointer via cutensornetStateUpdateTensorOperator.

In the current release, controlled tensor operators cannot be updated.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Tensor network state.
tensorId – [in] Tensor id assigned during the cutensornetStateApplyTensorOperator call.
tensorData – [in] Pointer to the updated elements of the tensor operator (tensor operator elements must be of the same type as the state tensor).
unitary – [in] Whether or not the tensor operator is unitary with respect to the first and second halves of its modes. This parameter is not applicable to the tensors that are part of a matrix product operator (MPO).

`cutensornetStateConfigure`¶

cutensornetStatus_t cutensornetStateConfigure(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, cutensornetStateAttributes_t attribute, const void *attributeValue, size_t attributeSize)¶

Configures computation of the full tensor network state, either in the exact or a factorized form.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [inout] Tensor network state.
attribute – [in] Configuration attribute.
attributeValue – [in] Pointer to the configuration attribute value (type-erased).
attributeSize – [in] The size of the configuration attribute value.

`cutensornetStatePrepare`¶

cutensornetStatus_t cutensornetStatePrepare(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, size_t maxWorkspaceSizeDevice, cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t cudaStream)¶

Prepares computation of the full tensor network state, either in the exact or a factorized form.

Warning

The cudaStream argument is unused in the current release (can be set to 0x0).

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Tensor network state.
maxWorkspaceSizeDevice – [in] Upper limit on the amount of available GPU scratch memory (bytes).
workDesc – [out] Workspace descriptor (the required scratch/cache memory sizes will be set).
cudaStream – [in] CUDA stream.

`cutensornetStateGetInfo`¶

cutensornetStatus_t cutensornetStateGetInfo(const cutensornetHandle_t handle, const cutensornetState_t tensorNetworkState, cutensornetStateAttributes_t attribute, void *attributeValue, size_t attributeSize)¶

Retrieves an attribute related to computation of the full tensor network state, either in the exact or a factorized form.

Note

The Flop count INFO attribute may not always be available, in which case the returned value will be zero.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Tensor network state.
attribute – [in] Information attribute.
attributeValue – [out] Pointer to the information attribute value (type-erased).
attributeSize – [in] The size of the information attribute value.

`cutensornetStateCompute`¶

cutensornetStatus_t cutensornetStateCompute(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, cutensornetWorkspaceDescriptor_t workDesc, int64_t *extentsOut[], int64_t *stridesOut[], void *stateTensorsOut[], cudaStream_t cudaStream)¶

Performs the actual computation of the full tensor network state, either in the exact or a factorized form.

Note

The length of extentsOut, stridesOut, and stateTensorsOut should correspond to the final target state MPS representation. For instance, if the final target state is factorized as an MPS with open boundary conditions, stateTensorsOut is expected to be an array of numStateModes pointers and the buffer sizes for all stateTensorsOut[i] are expected to be consistent with the target extents specified in the cutensornetStateFinalizeMPS() call prior to the state computation. If no factorization is requested for the tensor network state, the shape and strides of a single full output state tensor, which is computed in this API call, will be returned.

Warning

The provided workspace descriptor workDesc must have the Device Scratch buffer set explicitly since user-provided memory pools are not supported in the current release. Additionally, the attached workspace buffer must be 256-byte aligned in the current release.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Tensor network state.
workDesc – [in] Workspace descriptor (the required scratch/cache memory buffers must be set by the user).
extentsOut – [out] If not NULL, will hold the extents of all tensors defining the output state representation. Optionally, it can be NULL if this data is not needed.
stridesOut – [out] If not NULL, will hold the strides for all tensors defining the output state representation. Optionally, it can be NULL if this data is not needed.
stateTensorsOut – [inout] An array of pointers to GPU storage for all tensors defining the output state representation.
cudaStream – [in] CUDA stream.

`cutensornetGetOutputStateDetails`¶

cutensornetStatus_t cutensornetGetOutputStateDetails(const cutensornetHandle_t handle, const cutensornetState_t tensorNetworkState, int32_t *numTensorsOut, int32_t numModesOut[], int64_t *extentsOut[], int64_t *stridesOut[])¶

Queries the number of tensors, number of modes, extents, and strides for each of the final output state tensors.

Note

If all information regarding the output tensors is needed by the user, this function should be called three times: the first time to retrieve numTensorsOut for allocating numModesOut, the second time to retrieve numModesOut for allocating extentsOut and stridesOut, and the last time to retrieve extentsOut and stridesOut.

Warning

To retrieve numTensorsOut and numModesOut, it is not necessary to first compute the final target state via the cutensornetStateCompute() call. However, to obtain extentsOut and stridesOut, cutensornetStateCompute() may need to be called first to compute the output state factorization in case the output state is forced to be factorized (e.g., MPS-factorized).

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Tensor network state.
numTensorsOut – [out] On return, will hold the number of output state tensors (argument cannot be NULL).
numModesOut – [out] If not NULL, will hold the number of modes for each output state tensor. Optionally, can be NULL.
extentsOut – [out] If not NULL, will hold mode extents for each output state tensor. Optionally, can be NULL.
stridesOut – [out] If not NULL, will hold strides for each output state tensor. Optionally, can be NULL.

`cutensornetStateInitializeMPS`¶

cutensornetStatus_t cutensornetStateInitializeMPS(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, cutensornetBoundaryCondition_t boundaryCondition, const int64_t *const extentsIn[], const int64_t *const stridesIn[], void *stateTensorsIn[])¶

Imposes a user-defined MPS (Matrix Product State) factorization on the initial tensor network state with the given shape and data.

Note

This API function may be called at any time during the lifetime of the tensor network state to modify its initial state. If not called, the initial state will stay in the default vacuum state.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [inout] Tensor network state.
boundaryCondition – [in] The boundary condition of the chosen MPS representation.
extentsIn – [in] Array of size nStateModes specifying the extents of all tensors defining the initial MPS representation. extents[i] is expected to be consistent with the mode order (shared mode between (i-1)th and i-th MPS tensor, state mode of the i-th MPS tensor, shared mode between i-th and the (i+1)th MPS tensor). For the open boundary condition, the modes of the first tensor get reduced to (state mode, shared mode with the second site) while the modes of the last tensor become (shared mode with the second to the last site, state mode).
stridesIn – [in] Array of size nStateModes specifying the strides of all tensors in the chosen MPS representation. Similar to extentsIn, stridesIn is also expected to be consistent with the mode order of each MPS tensor. If NULL, the default generalized column-major strides will be assumed.
stateTensorsIn – [in] Array of size nStateModes specifying the data for all tensors defining the chosen MPS representation. If NULL, the initial MPS-factorized state will represent the vacuum state.

`cutensornetStateFinalizeMPS`¶

cutensornetStatus_t cutensornetStateFinalizeMPS(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, cutensornetBoundaryCondition_t boundaryCondition, const int64_t *const extentsOut[], const int64_t *const stridesOut[])¶

Imposes a user-defined MPS (Matrix Product State) factorization on the final tensor network state with the given shape.

By calling this API function, only the desired target tensor network state representation (MPS representation) is specified without actual computation. Tensors constituting the original tensor network state may still be updated with new data after this API function call. The actual MPS factorization of the tensor network state will be computed after calling cutensornetStatePrepare and cutensornetStateCompute API functions, following this cutensornetStateFinalizeMPS call.

Note

The current MPS factorization feature is provided as a preview, with more optimizations and enhanced functionality coming up in future releases. In the current release, the primary goal of this feature is to facilitate implementation of the MPS compression of tensor network states via a convenient high-level interface, targeting a broader community of users interested in adding MPS algorithms to their simulators.

Note

extentsOut can be used to specify the extent truncation for the shared bond between adjacent MPS tensors.

Warning

The current cuTensorNet library release supports MPS factorization of tensor network states with two-dimensional state modes only (qubits only, using the quantum computing language).

In the current release, the MPS factorization does not benefit from distributed execution.

Warning

If value-based SVD truncation is specified in CUTENSORNET_STATE_MPS_SVD_CONFIG, extentsOut and stridesOut may not be respected during execution (e.g., in cutensornetStateCompute()). In such cases, users can query runtime values of extentsOut and stridesOut in cutensornetStateCompute() by providing valid pointers.

As of current version, if tensorNetworkState has different extents on different modes, exact MPS factorization can not be computed if there are operators acting on two non-adjacent modes.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [inout] Tensor network state.
boundaryCondition – [in] The boundary condition of the target MPS representation.
extentsOut – [in] Array of size nStateModes specifying the maximal extents of all tensors defining the target MPS representation. extentsOut[i] is expected to be consistent with the mode order (shared mode between (i-1)th and i-th MPS tensor, state mode of the i-th MPS tensor, shared mode between i-th and (i+1)th MPS tensor). For the open boundary condition, the modes for the first tensor get reduced to (state mode, shared mode with the second site) while the modes for the last tensor become (shared mode with the second last site, state mode).
stridesOut – [in] Array of size nStateModes specifying the strides of all tensors defining the target MPS representation. Similar to extentsOut, stridesOut is also expected to be consistent with the mode order of each MPS tensor. If NULL, the default generalized column-major strides will be assumed.

`cutensornetCreateNetworkOperator`¶

cutensornetStatus_t cutensornetCreateNetworkOperator(const cutensornetHandle_t handle, int32_t numStateModes, const int64_t stateModeExtents[], cudaDataType_t dataType, cutensornetNetworkOperator_t *tensorNetworkOperator)¶

Creates an uninitialized tensor network operator of a given shape defined by the number of state modes and their extents.

A tensor network operator is an operator that maps tensor network states from the primary direct-product space back to the same tensor space. The shape of the tensor network operator is defined by the number of state modes and their extents, which should match the definition of the tensor network states the operator will be acting on. Note that formally the declared tensor network operator will have twice more modes than the number of defining state modes, the first half corresponding to the primary direct-product space it acts on while the second half corresponding to the same primary direct-product space where the resulting tensor network state lives.

Note

This API defines an abstract uninitialized tensor network operator. Users may later initialize it using some concrete structure by appending components to it.

Parameters

handle – [in] cuTensorNet library handle.
numStateModes – [in] The number of state modes the operator acts on.
stateModeExtents – [in] An array of size numStateModes specifying the extent of each state mode acted on.
dataType – [in] Data type of the operator.
tensorNetworkOperator – [out] Tensor network operator (empty at this point).

`cutensornetNetworkOperatorAppendProduct`¶

cutensornetStatus_t cutensornetNetworkOperatorAppendProduct(const cutensornetHandle_t handle, cutensornetNetworkOperator_t tensorNetworkOperator, cuDoubleComplex coefficient, int32_t numTensors, const int32_t numStateModes[], const int32_t *stateModes[], const int64_t *tensorModeStrides[], const void *tensorData[], int64_t *componentId)¶

Appends a tensor product operator component to the tensor network operator.

A tensor product operator component is defined as a tensor product of one or more tensor operators acting on disjoint subsets of state modes. Note that each tensor operator (tensor factor) in the specified tensor product has twice more modes than the number of state modes it acts on. Specifically, the first half of tensor operator modes will be contracted with the state modes. A typical example would be a tensor product of Pauli matrices in which each Pauli matrix acts on a specific mode of the tensor network state. This API function is used for defining a tensor network operator as a sum over tensor operator products with complex coefficients.

Note

All user-provided tensors used to define a tensor network operator must stay alive during the entire lifetime of the tensor network operator.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkOperator – [inout] Tensor network operator.
coefficient – [in] Complex coefficient associated with the appended operator component.
numTensors – [in] Number of tensor factors in the tensor product.
numStateModes – [in] Number of state modes each appended tensor factor acts on.
stateModes – [in] Modes each appended tensor factor acts on (length = numStateModes).
tensorModeStrides – [in] Tensor mode strides for each tensor factor (length = numStateModes * 2). If NULL, the default generalized column-major strides will be used.
tensorData – [in] Tensor data stored in GPU memory for each tensor factor.
componentId – [out] Unique sequential integer identifier of the appended tensor network operator component.

`cutensornetNetworkOperatorAppendMPO`¶

cutensornetStatus_t cutensornetNetworkOperatorAppendMPO(const cutensornetHandle_t handle, cutensornetNetworkOperator_t tensorNetworkOperator, cuDoubleComplex coefficient, int32_t numStateModes, const int32_t stateModes[], const int64_t *tensorModeExtents[], const int64_t *tensorModeStrides[], const void *tensorData[], cutensornetBoundaryCondition_t boundaryCondition, int64_t *componentId)¶

Appends a Matrix Product Operator (MPO) component to the tensor network operator.

The modes of the MPO tensors follow the standard cuTensorNet convention (each internal MPO tensor has four modes): Mode 0: (i-1)th - (i)th connection; Mode 1: (i)th site open mode acting on the ket state mode; Mode 2: (i)th - (i+1)th connection; Mode 3: (i)th site open mode acting on the bra state mode; When the open boundary condition is requested, the first MPO tensor will have mode 0 removed while the last MPO tensor will have mode 2 removed, both having only three modes (in order).

Note

All user-provided MPO tensors used to define a tensor network operator must stay alive during the entire lifetime of the tensor network operator.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkOperator – [inout] Tensor network operator.
coefficient – [in] Complex coefficient associated with the appended operator component.
numStateModes – [in] Number of state modes the MPO acts on (number of tensors in the MPO).
stateModes – [in] State modes the MPO acts on.
tensorModeExtents – [in] Tensor mode extents for each MPO tensor.
tensorModeStrides – [in] Storage strides for each MPO tensor or NULL (default generalized column-wise strides).
tensorData – [in] Tensor data stored in GPU memory for each MPO tensor factor.
boundaryCondition – [in] MPO boundary condition.
componentId – [out] Unique sequential integer identifier of the appended tensor network operator component.

`cutensornetDestroyNetworkOperator`¶

cutensornetStatus_t cutensornetDestroyNetworkOperator(cutensornetNetworkOperator_t tensorNetworkOperator)¶

Frees all resources owned by the tensor network operator.

Parameters: tensorNetworkOperator – [inout] Tensor network operator.

`cutensornetCreateAccessor`¶

cutensornetStatus_t cutensornetCreateAccessor(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, int32_t numProjectedModes, const int32_t *projectedModes, const int64_t *amplitudesTensorStrides, cutensornetStateAccessor_t *tensorNetworkAccessor)¶

Creates a tensor network state amplitudes accessor.

The state amplitudes accessor allows the user to extract single state amplitudes (elements of the state tensor), slices of state amplitudes (slices of the state tensor) as well as the full state tensor. The choice of a specific slices is accomplished by specifying the projected modes of the tensor network state, that is, a subset of the tensor network state modes that will be projected to specific basis vectors during the computation. The rest of the tensor state modes (open modes) in their respective relative order will define the shape of the resulting state amplitudes tensor requested by the user.

Note

The provided tensor network state must stay alive during the lifetime of the state amplitudes accessor. Additionally, applying a tensor operator to the tensor network state after it was used to create the state amplitudes accessor will invalidate the state amplitudes accessor. On the other hand, simply updating tensor operator data via cutensornetStateUpdateTensorOperator() is allowed.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Defined tensor network state.
numProjectedModes – [in] Number of projected state modes (tensor network state modes projected to specific basis vectors).
projectedModes – [in] Projected state modes (may be NULL when none or all modes are projected).
amplitudesTensorStrides – [in] Mode strides for the resulting amplitudes tensor. If NULL, the default generalized column-major strides will be assumed.
tensorNetworkAccessor – [out] Tensor network state amplitudes accessor.

`cutensornetDestroyAccessor`¶

cutensornetStatus_t cutensornetDestroyAccessor(cutensornetStateAccessor_t tensorNetworkAccessor)¶

Destroyes the tensor network state amplitudes accessor.

Parameters: tensorNetworkAccessor – [inout] Tensor network state amplitudes accessor.

`cutensornetAccessorConfigure`¶

cutensornetStatus_t cutensornetAccessorConfigure(const cutensornetHandle_t handle, cutensornetStateAccessor_t tensorNetworkAccessor, cutensornetAccessorAttributes_t attribute, const void *attributeValue, size_t attributeSize)¶

Configures computation of the requested tensor network state amplitudes tensor.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkAccessor – [inout] Tensor network state amplitudes accessor.
attribute – [in] Configuration attribute.
attributeValue – [in] Pointer to the configuration attribute value (type-erased).
attributeSize – [in] The size of the configuration attribute value.

`cutensornetAccessorPrepare`¶

cutensornetStatus_t cutensornetAccessorPrepare(const cutensornetHandle_t handle, cutensornetStateAccessor_t tensorNetworkAccessor, size_t maxWorkspaceSizeDevice, cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t cudaStream)¶

Prepares computation of the requested tensor network state amplitudes tensor.

Warning

The cudaStream argument is unused in the current release (can be set to 0x0).

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkAccessor – [in] Tensor network state amplitudes accessor.
maxWorkspaceSizeDevice – [in] Upper limit on the amount of available GPU scratch memory (bytes).
workDesc – [out] Workspace descriptor (the required scratch/cache memory sizes will be set).
cudaStream – [in] CUDA stream.

`cutensornetAccessorGetInfo`¶

cutensornetStatus_t cutensornetAccessorGetInfo(const cutensornetHandle_t handle, const cutensornetStateAccessor_t tensorNetworkAccessor, cutensornetAccessorAttributes_t attribute, void *attributeValue, size_t attributeSize)¶

Retrieves an attribute related to computation of the requested tensor network state amplitudes tensor.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkAccessor – [in] Tensor network state amplitudes accessor.
attribute – [in] Information attribute.
attributeValue – [out] Pointer to the information attribute value (type-erased).
attributeSize – [in] The size of the information attribute value.

`cutensornetAccessorCompute`¶

cutensornetStatus_t cutensornetAccessorCompute(const cutensornetHandle_t handle, cutensornetStateAccessor_t tensorNetworkAccessor, const int64_t *projectedModeValues, cutensornetWorkspaceDescriptor_t workDesc, void *amplitudesTensor, void *stateNorm, cudaStream_t cudaStream)¶

Computes the amplitudes of the tensor network state.

Note

The computed amplitudes are not normalized automatically in cases when the tensor circuit state is not guaranteed to have a unity norm. In such cases, the squared state norm is returned as a separate parameter.

Warning

The provided workspace descriptor workDesc must have the Device Scratch buffer set explicitly since user-provided memory pools are not supported in the current release. Additionally, the attached workspace buffer must be 256-byte aligned in the current release.

In the current release, the execution of this API function will synchronize the provided CUDA stream. This restriction may be released in the future.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkAccessor – [in] Tensor network state amplitudes accessor.
projectedModeValues – [in] The values of the projected state modes or NULL pointer if there are no projected modes.
workDesc – [in] Workspace descriptor (the required scratch/cache memory buffers must be set by the user).
amplitudesTensor – [inout] Storage for the computed tensor network state amplitudes tensor.
stateNorm – [out] The squared 2-norm of the underlying tensor circuit state (Host pointer). The returned scalar will have the same numerical data type as the tensor circuit state. Providing a NULL pointer will ignore norm calculation.
cudaStream – [in] CUDA stream.

`cutensornetCreateExpectation`¶

cutensornetStatus_t cutensornetCreateExpectation(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, cutensornetNetworkOperator_t tensorNetworkOperator, cutensornetStateExpectation_t *tensorNetworkExpectation)¶

Creates a representation of the tensor network state expectation value.

The tensor network state expectation value is the expectation value of the given tensor network operator with respect to the given tensor network state. Note that the computed expectation value is unnormalized, with the norm of the tensor network state returned separately (optionally).

Note

The provided tensor network state must stay alive during the lifetime of the tensor network state expectation value. Additionally, applying a tensor operator to the tensor network state after it was used to create the tensor network state expectation value will invalidate the tensor network state expectation value. On the other hand, simply updating tensor operator data via cutensornetStateUpdateTensorOperator() is allowed.

The provided tensor network operator must stay alive during the lifetime of the tensor network state expectation value. Additionally, appending new components to the tensor network operator after it was used to create the tensor network state expectation value will invalidate the tensor network state expectation value. On the other hand, simply updating the tensor data inside the tensor network operator is allowed.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Defined tensor network state.
tensorNetworkOperator – [in] Defined tensor network operator.
tensorNetworkExpectation – [out] Tensor network expectation value representation.

`cutensornetDestroyExpectation`¶

cutensornetStatus_t cutensornetDestroyExpectation(cutensornetStateExpectation_t tensorNetworkExpectation)¶

Destroyes the tensor network state expectation value representation.

Parameters: tensorNetworkExpectation – [inout] Tensor network state expectation value representation.

`cutensornetExpecationConfigure`¶

cutensornetStatus_t cutensornetExpectationConfigure(const cutensornetHandle_t handle, cutensornetStateExpectation_t tensorNetworkExpectation, cutensornetExpectationAttributes_t attribute, const void *attributeValue, size_t attributeSize)¶

Configures computation of the requested tensor network state expectation value.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkExpectation – [inout] Tensor network state expectation value representation.
attribute – [in] Configuration attribute.
attributeValue – [in] Pointer to the configuration attribute value (type-erased).
attributeSize – [in] The size of the configuration attribute value.

`cutensornetExpectationPrepare`¶

cutensornetStatus_t cutensornetExpectationPrepare(const cutensornetHandle_t handle, cutensornetStateExpectation_t tensorNetworkExpectation, size_t maxWorkspaceSizeDevice, cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t cudaStream)¶

Prepares computation of the requested tensor network state expectation value.

Warning

The cudaStream argument is unused in the current release (can be set to 0x0).

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkExpectation – [in] Tensor network state expectation value representation.
maxWorkspaceSizeDevice – [in] Upper limit on the amount of available GPU scratch memory (bytes).
workDesc – [out] Workspace descriptor (the required scratch/cache memory sizes will be set).
cudaStream – [in] CUDA stream.

`cutensornetExpectationGetInfo`¶

cutensornetStatus_t cutensornetExpectationGetInfo(const cutensornetHandle_t handle, const cutensornetStateExpectation_t tensorNetworkExpectation, cutensornetExpectationAttributes_t attribute, void *attributeValue, size_t attributeSize)¶

Retrieves an attribute related to computation of the requested tensor network state expectation value.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkExpectation – [in] Tensor network state expectation value representation.
attribute – [in] Information attribute.
attributeValue – [out] Pointer to the information attribute value (type-erased).
attributeSize – [in] The size of the information attribute value.

`cutensornetExpectationCompute`¶

cutensornetStatus_t cutensornetExpectationCompute(const cutensornetHandle_t handle, cutensornetStateExpectation_t tensorNetworkExpectation, cutensornetWorkspaceDescriptor_t workDesc, void *expectationValue, void *stateNorm, cudaStream_t cudaStream)¶

Computes an (unnormalized) expectation value of a given tensor network operator over a given tensor network state.

Note

The computed expectation value is not normalized automatically in cases when the tensor network state is not guaranteed to have a unity norm. In such cases, the squared state norm is returned as a separate parameter. The true tensor network state expectation value can then be obtained by dividing the returned unnormalized expectation value by the returned squared state norm.

Warning

The provided workspace descriptor workDesc must have the Device Scratch buffer set explicitly since user-provided memory pools are not supported in the current release. Additionally, the attached workspace buffer must be 256-byte aligned in the current release.

In the current release, the execution of this API function will synchronize the provided CUDA stream. This restriction may be released in the future.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkExpectation – [in] Tensor network state expectation value representation.
workDesc – [in] Workspace descriptor (the required scratch/cache memory buffers must be set by the user).
expectationValue – [out] Computed unnormalized tensor network state expectation value (Host pointer). The returned scalar will have the same numerical data type as the tensor circuit state.
stateNorm – [out] The squared 2-norm of the underlying tensor circuit state (Host pointer). The returned scalar will have the same numerical data type as the tensor circuit state. Providing a NULL pointer will ignore norm calculation.
cudaStream – [in] CUDA stream.

`cutensornetCreateMarginal`¶

cutensornetStatus_t cutensornetCreateMarginal(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, int32_t numMarginalModes, const int32_t *marginalModes, int32_t numProjectedModes, const int32_t *projectedModes, const int64_t *marginalTensorStrides, cutensornetStateMarginal_t *tensorNetworkMarginal)¶

Creates a representation of the specified marginal tensor for a given tensor network state.

The tensor network state marginal tensor is formed by a direct product of the tensor network state with its dual (conjugated) state, followed by a trace over all state modes except the explicitly specified so-called open state modes. The order of the specified open state modes will be respected when computing the tensor network state marginal tensor. Additionally, prior to tracing, some of the state modes can optionally be projected to specific individual basis states of those modes, thus forming the so-called projected modes which will not be involved in tracing. Note that the resulting marginal tensor will have twice more modes than the number of the specified open modes, first half coming from the primary direct-product space while the second half symmetrically coming from the dual (conjugate) space.

Note

In the quantum domain, the marginal tensor is known as the reduced density matrix. For example, in quantum circuit simulations, the reduced density matrix is specified by the state modes which are kept intact and the remaining state modes which are traced over. Additionally, prior to tracing, one can project certain qudit modes to specific individual basis states of those modes, resulting in a projected reduced density matrix.

The provided tensor network state must stay alive during the lifetime of the tensor network state marginal. Additionally, applying a tensor operator to the tensor network state after it was used to create the tensor network state marginal will invalidate the tensor network state marginal. On the other hand, simply updating tensor operator data via cutensornetStateUpdateTensorOperator() is allowed.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Tensor network state.
numMarginalModes – [in] Number of open state modes defining the marginal tensor.
marginalModes – [in] Pointer to the open state modes defining the marginal tensor.
numProjectedModes – [in] Number of projected state modes.
projectedModes – [in] Pointer to the projected state modes.
marginalTensorStrides – [in] Storage strides for the marginal tensor (number of tensor modes is twice the number of the defining open modes). If NULL, the defaul generalized column-major strides will be assumed.
tensorNetworkMarginal – [out] Tensor network state marginal.

`cutensornetDestroyMarginal`¶

cutensornetStatus_t cutensornetDestroyMarginal(cutensornetStateMarginal_t tensorNetworkMarginal)¶

Destroys the tensor network state marginal.

Parameters: tensorNetworkMarginal – [in] Tensor network state marginal representation.

`cutensornetMarginalConfigure`¶

cutensornetStatus_t cutensornetMarginalConfigure(const cutensornetHandle_t handle, cutensornetStateMarginal_t tensorNetworkMarginal, cutensornetMarginalAttributes_t attribute, const void *attributeValue, size_t attributeSize)¶

Configures computation of the requested tensor network state marginal tensor.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkMarginal – [inout] Tensor network state marginal representation.
attribute – [in] Configuration attribute.
attributeValue – [in] Pointer to the configuration attribute value (type-erased).
attributeSize – [in] The size of the configuration attribute value.

`cutensornetMarginalPrepare`¶

cutensornetStatus_t cutensornetMarginalPrepare(const cutensornetHandle_t handle, cutensornetStateMarginal_t tensorNetworkMarginal, size_t maxWorkspaceSizeDevice, cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t cudaStream)¶

Prepares computation of the requested tensor network state marginal tensor.

Warning

The cudaStream argument is unused in the current release (can be set to 0x0).

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkMarginal – [in] Tensor network state marginal representation.
maxWorkspaceSizeDevice – [in] Upper limit on the amount of available GPU scratch memory (bytes).
workDesc – [out] Workspace descriptor (the required scratch/cache memory sizes will be set).
cudaStream – [in] CUDA stream.

`cutensornetMarginalGetInfo`¶

cutensornetStatus_t cutensornetMarginalGetInfo(const cutensornetHandle_t handle, const cutensornetStateMarginal_t tensorNetworkMarginal, cutensornetMarginalAttributes_t attribute, void *attributeValue, size_t attributeSize)¶

Retrieves an attribute related to computation of the requested tensor network state marginal tensor.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkMarginal – [in] Tensor network state marginal representation.
attribute – [in] Information attribute.
attributeValue – [out] Pointer to the information attribute value (type-erased).
attributeSize – [in] The size of the information attribute value.

`cutensornetMarginalCompute`¶

cutensornetStatus_t cutensornetMarginalCompute(const cutensornetHandle_t handle, cutensornetStateMarginal_t tensorNetworkMarginal, const int64_t *projectedModeValues, cutensornetWorkspaceDescriptor_t workDesc, void *marginalTensor, cudaStream_t cudaStream)¶

Computes the requested tensor network state marginal tensor.

Warning

The provided workspace descriptor workDesc must have the Device Scratch buffer set explicitly since user-provided memory pools are not supported in the current release. Additionally, the attached workspace buffer must be 256-byte aligned in the current release.

In the current release, the execution of this API function will synchronize the provided CUDA stream. This restriction may be released in the future.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkMarginal – [in] Tensor network state marginal representation.
projectedModeValues – [in] Pointer to the values of the projected modes. Each integer value corresponds to a basis state of the given (projected) state mode.
workDesc – [in] Workspace descriptor (the required scratch/cache memory buffers must be set by the user).
marginalTensor – [out] Pointer to the GPU storage of the marginal tensor which will be computed in this call.
cudaStream – [in] CUDA stream.

`cutensornetCreateSampler`¶

cutensornetStatus_t cutensornetCreateSampler(const cutensornetHandle_t handle, cutensornetState_t tensorNetworkState, int32_t numModesToSample, const int32_t *modesToSample, cutensornetStateSampler_t *tensorNetworkSampler)¶

Creates a tensor network state sampler.

A tensor network state sampler produces samples from the state tensor with the probability equal to the squared absolute value of the corresponding element of the state tensor. One can also choose any subset of tensor network state modes to sample only from the subspace spanned by them. The order of specified state modes will be respected when producing the output samples.

Note

For the purpose of quantum circuit simulations, the tensor network state sampler can generate bit-strings (or qudit-strings) from the output state of the defined quantum circuit (i.e., the tensor network defined by gate applications).

The provided tensor network state must stay alive during the lifetime of the tensor network state sampler. Additionally, applying a tensor operator to the tensor network state after it was used to create the tensor network state sampler will invalidate the tensor network state sampler. On the other hand, simply updating tensor operator data via cutensornetStateUpdateTensorOperator() is allowed.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkState – [in] Tensor network state.
numModesToSample – [in] Number of the tensor network state modes to sample from.
modesToSample – [in] Pointer to the state modes to sample from (can be NULL when all modes are requested).
tensorNetworkSampler – [out] Tensor network sampler.

`cutensornetDestroySampler`¶

cutensornetStatus_t cutensornetDestroySampler(cutensornetStateSampler_t tensorNetworkSampler)¶

Destroys the tensor network state sampler.

Parameters: tensorNetworkSampler – [in] Tensor network state sampler.

`cutensornetSamplerConfigure`¶

cutensornetStatus_t cutensornetSamplerConfigure(const cutensornetHandle_t handle, cutensornetStateSampler_t tensorNetworkSampler, cutensornetSamplerAttributes_t attribute, const void *attributeValue, size_t attributeSize)¶

Configures the tensor network state sampler.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkSampler – [inout] Tensor network state sampler.
attribute – [in] Configuration attribute.
attributeValue – [in] Pointer to the configuration attribute value (type-erased).
attributeSize – [in] The size of the configuration attribute value.

`cutensornetSamplerPrepare`¶

cutensornetStatus_t cutensornetSamplerPrepare(const cutensornetHandle_t handle, cutensornetStateSampler_t tensorNetworkSampler, size_t maxWorkspaceSizeDevice, cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t cudaStream)¶

Prepares the tensor network state sampler.

Warning

The cudaStream argument is unused in the current release (can be set to 0x0).

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkSampler – [in] Tensor network state sampler.
maxWorkspaceSizeDevice – [in] Upper limit on the amount of available GPU scratch memory (bytes).
workDesc – [out] Workspace descriptor (the required scratch/cache memory sizes will be set).
cudaStream – [in] CUDA stream.

`cutensornetSamplerGetInfo`¶

cutensornetStatus_t cutensornetSamplerGetInfo(const cutensornetHandle_t handle, const cutensornetStateSampler_t tensorNetworkSampler, cutensornetSamplerAttributes_t attribute, void *attributeValue, size_t attributeSize)¶

Retrieves an attribute related to tensor network state sampling.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkSampler – [in] Tensor network state sampler.
attribute – [in] Information attribute.
attributeValue – [out] Pointer to the information attribute value (type-erased).
attributeSize – [in] The size of the information attribute value.

`cutensornetSamplerSample`¶

cutensornetStatus_t cutensornetSamplerSample(const cutensornetHandle_t handle, cutensornetStateSampler_t tensorNetworkSampler, int64_t numShots, cutensornetWorkspaceDescriptor_t workDesc, int64_t *samples, cudaStream_t cudaStream)¶

Performs sampling of the tensor network state, that is, generates the requested number of samples.

Note

The pseudo-random number generator used internally is initialized with a random default seed, thus generally resulting in a different set of samples generated upon each repeated execution. In future, the ability to reset the seed to a user-defined value may be provided, to ensure generation of exactly the same set of samples upon rerunning the application repeatedly (this could be useful for debugging).

Warning

The provided workspace descriptor workDesc must have the Device Scratch buffer set explicitly since user-provided memory pools are not supported in the current release. Additionally, the attached workspace buffer must be 256-byte aligned in the current release.

In the current release, the execution of this API function will synchronize the provided CUDA stream. This restriction may be released in the future.

Parameters

handle – [in] cuTensorNet library handle.
tensorNetworkSampler – [in] Tensor network state sampler.
numShots – [in] Number of samples to generate.
workDesc – [in] Workspace descriptor (the required scratch/cache memory buffers must be set by the user).
samples – [out] Host memory pointer where the generated state tensor samples will be stored at. The samples will be stored as samples[SampleId][ModeId] in C notation and the originally specified order of the tensor network state modes to sample from will be respected.
cudaStream – [in] CUDA stream.

Memory Management API¶

A stream-ordered memory allocator (or mempool for short) allocates/deallocates memory asynchronously from/to a mempool in a stream-ordered fashion, meaning memory operations and computations enqueued on the streams have a well-defined inter- and intra- stream dependency. There are several well-implemented stream-ordered mempools available, such as cudaMemPool_t that is built-in at the CUDA driver level since CUDA 11.2 (so that all CUDA applications in the same process can easily share the same pool, see here) and the RAPIDS Memory Manager (RMM). For a detailed introduction, see the NVIDIA Developer Blog.

The new device memory handler APIs allow users to bind a stream-ordered mempool to the library handle, such that cuTensorNet can take care of most of the memory management for users. Below is an illustration of what can be done:

MyMemPool pool = MyMemPool();  // kept alive for the entire process in real apps

int my_alloc(void* ctx, void** ptr, size_t size, cudaStream_t stream) {
  // assuming this is the memory allocation routine provided by my mempool
  return reinterpret_cast<MyMemPool*>(ctx)->alloc(ptr, size, stream);
}

int my_dealloc(void* ctx, void* ptr, size_t size, cudaStream_t stream) {
  // assuming this is the memory deallocation routine provided by my mempool
  return reinterpret_cast<MyMemPool*>(ctx)->dealloc(ptr, size, stream);
}

// create a mem handler and fill in the required members for the library to use
cutensornetDeviceMemHandler_t handler;
handler.ctx = reinterpret_cast<void*>(&pool);
handler.device_alloc = my_alloc;
handler.device_free = my_dealloc;
memcpy(handler.name, std::string("my pool").c_str(), CUTENSORNET_ALLOCATOR_NAME_LEN);

// bind the handler to the library handle
cutensornetSetDeviceMemHandler(handle, &handler);

/* ... perform the network creation & optimization as usual ... */

// create a workspace descriptor
cutensornetWorkspaceDescriptor_t workDesc;
// (this step is optional and workDesc can be set to NULL if one just wants
// to use the "recommended" workspace size)
cutensornetCreateWorkspaceDescriptor(handle, &workDesc);

// User doesn’t compute the required sizes

// User doesn’t query the workspace size (but one can if desired)

// User doesn’t allocate memory!

// User sets workspacePtr=NULL for the corresponding memory space (device, in this case) to indicate the library should
// draw memory (of the "recommended" size, if the workspace size is set to 0 as shown below) from the user's pool;
// if a nonzero size is set, we would use the given size instead of the recommended one.
// (this step is also optional if workDesc has been set to NULL)
cutensornetWorkspaceSetMemory(handle, workDesc, CUTENSORNET_MEMSPACE_DEVICE, CUTENSORNET_WORKSPACE_SCRATCH, NULL, 0);

// create a contraction plan
cutensornetContractionPlan_t plan;
cutensornetCreateContractionPlan(handle, descNet, optimizerInfo, workDesc, &plan);

// autotune the plan with the workspace
cutensornetContractionAutotune(handle, plan, rawDataIn, rawDataOut, workDesc, pref, stream);

// perform actual contraction with the workspace
for (int sliceId=0; sliceId<num_slices; sliceId++) {
    cutensornetContraction(
        handle, plan, rawDataIn, rawDataOut, workDesc, sliceId, stream);
}

// clean up
cutensornetDestroyContractionPlan(plan);
cutensornetDestroyWorkspaceDescriptor(workDesc);  // optional if workDesc has been set to NULL
// User doesn’t deallocate memory!

As shown above, several calls to the workspace-related APIs can be skipped. Moreover, allowing the library to share your memory pool not only can alleviate potential memory conflicts, but also enable possible optimizations.

Note

In the current release, only a device mempool can be bound.

`cutensornetSetDeviceMemHandler`¶

cutensornetStatus_t cutensornetSetDeviceMemHandler(cutensornetHandle_t handle, const cutensornetDeviceMemHandler_t *devMemHandler)¶

Set the current device memory handler.

Once set, when cuTensorNet needs device memory in various API calls it will allocate from the user-provided memory pool and deallocate at completion. See cutensornetDeviceMemHandler_t and APIs that require cutensornetWorkspaceDescriptor_t for further detail.

The internal stream order is established using the user-provided stream passed to cutensornetContractionAutotune() and cutensornetContraction().

Warning

It is undefined behavior for the following scenarios:

the library handle is bound to a memory handler and subsequently to another handler
the library handle outlives the attached memory pool
the memory pool is not stream-ordered

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
devMemHandler – [in] the device memory handler that encapsulates the user’s mempool. The struct content is copied internally.

`cutensornetGetDeviceMemHandler`¶

cutensornetStatus_t cutensornetGetDeviceMemHandler(const cutensornetHandle_t handle, cutensornetDeviceMemHandler_t *devMemHandler)¶

Get the current device memory handler.

Parameters

handle – [in] Opaque handle holding cuTensorNet’s library context.
devMemHandler – [out] If previously set, the struct pointed to by handler is filled in, otherwise CUTENSORNET_STATUS_NO_DEVICE_ALLOCATOR is returned.

Error Management API¶

`cutensornetGetErrorString`¶

const char *cutensornetGetErrorString(cutensornetStatus_t error)¶

Returns the description string for an error code.

Remark

non-blocking, no reentrant, and thread-safe

Parameters: error – [in] Error code to convert to string.
Returns: the error string

Logger API¶

`cutensornetLoggerSetCallback`¶

cutensornetStatus_t cutensornetLoggerSetCallback(cutensornetLoggerCallback_t callback)¶

This function sets the logging callback routine.

Parameters: callback – [in] Pointer to a callback function. Check cutensornetLoggerCallback_t.

`cutensornetLoggerSetCallbackData`¶

cutensornetStatus_t cutensornetLoggerSetCallbackData(cutensornetLoggerCallbackData_t callback, void *userData)¶

This function sets the logging callback routine, along with user data.

Parameters

callback – [in] Pointer to a callback function. Check cutensornetLoggerCallbackData_t.
userData – [in] Pointer to user-provided data to be used by the callback.

`cutensornetLoggerSetFile`¶

cutensornetStatus_t cutensornetLoggerSetFile(FILE *file)¶

This function sets the logging output file.

Parameters: file – [in] An open file with write permission.

`cutensornetLoggerOpenFile`¶

cutensornetStatus_t cutensornetLoggerOpenFile(const char *logFile)¶

This function opens a logging output file in the given path.

Parameters: logFile – [in] Path to the logging output file.

`cutensornetLoggerSetLevel`¶

cutensornetStatus_t cutensornetLoggerSetLevel(int32_t level)¶

This function sets the value of the logging level.

Parameters

level – [in] Log level, should be one of the following:

Level	Summary	Long Description
“0”	Off	logging is disabled (default)
“1”	Errors	only errors will be logged
“2”	Performance Trace	API calls that launch CUDA kernels will log their parameters and important information
“3”	Performance Hints	hints that can potentially improve the application’s performance
“4”	Heuristics Trace	provides general information about the library execution, may contain details about heuristic status
“5”	API Trace	API Trace - API calls will log their parameter and important information

`cutensornetLoggerSetMask`¶

cutensornetStatus_t cutensornetLoggerSetMask(int32_t mask)¶

This function sets the value of the log mask.

Parameters

mask – [in] Value of the logging mask. Masks are defined as a combination (bitwise OR) of the following masks:

Level	Description
“0”	Off
“1”	Errors
“2”	Performance Trace
“4”	Performance Hints
“8”	Heuristics Trace
“16”	API Trace

Refer to cutensornetLoggerSetLevel() for details.

`cutensornetLoggerForceDisable`¶

cutensornetStatus_t cutensornetLoggerForceDisable()¶: This function disables logging for the entire run.

Versioning API¶

`cutensornetGetVersion`¶

size_t cutensornetGetVersion()¶: Returns Version number of the cuTensorNet library.

`cutensornetGetCudartVersion`¶

size_t cutensornetGetCudartVersion()¶

Returns version number of the CUDA runtime that cuTensorNet was compiled against.

Can be compared against the CUDA runtime version from cudaRuntimeGetVersion().

cuTensorNet Functions¶

Handle Management API¶

cutensornetCreate¶

cutensornetDestroy¶

Network Descriptor API¶

cutensornetCreateNetworkDescriptor¶

cutensornetDestroyNetworkDescriptor¶

cutensornetNetworkGetAttribute¶

cutensornetNetworkSetAttribute¶

cutensornetGetOutputTensorDetails¶

cutensornetGetOutputTensorDescriptor¶

Tensor Descriptor API¶

cutensornetCreateTensorDescriptor¶

cutensornetGetTensorDetails¶

cutensornetDestroyTensorDescriptor¶

Contraction Optimizer API¶

cutensornetCreateContractionOptimizerConfig¶

cutensornetDestroyContractionOptimizerConfig¶

cutensornetContractionOptimizerConfigGetAttribute¶

cutensornetContractionOptimizerConfigSetAttribute¶

cutensornetCreateContractionOptimizerInfo¶

cutensornetDestroyContractionOptimizerInfo¶

cutensornetContractionOptimize¶

cutensornetContractionOptimizerInfoGetAttribute¶

cutensornetContractionOptimizerInfoSetAttribute¶

cutensornetContractionOptimizerInfoGetPackedSize¶

cutensornetContractionOptimizerInfoPackData¶

cutensornetCreateContractionOptimizerInfoFromPackedData¶

cutensornetUpdateContractionOptimizerInfoFromPackedData¶

Contraction Plan API¶

cutensornetCreateContractionPlan¶

cutensornetDestroyContractionPlan¶

cutensornetContractionAutotune¶

cutensornetCreateContractionAutotunePreference¶

cutensornetContractionAutotunePreferenceGetAttribute¶

cutensornetContractionAutotunePreferenceSetAttribute¶

cutensornetDestroyContractionAutotunePreference¶

Workspace Management API¶

cutensornetCreateWorkspaceDescriptor¶

cutensornetWorkspaceComputeSizes¶

cutensornetWorkspaceComputeContractionSizes¶

cutensornetWorkspaceComputeQRSizes¶

cutensornetWorkspaceComputeSVDSizes¶

cutensornetWorkspaceComputeGateSplitSizes¶

cutensornetWorkspaceGetSize¶

cutensornetWorkspaceGetMemorySize¶

cutensornetWorkspaceSet¶

cutensornetWorkspaceSetMemory¶

cutensornetWorkspaceGet¶

cutensornetWorkspaceGetMemory¶

cutensornetWorkspacePurgeCache¶

cutensornetDestroyWorkspaceDescriptor¶

Network Contraction API¶

cutensornetContraction¶

cutensornetContractSlices¶

Gradient Computation API¶

cutensornetComputeGradientsBackward¶

Slice Group API¶

cutensornetCreateSliceGroupFromIDRange¶

cutensornetCreateSliceGroupFromIDs¶

cutensornetDestroySliceGroup¶

Approximate Tensor Network Execution API¶

cutensornetTensorQR¶

cutensornetTensorSVD¶

cutensornetGateSplit¶

Tensor SVD Config API¶

cutensornetCreateTensorSVDConfig¶

cutensornetDestroyTensorSVDConfig¶

cutensornetTensorSVDConfigGetAttribute¶

cutensornetTensorSVDConfigSetAttribute¶

Tensor SVD Info API¶

cutensornetCreateTensorSVDInfo¶

cutensornetDestroyTensorSVDInfo¶

cutensornetTensorSVDInfoGetAttribute¶

Distributed Parallelization API¶

cutensornetDistributedResetConfiguration¶

cutensornetDistributedGetNumRanks¶

cutensornetDistributedGetProcRank¶

cutensornetDistributedSynchronize¶

High-Level Tensor Network API¶

`cutensornetCreate`¶

`cutensornetDestroy`¶

`cutensornetCreateNetworkDescriptor`¶

`cutensornetDestroyNetworkDescriptor`¶

`cutensornetNetworkGetAttribute`¶

`cutensornetNetworkSetAttribute`¶

`cutensornetGetOutputTensorDetails`¶

`cutensornetGetOutputTensorDescriptor`¶

`cutensornetCreateTensorDescriptor`¶

`cutensornetGetTensorDetails`¶

`cutensornetDestroyTensorDescriptor`¶

`cutensornetCreateContractionOptimizerConfig`¶

`cutensornetDestroyContractionOptimizerConfig`¶

`cutensornetContractionOptimizerConfigGetAttribute`¶

`cutensornetContractionOptimizerConfigSetAttribute`¶

`cutensornetCreateContractionOptimizerInfo`¶

`cutensornetDestroyContractionOptimizerInfo`¶

`cutensornetContractionOptimize`¶

`cutensornetContractionOptimizerInfoGetAttribute`¶

`cutensornetContractionOptimizerInfoSetAttribute`¶

`cutensornetContractionOptimizerInfoGetPackedSize`¶

`cutensornetContractionOptimizerInfoPackData`¶

`cutensornetCreateContractionOptimizerInfoFromPackedData`¶

`cutensornetUpdateContractionOptimizerInfoFromPackedData`¶

`cutensornetCreateContractionPlan`¶

`cutensornetDestroyContractionPlan`¶

`cutensornetContractionAutotune`¶

`cutensornetCreateContractionAutotunePreference`¶

`cutensornetContractionAutotunePreferenceGetAttribute`¶

`cutensornetContractionAutotunePreferenceSetAttribute`¶

`cutensornetDestroyContractionAutotunePreference`¶

`cutensornetCreateWorkspaceDescriptor`¶

`cutensornetWorkspaceComputeSizes`¶

`cutensornetWorkspaceComputeContractionSizes`¶

`cutensornetWorkspaceComputeQRSizes`¶

`cutensornetWorkspaceComputeSVDSizes`¶

`cutensornetWorkspaceComputeGateSplitSizes`¶

`cutensornetWorkspaceGetSize`¶

`cutensornetWorkspaceGetMemorySize`¶

`cutensornetWorkspaceSet`¶

`cutensornetWorkspaceSetMemory`¶

`cutensornetWorkspaceGet`¶

`cutensornetWorkspaceGetMemory`¶

`cutensornetWorkspacePurgeCache`¶

`cutensornetDestroyWorkspaceDescriptor`¶

`cutensornetContraction`¶

`cutensornetContractSlices`¶

`cutensornetComputeGradientsBackward`¶

`cutensornetCreateSliceGroupFromIDRange`¶

`cutensornetCreateSliceGroupFromIDs`¶

`cutensornetDestroySliceGroup`¶

`cutensornetTensorQR`¶

`cutensornetTensorSVD`¶

`cutensornetGateSplit`¶

`cutensornetCreateTensorSVDConfig`¶

`cutensornetDestroyTensorSVDConfig`¶

`cutensornetTensorSVDConfigGetAttribute`¶

`cutensornetTensorSVDConfigSetAttribute`¶

`cutensornetCreateTensorSVDInfo`¶

`cutensornetDestroyTensorSVDInfo`¶

`cutensornetTensorSVDInfoGetAttribute`¶

`cutensornetDistributedResetConfiguration`¶

`cutensornetDistributedGetNumRanks`¶

`cutensornetDistributedGetProcRank`¶

`cutensornetDistributedSynchronize`¶

`cutensornetCreateState`¶

`cutensornetDestroyState`¶

`cutensornetStateApplyTensor`¶

`cutensornetStateApplyTensorOperator`¶

`cutensornetStateApplyControlledTensorOperator`¶

`cutensornetStateApplyNetworkOperator`¶

`cutensornetStateUpdateTensor`¶

`cutensornetStateUpdateTensorOperator`¶

`cutensornetStateConfigure`¶

`cutensornetStatePrepare`¶

`cutensornetStateGetInfo`¶

`cutensornetStateCompute`¶

`cutensornetGetOutputStateDetails`¶

`cutensornetStateInitializeMPS`¶

`cutensornetStateFinalizeMPS`¶