cuTensorNet Functions

Handle Management API

cutensornetCreate

cutensornetStatus_t cutensornetCreate(cutensornetHandle_t *handle)

Initializes the cuTensorNet library.

The device associated with a particular cuTensorNet handle is assumed to remain unchanged after the cutensornetCreate() call. In order for the cuTensorNet library to use a different device, the application must set the new device to be used by calling cudaSetDevice() and then create another cuTensorNet handle, which will be associated with the new device, by calling cutensornetCreate().

Remark

blocking, no reentrant, and thread-safe

Parameters

handle[out] Pointer to cutensornetHandle_t

Returns

CUTENSORNET_STATUS_SUCCESS on success and an error code otherwise


cutensornetDestroy

cutensornetStatus_t cutensornetDestroy(cutensornetHandle_t handle)

Destroys the cuTensorNet library handle.

This function releases resources used by the cuTensorNet library handle. This function is the last call with a particular handle to the cuTensorNet library. Calling any cuTensorNet function which uses cutensornetHandle_t after cutensornetDestroy() will return an error.

Parameters

handle[inout] Opaque handle holding cuTensorNet’s library context.


Network Descriptor API

cutensornetCreateNetworkDescriptor

cutensornetStatus_t cutensornetCreateNetworkDescriptor(const cutensornetHandle_t handle, int32_t numInputs, const int32_t numModesIn[], const int64_t *const extentsIn[], const int64_t *const stridesIn[], const int32_t *const modesIn[], const cutensornetTensorQualifiers_t qualifiersIn[], int32_t numModesOut, const int64_t extentsOut[], const int64_t stridesOut[], const int32_t modesOut[], cudaDataType_t dataType, cutensornetComputeType_t computeType, cutensornetNetworkDescriptor_t *descNet)

Initializes a cutensornetNetworkDescriptor_t, describing the connectivity (i.e., network topology) between the tensors.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyNetworkDescriptor() is called once descNet is no longer required.

Supported data-type combinations are:

Data type

Compute type

Tensor Core

CUDA_R_16F

CUTENSORNET_COMPUTE_32F

Volta+

CUDA_R_16BF

CUTENSORNET_COMPUTE_32F

Ampere+

CUDA_R_32F

CUTENSORNET_COMPUTE_32F

No

CUDA_R_32F

CUTENSORNET_COMPUTE_TF32

Ampere+

CUDA_R_32F

CUTENSORNET_COMPUTE_16BF

Ampere+

CUDA_R_32F

CUTENSORNET_COMPUTE_16F

Volta+

CUDA_R_64F

CUTENSORNET_COMPUTE_64F

Ampere+

CUDA_R_64F

CUTENSORNET_COMPUTE_32F

No

CUDA_C_32F

CUTENSORNET_COMPUTE_32F

No

CUDA_C_32F

CUTENSORNET_COMPUTE_TF32

Ampere+

CUDA_C_64F

CUTENSORNET_COMPUTE_64F

Ampere+

CUDA_C_64F

CUTENSORNET_COMPUTE_32F

No

Note

If stridesIn (stridesOut) is set to 0 (NULL), it means the input tensors (output tensor) are in the Fortran (column-major) layout.

Note

numModesOut can be set to -1 for cuTensorNet to infer the output modes based on the input modes, or to 0 to perform a full reduction.

Note

If qualifiersIn is set to 0 (NULL), cuTensorNet will use the defaults in cutensornetTensorQualifiers_t .

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • numInputs[in] Number of input tensors.

  • numModesIn[in] Array of size numInputs; numModesIn[i] denotes the number of modes available in the i-th tensor.

  • extentsIn[in] Array of size numInputs; extentsIn[i] has numModesIn[i] many entries with extentsIn[i][j] (j < numModesIn[i]) corresponding to the extent of the j-th mode of tensor i.

  • stridesIn[in] Array of size numInputs; stridesIn[i] has numModesIn[i] many entries with stridesIn[i][j] (j < numModesIn[i]) corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of tensor i.

  • modesIn[in] Array of size numInputs; modesIn[i] has numModesIn[i] many entries each entry corresponds to a mode. Each mode that does not appear in the input tensor is implicitly contracted.

  • qualifiersIn[in] Array of size numInputs; qualifiersIn[i] denotes the qualifiers of i-th input tensor. Refer to cutensornetTensorQualifiers_t

  • numModesOut[in] number of modes of the output tensor. On entry, if this value is -1 and the output modes are not provided, the network will infer the output modes. If this value is 0, the network is force reduced.

  • extentsOut[in] Array of size numModesOut; extentsOut[j] (j < numModesOut) corresponding to the extent of the j-th mode of the output tensor.

  • stridesOut[in] Array of size numModesOut; stridesOut[j] (j < numModesOut) corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of the output tensor.

  • modesOut[in] Array of size numModesOut; modesOut[j] denotes the j-th mode of the output tensor. output tensor.

  • dataType[in] Denotes the data type for all input an output tensors.

  • computeType[in] Denotes the compute type used throughout the computation.

  • descNet[out] Pointer to a cutensornetNetworkDescriptor_t.


cutensornetDestroyNetworkDescriptor

cutensornetStatus_t cutensornetDestroyNetworkDescriptor(cutensornetNetworkDescriptor_t desc)

Frees all the memory associated with the network descriptor.

Parameters

desc[inout] Opaque handle to a tensor network descriptor.


cutensornetGetOutputTensorDetails

cutensornetStatus_t cutensornetGetOutputTensorDetails(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, int32_t *numModes, size_t *dataSize, int32_t *modeLabels, int64_t *extents, int64_t *strides)

DEPRECATED: Gets the number of output modes, data size, modes, extents, and strides of the output tensor.

If all information regarding the output tensor is needed by the user, this function should be called twice (the first time to retrieve numModesOut for allocating memory, and the second to retrieve modesOut, extentsOut, and stridesOut).

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descNet[in] Pointer to a cutensornetNetworkDescriptor_t.

  • numModes[out] on return, holds the number of modes of the output tensor. Cannot be null.

  • dataSize[out] if not null on return, holds the size (in bytes) of the memory needed for the output tensor. Optionally, can be null.

  • modeLabels[out] if not null on return, holds the modes of the output tensor. Optionally, can be null.

  • extents[out] if not null on return, holds the extents of the output tensor. Optionally, can be null.

  • strides[out] if not null on return, holds the strides of the output tensor. Optionally, can be null.


cutensornetGetOutputTensorDescriptor

cutensornetStatus_t cutensornetGetOutputTensorDescriptor(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, cutensornetTensorDescriptor_t *outputTensorDesc)

Creates a cutensornetTensorDescriptor_t representing the output tensor of the network.

This function will create a descriptor pointed to by outputTensorDesc. The user is responsible for calling cutensornetDestroyTensorDescriptor to destroy the descriptor.

Parameters

Tensor Descriptor API

cutensornetCreateTensorDescriptor

cutensornetStatus_t cutensornetCreateTensorDescriptor(const cutensornetHandle_t handle, int32_t numModes, const int64_t extents[], const int64_t strides[], const int32_t modes[], cudaDataType_t dataType, cutensornetTensorDescriptor_t *descTensor)

Initializes a cutensornetTensorDescriptor_t, describing the information of a tensor.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyTensorDescriptor() is called once descTensor is no longer required.

Note

If strides is set to NULL, it means the tensor is in the Fortran (column-major) layout.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • numModes[in] The number of modes of the tensor.

  • extents[in] Array of size numModes; extents[j] corresponding to the extent of the j-th mode of the tensor.

  • strides[in] Array of size numModes; strides[j] corresponding to the linearized offset in physical memory between two logically-neighboring elements w.r.t the j-th mode of the tensor.

  • modes[in] Array of size numModes; modes[j] denotes the j-th mode of the tensor.

  • dataType[in] Denotes the data type for the tensor.

  • descTensor[out] Pointer to a cutensornetTensorDescriptor_t.


cutensornetGetTensorDetails

cutensornetStatus_t cutensornetGetTensorDetails(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t tensorDesc, int32_t *numModes, size_t *dataSize, int32_t *modeLabels, int64_t *extents, int64_t *strides)

Gets the number of output modes, data size, mode labels, extents, and strides of a tensor.

If all information regarding the tensor is needed by the user, this function should be called twice (the first time to retrieve numModes for allocating memory, and the second to retrieve modeLabels, extents, and strides).

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • tensorDesc[in] Opaque handle to a tensor descriptor.

  • numModes[out] On return, holds the number of modes of the tensor. Cannot be null.

  • dataSize[out] If not null on return, holds the size (in bytes) of the memory needed for the tensor. Optionally, can be null.

  • modeLabels[out] If not null on return, holds the modes of the tensor. Optionally, can be null.

  • extents[out] If not null on return, holds the extents of the tensor. Optionally, can be null.

  • strides[out] If not null on return, holds the strides of the tensor. Optionally, can be null.


cutensornetDestroyTensorDescriptor

cutensornetStatus_t cutensornetDestroyTensorDescriptor(cutensornetTensorDescriptor_t descTensor)

Frees all the memory associated with the tensor descriptor.

Parameters

descTensor[inout] Opaque handle to a tensor descriptor.


Contraction Optimizer API

cutensornetCreateContractionOptimizerConfig

cutensornetStatus_t cutensornetCreateContractionOptimizerConfig(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t *optimizerConfig)

Sets up the required hyper-optimization parameters for the contraction order solver (see cutensornetContractionOptimize())

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionOptimizerConfig() is called once optimizerConfig is no longer required.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • optimizerConfig[out] This data structure holds all information about the user-requested hyper-optimization parameters.


cutensornetDestroyContractionOptimizerConfig

cutensornetStatus_t cutensornetDestroyContractionOptimizerConfig(cutensornetContractionOptimizerConfig_t optimizerConfig)

Frees all the memory associated with optimizerConfig.

Parameters

optimizerConfig[inout] Opaque structure.


cutensornetContractionOptimizerConfigGetAttribute

cutensornetStatus_t cutensornetContractionOptimizerConfigGetAttribute(const cutensornetHandle_t handle, const cutensornetContractionOptimizerConfig_t optimizerConfig, cutensornetContractionOptimizerConfigAttributes_t attr, void *buf, size_t sizeInBytes)

Gets attributes of optimizerConfig.

Parameters
  • handle[in] Opaque handle holding cuTENSORNet’s library context.

  • optimizerConfig[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within optimizerConfig.

  • sizeInBytes[in] Size of buf (in bytes).


cutensornetContractionOptimizerConfigSetAttribute

cutensornetStatus_t cutensornetContractionOptimizerConfigSetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerConfig_t optimizerConfig, cutensornetContractionOptimizerConfigAttributes_t attr, const void *buf, size_t sizeInBytes)

Sets attributes of optimizerConfig.

Parameters
  • handle[in] Opaque handle holding cuTENSORNet’s library context.

  • optimizerConfig[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[in] This buffer (of size sizeInBytes) determines the value to which attr will be set.

  • sizeInBytes[in] Size of buf (in bytes).


cutensornetCreateContractionOptimizerInfo

cutensornetStatus_t cutensornetCreateContractionOptimizerInfo(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, cutensornetContractionOptimizerInfo_t *optimizerInfo)

Allocates resources for optimizerInfo.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionOptimizerInfo() is called once optimizerInfo is no longer required.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descNet[in] Describes the tensor network (i.e., its tensors and their connectivity) for which optimizerInfo is created.

  • optimizerInfo[out] Pointer to cutensornetContractionOptimizerInfo_t.


cutensornetDestroyContractionOptimizerInfo

cutensornetStatus_t cutensornetDestroyContractionOptimizerInfo(cutensornetContractionOptimizerInfo_t optimizerInfo)

Frees all the memory associated with optimizerInfo.

Parameters

optimizerInfo[inout] Opaque structure.


cutensornetContractionOptimize

cutensornetStatus_t cutensornetContractionOptimize(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerConfig_t optimizerConfig, uint64_t workspaceSizeConstraint, cutensornetContractionOptimizerInfo_t optimizerInfo)

Computes an “optimized” contraction order as well as slicing info (for more information see Overview section) for a given tensor network such that the total time to solution is minimized while adhering to the user-provided memory constraint.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descNet[in] Describes the topology of the tensor network (i.e., all tensors, their connectivity and modes).

  • optimizerConfig[in] Holds all hyper-optimization parameters that govern the search for an “optimal” contraction order.

  • workspaceSizeConstraint[in] Maximal device memory that will be provided by the user (i.e., cuTensorNet has to find a viable path/slicing solution within this user-defined constraint).

  • optimizerInfo[inout] On return, this object will hold all necessary information about the optimized path and the related slicing information. optimizerInfo will hold information including (see cutensornetContractionOptimizerInfoAttributes_t):

    • Total number of slices.

    • Total number of sliced modes.

    • Information about the sliced modes (i.e., the IDs of the sliced modes (see modesIn w.r.t. cutensornetCreateNetworkDescriptor()) as well as their extents (see Overview section for additional documentation).

    • Optimized path.

    • FLOP count.

    • Total number of elements in the largest intermediate tensor.

    • The mode labels for all intermediate tensors.

    • The estimated runtime and “effective” flops.


cutensornetContractionOptimizerInfoGetAttribute

cutensornetStatus_t cutensornetContractionOptimizerInfoGetAttribute(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetContractionOptimizerInfoAttributes_t attr, void *buf, size_t sizeInBytes)

Gets attributes of optimizerInfo.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • optimizerInfo[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within optimizeInfo.

  • sizeInBytes[in] Size of buf (in bytes).


cutensornetContractionOptimizerInfoSetAttribute

cutensornetStatus_t cutensornetContractionOptimizerInfoSetAttribute(const cutensornetHandle_t handle, cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetContractionOptimizerInfoAttributes_t attr, const void *buf, size_t sizeInBytes)

Sets attributes of optimizerInfo.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • optimizerInfo[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[in] This buffer (of size sizeInBytes) determines the value to which attr will be set.

  • sizeInBytes[in] Size of buf (in bytes).


cutensornetContractionOptimizerInfoGetPackedSize

cutensornetStatus_t cutensornetContractionOptimizerInfoGetPackedSize(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, size_t *sizeInBytes)

Gets the packed size of the optimizerInfo object.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • optimizerInfo[in] Opaque structure of type cutensornetContractionOptimizerInfo_t.

  • sizeInBytes[out] The packed size (in bytes).


cutensornetContractionOptimizerInfoPackData

cutensornetStatus_t cutensornetContractionOptimizerInfoPackData(const cutensornetHandle_t handle, const cutensornetContractionOptimizerInfo_t optimizerInfo, void *buffer, size_t sizeInBytes)

Packs the optimizerInfo object into the provided buffer.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • optimizerInfo[in] Opaque structure of type cutensornetContractionOptimizerInfo_t.

  • buffer[out] On return, this buffer holds the contents of optimizerInfo in packed form.

  • sizeInBytes[in] The size of the buffer (in bytes).


cutensornetCreateContractionOptimizerInfoFromPackedData

cutensornetStatus_t cutensornetCreateContractionOptimizerInfoFromPackedData(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const void *buffer, size_t sizeInBytes, cutensornetContractionOptimizerInfo_t *optimizerInfo)

Create an optimizerInfo object from the provided buffer.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descNet[in] Describes the tensor network (i.e., its tensors and their connectivity) for which optimizerInfo is created.

  • buffer[in] A buffer with the contents of optimizerInfo in packed form.

  • sizeInBytes[in] The size of the buffer (in bytes).

  • optimizerInfo[out] Pointer to cutensornetContractionOptimizerInfo_t.


cutensornetUpdateContractionOptimizerInfoFromPackedData

cutensornetStatus_t cutensornetUpdateContractionOptimizerInfoFromPackedData(const cutensornetHandle_t handle, const void *buffer, size_t sizeInBytes, cutensornetContractionOptimizerInfo_t optimizerInfo)

Update the provided optimizerInfo object from the provided buffer.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • buffer[in] A buffer with the contents of optimizerInfo in packed form.

  • sizeInBytes[in] The size of the buffer (in bytes).

  • optimizerInfo[inout] Opaque object of type cutensornetContractionOptimizerInfo_t that will be updated.


Contraction Plan API

cutensornetCreateContractionPlan

cutensornetStatus_t cutensornetCreateContractionPlan(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetContractionPlan_t *plan)

Initializes a cutensornetContractionPlan_t.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionPlan() is called once plan is no longer required.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descNet[in] Describes the tensor network (i.e., its tensors and their connectivity).

  • optimizerInfo[in] Opaque structure.

  • workDesc[in] Opaque structure describing the workspace. At the creation of the contraction plan, only the workspace size is needed; the pointer to the workspace memory may be left null. If a device memory handler is set, workDesc can be set either to null (in which case the “recommended” workspace size is inferred, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid cutensornetWorkspaceDescriptor_t with the desired workspace size set and a null workspace pointer, see Memory Management API section.

  • plan[out] cuTensorNet’s contraction plan holds all the information required to perform the tensor contractions; to be precise, it initializes a cutensorContractionPlan_t for each tensor contraction that is required to contract the entire tensor network.


cutensornetDestroyContractionPlan

cutensornetStatus_t cutensornetDestroyContractionPlan(cutensornetContractionPlan_t plan)

Frees all resources owned by plan.

Parameters

plan[inout] Opaque structure.


cutensornetContractionAutotune

cutensornetStatus_t cutensornetContractionAutotune(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, const cutensornetWorkspaceDescriptor_t workDesc, const cutensornetContractionAutotunePreference_t pref, cudaStream_t stream)

Auto-tunes the contraction plan to find the best cutensorContractionPlan_t for each pair-wise contraction.

Note

This function is blocking due to the nature of the auto-tuning process.

Note

Input and output data pointers are recommended to be 256-byte aligned for best performance.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • plan[inout] The plan must already be created (see cutensornetCreateContractionPlan()); the individual contraction plans will be fine-tuned.

  • rawDataIn[in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor()); rawDataIn[i] points to the data associated with the i-th input tensor (in device memory).

  • rawDataOut[out] Points to the raw data of the output tensor (in device memory).

  • workDesc[in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than both the minimum needed and the value provided at plan creation). See cutensornetCreateContractionPlan(), cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory(). If a device memory handler is set, the workDesc can be set to null, or the workspace pointer in workDesc can be set to null, and the workspace size can be set either to 0 (in which case the “recommended” size is used, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid size. A workspace of the specified size will be drawn from the user’s mempool and released back once done.

  • pref[in] Controls the auto-tuning process and gives the user control over how much time is spent in this routine.

  • stream[in] The CUDA stream on which the computation is performed.


cutensornetCreateContractionAutotunePreference

cutensornetStatus_t cutensornetCreateContractionAutotunePreference(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t *autotunePreference)

Sets up the required auto-tune parameters for the contraction plan.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyContractionAutotunePreference() is called once autotunePreference is no longer required.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • autotunePreference[out] This data structure holds all information about the user-requested auto-tune parameters.


cutensornetContractionAutotunePreferenceGetAttribute

cutensornetStatus_t cutensornetContractionAutotunePreferenceGetAttribute(const cutensornetHandle_t handle, const cutensornetContractionAutotunePreference_t autotunePreference, cutensornetContractionAutotunePreferenceAttributes_t attr, void *buf, size_t sizeInBytes)

Gets attributes of autotunePreference.

Parameters
  • handle[in] Opaque handle holding cuTENSORNet’s library context.

  • autotunePreference[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within autotunePreference.

  • sizeInBytes[in] Size of buf (in bytes).


cutensornetContractionAutotunePreferenceSetAttribute

cutensornetStatus_t cutensornetContractionAutotunePreferenceSetAttribute(const cutensornetHandle_t handle, cutensornetContractionAutotunePreference_t autotunePreference, cutensornetContractionAutotunePreferenceAttributes_t attr, const void *buf, size_t sizeInBytes)

Sets attributes of autotunePreference.

Parameters
  • handle[in] Opaque handle holding cuTENSORNet’s library context.

  • autotunePreference[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[in] This buffer (of size sizeInBytes) determines the value to which attr will be set.

  • sizeInBytes[in] Size of buf (in bytes).


cutensornetDestroyContractionAutotunePreference

cutensornetStatus_t cutensornetDestroyContractionAutotunePreference(cutensornetContractionAutotunePreference_t autotunePreference)

Frees all the memory associated with autotunePreference.

Parameters

autotunePreference[inout] Opaque structure.


Workspace Management API

cutensornetCreateWorkspaceDescriptor

cutensornetStatus_t cutensornetCreateWorkspaceDescriptor(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t *workDesc)

Creates a workspace descriptor that holds information about the user provided memory buffer.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • workDesc[out] Pointer to the opaque workspace descriptor.


cutensornetWorkspaceComputeSizes

cutensornetStatus_t cutensornetWorkspaceComputeSizes(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetWorkspaceDescriptor_t workDesc)

DEPRECATED: Computes the workspace size needed to contract the input tensor network using the provided contraction path.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descNet[in] Describes the tensor network (i.e., its tensors and their connectivity).

  • optimizerInfo[in] Opaque structure.

  • workDesc[out] The workspace descriptor in which the information is collected.


cutensornetWorkspaceComputeContractionSizes

cutensornetStatus_t cutensornetWorkspaceComputeContractionSizes(const cutensornetHandle_t handle, const cutensornetNetworkDescriptor_t descNet, const cutensornetContractionOptimizerInfo_t optimizerInfo, cutensornetWorkspaceDescriptor_t workDesc)

Computes the workspace size needed to contract the input tensor network using the provided contraction path.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descNet[in] Describes the tensor network (i.e., its tensors and their connectivity).

  • optimizerInfo[in] Opaque structure.

  • workDesc[out] The workspace descriptor in which the information is collected.


cutensornetWorkspaceComputeQRSizes

cutensornetStatus_t cutensornetWorkspaceComputeQRSizes(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorIn, const cutensornetTensorDescriptor_t descTensorQ, const cutensornetTensorDescriptor_t descTensorR, cutensornetWorkspaceDescriptor_t workDesc)

Computes the workspace size needed to perform the tensor QR operation.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descTensorIn[in] Describes the modes, extents and other metadata information for a tensor.

  • descTensorQ[in] Describes the modes, extents and other metadata information for the output tensor Q.

  • descTensorR[in] Describes the modes, extents and other metadata information for the output tensor R.

  • workDesc[out] The workspace descriptor in which the information is collected.


cutensornetWorkspaceComputeSVDSizes

cutensornetStatus_t cutensornetWorkspaceComputeSVDSizes(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorIn, const cutensornetTensorDescriptor_t descTensorU, const cutensornetTensorDescriptor_t descTensorV, const cutensornetTensorSVDConfig_t svdConfig, cutensornetWorkspaceDescriptor_t workDesc)

Computes the workspace size needed to perform the tensor SVD operation.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descTensorIn[in] Describes the modes, extents and other metadata information for a tensor.

  • descTensorU[in] Describes the modes, extents and other metadata information for the output tensor U.

  • descTensorV[in] Describes the modes, extents and other metadata information for the output tensor V.

  • svdConfig[in] This data structure holds the user-requested svd parameters.

  • workDesc[out] The workspace descriptor in which the information is collected.


cutensornetWorkspaceComputeGateSplitSizes

cutensornetStatus_t cutensornetWorkspaceComputeGateSplitSizes(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorInA, const cutensornetTensorDescriptor_t descTensorInB, const cutensornetTensorDescriptor_t descTensorInG, const cutensornetTensorDescriptor_t descTensorU, const cutensornetTensorDescriptor_t descTensorV, const cutensornetGateSplitAlgo_t gateAlgo, const cutensornetTensorSVDConfig_t svdConfig, cutensornetComputeType_t computeType, cutensornetWorkspaceDescriptor_t workDesc)

Computes the workspace size needed to perform the gating operation.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descTensorInA[in] Describes the modes, extents, and other metadata information of the input tensor A.

  • descTensorInB[in] Describes the modes, extents, and other metadata information of the input tensor B.

  • descTensorInG[in] Describes the modes, extents, and other metadata information of the input gate tensor.

  • descTensorU[in] Describes the modes, extents, and other metadata information of the output U tensor. The extents of uncontracted modes are expected to be consistent with descTensorInA and descTensorInG.

  • descTensorV[in] Describes the modes, extents, and other metadata information of the output V tensor. The extents of uncontracted modes are expected to be consistent with descTensorInB and descTensorInG.

  • gateAlgo[in] The algorithm to use for splitting the gate tensor onto tensor A and B.

  • svdConfig[in] Opaque structure holding the user-requested SVD parameters.

  • computeType[in] Denotes the compute type used throughout the computation.

  • workDesc[out] Opaque structure describing the workspace.


cutensornetWorkspaceGetSize

cutensornetStatus_t cutensornetWorkspaceGetSize(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetWorksizePref_t workPref, cutensornetMemspace_t memSpace, uint64_t *workspaceSize)

DEPRECATED: Retrieves the needed workspace size for the given workspace preference and memory space.

The needed sizes for different tasks must be pre-calculated by calling the corresponding API, e.g, cutensornetWorkspaceComputeContractionSizes(), cutensornetWorkspaceComputeQRSizes(), cutensornetWorkspaceComputeSVDSizes() and cutensornetWorkspaceComputeGateSplitSizes().

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • workDesc[in] Opaque structure describing the workspace.

  • workPref[in] Preference of workspace for planning.

  • memSpace[in] The memory space where the workspace is allocated.

  • workspaceSize[out] Needed workspace size.


cutensornetWorkspaceGetMemorySize

cutensornetStatus_t cutensornetWorkspaceGetMemorySize(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetWorksizePref_t workPref, cutensornetMemspace_t memSpace, cutensornetWorkspaceKind_t workKind, int64_t *memorySize)

Retrieves the needed workspace size for the given workspace preference, memory space, workspace kind.

The needed sizes for different tasks must be pre-calculated by calling the corresponding API, e.g, cutensornetWorkspaceComputeContractionSizes(), cutensornetWorkspaceComputeQRSizes(), cutensornetWorkspaceComputeSVDSizes() and cutensornetWorkspaceComputeGateSplitSizes().

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • workDesc[in] Opaque structure describing the workspace.

  • workPref[in] Preference of workspace for planning.

  • memSpace[in] The memory space where the workspace is allocated.

  • workKind[in] The kind of workspace.

  • memorySize[out] Needed workspace size.


cutensornetWorkspaceSet

cutensornetStatus_t cutensornetWorkspaceSet(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, void *const workspacePtr, uint64_t workspaceSize)

DEPRECATED: Sets the memory address and workspace size of workspace provided by user.

A workspace is valid in the following cases:

  • workspacePtr is valid and workspaceSize > 0

  • workspacePtr is null and workspaceSize > 0 (used during cutensornetCreateContractionPlan() to provide the available workspace).

  • workspacePtr is null and workspaceSize = 0 (workspace memory will be drawn from the user’s mempool)

A workspace will be validated against the minimal required at usage (cutensornetCreateContractionPlan(), cutensornetContractionAutotune(), cutensornetContraction())

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • workDesc[inout] Opaque structure describing the workspace.

  • memSpace[in] The memory space where the workspace is allocated.

  • workspacePtr[in] Workspace memory pointer, may be null.

  • workspaceSize[in] Workspace size, must be >= 0.


cutensornetWorkspaceSetMemory

cutensornetStatus_t cutensornetWorkspaceSetMemory(const cutensornetHandle_t handle, cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, cutensornetWorkspaceKind_t workKind, void *const memoryPtr, int64_t memorySize)

Sets the memory address and workspace size of the workspace provided by user.

A workspace is valid in the following cases:

  • memoryPtr is valid and memorySize > 0

  • memoryPtr is null and memorySize > 0: used to indicate memory with the indicated memorySize should be drawn from the mempool, or for cutensornetCreateContractionPlan() to indicate the available workspace size.

  • memoryPtr is null and memorySize = 0: indicates the workspace of the specified kind is disabled (currently applies to CACHE kind only).

  • memoryPtr is null and memorySize < 0: indicates workspace memory should be drawn from the user’s mempool with the CUTENSORNET_WORKSIZE_PREF_RECOMMENDED size (see cutensornetWorksizePref_t).

The memorySize will be validated against the minimal required at usage (cutensornetCreateContractionPlan(), cutensornetContractionAutotune(), cutensornetContraction(), cutensornetContractSlices())

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • workDesc[inout] Opaque structure describing the workspace.

  • memSpace[in] The memory space where the workspace is allocated.

  • workKind[in] The kind of workspace.

  • memoryPtr[in] Workspace memory pointer, may be null.

  • memorySize[in] Workspace size.


cutensornetWorkspaceGet

cutensornetStatus_t cutensornetWorkspaceGet(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, void **workspacePtr, uint64_t *workspaceSize)

DEPRECATED: Retrieves the memory address and workspace size of workspace hosted in the workspace descriptor.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • workDesc[in] Opaque structure describing the workspace.

  • memSpace[in] The memory space where the workspace is allocated.

  • workspacePtr[out] Workspace memory pointer.

  • workspaceSize[out] Workspace size.


cutensornetWorkspaceGetMemory

cutensornetStatus_t cutensornetWorkspaceGetMemory(const cutensornetHandle_t handle, const cutensornetWorkspaceDescriptor_t workDesc, cutensornetMemspace_t memSpace, cutensornetWorkspaceKind_t workKind, void **memoryPtr, int64_t *memorySize)

Retrieves the memory address and workspace size of workspace hosted in the workspace descriptor.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • workDesc[in] Opaque structure describing the workspace.

  • memSpace[in] The memory space where the workspace is allocated.

  • workKind[in] The kind of workspace.

  • memoryPtr[out] Workspace memory pointer.

  • memorySize[out] Workspace size.


cutensornetDestroyWorkspaceDescriptor

cutensornetStatus_t cutensornetDestroyWorkspaceDescriptor(cutensornetWorkspaceDescriptor_t desc)

Frees the workspace descriptor.

Note that this API does not free the memory provided by cutensornetWorkspaceSetMemory().

Parameters

desc[inout] Opaque structure.


Network Contraction API

cutensornetContraction

cutensornetStatus_t cutensornetContraction(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, const cutensornetWorkspaceDescriptor_t workDesc, int64_t sliceId, cudaStream_t stream)

DEPRECATED: Performs the actual contraction of the tensor network.

Note

If multiple slices are created, the order of contracting over slices using cutensornetContraction() should be ascending starting from slice 0. If parallelizing over slices manually (in any fashion: streams, devices, processes, etc.), please make sure the output tensors (that are subject to a global reduction) are zero-initialized.

Note

Input and output data pointers are recommended to be 256-byte aligned for best performance.

Note

This function is asynchronous w.r.t. the calling CPU thread. The user should guarantee that the memory buffer provided in workDesc is valid until a synchronization with the stream or the device is executed.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • plan[inout] Encodes the execution of a tensor network contraction (see cutensornetCreateContractionPlan() and cutensornetContractionAutotune()). Some internal meta-data may be updated upon contraction.

  • rawDataIn[in] Array of N pointers (N being the number of input tensors specified cutensornetCreateNetworkDescriptor()); rawDataIn[i] points to the data associated with the i-th input tensor (in device memory).

  • rawDataOut[out] Points to the raw data of the output tensor (in device memory).

  • workDesc[in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than both the minimum needed and the value provided at plan creation). See cutensornetCreateContractionPlan(), cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory()). If a device memory handler is set, then workDesc can be set to null, or the workspace pointer in workDesc can be set to null, and the workspace size can be set either to 0 (in which case the “recommended” size is used, see CUTENSORNET_WORKSIZE_PREF_RECOMMENDED) or to a valid size. A workspace of the specified size will be drawn from the user’s mempool and released back once done.

  • sliceId[in] The ID of the slice that is currently contracted (this value ranges between 0 and optimizerInfo.numSlices); use 0 if no slices are used.

  • stream[in] The CUDA stream on which the computation is performed.


cutensornetContractSlices

cutensornetStatus_t cutensornetContractSlices(const cutensornetHandle_t handle, cutensornetContractionPlan_t plan, const void *const rawDataIn[], void *rawDataOut, int32_t accumulateOutput, const cutensornetWorkspaceDescriptor_t workDesc, const cutensornetSliceGroup_t sliceGroup, cudaStream_t stream)

Performs the actual contraction of the tensor network.

Note

Input and output data pointers are recommended to be 256-byte aligned for best performance.

Warning

In the current release, this function will synchronize the stream in case distributed execution is activated (via cutensornetDistributedResetConfiguration)

Parameters

Slice Group API

cutensornetCreateSliceGroupFromIDRange

cutensornetStatus_t cutensornetCreateSliceGroupFromIDRange(const cutensornetHandle_t handle, int64_t sliceIdStart, int64_t sliceIdStop, int64_t sliceIdStep, cutensornetSliceGroup_t *sliceGroup)

Create a cutensornetSliceGroup_t object from a range, which produces a sequence of slice IDs from the specified start (inclusive) to the specified stop (exclusive) values with the specified step. The sequence can be increasing or decreasing depending on the start and stop values.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • sliceIdStart[in] The start slice ID.

  • sliceIdStop[in] The final slice ID is the largest (smallest) integer that excludes this value and all those above (below) for an increasing (decreasing) sequence.

  • sliceIdStep[in] The step size between two successive slice IDs. A negative step size should be specified for a decreasing sequence.

  • sliceGroup[out] Opaque object specifying the slice IDs.


cutensornetCreateSliceGroupFromIDs

cutensornetStatus_t cutensornetCreateSliceGroupFromIDs(const cutensornetHandle_t handle, const int64_t *beginIDSequence, const int64_t *endIDSequence, cutensornetSliceGroup_t *sliceGroup)

Create a cutensornetSliceGroup_t object from a sequence of slice IDs. Duplicates in the input slice ID sequence will be removed.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • beginIDSequence[in] A pointer to the beginning of the slice ID sequence.

  • endIDSequence[in] A pointer to the end of the slice ID sequence.

  • sliceGroup[out] Opaque object specifying the slice IDs.


cutensornetDestroySliceGroup

cutensornetStatus_t cutensornetDestroySliceGroup(cutensornetSliceGroup_t sliceGroup)

Releases the resources associated with a cutensornetSliceGroup_t object and sets its value to null.

Parameters

sliceGroup[inout] Opaque object specifying the slices to be contracted (see cutensornetCreateSliceGroupFromIDRange() and cutensornetCreateSliceGroupFromIDs()).


Approximate Tensor Network Execution API

cutensornetTensorQR

cutensornetStatus_t cutensornetTensorQR(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorIn, const void *const rawDataIn, const cutensornetTensorDescriptor_t descTensorQ, void *q, const cutensornetTensorDescriptor_t descTensorR, void *r, const cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t stream)

Performs QR decomposition of a tensor.

The partition of all input modes in descTensorIn is specified in descTensorQ and descTensorR. descTensorQ and descTensorR are expected to share exactly one mode and the extent of that mode shall not exceed the minimum of m (row dimension) and n (column dimension) of the equivalent combined matrix QR.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descTensorIn[in] Describes the modes, extents, and other metadata information of a tensor.

  • rawDataIn[in] Pointer to the raw data of the input tensor (in device memory).

  • descTensorQ[in] Describes the modes, extents, and other metadata information of the output tensor Q.

  • q[out] Pointer to the output tensor data Q (in device memory).

  • descTensorR[in] Describes the modes, extents, and other metadata information of the output tensor R.

  • r[out] Pointer to the output tensor data R (in device memory).

  • workDesc[in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than the minimum needed). See cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory().

  • stream[in] The CUDA stream on which the computation is performed.


cutensornetTensorSVD

cutensornetStatus_t cutensornetTensorSVD(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorIn, const void *const rawDataIn, cutensornetTensorDescriptor_t descTensorU, void *u, void *s, cutensornetTensorDescriptor_t descTensorV, void *v, const cutensornetTensorSVDConfig_t svdConfig, cutensornetTensorSVDInfo_t svdInfo, const cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t stream)

Performs SVD decomposition of a tensor.

The partition of all input modes in descTensorIn is specified in descTensorU and descTensorV. descTensorU and descTensorV are expected to share exactly one mode. The extent of the shared mode shall not exceed the minimum of m (row dimension) and n (column dimension) for the equivalent combined matrix SVD. The following variants of tensor SVD are supported:

  • 1. Exact SVD: This can be specified by setting the extent of the shared mode in descTensorU and descTensorV to be the mininum of m and n, and setting svdConfig to NULL.

  • 2. SVD with fixed extent truncation: This can be specified by setting the extent of the shared mode in descTensorU and descTensorV to be lower than the mininum of m and n.

  • 3. SVD with value-based truncation: This can be specified by setting CUTENSORNET_TENSOR_SVD_CONFIG_ABS_CUTOFF or CUTENSORNET_TENSOR_SVD_CONFIG_REL_CUTOFF attribute of svdConfig.

  • 4. SVD with a combination of fixed extent and value-based truncation as described above.

Note

In the case of exact SVD or SVD with fixed extent truncation, descTensorU and descTensorV will remain constant after the execution. The data in u and v will respect the extent and stride in these tensor descriptors.

Note

When value-based truncation is requested in svdConfig, cutensornetTensorSVD searches for the minimal extent that satifies both the value-based truncation and fixed extent requirement. If the resulting extent is found to be the same as the one specified in U/V tensor descriptors, the extent and stride from the tensor descriptors will be respected. If the resulting extent is found to be lower than the one specified in U/V tensor descriptors, the data in u and v will adopt a new Fortran-layout matching the reduced extent found. The extent and stride in descTensorU and descTensorV will also be overwritten to reflect this change. The user can query the reduced extent with cutensornetTensorSVDInfoGetAttribute() or cutensornetGetTensorDetails() (which also returns the new strides).

Note

As the reduced size for value-based truncation is not known until runtime, the user should always allocate based on the full data size specified by the initial descTensorU and descTensorV for u and v.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descTensorIn[in] Describes the modes, extents, and other metadata information of a tensor.

  • rawDataIn[in] Pointer to the raw data of the input tensor (in device memory).

  • descTensorU[inout] Describes the modes, extents, and other metadata information of the output tensor U. The extents for uncontracted modes are expected to be consistent with descTensorIn.

  • u[out] Pointer to the output tensor data U (in device memory).

  • s[out] Pointer to the output tensor data S (in device memory). Can be NULL when the CUTENSORNET_TENSOR_SVD_CONFIG_S_PARTITION attribute of svdConfig is not set to default (CUTENSORNET_TENSOR_SVD_PARTITION_NONE).

  • descTensorV[inout] Describes the modes, extents, and other metadata information of the output tensor V.

  • v[out] Pointer to the output tensor data V (in device memory).

  • svdConfig[in] This data structure holds the user-requested SVD parameters. Can be NULL if users do not need to perform value-based truncation or singular value partitioning.

  • svdInfo[out] Opaque structure holding all information about the trucation at runtime. Can be NULL if runtime information on singular value truncation is not needed.

  • workDesc[in] Opaque structure describing the workspace. The provided workspace must be valid (the workspace size must be the same as or larger than the minimum needed). See cutensornetWorkspaceGetMemorySize() & cutensornetWorkspaceSetMemory().

  • stream[in] The CUDA stream on which the computation is performed.


cutensornetGateSplit

cutensornetStatus_t cutensornetGateSplit(const cutensornetHandle_t handle, const cutensornetTensorDescriptor_t descTensorInA, const void *rawDataInA, const cutensornetTensorDescriptor_t descTensorInB, const void *rawDataInB, const cutensornetTensorDescriptor_t descTensorInG, const void *rawDataInG, cutensornetTensorDescriptor_t descTensorU, void *u, void *s, cutensornetTensorDescriptor_t descTensorV, void *v, const cutensornetGateSplitAlgo_t gateAlgo, const cutensornetTensorSVDConfig_t svdConfig, cutensornetComputeType_t computeType, cutensornetTensorSVDInfo_t svdInfo, const cutensornetWorkspaceDescriptor_t workDesc, cudaStream_t stream)

Performs gate split operation.

descTensorInA, descTensorInB, and descTensorInG are expected to form a fully connected graph where the uncontracted modes are partitioned onto descTensorU and descTensorV via tensor SVD. descTensorU and descTensorV are expected to share exactly one mode. The extent of that mode shall not exceed the minimum of m (row dimension) and n (column dimension) of the smallest equivalent matrix SVD problem.

Note

The options for truncation and the treatment of extent and stride follows the same logic as tensor SVD, see cutensornetTensorSVD().

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • descTensorInA[in] Describes the modes, extents, and other metadata information of the input tensor A.

  • rawDataInA[in] Pointer to the raw data of the input tensor A (in device memory).

  • descTensorInB[in] Describes the modes, extents, and other metadata information of the input tensor B.

  • rawDataInB[in] Pointer to the raw data of the input tensor B (in device memory).

  • descTensorInG[in] Describes the modes, extents, and other metadata information of the input gate tensor.

  • rawDataInG[in] Pointer to the raw data of the input gate tensor G (in device memory).

  • descTensorU[in] Describes the modes, extents, and other metadata information of the output U tensor. The extents of uncontracted modes are expected to be consistent with descTensorInA and descTensorInG.

  • u[out] Pointer to the output tensor data U (in device memory).

  • s[out] Pointer to the output tensor data S (in device memory). Can be NULL when the CUTENSORNET_TENSOR_SVD_CONFIG_S_PARTITION attribute of svdConfig is not set to default (CUTENSORNET_TENSOR_SVD_PARTITION_NONE).

  • descTensorV[in] Describes the modes, extents, and other metadata information of the output V tensor. The extents of uncontracted modes are expected to be consistent with descTensorInB and descTensorInG.

  • v[out] Pointer to the output tensor data V (in device memory).

  • gateAlgo[in] The algorithm to use for splitting the gate tensor into tensor A and B.

  • svdConfig[in] Opaque structure holding the user-requested SVD parameters.

  • computeType[in] Denotes the compute type used throughout the computation.

  • svdInfo[out] Opaque structure holding all information about the truncation at runtime.

  • workDesc[in] Opaque structure describing the workspace.

  • stream[in] The CUDA stream on which the computation is performed.


Tensor SVD Config API

cutensornetCreateTensorSVDConfig

cutensornetStatus_t cutensornetCreateTensorSVDConfig(const cutensornetHandle_t handle, cutensornetTensorSVDConfig_t *svdConfig)

Sets up the options for singular value decomposition and truncation.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyTensorSVDConfig() is called once svdConfig is no longer required.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • svdConfig[out] This data structure holds the user-requested svd parameters.


cutensornetDestroyTensorSVDConfig

cutensornetStatus_t cutensornetDestroyTensorSVDConfig(cutensornetTensorSVDConfig_t svdConfig)

Frees all the memory associated with the tensor svd configuration.

Parameters

svdConfig[inout] Opaque handle to a tensor svd configuration.


cutensornetTensorSVDConfigGetAttribute

cutensornetStatus_t cutensornetTensorSVDConfigGetAttribute(const cutensornetHandle_t handle, const cutensornetTensorSVDConfig_t svdConfig, cutensornetTensorSVDConfigAttributes_t attr, void *buf, size_t sizeInBytes)

Gets attributes of svdConfig.

Parameters
  • handle[in] Opaque handle holding cuTENSORNet’s library context.

  • svdConfig[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within svdConfig.

  • sizeInBytes[in] Size of buf (in bytes).


cutensornetTensorSVDConfigSetAttribute

cutensornetStatus_t cutensornetTensorSVDConfigSetAttribute(const cutensornetHandle_t handle, cutensornetTensorSVDConfig_t svdConfig, cutensornetTensorSVDConfigAttributes_t attr, const void *buf, size_t sizeInBytes)

Sets attributes of svdConfig.

Parameters
  • handle[in] Opaque handle holding cuTENSORNet’s library context.

  • svdConfig[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[in] This buffer (of size sizeInBytes) determines the value to which attr will be set.

  • sizeInBytes[in] Size of buf (in bytes).


Tensor SVD Info API

cutensornetCreateTensorSVDInfo

cutensornetStatus_t cutensornetCreateTensorSVDInfo(const cutensornetHandle_t handle, cutensornetTensorSVDInfo_t *svdInfo)

Sets up the information for singular value decomposition.

Note that this function allocates data on the heap; hence, it is critical that cutensornetDestroyTensorSVDInfo() is called once svdInfo is no longer required.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • svdInfo[out] This data structure holds all information about the trucation at runtime.


cutensornetDestroyTensorSVDInfo

cutensornetStatus_t cutensornetDestroyTensorSVDInfo(cutensornetTensorSVDInfo_t svdInfo)

Frees all the memory associated with the TensorSVDInfo object.

Parameters

svdInfo[inout] Opaque handle to a TensorSVDInfo object.


cutensornetTensorSVDInfoGetAttribute

cutensornetStatus_t cutensornetTensorSVDInfoGetAttribute(const cutensornetHandle_t handle, const cutensornetTensorSVDInfo_t svdInfo, cutensornetTensorSVDInfoAttributes_t attr, void *buf, size_t sizeInBytes)

Gets attributes of svdInfo.

Parameters
  • handle[in] Opaque handle holding cuTENSORNet’s library context.

  • svdInfo[in] Opaque structure that is accessed.

  • attr[in] Specifies the attribute that is requested.

  • buf[out] On return, this buffer (of size sizeInBytes) holds the value that corresponds to attr within svdConfig.

  • sizeInBytes[in] Size of buf (in bytes).


Distributed Parallelization API

cutensornetDistributedResetConfiguration

cutensornetStatus_t cutensornetDistributedResetConfiguration(cutensornetHandle_t handle, const void *commPtr, size_t commSize)

Resets the distributed MPI parallelization configuration.

This function accepts a user-provided MPI communicator in a type-erased form and stores a copy of it inside the cuTensorNet library handle. The provided MPI communicator must be explicitly created by calling MPI_Comm_dup (please see the MPI specification). The subsequent calls to the contraction path finder, contraction plan autotuning, and contraction execution will be parallelized across all MPI processes in the provided MPI communicator. The provided MPI communicator is owned by the user, it should stay alive until the next reset call with a different MPI communicator. If NULL is provided as the pointer to the MPI communicator, no parallelization will be applied to the above mentioned procedures such that those procedures will execute redundantly across all MPI processes. As an example, please refer to the tensornet_example_mpi_auto.cu sample.

To enable distributed parallelism, cuTensorNet requires users to set an environment variable $CUTENSORNET_COMM_LIB containing the path to a shared library wrapping the communication primitives. For MPI users, we ship a wrapper file cutensornet_distributed_interface_mpi.c that can be compiled against the MPI library. cuTensorNet will use the included function pointers to perform inter-process communication.

Warning

This is a collective call that must be executed by all MPI processes. Note that one can still provide different (non-NULL) MPI communicators to different subgroups of MPI processes (to create concurrent cuTensorNet distributed subgroups).

Warning

The provided MPI communicator must not be used by more than one cuTensorNet library handle. This is automatically ensured by using MPI_Comm_dup.

Warning

The current library implementation assumes one GPU instance per MPI rank since the cutensornet library handle is associated with a single GPU instance. In case of multiple GPUs per node, each MPI process running on the same node may still see all GPU devices if CUDA_VISIBLE_DEVICES was not set to provide an exclusive access to each GPU. In such a case, the cutensornet library runtime will assign GPU #(processRank % numVisibleDevices), where processRank is the rank of the current process in its MPI communicator, and numVisibleDevices is the number of GPU devices visible to the current MPI process. The assigned GPU must coincide with the one associated with the cutensornet library handle, otherwise resulting in an error. To ensure consistency, the user must call cudaSetDevice in each MPI process to select the correct GPU device prior to creating a cutensornet library handle.

Warning

It is user’s responsibility to ensure that each MPI process in each provided MPI communicator executes exactly the same sequence of cutensornet API calls, which otherwise will result in an undefined behavior.

Parameters
  • handle[in] cuTensorNet library handle.

  • commPtr[in] A pointer to the provided MPI communicator created by MPI_Comm_dup.

  • commSize[in] The size of the provided MPI communicator: sizeof(MPI_Comm).


cutensornetDistributedGetNumRanks

cutensornetStatus_t cutensornetDistributedGetNumRanks(const cutensornetHandle_t handle, int32_t *numRanks)

Queries the number of MPI ranks in the current distributed MPI configuration.

Warning

The number of ranks corresponds to the MPI communicator used by the current MPI process. If different subgroups of MPI processes used different MPI communicators, the reported number will refer to their specific MPI communicators.

Parameters
  • handle[in] cuTensorNet library handle.

  • numRanks[out] Number of MPI ranks in the current distributed MPI configuration.


cutensornetDistributedGetProcRank

cutensornetStatus_t cutensornetDistributedGetProcRank(const cutensornetHandle_t handle, int32_t *procRank)

Queries the rank of the current MPI process in the current distributed MPI configuration.

Warning

The MPI process rank corresponds to the MPI communicator used by that MPI process. If different subgroups of MPI processes used different MPI communicators, the reported number will refer to their specific MPI communicators.

Parameters
  • handle[in] cuTensorNet library handle.

  • procRank[out] Rank of the current MPI process in the current distributed MPI configuration.


cutensornetDistributedSynchronize

cutensornetStatus_t cutensornetDistributedSynchronize(const cutensornetHandle_t handle)

Globally synchronizes all MPI processes in the current distributed MPI configuration, ensuring that all preceding cutensornet API calls have completed across all MPI processes.

Warning

This is a collective call that must be executed by all MPI processes.

Warning

Prior to performing the global synchronization, the user is still required to synchronize GPU operations locally (via CUDA stream synchronization).

Parameters

handle[in] cuTensorNet library handle.


Memory Management API

A stream-ordered memory allocator (or mempool for short) allocates/deallocates memory asynchronously from/to a mempool in a stream-ordered fashion, meaning memory operations and computations enqueued on the streams have a well-defined inter- and intra- stream dependency. There are several well-implemented stream-ordered mempools available, such as cudaMemPool_t that is built-in at the CUDA driver level since CUDA 11.2 (so that all CUDA applications in the same process can easily share the same pool, see here) and the RAPIDS Memory Manager (RMM). For a detailed introduction, see the NVIDIA Developer Blog.

The new device memory handler APIs allow users to bind a stream-ordered mempool to the library handle, such that cuTensorNet can take care of most of the memory management for users. Below is an illustration of what can be done:

MyMemPool pool = MyMemPool();  // kept alive for the entire process in real apps

int my_alloc(void* ctx, void** ptr, size_t size, cudaStream_t stream) {
  // assuming this is the memory allocation routine provided by my mempool
  return reinterpret_cast<MyMemPool*>(ctx)->alloc(ptr, size, stream);
}

int my_dealloc(void* ctx, void* ptr, size_t size, cudaStream_t stream) {
  // assuming this is the memory deallocation routine provided by my mempool
  return reinterpret_cast<MyMemPool*>(ctx)->dealloc(ptr, size, stream);
}

// create a mem handler and fill in the required members for the library to use
cutensornetDeviceMemHandler_t handler;
handler.ctx = reinterpret_cast<void*>(&pool);
handler.device_alloc = my_alloc;
handler.device_free = my_dealloc;
memcpy(handler.name, std::string("my pool").c_str(), CUTENSORNET_ALLOCATOR_NAME_LEN);

// bind the handler to the library handle
cutensornetSetDeviceMemHandler(handle, &handler);

/* ... perform the network creation & optimization as usual ... */

// create a workspace descriptor
cutensornetWorkspaceDescriptor_t workDesc;
// (this step is optional and workDesc can be set to NULL if one just wants
// to use the "recommended" workspace size)
cutensornetCreateWorkspaceDescriptor(handle, &workDesc);

// User doesn’t compute the required sizes

// User doesn’t query the workspace size (but one can if desired)

// User doesn’t allocate memory!

// User sets workspacePtr=NULL for the corresponding memory space (device, in this case) to indicate the library should
// draw memory (of the "recommended" size, if the workspace size is set to 0 as shown below) from the user's pool;
// if a nonzero size is set, we would use the given size instead of the recommended one.
// (this step is also optional if workDesc has been set to NULL)
cutensornetWorkspaceSetMemory(handle, workDesc, CUTENSORNET_MEMSPACE_DEVICE, CUTENSORNET_WORKSPACE_SCRATCH, NULL, 0);

// create a contraction plan
cutensornetContractionPlan_t plan;
cutensornetCreateContractionPlan(handle, descNet, optimizerInfo, workDesc, &plan);

// autotune the plan with the workspace
cutensornetContractionAutotune(handle, plan, rawDataIn, rawDataOut, workDesc, pref, stream);

// perform actual contraction with the workspace
for (int sliceId=0; sliceId<num_slices; sliceId++) {
    cutensornetContraction(
        handle, plan, rawDataIn, rawDataOut, workDesc, sliceId, stream);
}

// clean up
cutensornetDestroyContractionPlan(plan);
cutensornetDestroyWorkspaceDescriptor(workDesc);  // optional if workDesc has been set to NULL
// User doesn’t deallocate memory!

As shown above, several calls to the workspace-related APIs can be skipped. Moreover, allowing the library to share your memory pool not only can alleviate potential memory conflicts, but also enable possible optimizations.

Note

In the current release, only a device mempool can be bound.

cutensornetSetDeviceMemHandler

cutensornetStatus_t cutensornetSetDeviceMemHandler(cutensornetHandle_t handle, const cutensornetDeviceMemHandler_t *devMemHandler)

Set the current device memory handler.

Once set, when cuTensorNet needs device memory in various API calls it will allocate from the user-provided memory pool and deallocate at completion. See cutensornetDeviceMemHandler_t and APIs that require cutensornetWorkspaceDescriptor_t for further detail.

The internal stream order is established using the user-provided stream passed to cutensornetContractionAutotune() and cutensornetContraction().

Warning

It is undefined behavior for the following scenarios:

  • the library handle is bound to a memory handler and subsequently to another handler

  • the library handle outlives the attached memory pool

  • the memory pool is not stream-ordered

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • devMemHandler[in] the device memory handler that encapsulates the user’s mempool. The struct content is copied internally.


cutensornetGetDeviceMemHandler

cutensornetStatus_t cutensornetGetDeviceMemHandler(const cutensornetHandle_t handle, cutensornetDeviceMemHandler_t *devMemHandler)

Get the current device memory handler.

Parameters
  • handle[in] Opaque handle holding cuTensorNet’s library context.

  • devMemHandler[out] If previously set, the struct pointed to by handler is filled in, otherwise CUTENSORNET_STATUS_NO_DEVICE_ALLOCATOR is returned.

Error Management API

cutensornetGetErrorString

const char *cutensornetGetErrorString(cutensornetStatus_t error)

Returns the description string for an error code.

Remark

non-blocking, no reentrant, and thread-safe

Parameters

error[in] Error code to convert to string.

Returns

the error string

Logger API

cutensornetLoggerSetCallback

cutensornetStatus_t cutensornetLoggerSetCallback(cutensornetLoggerCallback_t callback)

This function sets the logging callback routine.

Parameters

callback[in] Pointer to a callback function. Check cutensornetLoggerCallback_t.


cutensornetLoggerSetCallbackData

cutensornetStatus_t cutensornetLoggerSetCallbackData(cutensornetLoggerCallbackData_t callback, void *userData)

This function sets the logging callback routine, along with user data.

Parameters
  • callback[in] Pointer to a callback function. Check cutensornetLoggerCallbackData_t.

  • userData[in] Pointer to user-provided data to be used by the callback.


cutensornetLoggerSetFile

cutensornetStatus_t cutensornetLoggerSetFile(FILE *file)

This function sets the logging output file.

Parameters

file[in] An open file with write permission.


cutensornetLoggerOpenFile

cutensornetStatus_t cutensornetLoggerOpenFile(const char *logFile)

This function opens a logging output file in the given path.

Parameters

logFile[in] Path to the logging output file.


cutensornetLoggerSetLevel

cutensornetStatus_t cutensornetLoggerSetLevel(int32_t level)

This function sets the value of the logging level.

Parameters

level[in] Log level, should be one of the following:

Level

Summary

Long Description

“0”

Off

logging is disabled (default)

“1”

Errors

only errors will be logged

“2”

Performance Trace

API calls that launch CUDA kernels will log their parameters and important information

“3”

Performance Hints

hints that can potentially improve the application’s performance

“4”

Heuristics Trace

provides general information about the library execution, may contain details about heuristic status

“5”

API Trace

API Trace - API calls will log their parameter and important information


cutensornetLoggerSetMask

cutensornetStatus_t cutensornetLoggerSetMask(int32_t mask)

This function sets the value of the log mask.

Refer to cutensornetLoggerSetLevel() for details.

Parameters

mask[in] Value of the logging mask. Masks are defined as a combination (bitwise OR) of the following masks:

Level

Description

“0”

Off

“1”

Errors

“2”

Performance Trace

“4”

Performance Hints

“8”

Heuristics Trace

“16”

API Trace


cutensornetLoggerForceDisable

cutensornetStatus_t cutensornetLoggerForceDisable()

This function disables logging for the entire run.

Versioning API

cutensornetGetVersion

size_t cutensornetGetVersion()

Returns Version number of the cuTensorNet library.


cutensornetGetCudartVersion

size_t cutensornetGetCudartVersion()

Returns version number of the CUDA runtime that cuTensorNet was compiled against.

Can be compared against the CUDA runtime version from cudaRuntimeGetVersion().