Release Notes¶

cuTensorNet v1.0.1¶

Bugs fixed:
- Fix a workspace pointer alignment issue.
- Fix a potential path optimizer issue to avoid returning CUTENSORNET_STATUS_NOT_SUPPORTED by performing more slicing.
Performance enhancements:
- This release improved the support for generalized einsum expression to provide a better contraction path.
Other changes:
- The Overview and Getting Started pages are significantly improved!
- Clarify in the documentation and sample that the contraction over slices needs to be done in ascending order, and that when parallelizing over the slices the output tensor should be zero-initialized.
- Clarify in the documentation that the returned FLOP count assumes real-valued inputs.
- Several issues in the C++ sample (samples/cutensornet/tensornet_example.cu) are fixed.

Functionality/performance improvements:
- Greatly reduced the workspace memory size required.
- Reduced the execution time of the pathfinder with multithreading and internal optimization.
- Support for hyperedges in tensor networks.
- Support for tensor networks described by generalized Einstein summation expressions.
Add new APIs and functionalities for:
- Managing workspace (see Workspace Management API for details).
- Binding a user-provided, stream-ordered memory pool to the library (see Memory Management API for details).
- Query of the output tensor details (see cutensornetGetOutputTensorDetails()).
- Set the number of threads for the hyperoptimizer (see Hyper-optimizer for details).
- Setting a logger callback with user-provided data (see cutensornetLoggerSetCallbackData()).
API changes:
- Replaced cutensornetContractionGetWorkspaceSize with cutensornetWorkspaceComputeSizes().
- cutensornetCreateContractionPlan(), cutensornetContractionAutotune(), and cutensornetContraction() receive a workspace descriptor instead of workspace pointer and size params.
- Renamed cutensornetGraphAlgo_t and cutensornetMemoryModel_t enumerations’ options.

Compatibility notes:

Limitation notes:

If multiple slices are created, the order of contracting over slices using cutensornetContraction() should be ascending starting from slice 0. If parallelizing over slices manually (in any fashion: streams, devices, processes, etc.), please make sure the output tensors (that are subject to a global reduction) are zero-initialized.

Initial public release
Add support for Linux ppc64le
Add new APIs and functionalities for:
- Fine-tuning the slicing algorithm
- Reconfiguring a tensor network
- Simplifying a tensor network
- Optimizing pathfinder parameters using the hyperoptimizer
- Retrieving the optimizer config parameters
API changes:
- cutensornetContractionGetWorkspace is renamed to cutensornetContractionGetWorkspaceSize
- cutensornetContractionAutotune()’s function signature has changed

Compatibility notes:

Compatibility notes:

Limitation notes: