Release Notes¶

cuQuantum Python v23.06.0¶

Add new APIs and functionalities:
- For low-level APIs, please refer to the release notes of cuStateVec v1.4.0 and cuTensorNet v2.2.0.
  - Complex-valued gradients, as returned by the experimental API cuquantum.cutensornet.compute_gradients_backward(), differ by a complex conjugate from PyTorch’s convention.
- New attribute cuquantum.cutensornet.tensor.SVDMethod.algorithm that allows users to choose between various SVD algorithms including "gesvd", "gesvdj", "gesvdr" and "gesvdp". For "gesvdj" and "gesvdr", users may also provide customized settings (e.g, tolerance for "gesvdj" algorithm).
- New attribute cuquantum.cutensornet.tensor.SVDInfo.algorithm that describes the SVD algorithm used in the SVD computation. For "gesvdj" and "gesvdp", users may also access execution information (e.g, residual for "gesvdj" algorithm).
Bugs fixed:
- Fix a bug for the auto blocking option for cuquantum.cutensornet.tensor.decompose() and cuquantum.cutensornet.experimental.contract_decompose().
- Fix a bug for cuquantum.CircuitToEinsum to account for the potential global phase when parsing qiskit.QuantumCircuit.
- Fix a bug for cuquantum.CircuitToEinsum to parse custom gates given qiskit.QuantumCircuit.
Other changes:
- Improved the Jupyter notebook for MPS demo by including the density-matrix based MPS-MPO contraction algorithm.
- Avoid using any whitespace unicode characters as TN symbols in cuquantum.CircuitToEinsum.
- The path finding algorithm cuquantum.PathFinderOptions now takes advantage of the new smart option to limit the pathfinder elapsed time (see CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SMART_OPTION for details). This change also applies to public APIs including cuquantum.contract(), cuquantum.contract_path() and cuquantum.OptimizerOptions.
- When running the hyper-optimizer to compute the optimal contraction path with cuquantum.contract_path() & co, which could be long-running depending on the problem size, it is now possible to abort via Ctrl-C.
- Conda packages compatible with CUDA 12 are available on conda-forge. Users can specify the target CUDA version using the new cuda-version metapackage if needed. For example, conda install -c conda-forge cuquantum-python cuda-version=12.0 or conda install -c conda-forge cuquantum-python cuda-version=11.8. This support has been backported to cuQuantum Python 23.03.
- Improve pip dependency management: When installing cuquantum-python or cuquantum-python-cu12 via pip, we will attempt to infer the compatible CuPy wheel and install it (with caveats noted below in the 23.03 release). The exception is cuquantum-python-cu11, for which we require users to explicitly pick from cupy-cuda110, cupy-cuda111, or cupy-cuda11x and install it, following CuPy’s installation guide.
  - If installing the meta-package cuquantum-python with pip 23.1+, passing --no-cache-dir to pip is required.

Compatibility notes:

cuQuantum Python now requires Python 3.9+
cuQuantum Python now requires NumPy v1.21+
cuQuantum Python now requires CuPy v10+

Known issues:

Under single precision, when the input tensor/matrix has a low rank, "gesvdr" based tensor SVD may suffer from reduced accuracy.
When "gesvdp" algorithm is used for tensor SVD, user is responsible for checking cuquantum.cutensornet.tensor.SVDInfo.gesvdp_err_sigma to monitor the convergence.

cuQuantum Python v23.03.0¶

Add new APIs and functionalities:
- For low-level APIs, please refer to the release notes of cuStateVec v1.3.0 and cuTensorNet v2.1.0.
- A new module cuquantum.cutensornet.tensor that supports tensor decomposition routines via cuquantum.cutensornet.tensor.decompose() and cuquantum.cutensornet.tensor.DecompositionOptions. The new module is also directly accessible via the cuquantum namespace. The following decomposition methods are supported:
  - QR decomposition via cuquantum.cutensornet.tensor.QRMethod.
  - Exact and approximate singular value decomposition (SVD) via cuquantum.cutensornet.tensor.SVDMethod. For approximate SVD, run-time information on truncation is stored in and accessible via cuquantum.cutensornet.tensor.SVDInfo.
- A new module cuquantum.cutensornet.experimental with experimental APIs, including cuquantum.cutensornet.experimental.contract_decompose(), cuquantum.cutensornet.experimental.ContractDecomposeAlgorithm and cuquantum.cutensornet.experimental.ContractDecomposeInfo. These experimental APIs can be used to perform compound contraction and decomposition operations. Note that all new experimental APIs may be subject to change in a future release. Kindly share your feedback with us on NVIDIA/cuQuantum GitHub Discussions!
- A new attribute cuquantum.CircuitToEinsum.gates is added to allow users to access gate operands from cuquantum.CircuitToEinsum.
API changes:
- The fixed kwarg support in cuquantum.CircuitToEinsum.state_vector() is removed. The same functionality can be achieved via the same fixed kwarg in cuquantum.CircuitToEinsum.batched_amplitudes().
Bugs fixed:
- The output mode labels were not lexicographically ordered when the Einstein summation expression is provided in implicit form (this is a regression from cuQuantum Python v0.1.0.1).
- Fix the parallel contraction failure when using MPICH (NVIDIA/cuQuantum#31).
Other changes:
- cuQuantum Python now supports Python 3.11.
- cuQuantum Python now supports CUDA 12.
- A set of new wheels with suffix -cu12 are released on PyPI.org for CUDA 12 users.
  - Example: pip install cuquantum-python-cu12 cupy-cuda12x for setting up a wheel-based environment compatible with CUDA 12
  - The existing cuquantum and cuquantum-python wheels (without the -cuXX suffix) are turned into automated installers that will attempt to detect the current CUDA environment and install the appropriate wheels. Please note that this automated detection may encounter conditions under which detection is unsuccessful, especially in a CPU-only environment (such as CI/CD). If detection fails we assume that the target environment is CUDA 11 and proceed. This assumption may be changed in a future release, and in such cases we recommend that users explicitly (manually) install the correct wheels.
- For conda packages, currently CUDA 12 support is pending the NVIDIA-led community effort (conda-forge/staged-recipes#21382). Once conda-forge supports CUDA 12 we will make compatible conda packages available.
- CUDA Lazy Loading is supported. This can significantly reduce memory footprint by deferring the loading of needed GPU kernels to the first call sites. This feature requires CUDA 11.8 (or above) and cuTENSOR 1.7.0 (or above). Please refer to the CUDA documentation for other requirements and details. Currently this feature requires users to opt in by setting the environment variable CUDA_MODULE_LOADING=LAZY. In a future CUDA version, lazy loading may become the default.
  - If you’re a wheel user, update your environment with pip install "cutensor-cuXX>=1.7" (XX = 11 or 12).
  - If you’re a conda user, update your environment with conda install -c conda-forge "cudatoolkit>=11.8" "cutensor>=1.7" (for CUDA 11).
- Our support policy is clarified, see Compatibility policy.

Compatibility notes:

cuQuantum Python requires Python 3.8+
- In the next release, Python 3.8 will be dropped to follow NEP-29. (This refers to the pre-built wheels on PyPI.org and the Conda packages on conda-forge. If you have any needs for pre-built support, please reach out to us on GitHub. Alternatively, you may build from source, although we might not guarantee indefinite support for source compatibility.)
cuQuantum Python requires NumPy v1.19+
- In the next release, NumPy 1.19 & 1.20 will be dropped to follow NEP-29.
cuQuantum Python requires CuPy v9.5+
- In the next release, CuPy v9 will be dropped to be consistent with NEP-29.

cuQuantum Python v22.11.0.1¶

This is a hot-fix release addressing a few issues in cuQuantum Python.

Bugs fixed:
- Fix performance degradation in cuquantum.contract() that could impact certain usage patterns.
- Fix the .save_statevector() usage in the Jupyter notebook qiskit_basic.ipynb.
- Remove invalid code.

cuQuantum Python v22.11.0¶

We are on NVIDIA/cuQuantum GitHub Discussions! For any questions regarding (or to share any exciting work built upon) cuQuantum, please feel free to reach out to us on GitHub Discussions.
- Bug reports should still go to our GitHub issue tracker.
Add new APIs and functionalities:
- For cuTensorNet low-level APIs, please refer to the release notes of cuTensorNet v2.0.0; no new cuStateVec API is added.
- A new API, cuquantum.CircuitToEinsum.batched_amplitudes() to compute the amplitudes for a batch of qubits. This is equivalent to the kwargs fixed support in cuquantum.CircuitToEinsum.state_vector(), which is deprecated and will be removed in a future release.
- A new API, cuquantum.CircuitToEinsum.expectation() to support expectation value computation for Pauli strings.
- A new helper API, cuquantum.cutensornet.get_mpi_comm_pointer() to get the pointer to and size of MPI communicator for the new low-level API cuquantum.cutensornet.distributed_reset_configuration() that enables distributed parallelism. This capability requires mpi4py.
- The cuquantum module now has a command line interface to return the include and library paths and the linker flags for the cuTENSOR and cuQuantum libraries: python -m cuquantum. See Command Line Support for detail.
- Support for controlling non-blocking behavior via the new option cuquantum.NetworkOptions.blocking.
API changes:
- For cuTensorNet low-level APIs, please refer to the release notes of cuTensorNet v2.0.0; cuStateVec low-level APIs remain unchanged.
  - Users can set the tensor qualifiers by using the dedicated NumPy dtype cuquantum.cutensornet.tensor_qualifiers_dtype. See the Python sample (python/samples/cutensornet/coarse/example21.py) for details. For example, complex conjugation can be done on-the-fly, reducing memory pressure.
  - To get/set the contraction path or slicing configurations, users should use contraction_optimizer_info_get_attribute_dtype() to get a NumPy custom dtype representing the path or slicing configuration object, in a manner consistent with the method used for all other attributes. Refer to the docstring for details. The (experimental) ContractionPath object is removed.
Functionality/performance improvements:
- Improved performance when contracting two tensors using cuquantum.contract() or related APIs.
- Improved performance for reusing a Network object with reset_operands().
- The lightcone construction in CircuitToEinsum is improved to further reduce the number of tensors in the network.
- The build system now supports PEP-517 and standard pip command-line flags. The environment variable CUQUANTUM_IGNORE_SOLVER is no longer used. See Getting Started for more information.
Bugs fixed:
- Fix a potential multi-device bug in the internal device context switch.
- Fix a bug for using invalid mode labels in cuquantum.CircuitToEinsum when input circuit size gets large.
Other changes:
- Provide one more distributed (MPI+NCCL) Python sample (example4_mpi_nccl.py) to show how to use cuTensorNet and create parallelism.
- The test infrastructure will show tests that are not runnable as “deselected” instead of “skipped”.
- A new pip wheel is released on PyPI: pip install cuquantum-python-cu11. Users can still install cuQuantum Python via pip install cuquantum-python, as before. cuquantum-python now becomes a meta-wheel pointing to cuquantum-python-cu11. This may change in a future release when a new CUDA version becomes available. Using wheels with the -cuXX suffix is encouraged.

Compatibility notes:

cuQuantum Python now requires cuStateVec 1.1.0 or above.
cuQuantum Python now requires cuTensorNet 2.0.0 or above.
cuQuantum Python now requires cuTENSOR 1.6.1 or above.

cuQuantum Python v22.07.1¶

Bugs fixed:
- The 22.07.0 cuquantum wheel had a wrong file layout. (If you are using the cuquantum 22.07.0.1 or 22.07.0.2 hot-fix wheel, they will work fine.)

cuQuantum Python v22.07.0¶

Add new APIs and functionalities:
- For low-level APIs, please refer to the release notes of cuStateVec v1.1.0 and cuTensorNet v1.1.0.
- New high-level API cuquantum.CircuitToEinsum that supports conversion of qiskit.QuantumCircuit and cirq.Circuit to tensor network contraction:
  - Support state coefficient
  - Support bitstring amplitude
  - Support reduced density matrix
  - Backend support on NumPy, CuPy and PyTorch
- Add a keyword-only argument slices to the cuquantum.Network.contract() method to support contracting an arbitrary subset of the slices.
- Add a new attribute intermediate_modes to the cuquantum.OptimizerInfo object for retrieving the mode labels of all intermediate tensors.
- Add a new attribute num_slices to the cuquantum.OptimizerInfo object for querying the total number of slices.
Functionality/performance improvements:
- Improve the einsum expression parser.
Bugs fixed:
- An exception mistakenly raised in cuquantum.einsum() when optimize is set to False.
- Missing f-specifier in the string representation of cuquantum.OptimizerInfo.
Other changes:
- Drop the dependency on typing_extensions.
- Provide distributed (MPI-based) Python samples that show how easy it is to use cuTensorNet and create parallelism. mpi4py is required for running these samples.
- Update the low-level, non-distributed sample tensornet_example.py by improving memory usage and switching to the new contraction API contract_slices().
- Provide Jupyter notebooks to show how to convert a quantum circuit to a tensor network contraction.
- Add a Python sample to illustrate the usage of the new multi-device bit-swapping multi_device_swap_index_bits() API.
- Restructure the samples folder to separate cuStateVec and cuTensorNet samples.

Compatibility notes:

cuQuantum Python now requires cuQuantum v22.07.
cuQuantum Python now requires Python 3.8+.
cuQuantum Python now requires NumPy v1.19+.
cuQuantum Python supports Cirq v0.6.0+.
cuQuantum Python supports Qiskit v0.24.0+.

cuQuantum Python v22.05.0¶

Bugs fixed:
- Make typing_extensions a required dependency (NVIDIA/cuQuantum#3)
- Fix issues in the test suite
Other changes:
- The Python sample (python/samples/tensornet_example.py) is updated to include a correctness check

cuQuantum Python v22.03.0¶

Stable release:
- Starting this release, cuQuantum Python switches to the CalVer versioning scheme, following cuQuantum SDK
- pip wheels are released on PyPI: pip install cuquantum-python
Functionality/performance improvements:
- High-level tensor network APIs are now fully NumPy compliant:
  - Support generalized einsum expressions
  - Support ellipsis
  - Support broadcasting
Add new APIs and functionalities for:
- For low-level APIs, please refer to the release notes of cuStateVec v1.0.0 and cuTensorNet v1.0.0.
- The high-level APIs support an EMM-like memory plugin interface (see Memory management).
API changes:
- For low-level APIs, please refer to the release notes of cuStateVec v1.0.0 and cuTensorNet v1.0.0.
- No API breaking changes for the high-level APIs.

Compatibility notes:

cuQuantum Python requires cuQuantum v22.03
cuQuantum Python requires Python 3.7+
- In the next release, Python 3.7 will be dropped to follow NEP-29.
cuQuantum Python requires NumPy v1.17+
- In the next release, NumPy 1.17 & 1.18 will be dropped to follow NEP-29.
cuQuantum Python requires CuPy v9.5+
cuQuantum Python supports PyTorch v1.10+

Known issues:

If you install cuQuantum Python from PyPI (pip install cuquantum-python), make sure you also install typing_extensions (via pip or conda). This only affects the wheel installation and will be fixed in the next release. (NVIDIA/cuQuantum#3)

cuQuantum Python v0.1.0.1¶

Patch release:
- Add a __version__ string

cuQuantum Python v0.1.0.0¶

Initial release (beta 2)

Compatibility notes:

cuQuantum Python requires cuQuantum v0.1.0
cuQuantum Python requires NumPy v1.17+
cuQuantum Python requires CuPy v9.5+
cuQuantum Python supports PyTorch v1.10+

Limitation notes:

In certain environments, if PyTorch is installed import cuquantum could fail (with a segmentation fault). It is currently under investigation and a temporary workaround is to import torch before importing cuquantum.