Release Notes

cuStateVec v1.6.0

Compatibility notes:

  • cuQuantum will drop support for RHEL 7 in the following cuQuantum release. Please plan ahead with this in mind. Thank you.

cuStateVec v1.5.0

cuStateVec v1.4.1

  • Resolve issues:

    • Fix an issue that custatevecApplyMatrix() is not asynchronously executed when applying 6 qubit gate matrices to a state vector of the double complex datatype.

    • Fix an issue that custatevecMeasureBatched() can fail on NVIDIA H100 with the “illegal instruction” error.

cuStateVec v1.4.0

Compatibility notes:

  • cuStateVec supports Ubuntu 20.04+.

cuStateVec v1.3.0

  • Add new API:

  • Improve performance/functionality:

    • Improved the performance of 5-qubit gate application with single precision and 6-qubit gate application with double precision in custatevecApplyMatrix() on the Hopper Architecture.

    • CUDA Lazy Loading is supported. This can significantly reduce memory footprint by deferring the loading of needed GPU kernels to the first call sites. This feature requires CUDA 11.8 (or above). Please refer to the CUDA documentation for other requirements and details. Currently this feature requires users to opt in by setting the environment variable CUDA_MODULE_LOADING=LAZY. In a future CUDA version, lazy loading may become the default.

  • Resolve issues:

  • Other changes:

    • Introduce support for CUDA 12.

    • A set of new wheels with suffix -cu12 are released on PyPI.org for CUDA 12 users.

      • Example: pip install custatevec-cu12 for installing cuStateVec compatible with CUDA 12

      • The existing cuquantum wheel (without the -cuXX suffix) is turned into an automated installer that will attempt to detect the current CUDA environment and install the appropriate wheels. Please note that this automated detection may encounter conditions under which detection is unsuccessful, especially in a CPU-only environment (such as CI/CD). If detection fails we assume that the target environment is CUDA 11 and proceed. This assumption may be changed in a future release, and in such cases we recommend that users explicitly (manually) install the correct wheels.

Compatibility notes:

  • cuStateVec requires CUDA 11.x or 12.x.

  • cuStateVec supports Ubuntu 18.04+

    • In the next release, Ubuntu 18.04 will be dropped. The minimum supported Ubuntu version will be 20.04.

cuStateVec v1.2.0

  • We are on NVIDIA/cuQuantum GitHub Discussions! For any questions regarding (or exciting works built upon) cuQuantum, please feel free to reach out to us on GitHub Discussions.

  • This release introduces support for the Hopper GPU family.

  • Improve performance/functionality:

  • Resolve issues:

  • Other changes:

    • A conda package is released on conda-forge: conda install -c conda-forge custatevec. Users can still obtain both cuStateVec and cuTensorNet with conda install -c conda-forge cuquantum, as before.

    • A pip wheel is released on PyPI: pip install custatevec-cu11. Users can still obtain both cuStateVec and cuTensorNet with pip install cuquantum, as before.

      • Currently, the cuquantum meta-wheel points to the cuquantum-cu11 meta-wheel (which then points to custatevec-cu11 and cutensornet-cu11 wheels). This may change in a future release when a new CUDA version becomes available. Using wheels with the -cuXX suffix is encouraged.

cuStateVec v1.1.0

Compatibility notes:

  • cuStateVec requires CUDA 11.x

Limitation notes:

  • custatevecMultiDeviceSwapIndexBits() could cause segmentation fault in case a device doesn’t have peer-to-peer (P2P) access to another one. When segmentation faults occur during the API call, please check if direct access between any pair of devices is enabled by cudaDeviceEnablePeerAccess.

  • custatevecMultiDeviceSwapIndexBits() could return CUSTATEVEC_STATUS_INVALID_VALUE if a handle created on the current device is not provided. Please refer to custatevecMultiDeviceSwapIndexBits() for the details.

  • CUSTATEVEC_STATUS_INTERNAL_ERROR might be returned if a wrong device pointer is passed to the functions. If a function returns CUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.

cuStateVec v1.0.0

API breaking changes:

Compatibility notes:

  • cuStateVec requires CUDA 11.x

Limitation notes:

  • CUSTATEVEC_STATUS_INTERNAL_ERROR might be returned if a wrong device pointer is passed to the functions. If a function returns CUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.

cuStateVec v0.1.1

  • Support for the NVIDIA cuQuantum Appliance (see here):

    • Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs

    • Optimized state vector element swap algorithm for multiple GPUs

    • Note: the multi-GPU features & optimizations are currently available only in the cuQuantum Appliance

cuStateVec v0.1.0

  • Add support for Linux ppc64le

  • Add new APIs:

    • Gate application for generalized permutation matrices

    • Expectation values of Pauli strings

    • Accessor to get/set state vector elements

Compatibility notes:

  • cuStateVec requires CUDA 11.4 or above

  • cuStateVec requires NVIDIA HPC SDK 21.11 or above

Limitation notes:

  • CUSTATEVEC_STATUS_INTERNAL_ERROR might be returned if a wrong device pointer is passed to the functions. If a function returns CUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.

cuStateVec v0.0.1

  • Initial release

  • Support Linux x86_64, Linux Arm64

  • Support Volta and Ampere architectures (compute capability 7.0+)

Compatibility notes:

  • cuStateVec requires CUDA 11.4 or above

  • cuStateVec requires NVIDIA HPC SDK 21.7 or above

Limitation notes:

  • This release is optimized for NVIDIA A100 and V100 GPUs.

  • CUSTATEVEC_STATUS_INTERNAL_ERROR might be returned if a wrong device pointer is passed to the functions. If a function returns CUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.

  • Performance optimization is planned in future releases.