Release Notes¶

cuStateVec v1.0.0¶

Improve performance/functionality:
- Gate application APIs are reoptimized:
  - custatevecApplyMatrix() reduced API execution latencies. Performance with small state vectors may improve for 1-4 qubits matrix application in single precision and 1-5 qubits matrix application in double precision, respectively.
  - custatevecApplyGeneralizedPermutationMatrix() reduced API execution latencies. Performance with small state vectors may improve for diagonal matrix cases.
Resolve issues:
- Multi-threading issues in custatevecApplyMatrix(), custatevecApplyGeneralizedPermutationMatrix(), and custatevecComputeExpectationsOnPauliBasis() are fixed. All the cuStateVec APIs in this version are thread safe as long as each host thread has its own cuStateVec handle.
Add new API:
- Binding a user-provided, stream-ordered memory pool to the library (see the introduction for Workspace and Memory Management API for detail).
- Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs (see custatevecBatchMeasureWithOffset(), custatevecSamplerGetSquaredNorm(), custatevecSamplerApplySubSVOffset())
- Optimized state vector element swap algorithm on single GPU (see custatevecSwapIndexBits())
- Testing whether a given matrix is Hermitian or unitary (see custatevecTestMatrixType())
- Setting a logger callback with user-provided data (see custatevecLoggerSetCallbackData())

API breaking changes:

The sampler and accessor descriptors are made completely opaque, just like the library handle custatevecHandle_t. For both descriptors there is a corresponding destructor API. Also, they are now passed by value in various routines. Now the C and Python APIs are unified.

Some APIs are renamed as follows:

previous version (< 1.0.0)	new version (= 1.0.0)
custatevecApplyMatrix_bufferSize	`custatevecApplyMatrixGetWorkspaceSize()`
custatevecApplyExp	`custatevecApplyPauliRotation()`
custatevecApplyGeneralizedPermutationMatrix_bufferSize	`custatevecApplyGeneralizedPermutationMatrixGetWorkspaceSize()`
custatevecExpectation_bufferSize	`custatevecComputeExpectationGetWorkspaceSize()`
custatevecExpectation	`custatevecComputeExpectation()`
custatevecExpectationsOnPauliBasis	`custatevecComputeExpectationsOnPauliBasis()`
custatevecSampler_create	`custatevecSamplerCreate()`
custatevecSampler_preprocess	`custatevecSamplerPreprocess()`
custatevecSampler_sample	`custatevecSamplerSample()`
custatevecAccessor_create	`custatevecAccessorCreate()`
custatevecAccessor_createReadOnly	`custatevecAccessorCreateView()`
custatevecAccessor_setExtraWorkspace	`custatevecAccessorSetExtraWorkspace()`
custatevecAccessor_set	`custatevecAccessorSet()`
custatevecAccessor_get	`custatevecAccessorGet()`

Compatibility notes:

Limitation notes:

CUSTATEVEC_STATUS_INTERNAL_ERROR might be returned if a wrong device pointer is passed to functions. If a function returns CUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.

Support for the NVIDIA cuQuantum Appliance (see here):
- Extensions for the batch-measure and sampler APIs to accept state vector partitions across multiple GPUs
- Optimized state vector element swap algorithm for multiple GPUs
- Note: the multi-GPU features & optimizations are currently available only in the cuQuantum Appliance

Add support for Linux ppc64le
Add new APIs:
- Gate application for generalized permutation matrices
- Expectation values of Pauli strings
- Accessor to get/set state vector elements

Compatibility notes:

Limitation notes:

CUSTATEVEC_STATUS_INTERNAL_ERROR might be returned if a wrong device pointer is passed to functions. If a function returns CUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.

Compatibility notes:

Limitation notes:

This release is optimized for NVIDIA A100 and V100 GPUs.
CUSTATEVEC_STATUS_INTERNAL_ERROR might be returned if a wrong device pointer is passed to functions. If a function returns CUSTATEVEC_STATUS_INTERNAL_ERROR, please check if a correct pointer is passed and the size is correctly specified.
Performance optimization is planned in future releases.