cuStateVec Ex: State Vector#

cuStateVec Ex API defines custatevecExStateVectorDescriptor_t to represent state vector object that provides comprehensive control over GPU memory and computational resources. Unlike the direct device pointer approach of cuStateVec, state vectors are represented as descriptor objects that encapsulate all necessary metadata and resource information.

This section describes the key operations of state vector object in cuStateVec Ex API.

State Vector Instantiation#

There are three models of state vector distribution. In this section, the state vector instantiation will be reviewed for all distribution models. For detailed information about state vector distribution models, see State Vector Distribution.

Single process#

The first step to instantiate a state vector instance is to create a state vector configuration by using one of the following two APIs:

The created configuration object is passed to custatevecExStateVectorCreateSingleProcess() to instantiate the state vector. The configuration dictionary object needs to be deleted after the state vector creation.

// Single-device example
custatevecExDictionaryDescriptor_t svConfig;
custatevecExConfigureStateVectorSingleDevice(&svConfig, dataType, numWires, numWires, 0, 0);

// Multi-device P2P example
// numWires = log2(numDevices) + numDeviceWires
int deviceIds[] = {0, 1, 2, 3};  // Specify device IDs
int numDevices = 4;
int numDeviceWires = 28;  // Local wires per device
int numWires = 30;  // log2(4) + 28 = 2 + 28 = 30 total wires
custatevecExConfigureStateVectorMultiDevice(&svConfig, dataType, numWires, numDeviceWires,
                                            deviceIds, numDevices,
                                            CUSTATEVEC_DEVICE_NETWORK_TYPE_SWITCH, 0);

custatevecExStateVectorDescriptor_t stateVector;
custatevecExStateVectorCreateSingleProcess(&stateVector, svConfig, nullptr, 0, nullptr);

// Clean up configuration after use
custatevecExDictionaryDestroy(svConfig);

Note

For multi-device state vectors, ensure that all specified GPUs are of the same generation and that GPUDirect P2P communication is available among all devices.

Multi process#

Multi-process state vector requires communicactor object that wraps inter-process-communication (IPC) library. The first step is to initialize the IPC library by calling custatevecExCommunicatorInitialize(). After initialization, create a communicator instance using custatevecExCommunicatorCreate().

Note

For detailed information about communicators, state vector descriptor, and operations, see Communicator.

The next step to instantiate a state vector instance is to create a state vector configuration by using custatevecExConfigureStateVectorMultiProcess(). This configuration defines the hierarchical network structure and memory sharing methods.

The created configuration object is passed to custatevecExStateVectorCreateMultiProcess() along with the communicator to instantiate the distributed state vector. The configuration dictionary object needs to be deleted after the state vector creation.

// Initialize MPI communicator
custatevecExCommunicatorStatus_t commStatus;
custatevecExCommunicatorInitialize(CUSTATEVEC_COMMUNICATOR_TYPE_OPENMPI, nullptr,
                                   &argc, &argv, &commStatus);

custatevecExCommunicatorDescriptor_t exCommunicator;
custatevecExCommunicatorCreate(&exCommunicator);

// Configure multi-process state vector
custatevecExDictionaryDescriptor_t svConfig;
custatevecExConfigureStateVectorMultiProcess(&svConfig, dataType, numWires, numDeviceWires,
                                             -1, CUSTATEVEC_EX_MEMORY_SHARING_METHOD_AUTODETECT,
                                             globalIndexBitClasses, numGlobalIndexBitsPerLayer,
                                             numLayers, transferWorkspace, nullptr, 0);

// Create multi-process state vector
custatevecExStateVectorDescriptor_t stateVector;
custatevecExStateVectorCreateMultiProcess(&stateVector, svConfig, stream, exCommunicator, nullptr);

// Clean up configuration after use
custatevecExDictionaryDestroy(svConfig);

Freeing state vector#

To properly release resources, destroy the state vector using custatevecExStateVectorDestroy(). For multi-process configurations, also destroy the communicator and finalize the IPC library.

// Destroy state vector
custatevecExStateVectorDestroy(stateVector);

// For multi-process: clean up communicator
if (isMultiProcess) {
    custatevecExCommunicatorDestroy(exCommunicator);
    custatevecExCommunicatorFinalize(&commStatus);
}

Synchronization#

The cuStateVec Ex API executes operations asynchronously on CUDA streams associated with state vector object. Also, there can be pending operations in state vector object. In order to synchronize all operations on a state vector, use custatevecExStateVectorSynchronize():

// Synchronize before reading results on host
custatevecExStateVectorSynchronize(stateVector);

This function synchronizes all CUDA calls and pending CPU operations with the state vector. For multi-process state vectors, custatevecExStateVectorSynchronize() also calls barrier to synchronize all processes.

For fine-grained synchronization control, use the interoperability APIs (custatevecExStateVectorGetResourcesFromDeviceSubSV() or custatevecExStateVectorGetResourcesFromDeviceSubSVView()) to access individual CUDA streams for each device sub-state vector.

Wire ordering management#

The state vector object manages its wire ordering (tensor mode ordering), which determines how qubits are mapped to index bits in the state vector array.

Getting wire ordering#

To retrieve the current wire ordering, use custatevecExStateVectorGetProperty() with the CUSTATEVEC_EX_SV_PROP_WIRE_ORDERING property:

int32_t numWires;
custatevecExStateVectorGetProperty(stateVector,
                                   CUSTATEVEC_EX_SV_PROP_NUM_WIRES,
                                   &numWires, sizeof(int32_t));
std::vector<int32_t> wireOrdering(numWires);
custatevecExStateVectorGetProperty(stateVector,
                                   CUSTATEVEC_EX_SV_PROP_WIRE_ORDERING,
                                   wireOrdering.data(), sizeof(int32_t) * numWires);

The wire ordering is an array where wireOrdering[i] specifies the wire mapped to index bit i. The first wire in the wire ordering is the least significant bit (LSB) of the index bits.

Permuting wire ordering#

To permute the wires (reorder the tensor modes), use custatevecExStateVectorPermuteIndexBits(). This API physically rearranges the state vector elements to match the new wire ordering:

// Define new permutation: {1, 2, 0} means wire 0 → index bit 1, wire 1 → index bit 2, wire 2 → index bit 0
int32_t permutation[] = {1, 2, 0};
custatevecExStateVectorPermuteIndexBits(stateVector, permutation, numWires,
                                        CUSTATEVEC_EX_PERMUTATION_SCATTER);

The API supports two permutation types:

../../_images/indexBitPermutation.png

Figure. Index bit permutation example using scatter permutation {1, 2, 0} on a 3-wire system. The left side shows the wire ordering transformation, while the right side illustrates how state vector elements are rearranged. The first wire in the wire ordering is the LSB of the index bits.#

custatevecExStateVectorPermuteIndexBits() is utilized to arrange data layout for specific computational patterns and is automatically managed during distributed operations.

Reverting wire ordering#

In cuStateVec Ex API, the wire ordering can be modified during operations, and the data layout is appropriately rearranged. To revert the wire ordering back to the default sequential order (identity permutation), retrieve the current wire ordering and use it as a scatter permutation:

// Get current wire ordering
int32_t numWires;
custatevecExStateVectorGetProperty(stateVector,
                                   CUSTATEVEC_EX_SV_PROP_NUM_WIRES,
                                   &numWires, sizeof(int32_t));
std::vector<int32_t> wireOrdering(numWires);
custatevecExStateVectorGetProperty(stateVector,
                                   CUSTATEVEC_EX_SV_PROP_WIRE_ORDERING,
                                   wireOrdering.data(), sizeof(int32_t) * numWires);

// Revert to default wire ordering using scatter permutation
custatevecExStateVectorPermuteIndexBits(stateVector, wireOrdering.data(), numWires,
                                        CUSTATEVEC_EX_PERMUTATION_SCATTER);

After this operation, the wire ordering becomes the identity permutation {0, 1, 2, …, numWires-1}, and the state vector elements are rearranged to match the default layout.

Getting and setting state#

The cuStateVec Ex API provides functions to transfer state vector elements between host and device memory.

Getting state from device#

To retrieve the state vector data from the device to host memory, use custatevecExStateVectorGetState():

// Allocate host memory for state vector
size_t svSize = 1ULL << numWires;  // 2^numWires
std::vector<cuDoubleComplex> hostStateVector(svSize);

// Copy state vector from device to host
custatevecExStateVectorGetState(stateVector, hostStateVector.data(), CUDA_C_64F,
                                0, svSize, 1);

The memory layout follows the wire ordering of the state vector. If one wants to copy the elements in the default layout, the wire ordering needs to be reverted to the default order before calling this function.

The parameters specify the data type, range of elements to copy, and the max number of concurrent copies. For distributed state vectors, custatevecExStateVectorGetState() retrieves only the local sub-state vector data for the current device/process.

Setting state on device#

To initialize or update the state vector on the device with data from host memory, use custatevecExStateVectorSetState():

// Prepare state vector data on host
std::vector<cuDoubleComplex> hostStateVector(svSize);
// ... initialize hostStateVector with desired quantum state ...

// Copy state vector from host to device
custatevecExStateVectorSetState(stateVector, hostStateVector.data(), CUDA_C_64F,
                                0, svSize, 1);

The parameters specify the data type, range of elements to set, and the max number of concurrent copies.

For distributed state vectors, custatevecExStateVectorSetState() sets only the local sub-state vector data for the current device/process. Each process must provide its corresponding partition of the full state vector.

The memory layout is not modified during the copy of elements from host to device. If the current wire ordering does not match the memory layout of the elements to be copied to state vector, use custatevecExStateVectorReassignWireOrdering() to update the wire ordering without touching the memory layout:

// External data uses wire ordering {2, 0, 1}
int32_t externalWireOrdering[] = {2, 0, 1};

// Reassign wire ordering to match the imported data layout
custatevecExStateVectorReassignWireOrdering(stateVector, externalWireOrdering, numWires);

This approach should be used when the imported data is already in a specific layout. After reassigning, you can use custatevecExStateVectorPermuteIndexBits() to physically rearrange the data to the desired wire ordering if needed.

Interoperability#

The cuStateVec Ex API provides interoperability functions to access the internal resources of state vectors, enabling integration with external libraries and custom CUDA kernels.

  • External library integration: Use cuBLAS for dot products, cuSPARSE for sparse operations, or other GPU libraries

  • Adding user-defined custom features: Implement specialized operations not available in the cuStateVec Ex API

Note

When accessing sub-state vectors directly, ensure proper synchronization using the provided CUDA stream. Operations on the state vector using cuStateVec Ex APIs and custom operations must be properly ordered.

Accessing device sub-state vector resources#

To access the internal GPU resources of a state vector for read-write operations, use custatevecExStateVectorGetResourcesFromDeviceSubSV():

// Get number of device sub-state vectors and their indices
int32_t numSubSVs;
custatevecExStateVectorGetProperty(stateVector,
                                   CUSTATEVEC_EX_SV_PROP_NUM_DEVICE_SUBSVS,
                                   &numSubSVs, sizeof(int32_t));

std::vector<int32_t> subSVIndices(numSubSVs);
custatevecExStateVectorGetProperty(stateVector,
                                   CUSTATEVEC_EX_SV_PROP_DEVICE_SUBSV_INDICES,
                                   subSVIndices.data(), sizeof(int32_t) * numSubSVs);

// Get resources for a specific device sub-state vector
int32_t deviceId;
void* devicePtr;
cudaStream_t stream;
custatevecHandle_t handle;
int32_t subSVIndex = subSVIndices[0];  // Access first sub-state vector

custatevecExStateVectorGetResourcesFromDeviceSubSV(stateVector, subSVIndex,
                                                    &deviceId, &devicePtr, &stream, &handle);

// Use devicePtr with external libraries (e.g., cuBLAS, cuSPARSE)
// or custom CUDA kernels

For single-device state vectors, there is only one sub-state vector (index 0). For distributed state vectors, each device/process manages one or more sub-state vectors, and the indices can be retrieved using custatevecExStateVectorGetProperty() with CUSTATEVEC_EX_SV_PROP_DEVICE_SUBSV_INDICES.

Read-only access to sub-state vectors#

For read-only operations that do not modify the state vector, use custatevecExStateVectorGetResourcesFromDeviceSubSVView() which returns a const pointer:

// Get read-only resources for a specific device sub-state vector
int32_t deviceId;
const void* devicePtr;
cudaStream_t stream;
custatevecHandle_t handle;

custatevecExStateVectorGetResourcesFromDeviceSubSVView(stateVector, subSVIndex,
                                                        &deviceId, &devicePtr, &stream, &handle);

// Use devicePtr for read-only operations (e.g., computing norms, inner products)

This function is identical to custatevecExStateVectorGetResourcesFromDeviceSubSV() but returns a const pointer for read-only operations.

Example: cuBLAS interoperability#

See the interoperability_dot.cpp sample for a complete example of using cuBLAS with distributed state vectors.