cuStateVec Ex: State Vector Migration#
The cuStateVec Ex API supports using host (CPU) memory as extended storage for state vectors, enabling simulations with more qubits than device memory alone can accommodate.
Overview#
When the number of wires exceeds what can fit in device memory, the state vector object allocates the excess portion on host memory. Operations such as gate application, measurement, and sampling work transparently on state vectors that span both device and host memory. The cuStateVec Ex API internally manages data migration between host and device.
In this release, host memory extension is supported for single-device state vector configurations.
Concepts#
Staged and unstaged sub state vectors#
Utilizing host memory is a configuration of state vector distribution. Placing portions of a state vector on host memory is considered as heterogeneous distribution, unlike multi-device and multi-process configurations where all sub state vectors are homogeneously placed on multiple devices.
For a single-device state vector with host memory, one sub state vector is placed on the device, and other sub state vectors are stored in host memory. During operations, sub state vectors are migrated between host and device, and operations are applied to all sub state vectors by staging each sub state vector onto device. The sub state vector on device is staged, and the sub state vectors on host are unstaged.
Using host memory is to place sub state vectors on host, and the state vector size is increased. To represent the added wires, migration wires (migration index bits) are introduced. The migration wires correspond to the index of sub state vectors migrating between host and device.
Adding migration wires expands the state vector. The state vector size increases by \(2^{numMigrationWires}\). A single-device state vector always has one sub state vector on device. Thus, \(2^{numMigrationWires} - 1\) sub state vectors are placed on host.
As an example, adding two migration wires creates four sub state vectors. One sub state vector is on device, and three sub state vectors are placed on host. In the current release, the maximum number of migration wires is 3. The maximum state vector size is 8x, and the required host memory capacity is to hold seven sub state vectors.
State vector with host memory (2 migration wires). SubSV0 is staged on device, and SubSV1-3 are unstaged on host.#
A staged sub state vector resides on device memory. It is directly accessible for GPU computation. Device sub state vectors are not sliced.
An unstaged sub state vector resides on host memory. Unstaged sub state vector slices are migrated to device as needed during operations.
Migration wires are wires (index bits) of the sub state vector that migrate between host and device. The number of migration wires determines the number of sub state vectors placed on host memory.
To enable host memory, specify numWires greater than numDeviceWires when
configuring a single-device state vector:
custatevecExDictionaryDescriptor_t svConfig;
int32_t numWires = 34; // total qubits
int32_t numDeviceWires = 33; // qubits fitting in device memory
// numWires - numDeviceWires = 1 migration wire
custatevecExConfigureStateVectorSingleDevice(
&svConfig, CUDA_C_64F, numWires, numDeviceWires, 0, 0);
custatevecExStateVectorDescriptor_t stateVector;
custatevecExStateVectorCreateSingleProcess(
&stateVector, svConfig, nullptr, 0, nullptr);
custatevecExDictionaryDestroy(svConfig);
The number of migration wires is defined by the following relationship:
numMigrationWires = numWires - numDeviceWires
During operations, sub state vectors are staged onto device one by one, and operations are applied to the staged sub state vector on device. After operations are applied, the next sub state vector is staged by swapping with the current one. A single operation completes when all sub state vectors are staged and operations are applied.
Migration of sub state vectors between host and device#
Unstaged sub state vector slices#
An unstaged sub state vector is sliced into sub state vector slices. By slicing, the staging operation can swap portions of a device sub state vector and unstaged sub state vector slices to stage a sub state vector onto device.
The number of sub state vector slices per sub state vector is \(2^{numMigrationWires}\). The staged sub state vector on device can also be viewed as consisting of sub state vector slices, though it is stored as a single contiguous memory block.
For a single-device state vector, the migration bits define the sub state vector index. Each slice has slice index bits that define the slice ordinal in a sub state vector. Each sub state vector slice is identified by a combination of the migration bits and the slice index bits.
When staging SubSV1 onto device, slices are swapped between host and device as shown in the figure below (a). After the swap, SubSV1 is staged on device and SubSV0 moves to host.
Since each slice has slice index bits, wires on the migration bit and the slice index bit can be swapped during staging as shown in (b). This operation sequence moves the migration wire into the staged sub state vector on device.
For detailed descriptions of migration algorithms, see State Vector Migration API.
Sub state vector slices and staging operations (1 migration wire). (a) Staging SubSV1 by swapping slices. (b) Swapping migration wire and slice index bit.#
Applying custom operations to state vectors with host memory#
Manual staging of a sub state vector#
To explicitly migrate an unstaged sub state vector to device memory, use
custatevecExStateVectorStageSubSV(). If the specified sub state vector is already on device, this API keeps it in place.
This API works with state vectors without host memory as well.
The following example queries all sub state vector indices, stages each one onto device, and applies a custom operation using the device resources:
// Query the number of sub state vectors
int32_t numSubSVs;
custatevecExStateVectorGetProperty(
stateVector, CUSTATEVEC_EX_SV_PROP_NUM_SUBSVS,
&numSubSVs, sizeof(int32_t));
// Query the list of sub state vector indices
std::vector<int32_t> subSVIndices(numSubSVs);
custatevecExStateVectorGetProperty(
stateVector, CUSTATEVEC_EX_SV_PROP_SUBSV_INDICES,
subSVIndices.data(), sizeof(int32_t) * numSubSVs);
// Stage each sub state vector and apply custom operations
for (auto subSVIndex : subSVIndices) {
custatevecExStateVectorStageSubSV(stateVector, subSVIndex);
// Access the resources of the staged device sub state vector
int32_t deviceId;
void* d_subSV;
cudaStream_t stream;
custatevecHandle_t handle;
custatevecExStateVectorGetResourcesFromDeviceSubSV(
stateVector, subSVIndex,
&deviceId, &d_subSV, &stream, &handle);
// Apply custom operations using d_subSV, stream, handle
// e.g., call cuBLAS, cuStateVec, or custom CUDA kernels
}
Accessing host memory resources#
APIs are provided to retrieve the internal resources of unstaged sub state vector slices.
The first step is to call custatevecExStateVectorExposeResources(). This API flushes all pending operations and arranges the sub state vector slice layout to be accessible by custatevecExStateVectorGetResourcesFromUnstagedSubSVSlice().
The following example exposes internal resources, queries sub state vector indices on host and device, and retrieves their resource pointers.
// Expose resources
custatevecExStateVectorExposeResources(
stateVector, CUSTATEVEC_EX_EXPOSE_RESOURCES_ACCESSIBLE);
// Access device sub state vectors
int32_t numDeviceSubSVs;
custatevecExStateVectorGetProperty(
stateVector, CUSTATEVEC_EX_SV_PROP_NUM_DEVICE_SUBSVS,
&numDeviceSubSVs, sizeof(int32_t));
std::vector<int32_t> deviceSubSVIndices(numDeviceSubSVs);
// This call can fail if custatevecExStateVectorExposeResources() is not called.
custatevecExStateVectorGetProperty(
stateVector, CUSTATEVEC_EX_SV_PROP_DEVICE_SUBSV_INDICES,
deviceSubSVIndices.data(), sizeof(int32_t) * numDeviceSubSVs);
for (auto subSVIndex : deviceSubSVIndices) {
int32_t deviceId;
void* d_subSV;
cudaStream_t stream;
custatevecHandle_t handle;
custatevecExStateVectorGetResourcesFromDeviceSubSV(
stateVector, subSVIndex,
&deviceId, &d_subSV, &stream, &handle);
// Apply custom operations
}
// Access unstaged sub state vector slices
int32_t numUnstagedSubSVs;
custatevecExStateVectorGetProperty(
stateVector, CUSTATEVEC_EX_SV_PROP_NUM_UNSTAGED_SUBSVS,
&numUnstagedSubSVs, sizeof(int32_t));
// This call can fail if custatevecExStateVectorExposeResources() is not called.
std::vector<int32_t> unstagedIndices(numUnstagedSubSVs);
custatevecExStateVectorGetProperty(
stateVector, CUSTATEVEC_EX_SV_PROP_UNSTAGED_SUBSV_INDICES,
unstagedIndices.data(), sizeof(int32_t) * numUnstagedSubSVs);
int32_t numSlices;
custatevecExStateVectorGetProperty(
stateVector, CUSTATEVEC_EX_SV_PROP_NUM_SUBSV_SLICES,
&numSlices, sizeof(int32_t));
for (auto subSVIndex : unstagedIndices) {
for (int32_t sliceIdx = 0; sliceIdx < numSlices; ++sliceIdx) {
void* slicePtr;
custatevecExMemoryPlacement_t placement;
int32_t deviceId;
cudaStream_t stream;
custatevecExStateVectorGetResourcesFromUnstagedSubSVSlice(
stateVector, subSVIndex, sliceIdx,
&slicePtr, &placement, &deviceId, &stream);
// Apply custom operations
// Retrieved host memory pointer is accessible from device.
}
}
Properties#
The following properties are relevant to host memory state vectors. All are retrieved
via custatevecExStateVectorGetProperty().
Property |
Description |
|---|---|
|
Number of migration wires ( |
|
Total number of sub state vectors (device + host) |
|
Array of all sub state vector indices |
|
Number of slices per sub state vector |
|
Number of unstaged (host) sub state vectors |
|
Array of unstaged sub state vector indices |