Examples#

Examples are provided for cuStateVec and cuStateVec Ex API.

The cuStateVec examples aim to show the usage of each cuStateVec API. The cuStateVec Ex examples aim to demonstrate real use cases of cuStateVec Ex API usage. Samples for both cuStateVec and cuStateVec Ex can be found in the NVIDIA/cuQuantum repository.

Compilation#

Assuming cuQuantum has been extracted in CUQUANTUM_ROOT, we update the library path accordingly:

export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${LD_LIBRARY_PATH}

We can compile the sample code we will discuss below (statevec_example.cu) via the following command:

nvcc statevec_example.cu -I${CUQUANTUM_ROOT}/include -L${CUQUANTUM_ROOT}/lib -lcustatevec -o statevec_example

Note

Depending on the source of the cuQuantum package, you may need to replace lib above by lib64.

Or simply run make to build all samples:

$ make CUQUANTUM_ROOT=<path_to_cuQuantum_installation>

cuStateVec code example#

The following code example shows the common steps to use cuStateVec. Here we apply a Toffoli gate, which inverts the third bit when the first two bits are both 1.

../_images/toffoli.png
#include <cuda_runtime_api.h> // cudaMalloc, cudaMemcpy, etc.
#include <cuComplex.h>        // cuDoubleComplex
#include <custatevec.h>       // custatevecApplyMatrix
#include <stdio.h>            // printf
#include <stdlib.h>           // EXIT_FAILURE

int main(void) {

   const int nIndexBits = 3;
   const int nSvSize    = (1 << nIndexBits);
   const int nTargets   = 1;
   const int nControls  = 2;
   const int adjoint    = 0;

   int targets[]  = {2};
   int controls[] = {0, 1};

   cuDoubleComplex h_sv[]        = {{ 0.0, 0.0}, { 0.0, 0.1}, { 0.1, 0.1},
                                    { 0.1, 0.2}, { 0.2, 0.2}, { 0.3, 0.3},
                                    { 0.3, 0.4}, { 0.4, 0.5}};
   cuDoubleComplex h_sv_result[] = {{ 0.0, 0.0}, { 0.0, 0.1}, { 0.1, 0.1},
                                    { 0.4, 0.5}, { 0.2, 0.2}, { 0.3, 0.3},
                                    { 0.3, 0.4}, { 0.1, 0.2}};
   cuDoubleComplex matrix[] = {{0.0, 0.0}, {1.0, 0.0},
                               {1.0, 0.0}, {0.0, 0.0}};


   cuDoubleComplex *d_sv;
   cudaMalloc((void**)&d_sv, nSvSize * sizeof(cuDoubleComplex));

   cudaMemcpy(d_sv, h_sv, nSvSize * sizeof(cuDoubleComplex),
              cudaMemcpyHostToDevice);

   //--------------------------------------------------------------------------

   // custatevec handle initialization
   custatevecHandle_t handle;

   custatevecCreate(&handle);

   void* extraWorkspace = nullptr;
   size_t extraWorkspaceSizeInBytes = 0;

   // check the size of external workspace
   custatevecApplyMatrixGetWorkspaceSize(
       handle, CUDA_C_64F, nIndexBits, matrix, CUDA_C_64F,
       CUSTATEVEC_MATRIX_LAYOUT_ROW, adjoint, nTargets, nControls,
       CUSTATEVEC_COMPUTE_64F, &extraWorkspaceSizeInBytes);

   // allocate external workspace if necessary
   if (extraWorkspaceSizeInBytes > 0)
       cudaMalloc(&extraWorkspace, extraWorkspaceSizeInBytes);

   // apply gate
   custatevecApplyMatrix(
       handle, d_sv, CUDA_C_64F, nIndexBits, matrix, CUDA_C_64F,
       CUSTATEVEC_MATRIX_LAYOUT_ROW, adjoint, targets, nTargets, controls,
       nullptr, nControls, CUSTATEVEC_COMPUTE_64F,
       extraWorkspace, extraWorkspaceSizeInBytes);

   // destroy handle
   custatevecDestroy(handle);

   //--------------------------------------------------------------------------

   cudaMemcpy(h_sv, d_sv, nSvSize * sizeof(cuDoubleComplex),
              cudaMemcpyDeviceToHost);

   bool correct = true;
   for (int i = 0; i < nSvSize; i++) {
       if ((h_sv[i].x != h_sv_result[i].x) ||
           (h_sv[i].y != h_sv_result[i].y)) {
           correct = false;
           break;
       }
   }

   if (correct)
       printf("example PASSED\n");
   else
       printf("example FAILED: wrong result\n");

   cudaFree(d_sv);
   if (extraWorkspaceSizeInBytes)
       cudaFree(extraWorkspace);

   return EXIT_SUCCESS;
}

cuStateVec Ex Samples#

Several sample programs are provided to demonstrate key features and usage patterns of cuStateVec Ex APIs.

quantum_state_initialization.cpp

Demonstrates how to initialize a state vector with external user data. The workflow shows setting quantum state from user data, reassigning wire ordering to match the imported data layout, then permuting the wire ordering to revert it to the default, which resets the memory layout of the state vector.

index_bit_permutation.cpp

Illustrates wire management and index bit permutation capabilities. Creates random permutations and applies them to the state vector, then verifies that state vector elements are correctly rearranged to match the updated wire ordering.

pauli_functions.cpp

Shows usage of Pauli rotation and Pauli expectation value computation. Applies rotation operations with custatevecExApplyPauliRotation and computes expectation values using custatevecExComputeExpectationOnPauliBasis for multiple Pauli strings.

estimate_pi.cpp

Implements a quantum phase estimation algorithm to estimate the value of π. This complete quantum algorithm example demonstrates advanced gate operations and measurement techniques using cuStateVec Ex APIs, following the approach described in the Qiskit textbook.

noise_channel.cpp

Demonstrates the SVUpdater workflow for noise simulation. Shows how to queue operators once and apply them repeatedly with different random numbers to study noise effects on GHZ entangled states across different system sizes.

interoperability_dot.cpp (Multi-device and multi-process capable)

Demonstrates cuStateVec Ex interoperability with cuBLAS for dot product computation. Supports single-device, multi-device, and multi-process configurations. Shows how to extract GPU memory pointers and CUDA streams from state vectors to compute dot products using external libraries while leveraging cuStateVec Ex’s distributed state vector capabilities.

quantum_volume.cpp (Multi-device and multi-process capable)

Implements quantum volume circuits for performance benchmarking. Supports single-device, multi-device, and multi-process configurations. Demonstrates scalability analysis across multiple qubit counts and reports performance metrics for distributed quantum simulations.

For detailed information about distributed state vectors, including scaling dimensions, configuration workflows, interoperability, and best practices, see cuStateVec Ex: State Vector.

Useful tips#

  • For debugging, the environment variable CUSTATEVEC_LOG_LEVEL=n can be set. The level n = 0, 1, …, 5 corresponds to the logger level as described and used in custatevecLoggerSetLevel(). The environment variable CUSTATEVEC_LOG_FILE=<filepath> can be used to direct the log output to a custom file at <filepath> instead of stdout.