Examples

In this section, we show an example on how to define a quantum operator, quantum states, and then compute the action of the quantum operator on a quantum state. For clarity, the quantum operator is defined inside a separate header transverse_ising_full_fused_noisy.h where it is wrapped in a helper C++ class UserDefinedLiouvillian. We also provide a utility header helpers.h containing convenient GPU array instantiation functions.

Compiling Code

Assuming cuQuantum has been extracted in CUQUANTUM_ROOT and cuTENSOR in CUTENSOR_ROOT, we update the library path as follows:

export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}

Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ${CUTENSOR_ROOT}/lib/11).

The serial sample code discussed below (operator_action_example.cpp) can be compiled via the following command:

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -o operator_action_example

For static linking against the cuDensityMat library, use the following command:

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a -L${CUTENSOR_DIR}/lib/12 -lcutensor -o operator_action_example

In order to build parallel (MPI) versions of the example operator_action_mpi_example.cpp, one will need to have an CUDA-aware MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and then set the environment variable $CUDENSITYMAT_COMM_LIB to the path to the MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so. The MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so can be built inside the ${CUQUANTUM_ROOT}/distributed_interfaces folder by calling the provided build script there. In order to link the executable to a CUDA-aware MPI library, one will need to add -I${MPI_PATH}/include and -L${MPI_PATH}/lib -lmpi to the build command:

nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -L${MPI_PATH}/lib -lmpi -o operator_action_mpi_example

Warning

When running operator_action_mpi_example.cpp without CUDA-aware MPI, the program will crash.

Note

Depending on the source of the cuQuantum package, you may need to replace lib above by lib64, depending which folder name is used inside your cuQuantum package.

Code Example (Serial Execution on a Single GPU)

The following code example illustrates the common steps necessary to use the cuDensityMat library to compute the action of a quantum many-body operator on a quantum state. The full sample code can be found in the NVIDIA/cuQuantum repository (main serial code and operator definition as well as the utility code).

First let’s introduce a helper class to construct a quantum many-body operator, for example, the transverse field Ising Hamiltonian with fused ZZ terms and and additional noise term.

  1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"   // helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* Time-dependent transverse-field Ising Hamiltonian operator
 19   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
 20    H = sum_{i} {h_i * X_i}                // transverse field sum of X_i
 21      + f(t) * sum_{i < j} {g_ij * ZZ_ij}  // modulated sum of fused {Z_i * Z_j} terms
 22      + d * sum_{i} {Y_i * {..} * Y_i}     // dissipation terms {Y_i * {..} * Y_i} will be fused into the YY_ii super-operator
 23   where {..} is the placeholder for the density matrix to show that the operators act from a different side
 24*/
 25
 26
 27// User-defined C++ callback function defining a time-dependent coefficient inside the Hamiltonian:
 28// f(t) = cos(omega * t) + i * sin(omega * t)
 29extern "C"
 30int32_t tdCoefComplex64(double time,             // time point
 31                        int32_t numParams,       // number of external user-defined Liouvillian parameters (= 1 here)
 32                        const double params[],   // params[0] is omega (user-defined Liouvillian parameter)
 33                        cudaDataType_t dataType, // data type (CUDA_C_64F here)
 34                        void * scalarStorage)    // CPU storage for the returned function value
 35{
 36  const auto omega = params[0];
 37  auto * tdCoef = static_cast<std::complex<double>*>(scalarStorage); // casting to complex<double> because it returns CUDA_C_64F data type
 38  *tdCoef = {std::cos(omega * time), std::sin(omega * time)};
 39  return 0; // error code (0: Success)
 40}
 41
 42
 43/** Convenience class to encapsulate the Liouvillian operator:
 44 *   - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
 45 *   - get() method returns a reference to the constructed Liouvillian operator
 46 *   - Destructor releases all resources used by the Liouvillian operator
 47 */
 48class UserDefinedLiouvillian final
 49{
 50private:
 51  // Data members
 52  cudensitymatHandle_t handle;              // library context handle
 53  const std::vector<int64_t> spaceShape;    // Hilbert space shape
 54  void * spinXelems {nullptr};              // elements of the X spin operator in GPU RAM
 55  void * spinYYelems {nullptr};             // elements of the YY two-spin operator in GPU RAM
 56  void * spinZZelems {nullptr};             // elements of the ZZ two-spin operator in GPU RAM
 57  cudensitymatElementaryOperator_t spinX;   // X spin operator
 58  cudensitymatElementaryOperator_t spinYY;  // YY two-spin operator
 59  cudensitymatElementaryOperator_t spinZZ;  // ZZ two-spin operator
 60  cudensitymatOperatorTerm_t oneBodyTerm;   // operator term: H1 = sum_{i} {h_i * X_i}
 61  cudensitymatOperatorTerm_t twoBodyTerm;   // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
 62  cudensitymatOperatorTerm_t noiseTerm;     // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix
 63  cudensitymatOperator_t liouvillian;       // (-i * (H1 + f(t) * H2) * rho) + (i * rho * (H1 + f(t) * H2)) + D1
 64
 65public:
 66
 67  // Constructor constructs a user-defined Liouvillian operator
 68  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,              // library context handle
 69                         const std::vector<int64_t> & hilbertSpaceShape): // Hilbert space shape
 70    handle(contextHandle), spaceShape(hilbertSpaceShape)
 71  {
 72    // Define the necessary elementary tensors in GPU memory (F-order storage!)
 73    spinXelems = createArrayGPU<std::complex<double>>(
 74                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
 75                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
 76
 77    spinYYelems = createArrayGPU<std::complex<double>>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
 78                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
 79                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
 80                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
 81                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
 82
 83    spinZZelems = createArrayGPU<std::complex<double>>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
 84                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
 85                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
 86                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
 87                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
 88
 89    // Construct the necessary Elementary Tensor Operators
 90    //   X_i operator
 91    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
 92                        1,                                   // one-body operator
 93                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
 94                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
 95                        0,                                   // 0 for dense tensors
 96                        nullptr,                             // nullptr for dense tensors
 97                        CUDA_C_64F,                          // data type
 98                        spinXelems,                          // tensor elements in GPU memory
 99                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
100                        &spinX));                            // the created elementary tensor operator
101    //  ZZ_ij = Z_i * Z_j fused operator
102    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
103                        2,                                   // two-body operator
104                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
105                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
106                        0,                                   // 0 for dense tensors
107                        nullptr,                             // nullptr for dense tensors
108                        CUDA_C_64F,                          // data type
109                        spinZZelems,                         // tensor elements in GPU memory
110                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
111                        &spinZZ));                           // the created elementary tensor operator
112    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
113    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
114                        2,                                   // two-body operator
115                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
116                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
117                        0,                                   // 0 for dense tensors
118                        nullptr,                             // nullptr for dense tensors
119                        CUDA_C_64F,                          // data type
120                        spinYYelems,                         // tensor elements in GPU memory
121                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
122                        &spinYY));                           // the created elementary tensor operator
123
124    // Construct the necessary Operator Terms from direct products of Elementary Tensor Operators
125    //  Create an empty operator term
126    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
127                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
128                        spaceShape.data(),                   // Hilbert space shape
129                        &oneBodyTerm));                      // the created empty operator term
130    //  Define the operator term
131    for (int32_t i = 0; i < spaceShape.size(); ++i) {
132      const double h_i = 1.0 / static_cast<double>(i+1);  // just some value (time-independent h_i coefficient)
133      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
134                          oneBodyTerm,
135                          1,                                                             // number of elementary tensor operators in the product
136                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
137                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
138                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
139                          make_cuDoubleComplex(h_i, 0.0),                                // h_i constant coefficient: Always 64-bit-precision complex number
140                          {nullptr, nullptr}));                                          // no time-dependent coefficient associated with the operator product
141    }
142    //  Create an empty operator term
143    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
144                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
145                        spaceShape.data(),                   // Hilbert space shape
146                        &twoBodyTerm));                      // the created empty operator term
147    //  Define the operator term
148    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
149      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
150        const double g_ij = -1.0 / static_cast<double>(i + j + 1);  // just some value (time-independent g_ij coefficient)
151        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
152                            twoBodyTerm,
153                            1,                                                              // number of elementary tensor operators in the product
154                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
155                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
156                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
157                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij constant coefficient: Always 64-bit-precision complex number
158                            {nullptr, nullptr}));                                           // no time-dependent coefficient associated with the operator product
159      }
160    }
161    //  Create an empty operator term
162    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
163                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
164                        spaceShape.data(),                   // Hilbert space shape
165                        &noiseTerm));                        // the created empty operator term
166    //  Define the operator term
167    for (int32_t i = 0; i < spaceShape.size(); ++i) {
168      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
169                          noiseTerm,
170                          1,                                                              // number of elementary tensor operators in the product
171                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
172                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
173                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
174                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
175                          {nullptr, nullptr}));                                           // no time-dependent coefficient associated with the operator product
176    }
177
178    // Construct the full Liouvillian operator as a sum of the operator terms
179    //  Create an empty operator (super-operator)
180    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
181                        spaceShape.size(),               // Hilbert space rank (number of dimensions)
182                        spaceShape.data(),               // Hilbert space shape
183                        &liouvillian));                  // the created empty operator (super-operator)
184    //  Append an operator term to the operator (super-operator)
185    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
186                        liouvillian,
187                        oneBodyTerm,                     // appended operator term
188                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
189                        make_cuDoubleComplex(0.0, -1.0), // -i constant
190                        {nullptr, nullptr}));            // no time-dependent coefficient associated with the operator term as a whole
191    //  Append an operator term to the operator (super-operator)
192    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
193                        liouvillian,
194                        twoBodyTerm,                     // appended operator term
195                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
196                        make_cuDoubleComplex(0.0, -1.0), // -i constant
197                        {tdCoefComplex64, nullptr}));    // function callback defining the time-dependent coefficient associated with this operator term as a whole
198    //  Append an operator term to the operator (super-operator)
199    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
200                        liouvillian,
201                        oneBodyTerm,                    // appended operator term
202                        1,                              // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
203                        make_cuDoubleComplex(0.0, 1.0), // i constant
204                        {nullptr, nullptr}));           // no time-dependent coefficient associated with the operator term as a whole
205    //  Append an operator term to the operator (super-operator)
206    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
207                        liouvillian,
208                        twoBodyTerm,                    // appended operator term
209                        1,                              // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
210                        make_cuDoubleComplex(0.0, 1.0), // i constant
211                        {tdCoefComplex64, nullptr}));   // function callback defining the time-dependent coefficient associated with this operator term as a whole
212    //  Append an operator term to the operator (super-operator)
213    const double d = 0.42; // just some value (time-independent coefficient)
214    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
215                        liouvillian,
216                        noiseTerm,                    // appended operator term
217                        0,                            // operator term action duality as a whole (no duality reversing in this case)
218                        make_cuDoubleComplex(d, 0.0), // constant coefficient associated with the operator term as a whole
219                        {nullptr, nullptr}));         // no time-dependent coefficient associated with the operator term as a whole
220  }
221
222  // Destructor destructs the user-defined Liouvillian operator
223  ~UserDefinedLiouvillian()
224  {
225    // Destroy the Liouvillian operator
226    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
227
228    // Destroy operator terms
229    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
230    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
231    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
232
233    // Destroy elementary tensor operators
234    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
235    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
236    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
237
238    // Destroy elementary tensors
239    destroyArrayGPU(spinYYelems);
240    destroyArrayGPU(spinZZelems);
241    destroyArrayGPU(spinXelems);
242  }
243
244  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
245  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
246  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
247  UserDefinedLiouvillian(UserDefinedLiouvillian &&) noexcept = default;
248  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) noexcept = default;
249
250  // Get access to the constructed Liouvillian
251  cudensitymatOperator_t & get()
252  {
253    return liouvillian;
254  }
255
256};

Now we can use this quantum many-body operator in our main code.

  1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15#include <cmath>
 16#include <complex>
 17#include <vector>
 18#include <chrono>
 19#include <iostream>
 20#include <cassert>
 21
 22
 23// Number of times to perform operator action on a quantum state
 24constexpr int NUM_REPEATS = 2;
 25
 26// Logging verbosity
 27bool verbose = true;
 28
 29
 30// Example workflow
 31void exampleWorkflow(cudensitymatHandle_t handle)
 32{
 33  // Define the composite Hilbert space shape and
 34  // quantum state batch size (number of individual quantum states)
 35  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 36  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 37
 38  if (verbose) {
 39    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 40    for (const auto & dimsn: spaceShape)
 41      std::cout << dimsn << ",";
 42    std::cout << ")" << std::endl;
 43    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 44  }
 45
 46  // Construct a user-defined Liouvillian operator using a convenience C++ class
 47  UserDefinedLiouvillian liouvillian(handle, spaceShape);
 48  if (verbose)
 49    std::cout << "Constructed the Liouvillian operator\n";
 50
 51  // Declare the input quantum state
 52  cudensitymatState_t inputState;
 53  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 54                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 55                      spaceShape.size(),
 56                      spaceShape.data(),
 57                      batchSize,
 58                      CUDA_C_64F,  // data type must match that of the operators created above
 59                      &inputState));
 60
 61  // Query the size of the quantum state storage
 62  std::size_t storageSize {0}; // only one storage component (tensor) is needed
 63  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 64                      inputState,
 65                      1,               // only one storage component
 66                      &storageSize));  // storage size in bytes
 67  const std::size_t stateVolume = storageSize / sizeof(std::complex<double>);  // quantum state tensor volume (number of elements)
 68  if (verbose)
 69    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 70
 71  // Prepare some initial value for the input quantum state
 72  std::vector<std::complex<double>> inputStateValue(stateVolume);
 73  for (std::size_t i = 0; i < stateVolume; ++i) {
 74    inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
 75  }
 76
 77  // Allocate initialized GPU storage for the input quantum state with prepared values
 78  auto * inputStateElems = createArrayGPU(inputStateValue);
 79
 80  // Attach initialized GPU storage to the input quantum state
 81  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
 82                      inputState,
 83                      1,                                                 // only one storage component (tensor)
 84                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
 85                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
 86  if (verbose)
 87    std::cout << "Constructed input quantum state\n";
 88
 89  // Declare the output quantum state of the same shape
 90  cudensitymatState_t outputState;
 91  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 92                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 93                      spaceShape.size(),
 94                      spaceShape.data(),
 95                      batchSize,
 96                      CUDA_C_64F,  // data type must match that of the operators created above
 97                      &outputState));
 98
 99  // Allocate initialized GPU storage for the output quantum state
100  auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
101
102  // Attach initialized GPU storage to the output quantum state
103  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
104                      outputState,
105                      1,                                                 // only one storage component (no tensor factorization)
106                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
107                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
108  if (verbose)
109    std::cout << "Constructed output quantum state\n";
110
111  // Declare a workspace descriptor
112  cudensitymatWorkspaceDescriptor_t workspaceDescr;
113  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
114
115  // Query free GPU memory
116  std::size_t freeMem = 0, totalMem = 0;
117  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
118  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
119  if (verbose)
120    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
121
122  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
123  const auto startTime = std::chrono::high_resolution_clock::now();
124  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
125                      liouvillian.get(),
126                      inputState,
127                      outputState,
128                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
129                      freeMem,                   // max available GPU free memory for the workspace
130                      workspaceDescr,            // workspace descriptor
131                      0x0));                     // default CUDA stream
132  const auto finishTime = std::chrono::high_resolution_clock::now();
133  const std::chrono::duration<double> timeSec = finishTime - startTime;
134  if (verbose)
135    std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
136
137  // Query the required workspace buffer size (bytes)
138  std::size_t requiredBufferSize {0};
139  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
140                      workspaceDescr,
141                      CUDENSITYMAT_MEMSPACE_DEVICE,
142                      CUDENSITYMAT_WORKSPACE_SCRATCH,
143                      &requiredBufferSize));
144  if (verbose)
145    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
146
147  // Allocate GPU storage for the workspace buffer
148  const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
149  auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
150  if (verbose)
151    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
152
153  // Attach the workspace buffer to the workspace descriptor
154  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
155                      workspaceDescr,
156                      CUDENSITYMAT_MEMSPACE_DEVICE,
157                      CUDENSITYMAT_WORKSPACE_SCRATCH,
158                      workspaceBuffer,
159                      requiredBufferSize));
160  if (verbose)
161    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
162
163  // Zero out the output quantum state
164  HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
165                      outputState,
166                      0x0));
167  if (verbose)
168    std::cout << "Initialized the output state to zero\n";
169
170  // Apply the Liouvillian operator to the input quatum state
171  // and accumulate its action into the output quantum state (note += semantics)
172  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
173    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
174    const auto startTime = std::chrono::high_resolution_clock::now();
175    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
176                        liouvillian.get(),
177                        0.01,                                  // time point
178                        1,                                     // number of external user-defined Hamiltonian parameters
179                        std::vector<double>({13.42}).data(),   // Hamiltonian parameter(s)
180                        inputState,                            // input quantum state
181                        outputState,                           // output quantum state
182                        workspaceDescr,                        // workspace descriptor
183                        0x0));                                 // default CUDA stream
184    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
185    const auto finishTime = std::chrono::high_resolution_clock::now();
186    const std::chrono::duration<double> timeSec = finishTime - startTime;
187    if (verbose)
188      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
189  }
190
191  // Compute the squared norm of the output quantum state
192  void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
193  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
194                      outputState,
195                      norm2,
196                      0x0));
197  if (verbose)
198    std::cout << "Computed the output state norm\n";
199  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
200  destroyArrayGPU(norm2);
201
202  // Destroy workspace descriptor
203  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
204
205  // Destroy workspace buffer storage
206  destroyArrayGPU(workspaceBuffer);
207
208  // Destroy quantum states
209  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
210  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
211
212  // Destroy quantum state storage
213  destroyArrayGPU(outputStateElems);
214  destroyArrayGPU(inputStateElems);
215
216  if (verbose)
217    std::cout << "Destroyed resources\n" << std::flush;
218}
219
220
221int main(int argc, char ** argv)
222{
223  // Assign a GPU to the process
224  HANDLE_CUDA_ERROR(cudaSetDevice(0));
225  if (verbose)
226    std::cout << "Set active device\n";
227
228  // Create a library handle
229  cudensitymatHandle_t handle;
230  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
231  if (verbose)
232    std::cout << "Created a library handle\n";
233
234  // Run the example
235  exampleWorkflow(handle);
236
237  // Destroy the library handle
238  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
239  if (verbose)
240    std::cout << "Destroyed the library handle\n";
241
242  // Done
243  return 0;
244}

Code Example (Parallel Execution on Multiple GPUs)

It is straightforward to adapt main serial code and enable parallel execution across multiple/many GPU devices (across multiple/many nodes). We will illustrate this with an example using the Message Passing Interface (MPI) as the communication layer. Below we show the minor additions that need to be made in order to enable distributed parallel execution without making any changes to the original serial source code.

The full sample code can be found in the NVIDIA/cuQuantum repository (main MPI code and operator definition as well as the utility code).

Here is the updated main code for multi-GPU runs.

  1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15// MPI library (optional)
 16#ifdef MPI_ENABLED
 17#include <mpi.h>
 18#endif
 19
 20#include <cmath>
 21#include <complex>
 22#include <vector>
 23#include <chrono>
 24#include <iostream>
 25#include <cassert>
 26
 27
 28// Number of times to perform operator action on a quantum state
 29constexpr int NUM_REPEATS = 2;
 30
 31// Logging verbosity
 32bool verbose = true;
 33
 34
 35// Example workflow
 36void exampleWorkflow(cudensitymatHandle_t handle)
 37{
 38  // Define the composite Hilbert space shape and
 39  // quantum state batch size (number of individual quantum states)
 40  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 41  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 42
 43  if (verbose) {
 44    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 45    for (const auto & dimsn: spaceShape)
 46      std::cout << dimsn << ",";
 47    std::cout << ")" << std::endl;
 48    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 49  }
 50
 51  // Construct a user-defined Liouvillian operator using a convenience C++ class
 52  UserDefinedLiouvillian liouvillian(handle, spaceShape);
 53  if (verbose)
 54    std::cout << "Constructed the Liouvillian operator\n";
 55
 56  // Declare the input quantum state
 57  cudensitymatState_t inputState;
 58  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 59                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 60                      spaceShape.size(),
 61                      spaceShape.data(),
 62                      batchSize,
 63                      CUDA_C_64F,  // data type must match that of the operators created above
 64                      &inputState));
 65
 66  // Query the size of the quantum state storage
 67  std::size_t storageSize {0}; // only one storage component (tensor) is needed
 68  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 69                      inputState,
 70                      1,               // only one storage component
 71                      &storageSize));  // storage size in bytes
 72  const std::size_t stateVolume = storageSize / sizeof(std::complex<double>);  // quantum state tensor volume (number of elements)
 73  if (verbose)
 74    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 75
 76  // Prepare some initial value for the input quantum state
 77  std::vector<std::complex<double>> inputStateValue(stateVolume);
 78  for (std::size_t i = 0; i < stateVolume; ++i) {
 79    inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
 80  }
 81
 82  // Allocate initialized GPU storage for the input quantum state with prepared values
 83  auto * inputStateElems = createArrayGPU(inputStateValue);
 84
 85  // Attach initialized GPU storage to the input quantum state
 86  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
 87                      inputState,
 88                      1,                                                 // only one storage component (tensor)
 89                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
 90                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
 91  if (verbose)
 92    std::cout << "Constructed input quantum state\n";
 93
 94  // Declare the output quantum state of the same shape
 95  cudensitymatState_t outputState;
 96  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 97                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 98                      spaceShape.size(),
 99                      spaceShape.data(),
100                      batchSize,
101                      CUDA_C_64F,  // data type must match that of the operators created above
102                      &outputState));
103
104  // Allocate initialized GPU storage for the output quantum state
105  auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
106
107  // Attach initialized GPU storage to the output quantum state
108  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109                      outputState,
110                      1,                                                 // only one storage component (no tensor factorization)
111                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
112                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
113  if (verbose)
114    std::cout << "Constructed output quantum state\n";
115
116  // Declare a workspace descriptor
117  cudensitymatWorkspaceDescriptor_t workspaceDescr;
118  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
119
120  // Query free GPU memory
121  std::size_t freeMem = 0, totalMem = 0;
122  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
123  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
124  if (verbose)
125    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
126
127  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
128  const auto startTime = std::chrono::high_resolution_clock::now();
129  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
130                      liouvillian.get(),
131                      inputState,
132                      outputState,
133                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
134                      freeMem,                   // max available GPU free memory for the workspace
135                      workspaceDescr,            // workspace descriptor
136                      0x0));                     // default CUDA stream
137  const auto finishTime = std::chrono::high_resolution_clock::now();
138  const std::chrono::duration<double> timeSec = finishTime - startTime;
139  if (verbose)
140    std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
141
142  // Query the required workspace buffer size (bytes)
143  std::size_t requiredBufferSize {0};
144  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
145                      workspaceDescr,
146                      CUDENSITYMAT_MEMSPACE_DEVICE,
147                      CUDENSITYMAT_WORKSPACE_SCRATCH,
148                      &requiredBufferSize));
149  if (verbose)
150    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
151
152  // Allocate GPU storage for the workspace buffer
153  const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
154  auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
155  if (verbose)
156    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
157
158  // Attach the workspace buffer to the workspace descriptor
159  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
160                      workspaceDescr,
161                      CUDENSITYMAT_MEMSPACE_DEVICE,
162                      CUDENSITYMAT_WORKSPACE_SCRATCH,
163                      workspaceBuffer,
164                      requiredBufferSize));
165  if (verbose)
166    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
167
168  // Zero out the output quantum state
169  HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
170                      outputState,
171                      0x0));
172  if (verbose)
173    std::cout << "Initialized the output state to zero\n";
174
175  // Apply the Liouvillian operator to the input quatum state
176  // and accumulate its action into the output quantum state (note += semantics)
177  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
178    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
179    const auto startTime = std::chrono::high_resolution_clock::now();
180    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
181                        liouvillian.get(),
182                        0.01,                                  // time point
183                        1,                                     // number of external user-defined Hamiltonian parameters
184                        std::vector<double>({13.42}).data(),   // Hamiltonian parameter(s)
185                        inputState,                            // input quantum state
186                        outputState,                           // output quantum state
187                        workspaceDescr,                        // workspace descriptor
188                        0x0));                                 // default CUDA stream
189    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
190    const auto finishTime = std::chrono::high_resolution_clock::now();
191    const std::chrono::duration<double> timeSec = finishTime - startTime;
192    if (verbose)
193      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
194  }
195
196  // Compute the squared norm of the output quantum state
197  void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
198  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
199                      outputState,
200                      norm2,
201                      0x0));
202  if (verbose)
203    std::cout << "Computed the output state norm\n";
204  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
205  destroyArrayGPU(norm2);
206
207  // Destroy workspace descriptor
208  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
209
210  // Destroy workspace buffer storage
211  destroyArrayGPU(workspaceBuffer);
212
213  // Destroy quantum states
214  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
215  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
216
217  // Destroy quantum state storage
218  destroyArrayGPU(outputStateElems);
219  destroyArrayGPU(inputStateElems);
220
221  if (verbose)
222    std::cout << "Destroyed resources\n" << std::flush;
223}
224
225
226int main(int argc, char ** argv)
227{
228  // Initialize MPI library (if needed)
229#ifdef MPI_ENABLED
230  HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
231  int procRank {-1};
232  HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
233  int numProcs {0};
234  HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
235  if (procRank != 0) verbose = false;
236  if (verbose)
237    std::cout << "Initialized MPI library\n";
238#else
239  const int procRank {0};
240  const int numProcs {1};
241#endif
242
243  // Assign a GPU to the process
244  int numDevices {0};
245  HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
246  const int deviceId = procRank % numDevices;
247  HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
248  if (verbose)
249    std::cout << "Set active device\n";
250
251  // Create a library handle
252  cudensitymatHandle_t handle;
253  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
254  if (verbose)
255    std::cout << "Created a library handle\n";
256
257  // Reset distributed configuration (once)
258#ifdef MPI_ENABLED
259  MPI_Comm comm;
260  HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
261  HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
262                      CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
263                      &comm, sizeof(comm)));
264#endif
265
266  // Run the example
267  exampleWorkflow(handle);
268
269  // Synchronize MPI processes
270#ifdef MPI_ENABLED
271  HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
272#endif
273
274  // Destroy the library handle
275  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
276  if (verbose)
277    std::cout << "Destroyed the library handle\n";
278
279  // Finalize the MPI library
280#ifdef MPI_ENABLED
281  HANDLE_MPI_ERROR(MPI_Finalize());
282  if (verbose)
283    std::cout << "Finalized MPI library\n";
284#endif
285
286  // Done
287  return 0;
288}

Useful Tips

  • For debugging, one can set the environment variable CUDENSITYMAT_LOG_LEVEL=n. The level n = 0, 1, …, 5 corresponds to the logger level as described in the table below. The environment variable CUDENSITYMAT_LOG_FILE=<filepath> can be used to redirect the log output to a custom file at <filepath> instead of stdout.

Level

Summary

Long Description

0

Off

Logging is disabled (default)

1

Errors

Only errors will be logged

2

Performance Trace

API calls that launch CUDA kernels will log their parameters and important information

3

Performance Hints

Hints that can potentially improve the application’s performance

4

Heuristics Trace

Provides general information about the library execution, may contain details about heuristic status

5

API Trace

API calls will log their parameter and important information