Examples

In this section, we show basic examples on how to define a quantum operator, quantum state, and then compute the action of the quantum operator on a quantum state, and, optionally, backward-differentiate the operator action (compute gradients) with respect to user-provided real parameters parameterizing the operator. For clarity, the quantum operator for each example is defined inside a separate C++ header, specifically transverse_ising_full_fused_noisy.h and transverse_ising_full_fused_noisy_grad.h, where it is wrapped in a helper C++ class UserDefinedLiouvillian. We also provide a utility header helpers.h containing convenient GPU array instantiation, initialization, and printing functions.

Compiling code

Assuming cuQuantum has been extracted in CUQUANTUM_ROOT and cuTENSOR is in CUTENSOR_ROOT, we update the library path as follows:

export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}

Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ${CUTENSOR_ROOT}/lib/11).

A serial sample code discussed below (operator_action_example.cpp) can be compiled via the following command:

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensornet -lcutensor -lcublas -o operator_action_example

For static linking against the cuDensityMat library, use the following command:

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a ${CUQUANTUM_ROOT}/lib/libcutensornet_static.a -L${CUTENSOR_ROOT}/lib/12 -lcutensor -lcublas -o operator_action_example

In order to build a parallel (MPI) version of the example operator_action_mpi_example.cpp, one will need to have a CUDA-aware MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and then set the environment variable $CUDENSITYMAT_COMM_LIB to the path to the MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so. The MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so can be built inside the ${CUQUANTUM_ROOT}/distributed_interfaces folder by calling the provided build script there. In order to link the executable to a CUDA-aware MPI library, one will need to add -I${MPI_PATH}/include and -L${MPI_PATH}/lib -lmpi to the build command:

nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -L${MPI_PATH}/lib -lcudensitymat -lcutensornet -lcutensor -lcublas -lmpi -o operator_action_mpi_example

Warning

When running operator_action_mpi_example.cpp with a non-CUDA-aware MPI library, the program will crash.

Note

Depending on the installation of the cuQuantum SDK package, you may need to replace lib above by lib64, depending which folder name is used inside your cuQuantum SDK package.

Code example (serial execution on a single GPU)

The following code example illustrates the common steps necessary to use the cuDensityMat library to compute the action of a quantum many-body operator on a quantum state. The full sample code can be found in the NVIDIA/cuQuantum repository (main serial code and operator definition as well as the utility code).

First let’s introduce a helper class to construct a specific quantum many-body operator, for example, the transverse field Ising Hamiltonian with fused ZZ terms and an additional noise term. Here we choose to make the f(t) coefficient depend on time and a single user-provided real parameter Omega. We use a CPU-side user-defined scalar callback function to define the dependence of the f(t) coefficient on time and the user-provided real parameter Omega. Note that inside the callback function definition, we explicitly expect the data type to be CUDA_C_64F (double-precision complex numbers), which applies to the scalar coefficient f(t) set by the callback function in-place.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"      // GPU helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* DESCRIPTION:
 19   Time-dependent transverse-field Ising Hamiltonian operator
 20   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
 21    H = sum_{i} {h_i * X_i}                // transverse field sum of X_i operators with static h_i coefficients 
 22      + f(t) * sum_{i < j} {g_ij * ZZ_ij}  // modulated sum of the fused ordered {Z_i * Z_j} terms with static g_ij coefficients
 23      + d * sum_{i} {Y_i * {..} * Y_i}     // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
 24   where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
 25*/
 26
 27/** Define the numerical type and data type for the GPU computations (same) */
 28using NumericalType = std::complex<double>;  // do not change
 29constexpr cudaDataType_t dataType = CUDA_C_64F;        // do not change
 30
 31
 32/** Example of a user-provided scalar CPU callback C function
 33 *  defining a time-dependent coefficient inside the Hamiltonian:
 34 *  f(t) = exp(i * Omega * t) = cos(Omega * t) + i * sin(Omega * t)
 35 */
 36extern "C"
 37int32_t fCoefComplex64(
 38  double time,             //in: time point
 39  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 40  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
 41  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
 42  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
 43  void * scalarStorage,    //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
 44  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 45{
 46  if (dataType == CUDA_C_64F) {
 47    auto * tdCoef = static_cast<cuDoubleComplex*>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
 48    for (int64_t i = 0; i < batchSize; ++i) {
 49      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
 50      tdCoef[i] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time)); // value of the i-th instance of the coefficients batch
 51    }
 52  } else {
 53    return 1; // error code (1: Error)
 54  }
 55  return 0; // error code (0: Success)
 56}
 57
 58
 59/** Convenience class which encapsulates a user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
 60 *  - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
 61 *  - Method `get()` returns a reference to the constructed Liouvillian operator
 62 *  - Destructor releases all resources used by the Liouvillian operator
 63 */
 64class UserDefinedLiouvillian final
 65{
 66private:
 67  // Data members
 68  cudensitymatHandle_t handle;             // library context handle
 69  int64_t stateBatchSize;                  // quantum state batch size
 70  const std::vector<int64_t> spaceShape;   // Hilbert space shape (extents of the modes of the composite Hilbert space)
 71  void * spinXelems {nullptr};             // elements of the X spin operator in GPU RAM (F-order storage)
 72  void * spinYYelems {nullptr};            // elements of the fused YY two-spin operator in GPU RAM (F-order storage)
 73  void * spinZZelems {nullptr};            // elements of the fused ZZ two-spin operator in GPU RAM (F-order storage)
 74  cudensitymatElementaryOperator_t spinX;  // X spin operator (elementary tensor operator)
 75  cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator (elementary tensor operator)
 76  cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator (elementary tensor operator)
 77  cudensitymatOperatorTerm_t oneBodyTerm;  // operator term: H1 = sum_{i} {h_i * X_i} (one-body term)
 78  cudensitymatOperatorTerm_t twoBodyTerm;  // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij} (two-body term)
 79  cudensitymatOperatorTerm_t noiseTerm;    // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix (two-body mixed term)
 80  cudensitymatOperator_t liouvillian;      // full operator: (-i * (H1 + H2) * {..}) + (i * {..} * (H1 + H2)) + D1{..} (super-operator)
 81
 82public:
 83
 84  // Constructor constructs a user-defined Liouvillian operator
 85  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,             // library context handle
 86                         const std::vector<int64_t> & hilbertSpaceShape, // Hilbert space shape
 87                         int64_t batchSize):                             // batch size for the quantum state
 88    handle(contextHandle), stateBatchSize(batchSize), spaceShape(hilbertSpaceShape)
 89  {
 90    // Define the necessary operator tensors in GPU memory (F-order storage!)
 91    spinXelems = createInitializeArrayGPU<NumericalType>(  // X[i0; j0]
 92                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
 93                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
 94
 95    spinYYelems = createInitializeArrayGPU<NumericalType>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
 96                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
 97                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
 98                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
 99                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
100
101    spinZZelems = createInitializeArrayGPU<NumericalType>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
102                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
103                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
104                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
105                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
106
107    // Construct the necessary Elementary Tensor Operators
108    //  X_i operator
109    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
110                        1,                                   // one-body operator
111                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
112                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
113                        0,                                   // 0 for dense tensors
114                        nullptr,                             // nullptr for dense tensors
115                        dataType,                            // data type
116                        spinXelems,                          // tensor elements in GPU memory
117                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
118                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
119                        &spinX));                            // the created elementary tensor operator
120    //  ZZ_ij = Z_i * Z_j fused operator
121    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
122                        2,                                   // two-body operator
123                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
124                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
125                        0,                                   // 0 for dense tensors
126                        nullptr,                             // nullptr for dense tensors
127                        dataType,                            // data type
128                        spinZZelems,                         // tensor elements in GPU memory
129                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
130                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
131                        &spinZZ));                           // the created elementary tensor operator
132    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
133    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
134                        2,                                   // two-body operator
135                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
136                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
137                        0,                                   // 0 for dense tensors
138                        nullptr,                             // nullptr for dense tensors
139                        dataType,                            // data type
140                        spinYYelems,                         // tensor elements in GPU memory
141                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
142                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
143                        &spinYY));                           // the created elementary tensor operator
144
145    // Construct the necessary Operator Terms from tensor products of Elementary Tensor Operators
146    //  Create an empty operator term
147    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
148                        spaceShape.size(),                   // Hilbert space rank (number of modes)
149                        spaceShape.data(),                   // Hilbert space shape (mode extents)
150                        &oneBodyTerm));                      // the created empty operator term
151    //  Define the operator term: H1 = sum_{i} {h_i * X_i}
152    for (int32_t i = 0; i < spaceShape.size(); ++i) {
153      const double h_i = 1.0 / static_cast<double>(i+1); // assign some value to the time-independent h_i coefficient
154      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
155                          oneBodyTerm,
156                          1,                                                             // number of elementary tensor operators in the product
157                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
158                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
159                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
160                          make_cuDoubleComplex(h_i, 0.0),                                // h_i constant coefficient: Always 64-bit-precision complex number
161                          cudensitymatScalarCallbackNone,                                // no time-dependent coefficient associated with this operator product
162                          cudensitymatScalarGradientCallbackNone));                      // no coefficient gradient associated with this operator product
163    }
164    //  Create an empty operator term
165    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
166                        spaceShape.size(),                   // Hilbert space rank (number of modes)
167                        spaceShape.data(),                   // Hilbert space shape (mode extents)
168                        &twoBodyTerm));                      // the created empty operator term
169    //  Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
170    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
171      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
172        const double g_ij = -1.0 / static_cast<double>(i + j + 1); // assign some value to the time-independent g_ij coefficient
173        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
174                            twoBodyTerm,
175                            1,                                                              // number of elementary tensor operators in the product
176                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
177                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
178                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
179                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij constant coefficient: Always 64-bit-precision complex number
180                            cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
181                            cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
182      }
183    }
184    //  Create an empty operator term
185    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
186                        spaceShape.size(),                   // Hilbert space rank (number of modes)
187                        spaceShape.data(),                   // Hilbert space shape (mode extents)
188                        &noiseTerm));                        // the created empty operator term
189    //  Define the operator term: D1 = d * sum_{i} {YY_ii}
190    for (int32_t i = 0; i < spaceShape.size(); ++i) {
191      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
192                          noiseTerm,
193                          1,                                                              // number of elementary tensor operators in the product
194                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
195                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
196                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
197                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
198                          cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
199                          cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
200    }
201
202    // Construct the full Liouvillian operator as a sum of the operator terms
203    //  Create an empty operator (super-operator)
204    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
205                        spaceShape.size(),                // Hilbert space rank (number of modes)
206                        spaceShape.data(),                // Hilbert space shape (modes extents)
207                        &liouvillian));                   // the created empty operator (super-operator)
208    //  Append an operator term to the operator (super-operator)
209    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
210                        liouvillian,
211                        oneBodyTerm,                      // appended operator term
212                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
213                        make_cuDoubleComplex(0.0, -1.0),  // -i constant
214                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
215                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
216    //  Append an operator term to the operator (super-operator)
217    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
218                        liouvillian,
219                        twoBodyTerm,                     // appended operator term
220                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
221                        make_cuDoubleComplex(0.0, -1.0), // -i constant
222                        {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
223                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator term as a whole
224    //  Append an operator term to the operator (super-operator)
225    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
226                        liouvillian,
227                        oneBodyTerm,                      // appended operator term
228                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
229                        make_cuDoubleComplex(0.0, 1.0),   // i constant
230                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
231                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
232    //  Append an operator term to the operator (super-operator)
233    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
234                        liouvillian,
235                        twoBodyTerm,                     // appended operator term
236                        1,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
237                        make_cuDoubleComplex(0.0, 1.0),  // i constant
238                        {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
239                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator term as a whole
240    //  Append an operator term to the operator (super-operator)
241    const double d = 0.42; // assign some value to the time-independent coefficient
242    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
243                        liouvillian,
244                        noiseTerm,                        // appended operator term
245                        0,                                // operator term action duality as a whole (no duality reversing in this case)
246                        make_cuDoubleComplex(d, 0.0),     // constant coefficient associated with the operator term as a whole
247                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
248                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
249  }
250
251  // Destructor destructs the user-defined Liouvillian operator
252  ~UserDefinedLiouvillian()
253  {
254    // Destroy the Liouvillian operator
255    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
256
257    // Destroy operator terms
258    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
259    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
260    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
261
262    // Destroy elementary tensor operators
263    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
264    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
265    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
266
267    // Destroy operator tensors
268    destroyArrayGPU(spinYYelems);
269    destroyArrayGPU(spinZZelems);
270    destroyArrayGPU(spinXelems);
271  }
272
273  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
274  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
275  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
276  UserDefinedLiouvillian(UserDefinedLiouvillian &&) = delete;
277  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) = delete;
278
279  /** Returns the number of externally provided Hamiltonian parameters. */
280  int32_t getNumParameters() const
281  {
282    return 1; // one parameter Omega
283  }
284
285  /** Get access to the constructed Liouvillian operator. */
286  cudensitymatOperator_t & get()
287  {
288    return liouvillian;
289  }
290
291};

Now we can use this parameterized quantum many-body operator in our main code to compute the action of the operator on a mixed quantum state.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14#include <cmath>
 15#include <complex>
 16#include <vector>
 17#include <chrono>
 18#include <iostream>
 19#include <cassert>
 20
 21
 22// Number of times to perform operator action on a quantum state
 23constexpr int NUM_REPEATS = 2;
 24
 25// Logging verbosity
 26bool verbose = true;
 27
 28
 29// Example workflow
 30void exampleWorkflow(cudensitymatHandle_t handle)
 31{
 32  // Define the composite Hilbert space shape and
 33  // quantum state batch size (number of individual quantum states in a batched simulation)
 34  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 35  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 36
 37  if (verbose) {
 38    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 39    for (const auto & dimsn: spaceShape)
 40      std::cout << dimsn << ",";
 41    std::cout << ")" << std::endl;
 42    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 43  }
 44
 45  // Construct a user-defined Liouvillian operator using a convenience C++ class
 46  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 47  if (verbose)
 48    std::cout << "Constructed the Liouvillian operator\n";
 49
 50  // Set and place external user-provided Hamiltonian parameters in GPU memory
 51  const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
 52  std::vector<double> cpuHamParams(numParams * batchSize);
 53  for (int64_t j = 0; j < batchSize; ++j) {
 54    for (int32_t i = 0; i < numParams; ++i) {
 55      cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
 56    }
 57  }
 58  auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
 59
 60  // Declare the input quantum state
 61  cudensitymatState_t inputState;
 62  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 63                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 64                      spaceShape.size(),
 65                      spaceShape.data(),
 66                      batchSize,
 67                      dataType,
 68                      &inputState));
 69
 70  // Query the size of the quantum state storage
 71  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 72  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 73                      inputState,
 74                      1,               // only one storage component (tensor)
 75                      &storageSize));  // storage size in bytes
 76  const std::size_t stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
 77  if (verbose)
 78    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 79
 80  // Prepare some initial value for the input quantum state batch
 81  std::vector<NumericalType> inputStateValue(stateVolume);
 82  if constexpr (std::is_same_v<NumericalType, float>) {
 83    for (std::size_t i = 0; i < stateVolume; ++i) {
 84      inputStateValue[i] = 1.0f / float(i+1); // just some value
 85    }
 86  } else if constexpr (std::is_same_v<NumericalType, double>) {
 87    for (std::size_t i = 0; i < stateVolume; ++i) {
 88      inputStateValue[i] = 1.0 / double(i+1); // just some value
 89    }
 90  } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
 91    for (std::size_t i = 0; i < stateVolume; ++i) {
 92      inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
 93    }
 94  } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
 95    for (std::size_t i = 0; i < stateVolume; ++i) {
 96      inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
 97    }
 98  } else {
 99    std::cerr << "Error: Unsupported data type!\n";
100    std::exit(1);
101  }
102  // Allocate initialized GPU storage for the input quantum state with prepared values
103  auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
104  if (verbose)
105    std::cout << "Allocated input quantum state storage and initialized it to some value\n";
106
107  // Attach initialized GPU storage to the input quantum state
108  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109                      inputState,
110                      1,                                                 // only one storage component (tensor)
111                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
112                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
113  if (verbose)
114    std::cout << "Constructed input quantum state\n";
115
116  // Declare the output quantum state of the same shape
117  cudensitymatState_t outputState;
118  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
119                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
120                      spaceShape.size(),
121                      spaceShape.data(),
122                      batchSize,
123                      dataType,
124                      &outputState));
125
126  // Allocate initialized GPU storage for the output quantum state
127  auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
128  if (verbose)
129    std::cout << "Allocated output quantum state storage\n";
130
131  // Attach GPU storage to the output quantum state
132  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
133                      outputState,
134                      1,                                                 // only one storage component (tensor)
135                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
136                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
137  if (verbose)
138    std::cout << "Constructed output quantum state\n";
139
140  // Declare a workspace descriptor
141  cudensitymatWorkspaceDescriptor_t workspaceDescr;
142  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
143
144  // Query free GPU memory
145  std::size_t freeMem = 0, totalMem = 0;
146  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
147  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
148  if (verbose)
149    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
150
151  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
152  const auto startTime = std::chrono::high_resolution_clock::now();
153  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
154                      liouvillian.get(),
155                      inputState,
156                      outputState,
157                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
158                      freeMem,                   // max available GPU free memory for the workspace
159                      workspaceDescr,            // workspace descriptor
160                      0x0));                     // default CUDA stream
161  const auto finishTime = std::chrono::high_resolution_clock::now();
162  const std::chrono::duration<double> timeSec = finishTime - startTime;
163  if (verbose)
164    std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
165
166  // Query the required workspace buffer size (bytes)
167  std::size_t requiredBufferSize {0};
168  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
169                      workspaceDescr,
170                      CUDENSITYMAT_MEMSPACE_DEVICE,
171                      CUDENSITYMAT_WORKSPACE_SCRATCH,
172                      &requiredBufferSize));
173  if (verbose)
174    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
175
176  // Allocate GPU storage for the workspace buffer
177  const std::size_t bufferVolume = requiredBufferSize / sizeof(NumericalType);
178  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
179  if (verbose)
180    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
181
182  // Attach the workspace buffer to the workspace descriptor
183  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
184                      workspaceDescr,
185                      CUDENSITYMAT_MEMSPACE_DEVICE,
186                      CUDENSITYMAT_WORKSPACE_SCRATCH,
187                      workspaceBuffer,
188                      requiredBufferSize));
189  if (verbose)
190    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
191
192  // Apply the Liouvillian operator to the input quatum state
193  // and accumulate its action into the output quantum state (note accumulative += semantics)
194  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
195    // Zero out the output quantum state
196    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
197                        outputState,
198                        0x0));
199    if (verbose)
200      std::cout << "Initialized the output quantum state to zero\n";
201    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
202    const auto startTime = std::chrono::high_resolution_clock::now();
203    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
204                        liouvillian.get(),
205                        0.3,                                   // time point (some value)
206                        batchSize,                             // user-defined batch size
207                        numParams,                             // number of external user-defined Hamiltonian parameters
208                        hamiltonianParams,                     // external Hamiltonian parameters in GPU memory
209                        inputState,                            // input quantum state
210                        outputState,                           // output quantum state
211                        workspaceDescr,                        // workspace descriptor
212                        0x0));                                 // default CUDA stream
213    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
214    const auto finishTime = std::chrono::high_resolution_clock::now();
215    const std::chrono::duration<double> timeSec = finishTime - startTime;
216    if (verbose)
217      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
218  }
219
220  // Compute the squared norm of the output quantum state
221  void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
222  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
223                      outputState,
224                      norm2,
225                      0x0));
226  if (verbose) {
227    std::cout << "Computed the output quantum state norm:\n";
228    printArrayGPU<double>(norm2, batchSize);
229  }
230
231  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
232
233  // Destroy the norm2 array
234  destroyArrayGPU(norm2);
235
236  // Destroy workspace descriptor
237  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
238
239  // Destroy workspace buffer storage
240  destroyArrayGPU(workspaceBuffer);
241
242  // Destroy quantum states
243  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
244  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
245
246  // Destroy quantum state storage
247  destroyArrayGPU(outputStateElems);
248  destroyArrayGPU(inputStateElems);
249
250  // Destroy external Hamiltonian parameters
251  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
252
253  if (verbose)
254    std::cout << "Destroyed resources\n" << std::flush;
255}
256
257
258int main(int argc, char ** argv)
259{
260  // Assign a GPU to the process
261  HANDLE_CUDA_ERROR(cudaSetDevice(0));
262  if (verbose)
263    std::cout << "Set active device\n";
264
265  // Create a library handle
266  cudensitymatHandle_t handle;
267  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
268  if (verbose)
269    std::cout << "Created a library handle\n";
270
271  // Run the example
272  exampleWorkflow(handle);
273
274  // Destroy the library handle
275  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
276  if (verbose)
277    std::cout << "Destroyed the library handle\n";
278
279  HANDLE_CUDA_ERROR(cudaDeviceReset());
280
281  // Done
282  return 0;
283}

Code example (parallel execution on multiple GPUs)

It is straightforward to adapt the main serial code and enable parallel execution across multiple/many GPU devices (across multiple/many nodes). We will illustrate this with an example using the Message Passing Interface (MPI) as the communication layer. Below we show the minor additions that need to be made in order to enable distributed parallel execution without making any changes to the original serial source code.

The full sample code can be found in the NVIDIA/cuQuantum repository (main MPI code and operator definition as well as the utility code).

Here is the updated main code for multi-GPU runs.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15// MPI library (optional)
 16#ifdef MPI_ENABLED
 17#include <mpi.h>
 18#endif
 19
 20#include <cmath>
 21#include <complex>
 22#include <vector>
 23#include <chrono>
 24#include <iostream>
 25#include <cassert>
 26
 27
 28// Number of times to perform operator action on a quantum state
 29constexpr int NUM_REPEATS = 2;
 30
 31// Logging verbosity
 32bool verbose = true;
 33
 34
 35// Example workflow
 36void exampleWorkflow(cudensitymatHandle_t handle)
 37{
 38  // Define the composite Hilbert space shape and
 39  // quantum state batch size (number of individual quantum states in a batched simulation)
 40  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 41  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 42
 43  if (verbose) {
 44    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 45    for (const auto & dimsn: spaceShape)
 46      std::cout << dimsn << ",";
 47    std::cout << ")" << std::endl;
 48    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 49  }
 50
 51  // Construct a user-defined Liouvillian operator using a convenience C++ class
 52  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 53  if (verbose)
 54    std::cout << "Constructed the Liouvillian operator\n";
 55
 56  // Set and place external user-provided Hamiltonian parameters in GPU memory
 57  const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
 58  std::vector<double> cpuHamParams(numParams * batchSize);
 59  for (int64_t j = 0; j < batchSize; ++j) {
 60    for (int32_t i = 0; i < numParams; ++i) {
 61      cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
 62    }
 63  }
 64  auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
 65
 66  // Declare the input quantum state
 67  cudensitymatState_t inputState;
 68  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 69                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 70                      spaceShape.size(),
 71                      spaceShape.data(),
 72                      batchSize,
 73                      dataType,
 74                      &inputState));
 75
 76  // Query the size of the quantum state storage
 77  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 78  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 79                      inputState,
 80                      1,               // only one storage component (tensor)
 81                      &storageSize));  // storage size in bytes
 82  const std::size_t stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
 83  if (verbose)
 84    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 85
 86  // Prepare some initial value for the input quantum state batch
 87  std::vector<NumericalType> inputStateValue(stateVolume);
 88  if constexpr (std::is_same_v<NumericalType, float>) {
 89    for (std::size_t i = 0; i < stateVolume; ++i) {
 90      inputStateValue[i] = 1.0f / float(i+1); // just some value
 91    }
 92  } else if constexpr (std::is_same_v<NumericalType, double>) {
 93    for (std::size_t i = 0; i < stateVolume; ++i) {
 94      inputStateValue[i] = 1.0 / double(i+1); // just some value
 95    }
 96  } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
 97    for (std::size_t i = 0; i < stateVolume; ++i) {
 98      inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
 99    }
100  } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
101    for (std::size_t i = 0; i < stateVolume; ++i) {
102      inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
103    }
104  } else {
105    std::cerr << "Error: Unsupported data type!\n";
106    std::exit(1);
107  }
108  // Allocate initialized GPU storage for the input quantum state with prepared values
109  auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
110  if (verbose)
111    std::cout << "Allocated input quantum state storage and initialized it to some value\n";
112
113  // Attach initialized GPU storage to the input quantum state
114  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
115                      inputState,
116                      1,                                                 // only one storage component (tensor)
117                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
118                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
119  if (verbose)
120    std::cout << "Constructed input quantum state\n";
121
122  // Declare the output quantum state of the same shape
123  cudensitymatState_t outputState;
124  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
125                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
126                      spaceShape.size(),
127                      spaceShape.data(),
128                      batchSize,
129                      dataType,
130                      &outputState));
131
132  // Allocate initialized GPU storage for the output quantum state
133  auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
134  if (verbose)
135    std::cout << "Allocated output quantum state storage\n";
136
137  // Attach GPU storage to the output quantum state
138  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
139                      outputState,
140                      1,                                                 // only one storage component (tensor)
141                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
142                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
143  if (verbose)
144    std::cout << "Constructed output quantum state\n";
145
146  // Declare a workspace descriptor
147  cudensitymatWorkspaceDescriptor_t workspaceDescr;
148  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
149
150  // Query free GPU memory
151  std::size_t freeMem = 0, totalMem = 0;
152  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
153  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
154  if (verbose)
155    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
156
157  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
158  const auto startTime = std::chrono::high_resolution_clock::now();
159  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
160                      liouvillian.get(),
161                      inputState,
162                      outputState,
163                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
164                      freeMem,                   // max available GPU free memory for the workspace
165                      workspaceDescr,            // workspace descriptor
166                      0x0));                     // default CUDA stream
167  const auto finishTime = std::chrono::high_resolution_clock::now();
168  const std::chrono::duration<double> timeSec = finishTime - startTime;
169  if (verbose)
170    std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
171
172  // Query the required workspace buffer size (bytes)
173  std::size_t requiredBufferSize {0};
174  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
175                      workspaceDescr,
176                      CUDENSITYMAT_MEMSPACE_DEVICE,
177                      CUDENSITYMAT_WORKSPACE_SCRATCH,
178                      &requiredBufferSize));
179  if (verbose)
180    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
181
182  // Allocate GPU storage for the workspace buffer
183  const std::size_t bufferVolume = requiredBufferSize / sizeof(NumericalType);
184  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
185  if (verbose)
186    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
187
188  // Attach the workspace buffer to the workspace descriptor
189  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
190                      workspaceDescr,
191                      CUDENSITYMAT_MEMSPACE_DEVICE,
192                      CUDENSITYMAT_WORKSPACE_SCRATCH,
193                      workspaceBuffer,
194                      requiredBufferSize));
195  if (verbose)
196    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
197
198  // Apply the Liouvillian operator to the input quatum state
199  // and accumulate its action into the output quantum state (note accumulative += semantics)
200  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
201    // Zero out the output quantum state
202    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
203                        outputState,
204                        0x0));
205    if (verbose)
206      std::cout << "Initialized the output quantum state to zero\n";
207    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
208    const auto startTime = std::chrono::high_resolution_clock::now();
209    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
210                        liouvillian.get(),
211                        0.3,                                   // time point (some value)
212                        batchSize,                             // user-defined batch size
213                        numParams,                             // number of external user-defined Hamiltonian parameters
214                        hamiltonianParams,                     // external Hamiltonian parameters in GPU memory
215                        inputState,                            // input quantum state
216                        outputState,                           // output quantum state
217                        workspaceDescr,                        // workspace descriptor
218                        0x0));                                 // default CUDA stream
219    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
220    const auto finishTime = std::chrono::high_resolution_clock::now();
221    const std::chrono::duration<double> timeSec = finishTime - startTime;
222    if (verbose)
223      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
224  }
225
226  // Compute the squared norm of the output quantum state
227  void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
228  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
229                      outputState,
230                      norm2,
231                      0x0));
232  if (verbose) {
233    std::cout << "Computed the output quantum state norm:\n";
234    printArrayGPU<double>(norm2, batchSize);
235  }
236
237  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
238
239  // Destroy the norm2 array
240  destroyArrayGPU(norm2);
241
242  // Destroy workspace descriptor
243  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
244
245  // Destroy workspace buffer storage
246  destroyArrayGPU(workspaceBuffer);
247
248  // Destroy quantum states
249  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
250  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
251
252  // Destroy quantum state storage
253  destroyArrayGPU(outputStateElems);
254  destroyArrayGPU(inputStateElems);
255
256  // Destroy external Hamiltonian parameters
257  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
258
259  if (verbose)
260    std::cout << "Destroyed resources\n" << std::flush;
261}
262
263
264int main(int argc, char ** argv)
265{
266  // Initialize MPI library (if needed)
267#ifdef MPI_ENABLED
268  HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
269  int procRank {-1};
270  HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
271  int numProcs {0};
272  HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
273  if (procRank != 0) verbose = false;
274  if (verbose)
275    std::cout << "Initialized MPI library\n";
276#else
277  const int procRank {0};
278  const int numProcs {1};
279#endif
280
281  // Assign a GPU to the process
282  int numDevices {0};
283  HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
284  const int deviceId = procRank % numDevices;
285  HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
286  if (verbose)
287    std::cout << "Set active device\n";
288
289  // Create a library handle
290  cudensitymatHandle_t handle;
291  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
292  if (verbose)
293    std::cout << "Created a library handle\n";
294
295  // Reset distributed configuration (once)
296#ifdef MPI_ENABLED
297  MPI_Comm comm;
298  HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
299  HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
300                      CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
301                      &comm, sizeof(comm)));
302#endif
303
304  // Run the example
305  exampleWorkflow(handle);
306
307  // Synchronize MPI processes
308#ifdef MPI_ENABLED
309  HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
310#endif
311
312  // Destroy the library handle
313  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
314  if (verbose)
315    std::cout << "Destroyed the library handle\n";
316
317  HANDLE_CUDA_ERROR(cudaDeviceReset());
318
319  // Finalize the MPI library
320#ifdef MPI_ENABLED
321  HANDLE_MPI_ERROR(MPI_Finalize());
322  if (verbose)
323    std::cout << "Finalized MPI library\n";
324#endif
325
326  // Done
327  return 0;
328}

Code example (serial execution with backward differentiation)

The following code example illustrates how to use the cuDensityMat library to not only compute the action of a quantum many-body operator on a quantum state, but also backward-differentiate it (compute gradients) with respect to user-provided real parameters parameterizing the operator (one real parameter Omega in this example). The full sample code can be found in the NVIDIA/cuQuantum repository (main serial gradient code and operator definition for gradient as well as the utility code).

First let’s construct a specific quantum many-body operator which, in this case, is a slightly modified version of the quantum many-body operator used in main serial code. Here we make both the h(t) and f(t) scalar coefficients depend on time and a single user-provided real parameter Omega via different (made-up) functional forms. In order to backward-differentiate the operator action with respect to this single user-provided real parameter Omega, we need to manually define a gradient callback function for each regular callback function we have (for h(t) and f(t) in this example). In our example, we define two CPU-side scalar gradient callback functions which compute the vector-jacobian product (VJP) of the scalar adjoint of h(t) and f(t) with respect to the user-provided real parameter Omega, respectively. A gradient callback function is expected to accumulate the VJP result into the paramsGrad output array, the final value of which will be the gradient(s) of the user-defined cost function with respect to the user-provided real parameters parameterizing the operator. As before, all regular and gradient callback functions used in our example explicitly expect the data type to be CUDA_C_64F (double-precision complex numbers). Updating the data type of the operator and quantum states would require updating the callback functions accordingly (CUDA_C_32F).

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"      // GPU helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* DESCRIPTION:
 19   Time-dependent transverse-field Ising Hamiltonian operator
 20   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
 21    H = sum_{i} {h_i(t) * X_i}             // transverse field sum of X_i operators with time-dependent h_i(t) coefficients 
 22      + f(t) * sum_{i < j} {g_ij * ZZ_ij}  // modulated sum of the fused ordered {Z_i * Z_j} terms with static g_ij coefficients
 23      + d * sum_{i} {Y_i * {..} * Y_i}     // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
 24   where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
 25*/
 26
 27/** Define the numerical type and data type for the GPU computations (same) */
 28using NumericalType = std::complex<double>;  // do not change
 29constexpr cudaDataType_t dataType = CUDA_C_64F;        // do not change
 30
 31
 32/** Example of a user-provided scalar CPU callback C function
 33 *  defining a time-dependent coefficient h_i(t) inside the Hamiltonian:
 34 *  h_i(t) = exp(-Omega * t)
 35 */
 36extern "C"
 37int32_t hCoefComplex64(
 38  double time,             //in: time point
 39  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 40  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
 41  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
 42  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
 43  void * scalarStorage,    //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
 44  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 45{
 46  if (dataType == CUDA_C_64F) {
 47    auto * tdCoef = static_cast<cuDoubleComplex *>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
 48    for (int64_t i = 0; i < batchSize; ++i) {
 49      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
 50      tdCoef[i] = make_cuDoubleComplex(std::exp((-omega) * time), 0.0); // value of the i-th instance of the coefficients batch
 51    }
 52  } else {
 53    return 1; // error code (1: Error)
 54  }
 55  return 0; // error code (0: Success)
 56}
 57
 58
 59/** User-provided gradient callback function for the user-provided
 60 *  scalar callback function with respect to its single parameter Omega.
 61 *  It accumulates a partial derivative 2*Re(dCost/dOmega) = 2*Re(dCost/dCoef * dCoef/dOmega),
 62 *  where:
 63 *  - Cost is some user-defined real scalar cost function,
 64 *  - dCost/dCoef is the adjoint of the cost function with respect to the coefficient (or their batch) associated with the callback function,
 65 *  - dCoef/dOmega is the gradient of the coefficient (or their batch) with respect to the parameter Omega:
 66 *    dCoef/dOmega = -time * exp(-Omega * time)
 67 */
 68extern "C"
 69int32_t hCoefGradComplex64(
 70  double time,             //in: time point
 71  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 72  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
 73  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
 74  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
 75  void * scalarGrad,       //inout: CPU-accessible storage for the adjoint(s) of the coefficient(s) of shape [0:batchSize-1]
 76  double * paramsGrad,     //inout: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of the returned gradient(s) of the parameter(s)
 77  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 78{
 79  if (dataType == CUDA_C_64F) {
 80    const auto * tdCoefAdjoint = static_cast<const cuDoubleComplex *>(scalarGrad); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
 81    for (int64_t i = 0; i < batchSize; ++i) {
 82      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
 83      paramsGrad[i * numParams + 0] += // IMPORTANT: Accumulate the partial derivative for the i-th instance of the batch, not overwrite it!
 84        2.0 * cuCreal(cuCmul(tdCoefAdjoint[i], make_cuDoubleComplex(std::exp((-omega) * time) * (-time), 0.0)));
 85    }
 86  } else {
 87    return 1; // error code (1: Error)
 88  }
 89  return 0; // error code (0: Success)
 90}
 91
 92
 93/** Example of a user-provided scalar CPU callback C function
 94 *  defining a time-dependent coefficient f(t) inside the Hamiltonian:
 95 *  f(t) = exp(i * Omega * t) = cos(Omega * t) + i * sin(Omega * t)
 96 */
 97extern "C"
 98int32_t fCoefComplex64(
 99  double time,             //in: time point
100  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
101  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
102  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
103  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
104  void * scalarStorage,    //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
105  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
106{
107  if (dataType == CUDA_C_64F) {
108    auto * tdCoef = static_cast<cuDoubleComplex *>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
109    for (int64_t i = 0; i < batchSize; ++i) {
110      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
111      tdCoef[i] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time)); // value of the i-th instance of the coefficients batch
112    }
113  } else {
114    return 1; // error code (1: Error)
115  }
116  return 0; // error code (0: Success)
117}
118
119
120/** User-provided gradient callback function for the user-provided
121 *  scalar callback function with respect to its single parameter Omega.
122 *  It accumulates a partial derivative 2*Re(dCost/dOmega) = 2*Re(dCost/dCoef * dCoef/dOmega),
123 *  where:
124 *  - Cost is some user-defined real scalar cost function,
125 *  - dCost/dCoef is the adjoint of the cost function with respect to the coefficient associated with the callback function,
126 *  - dCoef/dOmega is the gradient of the coefficient with respect to the parameter Omega:
127 *    dCoef/dOmega = -i * time * exp(i * Omega * time)
128 *                 = -i * time * (cos(Omega * time) + i * sin(Omega * time)) =
129 *                 = -time * sin(Omega * time) + i * time * cos(Omega * time)
130 */
131extern "C"
132int32_t fCoefGradComplex64(
133  double time,             //in: time point
134  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
135  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
136  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
137  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
138  void * scalarGrad,       //inout: CPU-accessible storage for the adjoint(s) of the coefficient(s) of shape [0:batchSize-1]
139  double * paramsGrad,     //inout: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of the returned gradient(s) of the parameter(s)
140  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
141{
142  if (dataType == CUDA_C_64F) {
143    const auto * tdCoefAdjoint = static_cast<const cuDoubleComplex *>(scalarGrad); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
144    for (int64_t i = 0; i < batchSize; ++i) {
145      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
146      paramsGrad[i * numParams + 0] += // IMPORTANT: Accumulate the partial derivative for the i-th instance of the batch, not overwrite it!
147        2.0 * cuCreal(cuCmul(tdCoefAdjoint[i], make_cuDoubleComplex(-std::sin(omega * time) * time, std::cos(omega * time) * time)));
148    }
149  } else {
150    return 1; // error code (1: Error)
151  }
152  return 0; // error code (0: Success)
153}
154
155
156/** Convenience class which encapsulates a user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
157 *  - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
158 *  - Method `get()` returns a reference to the constructed Liouvillian operator
159 *  - Destructor releases all resources used by the Liouvillian operator
160 */
161class UserDefinedLiouvillian final
162{
163private:
164  // Data members
165  cudensitymatHandle_t handle;             // library context handle
166  int64_t stateBatchSize;                  // quantum state batch size
167  const std::vector<int64_t> spaceShape;   // Hilbert space shape (extents of the modes of the composite Hilbert space)
168  void * spinXelems {nullptr};             // elements of the X spin operator in GPU RAM (F-order storage)
169  void * spinYYelems {nullptr};            // elements of the fused YY two-spin operator in GPU RAM (F-order storage)
170  void * spinZZelems {nullptr};            // elements of the fused ZZ two-spin operator in GPU RAM (F-order storage)
171  cudensitymatElementaryOperator_t spinX;  // X spin operator (elementary tensor operator)
172  cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator (elementary tensor operator)
173  cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator (elementary tensor operator)
174  cudensitymatOperatorTerm_t oneBodyTerm;  // operator term: H1 = sum_{i} {h_i(t) * X_i} (one-body term)
175  cudensitymatOperatorTerm_t twoBodyTerm;  // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij} (two-body term)
176  cudensitymatOperatorTerm_t noiseTerm;    // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix (two-body mixed term)
177  cudensitymatOperator_t liouvillian;      // full operator: (-i * (H1 + H2) * {..}) + (i * {..} * (H1 + H2)) + D1{..} (super-operator)
178
179public:
180
181  // Constructor constructs a user-defined Liouvillian operator
182  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,             // library context handle
183                         const std::vector<int64_t> & hilbertSpaceShape, // Hilbert space shape
184                         int64_t batchSize):                             // batch size for the quantum state
185    handle(contextHandle), stateBatchSize(batchSize), spaceShape(hilbertSpaceShape)
186  {
187    // Define the necessary operator tensors in GPU memory (F-order storage!)
188    spinXelems = createInitializeArrayGPU<NumericalType>(  // X[i0; j0]
189                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
190                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
191
192    spinYYelems = createInitializeArrayGPU<NumericalType>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
193                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
194                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
195                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
196                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
197
198    spinZZelems = createInitializeArrayGPU<NumericalType>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
199                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
200                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
201                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
202                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
203
204    // Construct the necessary Elementary Tensor Operators
205    //  X_i operator
206    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
207                        1,                                   // one-body operator
208                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
209                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
210                        0,                                   // 0 for dense tensors
211                        nullptr,                             // nullptr for dense tensors
212                        dataType,                            // data type
213                        spinXelems,                          // tensor elements in GPU memory
214                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
215                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
216                        &spinX));                            // the created elementary tensor operator
217    //  ZZ_ij = Z_i * Z_j fused operator
218    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
219                        2,                                   // two-body operator
220                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
221                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
222                        0,                                   // 0 for dense tensors
223                        nullptr,                             // nullptr for dense tensors
224                        dataType,                            // data type
225                        spinZZelems,                         // tensor elements in GPU memory
226                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
227                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
228                        &spinZZ));                           // the created elementary tensor operator
229    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
230    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
231                        2,                                   // two-body operator
232                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
233                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
234                        0,                                   // 0 for dense tensors
235                        nullptr,                             // nullptr for dense tensors
236                        dataType,                            // data type
237                        spinYYelems,                         // tensor elements in GPU memory
238                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
239                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
240                        &spinYY));                           // the created elementary tensor operator
241
242    // Construct the necessary Operator Terms from tensor products of Elementary Tensor Operators
243    //  Create an empty operator term
244    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
245                        spaceShape.size(),                   // Hilbert space rank (number of modes)
246                        spaceShape.data(),                   // Hilbert space shape (mode extents)
247                        &oneBodyTerm));                      // the created empty operator term
248    //  Define the operator term: H1 = sum_{i} {h_i(t) * X_i}
249    for (int32_t i = 0; i < spaceShape.size(); ++i) {
250      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
251                          oneBodyTerm,
252                          1,                                                             // number of elementary tensor operators in the product
253                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
254                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
255                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
256                          make_cuDoubleComplex(1.0, 0.0),                                // static coefficient part: Always 64-bit-precision complex number
257                          {hCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr},   // CPU scalar callback function defining the time-dependent coefficient associated with this operator product
258                          {hCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
259                           CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
260    }
261    //  Create an empty operator term
262    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
263                        spaceShape.size(),                   // Hilbert space rank (number of modes)
264                        spaceShape.data(),                   // Hilbert space shape (mode extents)
265                        &twoBodyTerm));                      // the created empty operator term
266    //  Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
267    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
268      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
269        const double g_ij = -1.0 / static_cast<double>(i + j + 1); // assign some value to the time-independent g_ij coefficient
270        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
271                            twoBodyTerm,
272                            1,                                                              // number of elementary tensor operators in the product
273                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
274                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
275                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
276                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij static coefficient: Always 64-bit-precision complex number
277                            cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
278                            cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
279      }
280    }
281    //  Create an empty operator term
282    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
283                        spaceShape.size(),                   // Hilbert space rank (number of modes)
284                        spaceShape.data(),                   // Hilbert space shape (mode extents)
285                        &noiseTerm));                        // the created empty operator term
286    //  Define the operator term: D1 = d * sum_{i} {YY_ii}
287    for (int32_t i = 0; i < spaceShape.size(); ++i) {
288      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
289                          noiseTerm,
290                          1,                                                              // number of elementary tensor operators in the product
291                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
292                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
293                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
294                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
295                          cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
296                          cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
297    }
298
299    // Construct the full Liouvillian operator as a sum of the operator terms
300    //  Create an empty operator (super-operator)
301    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
302                        spaceShape.size(),                // Hilbert space rank (number of modes)
303                        spaceShape.data(),                // Hilbert space shape (modes extents)
304                        &liouvillian));                   // the created empty operator (super-operator)
305    //  Append an operator term to the operator (super-operator)
306    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
307                        liouvillian,
308                        oneBodyTerm,                      // appended operator term
309                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
310                        make_cuDoubleComplex(0.0, -1.0),  // -i constant
311                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
312                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
313    //  Append an operator term to the operator (super-operator)
314    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
315                        liouvillian,
316                        twoBodyTerm,                      // appended operator term
317                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
318                        make_cuDoubleComplex(0.0, -1.0),  // -i constant
319                        {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
320                        {fCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
321                         CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
322    //  Append an operator term to the operator (super-operator)
323    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
324                        liouvillian,
325                        oneBodyTerm,                      // appended operator term
326                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
327                        make_cuDoubleComplex(0.0, 1.0),   // +i constant
328                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
329                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
330    //  Append an operator term to the operator (super-operator)
331    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
332                        liouvillian,
333                        twoBodyTerm,                      // appended operator term
334                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
335                        make_cuDoubleComplex(0.0, 1.0),   // +i constant
336                        {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
337                        {fCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
338                         CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
339    //  Append an operator term to the operator (super-operator)
340    const double d = 1.0; // assign some value to the time-independent coefficient
341    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
342                        liouvillian,
343                        noiseTerm,                        // appended operator term
344                        0,                                // operator term action duality as a whole (no duality reversing in this case)
345                        make_cuDoubleComplex(d, 0.0),     // static coefficient associated with the operator term as a whole
346                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
347                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
348  }
349
350  // Destructor destructs the user-defined Liouvillian operator
351  ~UserDefinedLiouvillian()
352  {
353    // Destroy the Liouvillian operator
354    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
355
356    // Destroy operator terms
357    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
358    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
359    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
360
361    // Destroy elementary tensor operators
362    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
363    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
364    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
365
366    // Destroy operator tensors
367    destroyArrayGPU(spinYYelems);
368    destroyArrayGPU(spinZZelems);
369    destroyArrayGPU(spinXelems);
370  }
371
372  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
373  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
374  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
375  UserDefinedLiouvillian(UserDefinedLiouvillian &&) = delete;
376  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) = delete;
377
378  /** Returns the number of externally provided Hamiltonian parameters. */
379  int32_t getNumParameters() const
380  {
381    return 1; // one parameter Omega
382  }
383
384  /** Get access to the constructed Liouvillian operator. */
385  cudensitymatOperator_t & get()
386  {
387    return liouvillian;
388  }
389
390};

Now we can use the defined quantum many-body operator in our main code to compute its action on a mixed quantum state and then backward-differentiate it (compute gradients) with respect to the user-provided real parameter Omega. For the sake of simplicity, we pass a made-up adjoint of the output quantum state to the cudensitymatOperatorComputeActionBackwardDiff() call, which is just the output quantum state itself (in real scenarios, the adjoint of the output quantum state will depend on the user-chosen cost function and will be provided by the user). Upon completion of the cudensitymatOperatorComputeActionBackwardDiff() call, the paramsGrad output argument will contain the gradient of the user-defined cost function with respect to the user-provided real parameter Omega. Additionally, the backward-differentiation API call will also return the adjoint of the input quantum state for cases where the input quantum state implicitly depends on the user-provided real parameters (for example, cases where the input quantum state comes from a previous operator action step, which is typical for time-integration of quantum dynamics master equations). Note that both output arguments, namely paramsGrad and stateInAdj, are accumulative, i.e., they will be accumulated into (it is user’s responsibility to zero them out before the first call).

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy_grad.h" // user-defined Liouvillian operator example
 13
 14#include <cmath>
 15#include <complex>
 16#include <vector>
 17#include <chrono>
 18#include <iostream>
 19#include <cassert>
 20
 21
 22// Number of times to perform operator action on a quantum state
 23constexpr int NUM_REPEATS = 2;
 24
 25// Logging verbosity
 26bool verbose = true;
 27
 28
 29// Example workflow
 30void exampleWorkflow(cudensitymatHandle_t handle)
 31{
 32  // Define the composite Hilbert space shape and
 33  // quantum state batch size (number of individual quantum states in a batched simulation)
 34  const std::vector<int64_t> spaceShape({2,2,2,2}); // dimensions of quantum degrees of freedom
 35  const int64_t batchSize = 1;                      // number of quantum states per batch
 36
 37  if (verbose) {
 38    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 39    for (const auto & dimsn: spaceShape)
 40      std::cout << dimsn << ",";
 41    std::cout << ")" << std::endl;
 42    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 43  }
 44
 45  // Construct a user-defined Liouvillian operator using a convenience C++ class
 46  // Note that the constructed Liouvillian operator has some batched coefficients
 47  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 48  if (verbose)
 49    std::cout << "Constructed the Liouvillian operator\n";
 50
 51  // Set and place external user-provided Hamiltonian parameters in GPU memory
 52  const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
 53  if (verbose)
 54    std::cout << "Number of external user-provided Hamiltonian parameters = " << numParams << std::endl;
 55  std::vector<double> cpuHamParams(numParams * batchSize);
 56  for (int64_t j = 0; j < batchSize; ++j) {
 57    for (int32_t i = 0; i < numParams; ++i) {
 58      cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
 59    }
 60  }
 61  auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
 62  if (verbose)
 63    std::cout << "Created an array of external user-provided Hamiltonian parameters in GPU memory\n";
 64
 65  // Create an array of gradients for the user-provided Hamiltonian parameters in GPU memory
 66  std::vector<double> cpuHamParamsGrad(numParams * batchSize, 0.0);
 67  auto * hamiltonianParamsGrad = static_cast<double *>(createInitializeArrayGPU(cpuHamParamsGrad));
 68  if (verbose)
 69    std::cout << "Created an array of gradients for the external user-provided Hamiltonian parameters in GPU memory\n";
 70
 71  // Declare the input quantum state
 72  cudensitymatState_t inputState;
 73  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 74                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 75                      spaceShape.size(),
 76                      spaceShape.data(),
 77                      batchSize,
 78                      dataType,
 79                      &inputState));
 80
 81  // Query the size of the quantum state storage
 82  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 83  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 84                      inputState,
 85                      1,               // only one storage component (tensor)
 86                      &storageSize));  // storage size in bytes
 87  const std::size_t stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
 88  if (verbose)
 89    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 90
 91  // Prepare some initial value for the input quantum state batch
 92  std::vector<NumericalType> inputStateValue(stateVolume);
 93  if constexpr (std::is_same_v<NumericalType, float>) {
 94    for (std::size_t i = 0; i < stateVolume; ++i) {
 95      inputStateValue[i] = 1.0f / float(i+1); // just some value
 96    }
 97  } else if constexpr (std::is_same_v<NumericalType, double>) {
 98    for (std::size_t i = 0; i < stateVolume; ++i) {
 99      inputStateValue[i] = 1.0 / double(i+1); // just some value
100    }
101  } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
102    for (std::size_t i = 0; i < stateVolume; ++i) {
103      inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
104    }
105  } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
106    for (std::size_t i = 0; i < stateVolume; ++i) {
107      inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
108    }
109  } else {
110    std::cerr << "Error: Unsupported data type!\n";
111    std::exit(1);
112  }
113  // Allocate initialized GPU storage for the input quantum state with prepared values
114  auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
115  if (verbose)
116    std::cout << "Allocated input quantum state storage and initialized it to some value\n";
117
118  // Attach initialized GPU storage to the input quantum state
119  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
120                      inputState,
121                      1,                                                 // only one storage component (tensor)
122                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
123                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
124  if (verbose)
125    std::cout << "Constructed input quantum state\n";
126
127  // Declare the output quantum state of the same shape
128  cudensitymatState_t outputState;
129  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
130                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
131                      spaceShape.size(),
132                      spaceShape.data(),
133                      batchSize,
134                      dataType,
135                      &outputState));
136
137  // Allocate GPU storage for the output quantum state
138  auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
139  if (verbose)
140    std::cout << "Allocated output quantum state storage\n";
141
142  // Attach GPU storage to the output quantum state
143  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
144                      outputState,
145                      1,                                                 // only one storage component (tensor)
146                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
147                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
148  if (verbose)
149    std::cout << "Constructed output quantum state\n";
150
151  // Declare the adjoint input quantum state of the same shape
152  cudensitymatState_t inputStateAdj;
153  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
154                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
155                      spaceShape.size(),
156                      spaceShape.data(),
157                      batchSize,
158                      dataType,  // data type must match that of the operators created above
159                      &inputStateAdj));
160
161  // Allocate GPU storage for the adjoint input quantum state
162  auto * inputStateAdjElems = createArrayGPU<NumericalType>(stateVolume);
163  if (verbose)
164    std::cout << "Allocated adjoint input quantum state storage\n";
165
166  // Attach GPU storage to the adjoint input quantum state
167  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
168                      inputStateAdj,
169                      1,                                                 // only one storage component (tensor)
170                      std::vector<void*>({inputStateAdjElems}).data(),   // pointer to the GPU storage for the quantum state
171                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
172  if (verbose)
173    std::cout << "Constructed adjoint input quantum state\n";
174
175  // Declare a workspace descriptor
176  cudensitymatWorkspaceDescriptor_t workspaceDescr;
177  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
178
179  // Query free GPU memory
180  std::size_t freeMem = 0, totalMem = 0;
181  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
182  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.45); // take 45% of the free memory for the workspace buffer
183  if (verbose)
184    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
185
186  // Allocate GPU storage for the workspace buffer
187  const std::size_t bufferVolume = freeMem / sizeof(NumericalType);
188  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
189  if (verbose)
190    std::cout << "Allocated workspace buffer of size (bytes) = " << freeMem << std::endl;
191
192  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
193  auto startTime = std::chrono::high_resolution_clock::now();
194  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
195                      liouvillian.get(),
196                      inputState,
197                      outputState,
198                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
199                      freeMem,                   // max available GPU free memory for the workspace
200                      workspaceDescr,            // workspace descriptor
201                      0x0));                     // default CUDA stream
202  auto finishTime = std::chrono::high_resolution_clock::now();
203  std::chrono::duration<double> timeSec = finishTime - startTime;
204  if (verbose)
205    std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
206
207  // Query the required workspace buffer size (bytes)
208  std::size_t requiredBufferSize {0};
209  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
210                      workspaceDescr,
211                      CUDENSITYMAT_MEMSPACE_DEVICE,
212                      CUDENSITYMAT_WORKSPACE_SCRATCH,
213                      &requiredBufferSize));
214  if (verbose)
215    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
216
217  if (requiredBufferSize > freeMem) {
218    std::cerr << "Error: Required workspace buffer size is greater than the available GPU free memory!\n";
219    std::exit(1);
220  }
221
222  // Attach the workspace buffer to the workspace descriptor
223  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
224                      workspaceDescr,
225                      CUDENSITYMAT_MEMSPACE_DEVICE,
226                      CUDENSITYMAT_WORKSPACE_SCRATCH,
227                      workspaceBuffer,
228                      requiredBufferSize));
229  if (verbose)
230    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
231
232  // Apply the Liouvillian operator to the input quatum state
233  // and accumulate its action into the output quantum state (note the accumulative += semantics)
234  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
235    // Zero out the output quantum state
236    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
237                        outputState,
238                        0x0));
239    if (verbose)
240      std::cout << "Initialized the output quantum state to zero\n";
241    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
242    startTime = std::chrono::high_resolution_clock::now();
243    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
244                        liouvillian.get(),
245                        0.3,                // time point (some value)
246                        batchSize,          // user-defined batch size
247                        numParams,          // number of external user-defined Hamiltonian parameters
248                        hamiltonianParams,  // external Hamiltonian parameters in GPU memory
249                        inputState,         // input quantum state
250                        outputState,        // output quantum state
251                        workspaceDescr,     // workspace descriptor
252                        0x0));              // default CUDA stream
253    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
254    finishTime = std::chrono::high_resolution_clock::now();
255    timeSec = finishTime - startTime;
256    if (verbose)
257      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
258  }
259
260  // Compute the squared norm of the output quantum state
261  void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
262  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
263                      outputState,
264                      norm2,
265                      0x0));
266  if (verbose) {
267    std::cout << "Computed the output quantum state norm:\n";
268    printArrayGPU<double>(norm2, batchSize);
269  }
270
271  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
272
273  // Prepare the Liouvillian operator action backward differentiation (needs to be done only once)
274  startTime = std::chrono::high_resolution_clock::now();
275  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareActionBackwardDiff(handle,
276                      liouvillian.get(),
277                      inputState,
278                      outputState,               // adjoint output quantum state is always congruent to the output quantum state
279                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
280                      freeMem,                   // max available GPU free memory for the workspace buffer
281                      workspaceDescr,            // workspace descriptor
282                      0x0));                     // default CUDA stream
283  finishTime = std::chrono::high_resolution_clock::now();
284  timeSec = finishTime - startTime;
285  if (verbose)
286    std::cout << "Operator action backward differentiation preparation time (sec) = " << timeSec.count() << std::endl;
287
288  // Query the required workspace buffer size (bytes)
289  requiredBufferSize = 0;
290  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
291                      workspaceDescr,
292                      CUDENSITYMAT_MEMSPACE_DEVICE,
293                      CUDENSITYMAT_WORKSPACE_SCRATCH,
294                      &requiredBufferSize));
295  if (verbose)
296    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
297
298  if (requiredBufferSize > freeMem) {
299    std::cerr << "Error: Required workspace buffer size is greater than the available GPU free memory!\n";
300    std::exit(1);
301  }
302
303  // Attach the workspace buffer to the workspace descriptor
304  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
305                      workspaceDescr,
306                      CUDENSITYMAT_MEMSPACE_DEVICE,
307                      CUDENSITYMAT_WORKSPACE_SCRATCH,
308                      workspaceBuffer,
309                      requiredBufferSize));
310  if (verbose)
311    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
312
313  // Liouvillian operator action backward differentiation:
314  // The adjoint output quantum state, which is always congruent to the output quantum state,
315  // depends on the user-defined cost function, so here we simply pass the previously computed output quantum state.
316  // In real-life applications, the user will pass their adjoint output quantum state, computed for their cost function.
317  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
318    // Zero out the adjoint input quantum state and gradients
319    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
320                        inputStateAdj,
321                        0x0));
322    initializeArrayGPU(std::vector<double>(numParams * batchSize, 0.0), hamiltonianParamsGrad);
323    if (verbose)
324      std::cout << "Initialized the adjoint input quantum state and gradients to zero\n";
325    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
326    startTime = std::chrono::high_resolution_clock::now();
327    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeActionBackwardDiff(handle,
328                        liouvillian.get(),
329                        0.3,                    // time point (some value)
330                        batchSize,              // user-defined batch size
331                        numParams,              // number of external user-defined Hamiltonian parameters
332                        hamiltonianParams,      // external Hamiltonian parameters in GPU memory
333                        inputState,             // input quantum state
334                        outputState,            // adjoint output quantum state (here we just pass the previously computed output quantum state for simplicity)
335                        inputStateAdj,          // adjoint input quantum state
336                        hamiltonianParamsGrad,  // partial gradients with respect to the user-defined real parameters
337                        workspaceDescr,         // workspace descriptor
338                        0x0));                  // default CUDA stream
339    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
340    finishTime = std::chrono::high_resolution_clock::now();
341    timeSec = finishTime - startTime;
342    if (verbose)
343      std::cout << "Operator action backward differentiation computation time (sec) = " << timeSec.count() << std::endl;
344  }
345
346  // Compute the squared norm of the adjoint input quantum state
347  initializeArrayGPU(std::vector<double>(batchSize, 0.0), norm2);
348  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
349                      inputStateAdj,
350                      norm2,
351                      0x0));
352  if (verbose) {
353    std::cout << "Computed the adjoint input quantum state norm:\n";
354    printArrayGPU<double>(norm2, batchSize);
355    std::cout << "Hamiltonian parameters gradients:\n";
356    printArrayGPU<double>(hamiltonianParamsGrad, numParams * batchSize);
357  }
358
359  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
360
361  // Destroy the norm2 array
362  destroyArrayGPU(norm2);
363
364  // Destroy workspace descriptor
365  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
366
367  // Destroy workspace buffer storage
368  destroyArrayGPU(workspaceBuffer);
369
370  // Destroy quantum states
371  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputStateAdj));
372  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
373  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
374
375  // Destroy quantum state storage
376  destroyArrayGPU(inputStateAdjElems);
377  destroyArrayGPU(outputStateElems);
378  destroyArrayGPU(inputStateElems);
379
380  // Destroy external Hamiltonian parameters
381  destroyArrayGPU(static_cast<void *>(hamiltonianParamsGrad));
382  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
383
384  if (verbose)
385    std::cout << "Destroyed resources\n" << std::flush;
386}
387
388
389int main(int argc, char ** argv)
390{
391  // Assign a GPU to the process
392  HANDLE_CUDA_ERROR(cudaSetDevice(0));
393  if (verbose)
394    std::cout << "Set active device\n";
395
396  // Create a library handle
397  cudensitymatHandle_t handle;
398  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
399  if (verbose)
400    std::cout << "Created a library handle\n";
401
402  // Run the example
403  exampleWorkflow(handle);
404
405  // Destroy the library handle
406  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
407  if (verbose)
408    std::cout << "Destroyed the library handle\n";
409
410  HANDLE_CUDA_ERROR(cudaDeviceReset());
411
412  // Done
413  return 0;
414}

Useful tips

  • For debugging, one can set the environment variable CUDENSITYMAT_LOG_LEVEL=n. The level n = 0, 1, …, 5 corresponds to the logger level as described in the table below. The environment variable CUDENSITYMAT_LOG_FILE=<filepath> can be used to redirect the log output to a custom file at <filepath> instead of stdout.

Level

Summary

Long Description

0

Off

Logging is disabled (default)

1

Errors

Only errors will be logged

2

Performance Trace

API calls that launch CUDA kernels will log their parameters and important information

3

Performance Hints

Hints that can potentially improve the application’s performance

4

Heuristics Trace

Provides general information about the library execution, may contain details about heuristic status

5

API Trace

API calls will log their parameter and important information