Examples#

In this section, we show basic examples on how to define a quantum operator, quantum state, and then compute the action of the quantum operator on a quantum state, and, optionally, backward-differentiate the operator action (compute gradients) with respect to user-provided real parameters parameterizing the operator. We also show an example on how to compute the extreme eigenspectrum of a given operator. For clarity, the quantum operator for each example is defined inside a separate C++ header, specifically transverse_ising_full_fused.h, transverse_ising_full_fused_noisy.h and transverse_ising_full_fused_noisy_grad.h, where it is wrapped in a helper C++ class UserDefinedLiouvillian. We also provide a utility header helpers.h containing convenient GPU array creation/destruction, initialization, copying, and printing helper functions.

Building code#

Assuming cuQuantum has been extracted in CUQUANTUM_ROOT and cuTENSOR is in CUTENSOR_ROOT, we update the library path as follows:

export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib:${LD_LIBRARY_PATH}

A serial sample code discussed below (operator_action_example.cpp) can be built via the following command:

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib -lcudensitymat -lcutensornet -lcutensor -lcusolver -lcublas -lcurand -o operator_action_example

For static linking against the cuDensityMat library, use the following command:

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a ${CUQUANTUM_ROOT}/lib/libcutensornet_static.a -L${CUTENSOR_ROOT}/lib -lcutensor -lcusolver -lcublas -lcurand -o operator_action_example

In order to build a parallel (MPI) version of the example operator_action_mpi_example.cpp, one will need to have a CUDA-aware MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and then set the environment variable $CUDENSITYMAT_COMM_LIB to the path to the MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so. The MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so can be built inside the ${CUQUANTUM_ROOT}/distributed_interfaces folder by calling the build script provided there. In order to link the executable to a CUDA-aware MPI library, one will need to add -I${MPI_PATH}/include and -L${MPI_PATH}/lib -lmpi to the build command:

nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib -L${MPI_PATH}/lib -lcudensitymat -lcutensornet -lcutensor -lcusolver -lcublas -lcurand -lmpi -o operator_action_mpi_example

Warning

When running operator_action_mpi_example.cpp with a non-CUDA-aware MPI library, the program will crash.

Note

Depending on the installation of the cuQuantum SDK package, you may need to replace lib above by lib64, depending which folder name is used inside your cuQuantum SDK package.

Code example (serial execution on a single GPU)#

The following code example illustrates the common steps necessary to use the cuDensityMat library to compute the action of a quantum many-body operator on a quantum state. The full sample code can be found in the NVIDIA/cuQuantum repository (main serial code and operator definition as well as the utility code).

First let’s introduce a helper class to construct a specific quantum many-body operator, for example, the transverse field Ising Hamiltonian with fused ZZ terms and an additional noise term. Here we choose to make the f(t) coefficient depend on time and a single user-provided real parameter Omega. We use a CPU-side user-defined scalar callback function to define the dependence of the f(t) coefficient on time and the user-provided real parameter Omega. Note that inside the callback function definition, we explicitly expect the data type to be CUDA_C_64F (double-precision complex numbers), which applies to the scalar coefficient f(t) set by the callback function in-place.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"      // GPU helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* DESCRIPTION:
 19   Time-dependent transverse-field Ising Hamiltonian operator
 20   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
 21    H = sum_{i} {h_i * X_i}                // transverse field sum of X_i operators with static h_i coefficients 
 22      + f(t) * sum_{i < j} {g_ij * ZZ_ij}  // modulated sum of the fused ordered {Z_i * Z_j} terms with static g_ij coefficients
 23      + d * sum_{i} {Y_i * {..} * Y_i}     // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
 24   where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
 25*/
 26
 27/** Define the numerical type and data type for the GPU computations (same) */
 28using NumericalType = std::complex<double>;      // do not change
 29constexpr cudaDataType_t dataType = CUDA_C_64F;  // do not change
 30
 31
 32/** Example of a user-provided scalar CPU callback C function
 33 *  defining a time-dependent coefficient inside the Hamiltonian:
 34 *  f(t) = exp(i * Omega * t) = cos(Omega * t) + i * sin(Omega * t)
 35 */
 36extern "C"
 37int32_t fCoefComplex64(
 38  double time,             //in: time point
 39  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 40  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
 41  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
 42  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
 43  void * scalarStorage,    //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
 44  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 45{
 46  if (dataType == CUDA_C_64F) {
 47    auto * tdCoef = static_cast<cuDoubleComplex*>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
 48    for (int64_t i = 0; i < batchSize; ++i) {
 49      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
 50      tdCoef[i] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time)); // value of the i-th instance of the coefficients batch
 51    }
 52  } else {
 53    return 1; // error code (1: Error)
 54  }
 55  return 0; // error code (0: Success)
 56}
 57
 58
 59/** Convenience class which encapsulates a user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
 60 *  - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
 61 *  - Method `get()` returns a reference to the constructed Liouvillian operator
 62 *  - Destructor releases all resources used by the Liouvillian operator
 63 */
 64class UserDefinedLiouvillian final
 65{
 66private:
 67  // Data members
 68  cudensitymatHandle_t handle;             // library context handle
 69  int64_t stateBatchSize;                  // quantum state batch size
 70  const std::vector<int64_t> spaceShape;   // Hilbert space shape (extents of the modes of the composite Hilbert space)
 71  void * spinXelems {nullptr};             // elements of the X spin operator in GPU RAM (F-order storage)
 72  void * spinYYelems {nullptr};            // elements of the fused YY two-spin operator in GPU RAM (F-order storage)
 73  void * spinZZelems {nullptr};            // elements of the fused ZZ two-spin operator in GPU RAM (F-order storage)
 74  cudensitymatElementaryOperator_t spinX;  // X spin operator (elementary tensor operator)
 75  cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator (elementary tensor operator)
 76  cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator (elementary tensor operator)
 77  cudensitymatOperatorTerm_t oneBodyTerm;  // operator term: H1 = sum_{i} {h_i * X_i} (one-body term)
 78  cudensitymatOperatorTerm_t twoBodyTerm;  // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij} (two-body term)
 79  cudensitymatOperatorTerm_t noiseTerm;    // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix (two-body mixed term)
 80  cudensitymatOperator_t liouvillian;      // full operator: (-i * (H1 + H2) * {..}) + (i * {..} * (H1 + H2)) + D1{..} (super-operator)
 81
 82public:
 83
 84  // Constructor constructs a user-defined Liouvillian operator
 85  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,             // library context handle
 86                         const std::vector<int64_t> & hilbertSpaceShape, // Hilbert space shape
 87                         int64_t batchSize):                             // batch size for the quantum state
 88    handle(contextHandle), stateBatchSize(batchSize), spaceShape(hilbertSpaceShape)
 89  {
 90    // Define the necessary operator tensors in GPU memory (F-order storage!)
 91    spinXelems = createInitializeArrayGPU<NumericalType>(  // X[i0; j0]
 92                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
 93                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
 94
 95    spinYYelems = createInitializeArrayGPU<NumericalType>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
 96                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
 97                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
 98                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
 99                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
100
101    spinZZelems = createInitializeArrayGPU<NumericalType>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
102                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
103                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
104                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
105                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
106
107    // Construct the necessary Elementary Tensor Operators
108    //  X_i operator
109    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
110                        1,                                   // one-body operator
111                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
112                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
113                        0,                                   // 0 for dense tensors
114                        nullptr,                             // nullptr for dense tensors
115                        dataType,                            // data type
116                        spinXelems,                          // tensor elements in GPU memory
117                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
118                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
119                        &spinX));                            // the created elementary tensor operator
120    //  ZZ_ij = Z_i * Z_j fused operator
121    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
122                        2,                                   // two-body operator
123                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
124                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
125                        0,                                   // 0 for dense tensors
126                        nullptr,                             // nullptr for dense tensors
127                        dataType,                            // data type
128                        spinZZelems,                         // tensor elements in GPU memory
129                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
130                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
131                        &spinZZ));                           // the created elementary tensor operator
132    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
133    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
134                        2,                                   // two-body operator
135                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
136                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
137                        0,                                   // 0 for dense tensors
138                        nullptr,                             // nullptr for dense tensors
139                        dataType,                            // data type
140                        spinYYelems,                         // tensor elements in GPU memory
141                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
142                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
143                        &spinYY));                           // the created elementary tensor operator
144
145    // Construct the necessary Operator Terms from tensor products of Elementary Tensor Operators
146    //  Create an empty operator term
147    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
148                        spaceShape.size(),                   // Hilbert space rank (number of modes)
149                        spaceShape.data(),                   // Hilbert space shape (mode extents)
150                        &oneBodyTerm));                      // the created empty operator term
151    //  Define the operator term: H1 = sum_{i} {h_i * X_i}
152    for (int32_t i = 0; i < spaceShape.size(); ++i) {
153      const double h_i = 1.0 / static_cast<double>(i+1); // assign some value to the time-independent h_i coefficient
154      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
155                          oneBodyTerm,
156                          1,                                                             // number of elementary tensor operators in the product
157                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
158                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
159                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
160                          make_cuDoubleComplex(h_i, 0.0),                                // h_i constant coefficient: Always 64-bit-precision complex number
161                          cudensitymatScalarCallbackNone,                                // no time-dependent coefficient associated with this operator product
162                          cudensitymatScalarGradientCallbackNone));                      // no coefficient gradient associated with this operator product
163    }
164    //  Create an empty operator term
165    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
166                        spaceShape.size(),                   // Hilbert space rank (number of modes)
167                        spaceShape.data(),                   // Hilbert space shape (mode extents)
168                        &twoBodyTerm));                      // the created empty operator term
169    //  Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
170    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
171      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
172        const double g_ij = -1.0 / static_cast<double>(i + j + 1); // assign some value to the time-independent g_ij coefficient
173        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
174                            twoBodyTerm,
175                            1,                                                              // number of elementary tensor operators in the product
176                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
177                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
178                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
179                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij constant coefficient: Always 64-bit-precision complex number
180                            cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
181                            cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
182      }
183    }
184    //  Create an empty operator term
185    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
186                        spaceShape.size(),                   // Hilbert space rank (number of modes)
187                        spaceShape.data(),                   // Hilbert space shape (mode extents)
188                        &noiseTerm));                        // the created empty operator term
189    //  Define the operator term: D1 = d * sum_{i} {YY_ii}
190    for (int32_t i = 0; i < spaceShape.size(); ++i) {
191      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
192                          noiseTerm,
193                          1,                                                              // number of elementary tensor operators in the product
194                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
195                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
196                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
197                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
198                          cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
199                          cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
200    }
201
202    // Construct the full Liouvillian operator as a sum of the operator terms
203    //  Create an empty operator (super-operator)
204    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
205                        spaceShape.size(),                // Hilbert space rank (number of modes)
206                        spaceShape.data(),                // Hilbert space shape (modes extents)
207                        &liouvillian));                   // the created empty operator (super-operator)
208    //  Append an operator term to the operator (super-operator)
209    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
210                        liouvillian,
211                        oneBodyTerm,                      // appended operator term
212                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
213                        make_cuDoubleComplex(0.0, -1.0),  // -i constant
214                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
215                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
216    //  Append an operator term to the operator (super-operator)
217    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
218                        liouvillian,
219                        twoBodyTerm,                     // appended operator term
220                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
221                        make_cuDoubleComplex(0.0, -1.0), // -i constant
222                        {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
223                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator term as a whole
224    //  Append an operator term to the operator (super-operator)
225    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
226                        liouvillian,
227                        oneBodyTerm,                      // appended operator term
228                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
229                        make_cuDoubleComplex(0.0, 1.0),   // i constant
230                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
231                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
232    //  Append an operator term to the operator (super-operator)
233    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
234                        liouvillian,
235                        twoBodyTerm,                     // appended operator term
236                        1,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
237                        make_cuDoubleComplex(0.0, 1.0),  // i constant
238                        {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
239                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator term as a whole
240    //  Append an operator term to the operator (super-operator)
241    const double d = 0.42; // assign some value to the time-independent coefficient
242    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
243                        liouvillian,
244                        noiseTerm,                        // appended operator term
245                        0,                                // operator term action duality as a whole (no duality reversing in this case)
246                        make_cuDoubleComplex(d, 0.0),     // constant coefficient associated with the operator term as a whole
247                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
248                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
249  }
250
251  // Destructor destructs the user-defined Liouvillian operator
252  ~UserDefinedLiouvillian()
253  {
254    // Destroy the Liouvillian operator
255    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
256
257    // Destroy operator terms
258    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
259    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
260    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
261
262    // Destroy elementary tensor operators
263    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
264    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
265    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
266
267    // Destroy operator tensors
268    destroyArrayGPU(spinYYelems);
269    destroyArrayGPU(spinZZelems);
270    destroyArrayGPU(spinXelems);
271  }
272
273  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
274  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
275  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
276  UserDefinedLiouvillian(UserDefinedLiouvillian &&) = delete;
277  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) = delete;
278
279  /** Returns the number of externally provided Hamiltonian parameters. */
280  int32_t getNumParameters() const
281  {
282    return 1; // one parameter Omega
283  }
284
285  /** Get access to the constructed Liouvillian operator. */
286  cudensitymatOperator_t & get()
287  {
288    return liouvillian;
289  }
290
291};

Now we can use this parameterized quantum many-body operator in our main code to compute the action of the operator on a mixed quantum state (density matrix).

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14#include <cmath>
 15#include <complex>
 16#include <vector>
 17#include <chrono>
 18#include <iostream>
 19#include <cassert>
 20
 21
 22// Number of times to perform operator action on a quantum state
 23constexpr int NUM_REPEATS = 2;
 24
 25// Logging verbosity
 26bool verbose = true;
 27
 28
 29// Example workflow
 30void exampleWorkflow(cudensitymatHandle_t handle)
 31{
 32  // Define the composite Hilbert space shape and
 33  // quantum state batch size (number of individual quantum states in a batched simulation)
 34  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 35  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 36
 37  if (verbose) {
 38    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 39    for (const auto & dimsn: spaceShape)
 40      std::cout << dimsn << ",";
 41    std::cout << ")" << std::endl;
 42    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 43  }
 44
 45  // Construct a user-defined Liouvillian operator using a convenience C++ class
 46  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 47  if (verbose)
 48    std::cout << "Constructed the Liouvillian operator\n";
 49
 50  // Set and place external user-provided Hamiltonian parameters in GPU memory
 51  const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
 52  std::vector<double> cpuHamParams(numParams * batchSize);
 53  for (int64_t j = 0; j < batchSize; ++j) {
 54    for (int32_t i = 0; i < numParams; ++i) {
 55      cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
 56    }
 57  }
 58  auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
 59
 60  // Declare the input quantum state
 61  cudensitymatState_t inputState;
 62  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 63                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 64                      spaceShape.size(),
 65                      spaceShape.data(),
 66                      batchSize,
 67                      dataType,
 68                      &inputState));
 69
 70  // Query the size of the quantum state storage
 71  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 72  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 73                      inputState,
 74                      1,               // only one storage component (tensor)
 75                      &storageSize));  // storage size in bytes
 76  const std::size_t stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
 77  if (verbose)
 78    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 79
 80  // Prepare some initial value for the input quantum state batch
 81  std::vector<NumericalType> inputStateValue(stateVolume);
 82  if constexpr (std::is_same_v<NumericalType, float>) {
 83    for (std::size_t i = 0; i < stateVolume; ++i) {
 84      inputStateValue[i] = 1.0f / float(i+1); // just some value
 85    }
 86  } else if constexpr (std::is_same_v<NumericalType, double>) {
 87    for (std::size_t i = 0; i < stateVolume; ++i) {
 88      inputStateValue[i] = 1.0 / double(i+1); // just some value
 89    }
 90  } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
 91    for (std::size_t i = 0; i < stateVolume; ++i) {
 92      inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
 93    }
 94  } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
 95    for (std::size_t i = 0; i < stateVolume; ++i) {
 96      inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
 97    }
 98  } else {
 99    std::cerr << "Error: Unsupported data type!\n";
100    std::exit(1);
101  }
102  // Allocate initialized GPU storage for the input quantum state with prepared values
103  auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
104  if (verbose)
105    std::cout << "Allocated input quantum state storage and initialized it to some value\n";
106
107  // Attach initialized GPU storage to the input quantum state
108  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109                      inputState,
110                      1,                                                 // only one storage component (tensor)
111                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
112                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
113  if (verbose)
114    std::cout << "Constructed input quantum state\n";
115
116  // Declare the output quantum state of the same shape
117  cudensitymatState_t outputState;
118  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
119                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
120                      spaceShape.size(),
121                      spaceShape.data(),
122                      batchSize,
123                      dataType,
124                      &outputState));
125
126  // Allocate initialized GPU storage for the output quantum state
127  auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
128  if (verbose)
129    std::cout << "Allocated output quantum state storage\n";
130
131  // Attach GPU storage to the output quantum state
132  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
133                      outputState,
134                      1,                                                 // only one storage component (tensor)
135                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
136                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
137  if (verbose)
138    std::cout << "Constructed output quantum state\n";
139
140  // Declare a workspace descriptor
141  cudensitymatWorkspaceDescriptor_t workspaceDescr;
142  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
143
144  // Query free GPU memory
145  std::size_t freeMem = 0, totalMem = 0;
146  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
147  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
148  if (verbose)
149    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
150
151  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
152  const auto startTime = std::chrono::high_resolution_clock::now();
153  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
154                      liouvillian.get(),
155                      inputState,
156                      outputState,
157                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
158                      freeMem,                   // max available GPU free memory for the workspace
159                      workspaceDescr,            // workspace descriptor
160                      0x0));                     // default CUDA stream
161  const auto finishTime = std::chrono::high_resolution_clock::now();
162  const std::chrono::duration<double> timeSec = finishTime - startTime;
163  if (verbose)
164    std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
165
166  // Query the required workspace buffer size (bytes)
167  std::size_t requiredBufferSize {0};
168  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
169                      workspaceDescr,
170                      CUDENSITYMAT_MEMSPACE_DEVICE,
171                      CUDENSITYMAT_WORKSPACE_SCRATCH,
172                      &requiredBufferSize));
173  if (verbose)
174    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
175
176  // Allocate GPU storage for the workspace buffer
177  const std::size_t bufferVolume = requiredBufferSize / sizeof(NumericalType);
178  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
179  if (verbose)
180    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
181
182  // Attach the workspace buffer to the workspace descriptor
183  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
184                      workspaceDescr,
185                      CUDENSITYMAT_MEMSPACE_DEVICE,
186                      CUDENSITYMAT_WORKSPACE_SCRATCH,
187                      workspaceBuffer,
188                      requiredBufferSize));
189  if (verbose)
190    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
191
192  // Apply the Liouvillian operator to the input quatum state
193  // and accumulate its action into the output quantum state (note accumulative += semantics)
194  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
195    // Zero out the output quantum state
196    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
197                        outputState,
198                        0x0));
199    if (verbose)
200      std::cout << "Initialized the output quantum state to zero\n";
201    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
202    const auto startTime = std::chrono::high_resolution_clock::now();
203    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
204                        liouvillian.get(),
205                        0.3,                                   // time point (some value)
206                        batchSize,                             // user-defined batch size
207                        numParams,                             // number of external user-defined Hamiltonian parameters
208                        hamiltonianParams,                     // external Hamiltonian parameters in GPU memory
209                        inputState,                            // input quantum state
210                        outputState,                           // output quantum state
211                        workspaceDescr,                        // workspace descriptor
212                        0x0));                                 // default CUDA stream
213    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
214    const auto finishTime = std::chrono::high_resolution_clock::now();
215    const std::chrono::duration<double> timeSec = finishTime - startTime;
216    if (verbose)
217      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
218  }
219
220  // Compute the squared norm of the output quantum state
221  void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
222  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
223                      outputState,
224                      norm2,
225                      0x0));
226  if (verbose) {
227    std::cout << "Computed the output quantum state norm:\n";
228    printArrayGPU<double>(norm2, batchSize);
229  }
230
231  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
232
233  // Destroy the norm2 array
234  destroyArrayGPU(norm2);
235
236  // Destroy workspace descriptor
237  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
238
239  // Destroy workspace buffer storage
240  destroyArrayGPU(workspaceBuffer);
241
242  // Destroy quantum states
243  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
244  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
245
246  // Destroy quantum state storage
247  destroyArrayGPU(outputStateElems);
248  destroyArrayGPU(inputStateElems);
249
250  // Destroy external Hamiltonian parameters
251  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
252
253  if (verbose)
254    std::cout << "Destroyed resources\n" << std::flush;
255}
256
257
258int main(int argc, char ** argv)
259{
260  // Assign a GPU to the process
261  HANDLE_CUDA_ERROR(cudaSetDevice(0));
262  if (verbose)
263    std::cout << "Set active device\n";
264
265  // Create a library handle
266  cudensitymatHandle_t handle;
267  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
268  if (verbose)
269    std::cout << "Created a library handle\n";
270
271  // Run the example
272  exampleWorkflow(handle);
273
274  // Destroy the library handle
275  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
276  if (verbose)
277    std::cout << "Destroyed the library handle\n";
278
279  HANDLE_CUDA_ERROR(cudaDeviceReset());
280
281  // Done
282  return 0;
283}

Code example (parallel execution on multiple GPUs)#

It is straightforward to adapt the main serial code and enable parallel execution across multiple/many GPU devices (across multiple/many nodes). Two distributed communication backends are supported: MPI and NCCL (experimental).

MPI backend#

We will illustrate parallel execution with an example using the Message Passing Interface (MPI) as the communication layer. Below we show the minor additions that need to be made in order to enable distributed parallel execution without making any changes to the original serial source code.

The full sample code can be found in the NVIDIA/cuQuantum repository (main MPI code and operator definition as well as the utility code).

Here is the updated main code for multi-GPU runs using MPI.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15// MPI library (optional)
 16#ifdef MPI_ENABLED
 17#include <mpi.h>
 18#endif
 19
 20#include <cmath>
 21#include <complex>
 22#include <vector>
 23#include <chrono>
 24#include <iostream>
 25#include <cassert>
 26
 27
 28// Number of times to perform operator action on a quantum state
 29constexpr int NUM_REPEATS = 2;
 30
 31// Logging verbosity
 32bool verbose = true;
 33
 34
 35// Example workflow
 36void exampleWorkflow(cudensitymatHandle_t handle)
 37{
 38  // Define the composite Hilbert space shape and
 39  // quantum state batch size (number of individual quantum states in a batched simulation)
 40  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 41  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 42
 43  if (verbose) {
 44    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 45    for (const auto & dimsn: spaceShape)
 46      std::cout << dimsn << ",";
 47    std::cout << ")" << std::endl;
 48    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 49  }
 50
 51  // Construct a user-defined Liouvillian operator using a convenience C++ class
 52  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 53  if (verbose)
 54    std::cout << "Constructed the Liouvillian operator\n";
 55
 56  // Set and place external user-provided Hamiltonian parameters in GPU memory
 57  const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
 58  std::vector<double> cpuHamParams(numParams * batchSize);
 59  for (int64_t j = 0; j < batchSize; ++j) {
 60    for (int32_t i = 0; i < numParams; ++i) {
 61      cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
 62    }
 63  }
 64  auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
 65
 66  // Declare the input quantum state
 67  cudensitymatState_t inputState;
 68  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 69                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 70                      spaceShape.size(),
 71                      spaceShape.data(),
 72                      batchSize,
 73                      dataType,
 74                      &inputState));
 75
 76  // Query the size of the quantum state storage
 77  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 78  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 79                      inputState,
 80                      1,               // only one storage component (tensor)
 81                      &storageSize));  // storage size in bytes
 82  const std::size_t stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
 83  if (verbose)
 84    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 85
 86  // Prepare some initial value for the input quantum state batch
 87  std::vector<NumericalType> inputStateValue(stateVolume);
 88  if constexpr (std::is_same_v<NumericalType, float>) {
 89    for (std::size_t i = 0; i < stateVolume; ++i) {
 90      inputStateValue[i] = 1.0f / float(i+1); // just some value
 91    }
 92  } else if constexpr (std::is_same_v<NumericalType, double>) {
 93    for (std::size_t i = 0; i < stateVolume; ++i) {
 94      inputStateValue[i] = 1.0 / double(i+1); // just some value
 95    }
 96  } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
 97    for (std::size_t i = 0; i < stateVolume; ++i) {
 98      inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
 99    }
100  } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
101    for (std::size_t i = 0; i < stateVolume; ++i) {
102      inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
103    }
104  } else {
105    std::cerr << "Error: Unsupported data type!\n";
106    std::exit(1);
107  }
108  // Allocate initialized GPU storage for the input quantum state with prepared values
109  auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
110  if (verbose)
111    std::cout << "Allocated input quantum state storage and initialized it to some value\n";
112
113  // Attach initialized GPU storage to the input quantum state
114  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
115                      inputState,
116                      1,                                                 // only one storage component (tensor)
117                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
118                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
119  if (verbose)
120    std::cout << "Constructed input quantum state\n";
121
122  // Declare the output quantum state of the same shape
123  cudensitymatState_t outputState;
124  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
125                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
126                      spaceShape.size(),
127                      spaceShape.data(),
128                      batchSize,
129                      dataType,
130                      &outputState));
131
132  // Allocate initialized GPU storage for the output quantum state
133  auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
134  if (verbose)
135    std::cout << "Allocated output quantum state storage\n";
136
137  // Attach GPU storage to the output quantum state
138  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
139                      outputState,
140                      1,                                                 // only one storage component (tensor)
141                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
142                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
143  if (verbose)
144    std::cout << "Constructed output quantum state\n";
145
146  // Declare a workspace descriptor
147  cudensitymatWorkspaceDescriptor_t workspaceDescr;
148  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
149
150  // Query free GPU memory
151  std::size_t freeMem = 0, totalMem = 0;
152  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
153  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
154  if (verbose)
155    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
156
157  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
158  const auto startTime = std::chrono::high_resolution_clock::now();
159  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
160                      liouvillian.get(),
161                      inputState,
162                      outputState,
163                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
164                      freeMem,                   // max available GPU free memory for the workspace
165                      workspaceDescr,            // workspace descriptor
166                      0x0));                     // default CUDA stream
167  const auto finishTime = std::chrono::high_resolution_clock::now();
168  const std::chrono::duration<double> timeSec = finishTime - startTime;
169  if (verbose)
170    std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
171
172  // Query the required workspace buffer size (bytes)
173  std::size_t requiredBufferSize {0};
174  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
175                      workspaceDescr,
176                      CUDENSITYMAT_MEMSPACE_DEVICE,
177                      CUDENSITYMAT_WORKSPACE_SCRATCH,
178                      &requiredBufferSize));
179  if (verbose)
180    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
181
182  // Allocate GPU storage for the workspace buffer
183  const std::size_t bufferVolume = requiredBufferSize / sizeof(NumericalType);
184  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
185  if (verbose)
186    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
187
188  // Attach the workspace buffer to the workspace descriptor
189  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
190                      workspaceDescr,
191                      CUDENSITYMAT_MEMSPACE_DEVICE,
192                      CUDENSITYMAT_WORKSPACE_SCRATCH,
193                      workspaceBuffer,
194                      requiredBufferSize));
195  if (verbose)
196    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
197
198  // Apply the Liouvillian operator to the input quatum state
199  // and accumulate its action into the output quantum state (note accumulative += semantics)
200  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
201    // Zero out the output quantum state
202    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
203                        outputState,
204                        0x0));
205    if (verbose)
206      std::cout << "Initialized the output quantum state to zero\n";
207    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
208    const auto startTime = std::chrono::high_resolution_clock::now();
209    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
210                        liouvillian.get(),
211                        0.3,                                   // time point (some value)
212                        batchSize,                             // user-defined batch size
213                        numParams,                             // number of external user-defined Hamiltonian parameters
214                        hamiltonianParams,                     // external Hamiltonian parameters in GPU memory
215                        inputState,                            // input quantum state
216                        outputState,                           // output quantum state
217                        workspaceDescr,                        // workspace descriptor
218                        0x0));                                 // default CUDA stream
219    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
220    const auto finishTime = std::chrono::high_resolution_clock::now();
221    const std::chrono::duration<double> timeSec = finishTime - startTime;
222    if (verbose)
223      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
224  }
225
226  // Compute the squared norm of the output quantum state
227  void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
228  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
229                      outputState,
230                      norm2,
231                      0x0));
232  if (verbose) {
233    std::cout << "Computed the output quantum state norm:\n";
234    printArrayGPU<double>(norm2, batchSize);
235  }
236
237  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
238
239  // Destroy the norm2 array
240  destroyArrayGPU(norm2);
241
242  // Destroy workspace descriptor
243  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
244
245  // Destroy workspace buffer storage
246  destroyArrayGPU(workspaceBuffer);
247
248  // Destroy quantum states
249  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
250  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
251
252  // Destroy quantum state storage
253  destroyArrayGPU(outputStateElems);
254  destroyArrayGPU(inputStateElems);
255
256  // Destroy external Hamiltonian parameters
257  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
258
259  if (verbose)
260    std::cout << "Destroyed resources\n" << std::flush;
261}
262
263
264int main(int argc, char ** argv)
265{
266  // Initialize MPI library (if needed)
267#ifdef MPI_ENABLED
268  HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
269  int procRank {-1};
270  HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
271  int numProcs {0};
272  HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
273  if (procRank != 0) verbose = false;
274  if (verbose)
275    std::cout << "Initialized MPI library\n";
276#else
277  const int procRank {0};
278  const int numProcs {1};
279#endif
280
281  // Assign a GPU to the process
282  int numDevices {0};
283  HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
284  const int deviceId = procRank % numDevices;
285  HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
286  if (verbose)
287    std::cout << "Set active device\n";
288
289  // Create a library handle
290  cudensitymatHandle_t handle;
291  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
292  if (verbose)
293    std::cout << "Created a library handle\n";
294
295  // Reset distributed configuration (once)
296#ifdef MPI_ENABLED
297  MPI_Comm comm;
298  HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
299  HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
300                      CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
301                      &comm, sizeof(comm)));
302#endif
303
304  // Run the example
305  exampleWorkflow(handle);
306
307  // Synchronize MPI processes
308#ifdef MPI_ENABLED
309  HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
310#endif
311
312  // Destroy the library handle
313  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
314  if (verbose)
315    std::cout << "Destroyed the library handle\n";
316
317  HANDLE_CUDA_ERROR(cudaDeviceReset());
318
319  // Finalize the MPI library
320#ifdef MPI_ENABLED
321  HANDLE_MPI_ERROR(MPI_Finalize());
322  if (verbose)
323    std::cout << "Finalized MPI library\n";
324#endif
325
326  // Done
327  return 0;
328}

NCCL backend (experimental)#

NCCL (NVIDIA Collective Communications Library) can provide better performance for GPU-to-GPU communication, especially within a single node with NVLink connectivity. Below we show an example using NCCL as the communication layer. Note that the NCCL backend is currently experimental. MPI is used for process spawning and bootstrapping (e.g., broadcasting the ncclUniqueId), but all GPU-to-GPU communication uses NCCL.

The full sample code can be found in the NVIDIA/cuQuantum repository (main NCCL code and operator definition as well as the utility code).

Here is the main code for multi-GPU runs using NCCL.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15// NCCL library (required for this example)
 16#ifdef NCCL_ENABLED
 17#include <nccl.h>
 18#endif
 19
 20// MPI library (used for bootstrapping NCCL communicator)
 21#ifdef MPI_ENABLED
 22#include <mpi.h>
 23#endif
 24
 25#include <cmath>
 26#include <complex>
 27#include <vector>
 28#include <chrono>
 29#include <iostream>
 30#include <cassert>
 31
 32
 33// Number of times to perform operator action on a quantum state
 34constexpr int NUM_REPEATS = 2;
 35
 36// Logging verbosity
 37bool verbose = true;
 38
 39
 40#ifdef NCCL_ENABLED
 41// Error handling macro for NCCL
 42#define HANDLE_NCCL_ERROR(x)                                 \
 43{                                                            \
 44  const ncclResult_t err = x;                                \
 45  if (err != ncclSuccess)                                    \
 46  {                                                          \
 47    printf("NCCL Error: %s in line %d\n",                    \
 48           ncclGetErrorString(err), __LINE__);               \
 49    fflush(stdout);                                          \
 50    std::abort();                                            \
 51  }                                                          \
 52};
 53#endif
 54
 55
 56// Example workflow
 57void exampleWorkflow(cudensitymatHandle_t handle)
 58{
 59  // Define the composite Hilbert space shape and
 60  // quantum state batch size (number of individual quantum states in a batched simulation)
 61  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 62  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 63
 64  if (verbose) {
 65    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 66    for (const auto & dimsn: spaceShape)
 67      std::cout << dimsn << ",";
 68    std::cout << ")" << std::endl;
 69    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 70  }
 71
 72  // Construct a user-defined Liouvillian operator using a convenience C++ class
 73  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 74  if (verbose)
 75    std::cout << "Constructed the Liouvillian operator\n";
 76
 77  // Set and place external user-provided Hamiltonian parameters in GPU memory
 78  const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
 79  std::vector<double> cpuHamParams(numParams * batchSize);
 80  for (int64_t j = 0; j < batchSize; ++j) {
 81    for (int32_t i = 0; i < numParams; ++i) {
 82      cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
 83    }
 84  }
 85  auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
 86
 87  // Declare the input quantum state
 88  cudensitymatState_t inputState;
 89  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 90                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 91                      spaceShape.size(),
 92                      spaceShape.data(),
 93                      batchSize,
 94                      dataType,
 95                      &inputState));
 96
 97  // Query the size of the quantum state storage
 98  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 99  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
100                      inputState,
101                      1,               // only one storage component (tensor)
102                      &storageSize));  // storage size in bytes
103  const std::size_t stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
104  if (verbose)
105    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
106
107  // Prepare some initial value for the input quantum state batch
108  std::vector<NumericalType> inputStateValue(stateVolume);
109  if constexpr (std::is_same_v<NumericalType, float>) {
110    for (std::size_t i = 0; i < stateVolume; ++i) {
111      inputStateValue[i] = 1.0f / float(i+1); // just some value
112    }
113  } else if constexpr (std::is_same_v<NumericalType, double>) {
114    for (std::size_t i = 0; i < stateVolume; ++i) {
115      inputStateValue[i] = 1.0 / double(i+1); // just some value
116    }
117  } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
118    for (std::size_t i = 0; i < stateVolume; ++i) {
119      inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
120    }
121  } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
122    for (std::size_t i = 0; i < stateVolume; ++i) {
123      inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
124    }
125  } else {
126    std::cerr << "Error: Unsupported data type!\n";
127    std::exit(1);
128  }
129  // Allocate initialized GPU storage for the input quantum state with prepared values
130  auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
131  if (verbose)
132    std::cout << "Allocated input quantum state storage and initialized it to some value\n";
133
134  // Attach initialized GPU storage to the input quantum state
135  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
136                      inputState,
137                      1,                                                 // only one storage component (tensor)
138                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
139                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
140  if (verbose)
141    std::cout << "Constructed input quantum state\n";
142
143  // Declare the output quantum state of the same shape
144  cudensitymatState_t outputState;
145  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
146                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
147                      spaceShape.size(),
148                      spaceShape.data(),
149                      batchSize,
150                      dataType,
151                      &outputState));
152
153  // Allocate initialized GPU storage for the output quantum state
154  auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
155  if (verbose)
156    std::cout << "Allocated output quantum state storage\n";
157
158  // Attach GPU storage to the output quantum state
159  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
160                      outputState,
161                      1,                                                 // only one storage component (tensor)
162                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
163                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
164  if (verbose)
165    std::cout << "Constructed output quantum state\n";
166
167  // Declare a workspace descriptor
168  cudensitymatWorkspaceDescriptor_t workspaceDescr;
169  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
170
171  // Query free GPU memory
172  std::size_t freeMem = 0, totalMem = 0;
173  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
174  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
175  if (verbose)
176    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
177
178  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
179  const auto startTime = std::chrono::high_resolution_clock::now();
180  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
181                      liouvillian.get(),
182                      inputState,
183                      outputState,
184                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
185                      freeMem,                   // max available GPU free memory for the workspace
186                      workspaceDescr,            // workspace descriptor
187                      0x0));                     // default CUDA stream
188  const auto finishTime = std::chrono::high_resolution_clock::now();
189  const std::chrono::duration<double> timeSec = finishTime - startTime;
190  if (verbose)
191    std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
192
193  // Query the required workspace buffer size (bytes)
194  std::size_t requiredBufferSize {0};
195  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
196                      workspaceDescr,
197                      CUDENSITYMAT_MEMSPACE_DEVICE,
198                      CUDENSITYMAT_WORKSPACE_SCRATCH,
199                      &requiredBufferSize));
200  if (verbose)
201    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
202
203  // Allocate GPU storage for the workspace buffer
204  const std::size_t bufferVolume = requiredBufferSize / sizeof(NumericalType);
205  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
206  if (verbose)
207    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
208
209  // Attach the workspace buffer to the workspace descriptor
210  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
211                      workspaceDescr,
212                      CUDENSITYMAT_MEMSPACE_DEVICE,
213                      CUDENSITYMAT_WORKSPACE_SCRATCH,
214                      workspaceBuffer,
215                      requiredBufferSize));
216  if (verbose)
217    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
218
219  // Apply the Liouvillian operator to the input quatum state
220  // and accumulate its action into the output quantum state (note accumulative += semantics)
221  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
222    // Zero out the output quantum state
223    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
224                        outputState,
225                        0x0));
226    if (verbose)
227      std::cout << "Initialized the output quantum state to zero\n";
228    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
229    const auto startTime = std::chrono::high_resolution_clock::now();
230    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
231                        liouvillian.get(),
232                        0.3,                                   // time point (some value)
233                        batchSize,                             // user-defined batch size
234                        numParams,                             // number of external user-defined Hamiltonian parameters
235                        hamiltonianParams,                     // external Hamiltonian parameters in GPU memory
236                        inputState,                            // input quantum state
237                        outputState,                           // output quantum state
238                        workspaceDescr,                        // workspace descriptor
239                        0x0));                                 // default CUDA stream
240    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
241    const auto finishTime = std::chrono::high_resolution_clock::now();
242    const std::chrono::duration<double> timeSec = finishTime - startTime;
243    if (verbose)
244      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
245  }
246
247  // Compute the squared norm of the output quantum state
248  void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
249  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
250                      outputState,
251                      norm2,
252                      0x0));
253  if (verbose) {
254    std::cout << "Computed the output quantum state norm:\n";
255    printArrayGPU<double>(norm2, batchSize);
256  }
257
258  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
259
260  // Destroy the norm2 array
261  destroyArrayGPU(norm2);
262
263  // Destroy workspace descriptor
264  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
265
266  // Destroy workspace buffer storage
267  destroyArrayGPU(workspaceBuffer);
268
269  // Destroy quantum states
270  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
271  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
272
273  // Destroy quantum state storage
274  destroyArrayGPU(outputStateElems);
275  destroyArrayGPU(inputStateElems);
276
277  // Destroy external Hamiltonian parameters
278  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
279
280  if (verbose)
281    std::cout << "Destroyed resources\n" << std::flush;
282}
283
284
285int main(int argc, char ** argv)
286{
287#if defined(NCCL_ENABLED) && defined(MPI_ENABLED)
288  // Initialize MPI library (used to bootstrap NCCL)
289  HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
290  int procRank {-1};
291  HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
292  int numProcs {0};
293  HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
294  if (procRank != 0) verbose = false;
295  if (verbose)
296    std::cout << "Initialized MPI library (for NCCL bootstrap)\n";
297
298  // Assign a GPU to the process
299  int numDevices {0};
300  HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
301  const int deviceId = procRank % numDevices;
302  HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
303  if (verbose)
304    std::cout << "Set active device to GPU " << deviceId << "\n";
305
306  // Initialize NCCL communicator
307  // Step 1: Generate unique ID on rank 0 and broadcast to all ranks
308  ncclUniqueId ncclId;
309  if (procRank == 0) {
310    HANDLE_NCCL_ERROR(ncclGetUniqueId(&ncclId));
311  }
312  HANDLE_MPI_ERROR(MPI_Bcast(&ncclId, sizeof(ncclId), MPI_BYTE, 0, MPI_COMM_WORLD));
313
314  // Step 2: Initialize NCCL communicator with the shared unique ID
315  ncclComm_t ncclComm;
316  HANDLE_NCCL_ERROR(ncclCommInitRank(&ncclComm, numProcs, ncclId, procRank));
317  if (verbose)
318    std::cout << "Initialized NCCL communicator with " << numProcs << " ranks\n";
319
320  // Create a library handle
321  cudensitymatHandle_t handle;
322  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
323  if (verbose)
324    std::cout << "Created a library handle\n";
325
326  // Reset distributed configuration with NCCL communicator
327  // The barrier buffer is now managed internally by cuDensityMat
328  HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
329                      CUDENSITYMAT_DISTRIBUTED_PROVIDER_NCCL,
330                      &ncclComm, sizeof(ncclComm)));
331  if (verbose)
332    std::cout << "Configured distributed execution with NCCL\n";
333
334  // Run the example
335  exampleWorkflow(handle);
336
337  // Synchronize processes via NCCL barrier (uses allreduce internally)
338  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
339  HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
340
341  // Destroy the library handle
342  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
343  if (verbose)
344    std::cout << "Destroyed the library handle\n";
345
346  // Finalize NCCL communicator
347  HANDLE_NCCL_ERROR(ncclCommFinalize(ncclComm));
348  HANDLE_NCCL_ERROR(ncclCommDestroy(ncclComm));
349  if (verbose)
350    std::cout << "Finalized NCCL communicator\n";
351
352  HANDLE_CUDA_ERROR(cudaDeviceReset());
353
354  // Finalize the MPI library
355  HANDLE_MPI_ERROR(MPI_Finalize());
356  if (verbose)
357    std::cout << "Finalized MPI library\n";
358
359#else
360  // Fallback for when NCCL or MPI is not enabled
361  (void)argc;
362  (void)argv;
363  std::cerr << "This example requires both NCCL_ENABLED and MPI_ENABLED to be defined.\n";
364  std::cerr << "NCCL uses MPI for bootstrapping (sharing ncclUniqueId across processes).\n";
365  std::cerr << "Build with: -DENABLE_NCCL=TRUE -DENABLE_MPI=TRUE\n";
366  return 1;
367#endif
368
369  // Done
370  return 0;
371}

Code example (serial execution with backward differentiation)#

The following code example illustrates how to use the cuDensityMat library to not only compute the action of a quantum many-body operator on a quantum state, but also backward-differentiate it (compute gradients) with respect to user-provided real parameters parameterizing the operator (one real parameter Omega in this example). The full sample code can be found in the NVIDIA/cuQuantum repository (main serial gradient code and operator definition for gradient as well as the utility code).

First let’s construct a specific quantum many-body operator which, in this case, is a slightly modified version of the quantum many-body operator used in main serial code. Here we make both the h(t) and f(t) scalar coefficients depend on time and a single user-provided real parameter Omega via different (made-up) functional forms. In order to backward-differentiate the operator action with respect to this single user-provided real parameter Omega, we need to manually define a gradient callback function for each regular callback function we have (for h(t) and f(t) in this example). In our example, we define two CPU-side scalar gradient callback functions which compute the vector-jacobian product (VJP) of the scalar adjoint of h(t) and f(t) with respect to the user-provided real parameter Omega, respectively. A gradient callback function is expected to accumulate the VJP result into the paramsGrad output array, the final value of which will be the gradient(s) of the user-defined cost function with respect to the user-provided real parameters parameterizing the operator. As before, all regular and gradient callback functions used in our example explicitly expect the data type to be CUDA_C_64F (double-precision complex numbers).

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"      // GPU helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* DESCRIPTION:
 19   Time-dependent transverse-field Ising Hamiltonian operator
 20   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
 21    H = sum_{i} {h_i(t) * X_i}             // transverse field sum of X_i operators with time-dependent h_i(t) coefficients 
 22      + f(t) * sum_{i < j} {g_ij * ZZ_ij}  // modulated sum of the fused ordered {Z_i * Z_j} terms with static g_ij coefficients
 23      + d * sum_{i} {Y_i * {..} * Y_i}     // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
 24   where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
 25*/
 26
 27/** Define the numerical type and data type for the GPU computations (same) */
 28using NumericalType = std::complex<double>;      // do not change
 29constexpr cudaDataType_t dataType = CUDA_C_64F;  // do not change
 30
 31
 32/** Example of a user-provided scalar CPU callback C function
 33 *  defining a time-dependent coefficient h_i(t) inside the Hamiltonian:
 34 *  h_i(t) = exp(-Omega * t)
 35 */
 36extern "C"
 37int32_t hCoefComplex64(
 38  double time,             //in: time point
 39  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 40  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
 41  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
 42  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
 43  void * scalarStorage,    //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
 44  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 45{
 46  if (dataType == CUDA_C_64F) {
 47    auto * tdCoef = static_cast<cuDoubleComplex *>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
 48    for (int64_t i = 0; i < batchSize; ++i) {
 49      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
 50      tdCoef[i] = make_cuDoubleComplex(std::exp((-omega) * time), 0.0); // value of the i-th instance of the coefficients batch
 51    }
 52  } else {
 53    return 1; // error code (1: Error)
 54  }
 55  return 0; // error code (0: Success)
 56}
 57
 58
 59/** User-provided gradient callback function for the user-provided
 60 *  scalar callback function with respect to its single parameter Omega.
 61 *  It accumulates a partial derivative 2*Re(dCost/dOmega) = 2*Re(dCost/dCoef * dCoef/dOmega),
 62 *  where:
 63 *  - Cost is some user-defined real scalar cost function,
 64 *  - dCost/dCoef is the adjoint of the cost function with respect to the coefficient (or their batch) associated with the callback function,
 65 *  - dCoef/dOmega is the gradient of the coefficient (or their batch) with respect to the parameter Omega:
 66 *    dCoef/dOmega = -time * exp(-Omega * time)
 67 */
 68extern "C"
 69int32_t hCoefGradComplex64(
 70  double time,             //in: time point
 71  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 72  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
 73  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
 74  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
 75  void * scalarGrad,       //inout: CPU-accessible storage for the adjoint(s) of the coefficient(s) of shape [0:batchSize-1]
 76  double * paramsGrad,     //inout: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of the returned gradient(s) of the parameter(s)
 77  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 78{
 79  if (dataType == CUDA_C_64F) {
 80    const auto * tdCoefAdjoint = static_cast<const cuDoubleComplex *>(scalarGrad); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
 81    for (int64_t i = 0; i < batchSize; ++i) {
 82      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
 83      paramsGrad[i * numParams + 0] += // IMPORTANT: Accumulate the partial derivative for the i-th instance of the batch, not overwrite it!
 84        2.0 * cuCreal(cuCmul(tdCoefAdjoint[i], make_cuDoubleComplex(std::exp((-omega) * time) * (-time), 0.0)));
 85    }
 86  } else {
 87    return 1; // error code (1: Error)
 88  }
 89  return 0; // error code (0: Success)
 90}
 91
 92
 93/** Example of a user-provided scalar CPU callback C function
 94 *  defining a time-dependent coefficient f(t) inside the Hamiltonian:
 95 *  f(t) = exp(i * Omega * t) = cos(Omega * t) + i * sin(Omega * t)
 96 */
 97extern "C"
 98int32_t fCoefComplex64(
 99  double time,             //in: time point
100  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
101  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
102  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
103  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
104  void * scalarStorage,    //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
105  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
106{
107  if (dataType == CUDA_C_64F) {
108    auto * tdCoef = static_cast<cuDoubleComplex *>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
109    for (int64_t i = 0; i < batchSize; ++i) {
110      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
111      tdCoef[i] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time)); // value of the i-th instance of the coefficients batch
112    }
113  } else {
114    return 1; // error code (1: Error)
115  }
116  return 0; // error code (0: Success)
117}
118
119
120/** User-provided gradient callback function for the user-provided
121 *  scalar callback function with respect to its single parameter Omega.
122 *  It accumulates a partial derivative 2*Re(dCost/dOmega) = 2*Re(dCost/dCoef * dCoef/dOmega),
123 *  where:
124 *  - Cost is some user-defined real scalar cost function,
125 *  - dCost/dCoef is the adjoint of the cost function with respect to the coefficient associated with the callback function,
126 *  - dCoef/dOmega is the gradient of the coefficient with respect to the parameter Omega:
127 *    dCoef/dOmega = -i * time * exp(i * Omega * time)
128 *                 = -i * time * (cos(Omega * time) + i * sin(Omega * time)) =
129 *                 = -time * sin(Omega * time) + i * time * cos(Omega * time)
130 */
131extern "C"
132int32_t fCoefGradComplex64(
133  double time,             //in: time point
134  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
135  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
136  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
137  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
138  void * scalarGrad,       //inout: CPU-accessible storage for the adjoint(s) of the coefficient(s) of shape [0:batchSize-1]
139  double * paramsGrad,     //inout: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of the returned gradient(s) of the parameter(s)
140  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
141{
142  if (dataType == CUDA_C_64F) {
143    const auto * tdCoefAdjoint = static_cast<const cuDoubleComplex *>(scalarGrad); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
144    for (int64_t i = 0; i < batchSize; ++i) {
145      const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
146      paramsGrad[i * numParams + 0] += // IMPORTANT: Accumulate the partial derivative for the i-th instance of the batch, not overwrite it!
147        2.0 * cuCreal(cuCmul(tdCoefAdjoint[i], make_cuDoubleComplex(-std::sin(omega * time) * time, std::cos(omega * time) * time)));
148    }
149  } else {
150    return 1; // error code (1: Error)
151  }
152  return 0; // error code (0: Success)
153}
154
155
156/** Convenience class which encapsulates a user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
157 *  - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
158 *  - Method `get()` returns a reference to the constructed Liouvillian operator
159 *  - Destructor releases all resources used by the Liouvillian operator
160 */
161class UserDefinedLiouvillian final
162{
163private:
164  // Data members
165  cudensitymatHandle_t handle;             // library context handle
166  int64_t stateBatchSize;                  // quantum state batch size
167  const std::vector<int64_t> spaceShape;   // Hilbert space shape (extents of the modes of the composite Hilbert space)
168  void * spinXelems {nullptr};             // elements of the X spin operator in GPU RAM (F-order storage)
169  void * spinYYelems {nullptr};            // elements of the fused YY two-spin operator in GPU RAM (F-order storage)
170  void * spinZZelems {nullptr};            // elements of the fused ZZ two-spin operator in GPU RAM (F-order storage)
171  cudensitymatElementaryOperator_t spinX;  // X spin operator (elementary tensor operator)
172  cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator (elementary tensor operator)
173  cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator (elementary tensor operator)
174  cudensitymatOperatorTerm_t oneBodyTerm;  // operator term: H1 = sum_{i} {h_i(t) * X_i} (one-body term)
175  cudensitymatOperatorTerm_t twoBodyTerm;  // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij} (two-body term)
176  cudensitymatOperatorTerm_t noiseTerm;    // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix (two-body mixed term)
177  cudensitymatOperator_t liouvillian;      // full operator: (-i * (H1 + H2) * {..}) + (i * {..} * (H1 + H2)) + D1{..} (super-operator)
178
179public:
180
181  // Constructor constructs a user-defined Liouvillian operator
182  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,             // library context handle
183                         const std::vector<int64_t> & hilbertSpaceShape, // Hilbert space shape
184                         int64_t batchSize):                             // batch size for the quantum state
185    handle(contextHandle), stateBatchSize(batchSize), spaceShape(hilbertSpaceShape)
186  {
187    // Define the necessary operator tensors in GPU memory (F-order storage!)
188    spinXelems = createInitializeArrayGPU<NumericalType>(  // X[i0; j0]
189                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
190                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
191
192    spinYYelems = createInitializeArrayGPU<NumericalType>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
193                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
194                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
195                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
196                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
197
198    spinZZelems = createInitializeArrayGPU<NumericalType>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
199                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
200                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
201                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
202                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
203
204    // Construct the necessary Elementary Tensor Operators
205    //  X_i operator
206    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
207                        1,                                   // one-body operator
208                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
209                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
210                        0,                                   // 0 for dense tensors
211                        nullptr,                             // nullptr for dense tensors
212                        dataType,                            // data type
213                        spinXelems,                          // tensor elements in GPU memory
214                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
215                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
216                        &spinX));                            // the created elementary tensor operator
217    //  ZZ_ij = Z_i * Z_j fused operator
218    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
219                        2,                                   // two-body operator
220                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
221                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
222                        0,                                   // 0 for dense tensors
223                        nullptr,                             // nullptr for dense tensors
224                        dataType,                            // data type
225                        spinZZelems,                         // tensor elements in GPU memory
226                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
227                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
228                        &spinZZ));                           // the created elementary tensor operator
229    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
230    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
231                        2,                                   // two-body operator
232                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
233                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
234                        0,                                   // 0 for dense tensors
235                        nullptr,                             // nullptr for dense tensors
236                        dataType,                            // data type
237                        spinYYelems,                         // tensor elements in GPU memory
238                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
239                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
240                        &spinYY));                           // the created elementary tensor operator
241
242    // Construct the necessary Operator Terms from tensor products of Elementary Tensor Operators
243    //  Create an empty operator term
244    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
245                        spaceShape.size(),                   // Hilbert space rank (number of modes)
246                        spaceShape.data(),                   // Hilbert space shape (mode extents)
247                        &oneBodyTerm));                      // the created empty operator term
248    //  Define the operator term: H1 = sum_{i} {h_i(t) * X_i}
249    for (int32_t i = 0; i < spaceShape.size(); ++i) {
250      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
251                          oneBodyTerm,
252                          1,                                                             // number of elementary tensor operators in the product
253                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
254                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
255                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
256                          make_cuDoubleComplex(1.0, 0.0),                                // static coefficient part: Always 64-bit-precision complex number
257                          {hCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr},   // CPU scalar callback function defining the time-dependent coefficient associated with this operator product
258                          {hCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
259                           CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
260    }
261    //  Create an empty operator term
262    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
263                        spaceShape.size(),                   // Hilbert space rank (number of modes)
264                        spaceShape.data(),                   // Hilbert space shape (mode extents)
265                        &twoBodyTerm));                      // the created empty operator term
266    //  Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
267    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
268      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
269        const double g_ij = -1.0 / static_cast<double>(i + j + 1); // assign some value to the time-independent g_ij coefficient
270        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
271                            twoBodyTerm,
272                            1,                                                              // number of elementary tensor operators in the product
273                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
274                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
275                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
276                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij static coefficient: Always 64-bit-precision complex number
277                            cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
278                            cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
279      }
280    }
281    //  Create an empty operator term
282    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
283                        spaceShape.size(),                   // Hilbert space rank (number of modes)
284                        spaceShape.data(),                   // Hilbert space shape (mode extents)
285                        &noiseTerm));                        // the created empty operator term
286    //  Define the operator term: D1 = d * sum_{i} {YY_ii}
287    for (int32_t i = 0; i < spaceShape.size(); ++i) {
288      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
289                          noiseTerm,
290                          1,                                                              // number of elementary tensor operators in the product
291                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
292                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
293                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
294                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
295                          cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
296                          cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
297    }
298
299    // Construct the full Liouvillian operator as a sum of the created operator terms
300    //  Create an empty operator (super-operator)
301    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
302                        spaceShape.size(),                // Hilbert space rank (number of modes)
303                        spaceShape.data(),                // Hilbert space shape (modes extents)
304                        &liouvillian));                   // the created empty operator (super-operator)
305    //  Append an operator term to the operator (super-operator)
306    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
307                        liouvillian,
308                        oneBodyTerm,                      // appended operator term
309                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
310                        make_cuDoubleComplex(0.0, -1.0),  // -i constant
311                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
312                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
313    //  Append an operator term to the operator (super-operator)
314    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
315                        liouvillian,
316                        twoBodyTerm,                      // appended operator term
317                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
318                        make_cuDoubleComplex(0.0, -1.0),  // -i constant
319                        {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
320                        {fCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
321                         CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
322    //  Append an operator term to the operator (super-operator)
323    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
324                        liouvillian,
325                        oneBodyTerm,                      // appended operator term
326                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
327                        make_cuDoubleComplex(0.0, +1.0),  // +i constant
328                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
329                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
330    //  Append an operator term to the operator (super-operator)
331    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
332                        liouvillian,
333                        twoBodyTerm,                      // appended operator term
334                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
335                        make_cuDoubleComplex(0.0, 1.0),   // +i constant
336                        {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
337                        {fCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
338                         CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
339    //  Append an operator term to the operator (super-operator)
340    const double d = 1.0; // assign some value to the time-independent coefficient
341    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
342                        liouvillian,
343                        noiseTerm,                        // appended operator term
344                        0,                                // operator term action duality as a whole (no duality reversing in this case)
345                        make_cuDoubleComplex(d, 0.0),     // static coefficient associated with the operator term as a whole
346                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
347                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
348  }
349
350  // Destructor destructs the user-defined Liouvillian operator
351  ~UserDefinedLiouvillian()
352  {
353    // Destroy the Liouvillian operator
354    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
355
356    // Destroy operator terms
357    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
358    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
359    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
360
361    // Destroy elementary tensor operators
362    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
363    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
364    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
365
366    // Destroy operator tensors
367    destroyArrayGPU(spinYYelems);
368    destroyArrayGPU(spinZZelems);
369    destroyArrayGPU(spinXelems);
370  }
371
372  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
373  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
374  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
375  UserDefinedLiouvillian(UserDefinedLiouvillian &&) = delete;
376  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) = delete;
377
378  /** Returns the number of externally provided Hamiltonian parameters. */
379  int32_t getNumParameters() const
380  {
381    return 1; // one parameter Omega
382  }
383
384  /** Get access to the constructed Liouvillian operator. */
385  cudensitymatOperator_t & get()
386  {
387    return liouvillian;
388  }
389
390};

Now we can use the defined quantum many-body operator in our main code to compute its action on a mixed quantum state and then backward-differentiate it (compute gradients) with respect to the user-provided real parameter Omega. For the sake of simplicity, we pass a made-up adjoint of the output quantum state to the cudensitymatOperatorComputeActionBackwardDiff() call, which is just the output quantum state itself (in real scenarios, the adjoint of the output quantum state will depend on the user-chosen cost function and will be provided by the user). Upon completion of the cudensitymatOperatorComputeActionBackwardDiff() call, the paramsGrad output argument will contain the gradient of the user-defined cost function with respect to the user-provided real parameter Omega. Additionally, the backward-differentiation API call will also return the adjoint of the input quantum state for cases where the input quantum state implicitly depends on the user-provided real parameters (for example, cases where the input quantum state comes from a previous operator action step, which is typical for time-integration of quantum dynamics master equations). Note that both output arguments, namely paramsGrad and stateInAdj, are accumulative, i.e., they will be accumulated into (it is user’s responsibility to zero them out before the first call!).

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy_grad.h" // user-defined Liouvillian operator example
 13
 14#include <cmath>
 15#include <complex>
 16#include <vector>
 17#include <chrono>
 18#include <iostream>
 19#include <cassert>
 20
 21
 22// Number of times to perform operator action on a quantum state
 23constexpr int NUM_REPEATS = 2;
 24
 25// Logging verbosity
 26bool verbose = true;
 27
 28
 29// Example workflow
 30void exampleWorkflow(cudensitymatHandle_t handle)
 31{
 32  // Define the composite Hilbert space shape and
 33  // quantum state batch size (number of individual quantum states in a batched simulation)
 34  const std::vector<int64_t> spaceShape({2,2,2,2}); // dimensions of quantum degrees of freedom
 35  const int64_t batchSize = 1;                      // number of quantum states per batch
 36
 37  if (verbose) {
 38    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 39    for (const auto & dimsn: spaceShape)
 40      std::cout << dimsn << ",";
 41    std::cout << ")" << std::endl;
 42    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 43  }
 44
 45  // Construct a user-defined Liouvillian operator using a convenience C++ class
 46  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 47  if (verbose)
 48    std::cout << "Constructed the Liouvillian operator\n";
 49
 50  // Set and place external user-provided Hamiltonian parameters in GPU memory
 51  const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
 52  if (verbose)
 53    std::cout << "Number of external user-provided Hamiltonian parameters = " << numParams << std::endl;
 54  std::vector<double> cpuHamParams(numParams * batchSize);
 55  for (int64_t j = 0; j < batchSize; ++j) {
 56    for (int32_t i = 0; i < numParams; ++i) {
 57      cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
 58    }
 59  }
 60  auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
 61  if (verbose)
 62    std::cout << "Created an array of external user-provided Hamiltonian parameters in GPU memory\n";
 63
 64  // Create an array of gradients for the user-provided Hamiltonian parameters in GPU memory
 65  std::vector<double> cpuHamParamsGrad(numParams * batchSize, 0.0);
 66  auto * hamiltonianParamsGrad = static_cast<double *>(createInitializeArrayGPU(cpuHamParamsGrad));
 67  if (verbose)
 68    std::cout << "Created an array of gradients for the external user-provided Hamiltonian parameters in GPU memory\n";
 69
 70  // Declare the input quantum state
 71  cudensitymatState_t inputState;
 72  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 73                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 74                      spaceShape.size(),
 75                      spaceShape.data(),
 76                      batchSize,
 77                      dataType,
 78                      &inputState));
 79
 80  // Query the size of the quantum state storage
 81  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 82  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 83                      inputState,
 84                      1,               // only one storage component (tensor)
 85                      &storageSize));  // storage size in bytes
 86  const std::size_t stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
 87  if (verbose)
 88    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 89
 90  // Prepare some initial value for the input quantum state batch
 91  std::vector<NumericalType> inputStateValue(stateVolume);
 92  if constexpr (std::is_same_v<NumericalType, float>) {
 93    for (std::size_t i = 0; i < stateVolume; ++i) {
 94      inputStateValue[i] = 1.0f / float(i+1); // just some value
 95    }
 96  } else if constexpr (std::is_same_v<NumericalType, double>) {
 97    for (std::size_t i = 0; i < stateVolume; ++i) {
 98      inputStateValue[i] = 1.0 / double(i+1); // just some value
 99    }
100  } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
101    for (std::size_t i = 0; i < stateVolume; ++i) {
102      inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
103    }
104  } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
105    for (std::size_t i = 0; i < stateVolume; ++i) {
106      inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
107    }
108  } else {
109    std::cerr << "Error: Unsupported data type!\n";
110    std::exit(1);
111  }
112  // Allocate initialized GPU storage for the input quantum state with prepared values
113  auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
114  if (verbose)
115    std::cout << "Allocated input quantum state storage and initialized it to some value\n";
116
117  // Attach initialized GPU storage to the input quantum state
118  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
119                      inputState,
120                      1,                                                 // only one storage component (tensor)
121                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
122                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
123  if (verbose)
124    std::cout << "Constructed input quantum state\n";
125
126  // Declare the output quantum state of the same shape
127  cudensitymatState_t outputState;
128  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
129                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
130                      spaceShape.size(),
131                      spaceShape.data(),
132                      batchSize,
133                      dataType,
134                      &outputState));
135
136  // Allocate GPU storage for the output quantum state
137  auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
138  if (verbose)
139    std::cout << "Allocated output quantum state storage\n";
140
141  // Attach GPU storage to the output quantum state
142  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
143                      outputState,
144                      1,                                                 // only one storage component (tensor)
145                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
146                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
147  if (verbose)
148    std::cout << "Constructed output quantum state\n";
149
150  // Declare the adjoint input quantum state of the same shape
151  cudensitymatState_t inputStateAdj;
152  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
153                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
154                      spaceShape.size(),
155                      spaceShape.data(),
156                      batchSize,
157                      dataType,  // data type must match
158                      &inputStateAdj));
159
160  // Allocate GPU storage for the adjoint input quantum state
161  auto * inputStateAdjElems = createArrayGPU<NumericalType>(stateVolume);
162  if (verbose)
163    std::cout << "Allocated adjoint input quantum state storage\n";
164
165  // Attach GPU storage to the adjoint input quantum state
166  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
167                      inputStateAdj,
168                      1,                                                 // only one storage component (tensor)
169                      std::vector<void*>({inputStateAdjElems}).data(),   // pointer to the GPU storage for the quantum state
170                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
171  if (verbose)
172    std::cout << "Constructed adjoint input quantum state\n";
173
174  // Declare a workspace descriptor
175  cudensitymatWorkspaceDescriptor_t workspaceDescr;
176  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
177
178  // Query free GPU memory
179  std::size_t freeMem = 0, totalMem = 0;
180  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
181  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.45); // take 45% of the free memory for the workspace buffer
182  if (verbose)
183    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
184
185  // Allocate GPU storage for the workspace buffer
186  const std::size_t bufferVolume = freeMem / sizeof(NumericalType);
187  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
188  if (verbose)
189    std::cout << "Allocated workspace buffer of size (bytes) = " << freeMem << std::endl;
190
191  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
192  auto startTime = std::chrono::high_resolution_clock::now();
193  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
194                      liouvillian.get(),
195                      inputState,
196                      outputState,
197                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
198                      freeMem,                   // max available GPU free memory for the workspace
199                      workspaceDescr,            // workspace descriptor
200                      0x0));                     // default CUDA stream
201  auto finishTime = std::chrono::high_resolution_clock::now();
202  std::chrono::duration<double> timeSec = finishTime - startTime;
203  if (verbose)
204    std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
205
206  // Query the required workspace buffer size (bytes)
207  std::size_t requiredBufferSize {0};
208  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
209                      workspaceDescr,
210                      CUDENSITYMAT_MEMSPACE_DEVICE,
211                      CUDENSITYMAT_WORKSPACE_SCRATCH,
212                      &requiredBufferSize));
213  if (verbose)
214    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
215
216  if (requiredBufferSize > freeMem) {
217    std::cerr << "Error: Required workspace buffer size is greater than the available GPU free memory!\n";
218    std::exit(1);
219  }
220
221  // Attach the workspace buffer to the workspace descriptor
222  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
223                      workspaceDescr,
224                      CUDENSITYMAT_MEMSPACE_DEVICE,
225                      CUDENSITYMAT_WORKSPACE_SCRATCH,
226                      workspaceBuffer,
227                      requiredBufferSize));
228  if (verbose)
229    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
230
231  // Apply the Liouvillian operator to the input quatum state
232  // and accumulate its action into the output quantum state (note the accumulative += semantics)
233  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
234    // Zero out the output quantum state
235    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
236                        outputState,
237                        0x0));
238    if (verbose)
239      std::cout << "Initialized the output quantum state to zero\n";
240    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
241    startTime = std::chrono::high_resolution_clock::now();
242    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
243                        liouvillian.get(),
244                        0.3,                // time point (some value)
245                        batchSize,          // user-defined batch size
246                        numParams,          // number of external user-defined Hamiltonian parameters
247                        hamiltonianParams,  // external Hamiltonian parameters in GPU memory
248                        inputState,         // input quantum state
249                        outputState,        // output quantum state
250                        workspaceDescr,     // workspace descriptor
251                        0x0));              // default CUDA stream
252    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
253    finishTime = std::chrono::high_resolution_clock::now();
254    timeSec = finishTime - startTime;
255    if (verbose)
256      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
257  }
258
259  // Compute the squared norm of the output quantum state
260  void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
261  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
262                      outputState,
263                      norm2,
264                      0x0));
265  if (verbose) {
266    std::cout << "Computed the output quantum state norm:\n";
267    printArrayGPU<double>(norm2, batchSize);
268  }
269
270  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
271
272  // Prepare the Liouvillian operator action backward differentiation (needs to be done only once)
273  startTime = std::chrono::high_resolution_clock::now();
274  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareActionBackwardDiff(handle,
275                      liouvillian.get(),
276                      inputState,
277                      outputState,               // adjoint output quantum state is always congruent to the output quantum state
278                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
279                      freeMem,                   // max available GPU free memory for the workspace buffer
280                      workspaceDescr,            // workspace descriptor
281                      0x0));                     // default CUDA stream
282  finishTime = std::chrono::high_resolution_clock::now();
283  timeSec = finishTime - startTime;
284  if (verbose)
285    std::cout << "Operator action backward differentiation preparation time (sec) = " << timeSec.count() << std::endl;
286
287  // Query the required workspace buffer size (bytes)
288  requiredBufferSize = 0;
289  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
290                      workspaceDescr,
291                      CUDENSITYMAT_MEMSPACE_DEVICE,
292                      CUDENSITYMAT_WORKSPACE_SCRATCH,
293                      &requiredBufferSize));
294  if (verbose)
295    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
296
297  if (requiredBufferSize > freeMem) {
298    std::cerr << "Error: Required workspace buffer size is greater than the available GPU free memory!\n";
299    std::exit(1);
300  }
301
302  // Attach the workspace buffer to the workspace descriptor
303  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
304                      workspaceDescr,
305                      CUDENSITYMAT_MEMSPACE_DEVICE,
306                      CUDENSITYMAT_WORKSPACE_SCRATCH,
307                      workspaceBuffer,
308                      requiredBufferSize));
309  if (verbose)
310    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
311
312  // Liouvillian operator action backward differentiation:
313  // The adjoint output quantum state, which is always congruent to the output quantum state,
314  // depends on the user-defined cost function, so here we simply pass the previously computed output quantum state.
315  // In real-life applications, the user will pass their adjoint output quantum state, computed for their cost function.
316  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
317    // Zero out the adjoint input quantum state and gradients
318    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
319                        inputStateAdj,
320                        0x0));
321    initializeArrayGPU(std::vector<double>(numParams * batchSize, 0.0), hamiltonianParamsGrad);
322    if (verbose)
323      std::cout << "Initialized the adjoint input quantum state and gradients to zero\n";
324    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
325    startTime = std::chrono::high_resolution_clock::now();
326    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeActionBackwardDiff(handle,
327                        liouvillian.get(),
328                        0.3,                    // time point (some value)
329                        batchSize,              // user-defined batch size
330                        numParams,              // number of external user-defined Hamiltonian parameters
331                        hamiltonianParams,      // external Hamiltonian parameters in GPU memory
332                        inputState,             // input quantum state
333                        outputState,            // adjoint output quantum state (here we just pass the previously computed output quantum state for simplicity)
334                        inputStateAdj,          // adjoint input quantum state
335                        hamiltonianParamsGrad,  // partial gradients with respect to the user-defined real parameters
336                        workspaceDescr,         // workspace descriptor
337                        0x0));                  // default CUDA stream
338    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
339    finishTime = std::chrono::high_resolution_clock::now();
340    timeSec = finishTime - startTime;
341    if (verbose)
342      std::cout << "Operator action backward differentiation computation time (sec) = " << timeSec.count() << std::endl;
343  }
344
345  // Compute the squared norm of the adjoint input quantum state
346  initializeArrayGPU(std::vector<double>(batchSize, 0.0), norm2);
347  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
348                      inputStateAdj,
349                      norm2,
350                      0x0));
351  if (verbose) {
352    std::cout << "Computed the adjoint input quantum state norm:\n";
353    printArrayGPU<double>(norm2, batchSize);
354    std::cout << "Hamiltonian parameters gradients:\n";
355    printArrayGPU<double>(hamiltonianParamsGrad, numParams * batchSize);
356  }
357
358  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
359
360  // Destroy the norm2 array
361  destroyArrayGPU(norm2);
362
363  // Destroy workspace descriptor
364  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
365
366  // Destroy workspace buffer storage
367  destroyArrayGPU(workspaceBuffer);
368
369  // Destroy quantum states
370  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputStateAdj));
371  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
372  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
373
374  // Destroy quantum state storage
375  destroyArrayGPU(inputStateAdjElems);
376  destroyArrayGPU(outputStateElems);
377  destroyArrayGPU(inputStateElems);
378
379  // Destroy external Hamiltonian parameters
380  destroyArrayGPU(static_cast<void *>(hamiltonianParamsGrad));
381  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
382
383  if (verbose)
384    std::cout << "Destroyed resources\n" << std::flush;
385}
386
387
388int main(int argc, char ** argv)
389{
390  // Assign a GPU to the process
391  HANDLE_CUDA_ERROR(cudaSetDevice(0));
392  if (verbose)
393    std::cout << "Set active device\n";
394
395  // Create a library handle
396  cudensitymatHandle_t handle;
397  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
398  if (verbose)
399    std::cout << "Created a library handle\n";
400
401  // Run the example
402  exampleWorkflow(handle);
403
404  // Destroy the library handle
405  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
406  if (verbose)
407    std::cout << "Destroyed the library handle\n";
408
409  HANDLE_CUDA_ERROR(cudaDeviceReset());
410
411  // Done
412  return 0;
413}

Code example (serial batched execution with backward differentiation)#

The following code example extends backward differentiation to batched operators and quantum states. Here the Hamiltonian operator contains batched coefficients such that each instance of a batched quantum state is acted on by a different instance of the batched operator (with different coefficient values). The full sample code can be found in the NVIDIA/cuQuantum repository (main serial batched gradient code and operator definition for batched gradient as well as the utility code).

The Hamiltonian definition makes both the h(t) and f(t) scalar coefficients batched, requiring user-supplied vector storage for their static and dynamic (total) values. The corresponding scalar and scalar gradient callback functions also operate on a batch instead of a single instance. Furthermore, both params and paramsGrad arrays become truly two-dimensional arrays, with the first dimension corresponding to the number of user-provided real parameters and the second dimension corresponding to the batch size.

  1/* Copyright (c) 2026-2026, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"      // GPU helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* DESCRIPTION:
 19   Batched time-dependent transverse-field Ising Hamiltonian operator
 20   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
 21    H[k] = sum_{i} {h_i(t)[k] * X_i}          // transverse-field sum of X_i operators with batched time-dependent h_i(t)[k] coefficients 
 22      + f(t)[k] * sum_{i < j} {g_ij * ZZ_ij}  // batched modulated sum of the fused ordered {Z_i * Z_j} terms with static g_ij coefficients
 23      + d * sum_{i} {Y_i * {..} * Y_i}        // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
 24   where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
 25*/
 26
 27/** Define the numerical type and data type for the GPU computations (same) */
 28using NumericalType = std::complex<double>;      // do not change
 29constexpr cudaDataType_t dataType = CUDA_C_64F;  // do not change
 30
 31
 32/** User-provided batched scalar CPU callback C function
 33 *  defining a batched time-dependent coefficient h_i(t) for all instances
 34 *  of the batch inside the Hamiltonian:
 35 *  h_i(t)[k] = exp(-Omega[k] * t) for k = 0, ..., batchSize-1
 36 */
 37extern "C"
 38int32_t hCoefBatchComplex64(
 39  double time,             //in: time point
 40  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 41  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
 42  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
 43  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
 44  void * scalarStorage,    //inout: CPU-accessible storage for the returned batched coefficient values of shape [0:batchSize-1]
 45  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 46{
 47  if (dataType == CUDA_C_64F) {
 48    auto * tdCoef = static_cast<cuDoubleComplex *>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
 49    for (int64_t k = 0; k < batchSize; ++k) { // for each instance of the batch
 50      const auto omega = params[k * numParams + 0]; // params[0][k]: 0-th parameter for k-th instance of the batch
 51      tdCoef[k] = make_cuDoubleComplex(std::exp((-omega) * time), 0.0); // value of the k-th instance of the batched coefficients
 52    }
 53  } else {
 54    return 1; // error code (1: Error)
 55  }
 56  return 0; // error code (0: Success)
 57}
 58
 59
 60/** User-provided batched scalar gradient callback function (CPU-side) for the user-provided
 61 *  batched scalar callback function hCoefBatchComplex64, defining the gradients with respect
 62 *  to its single (batched) parameter Omega. It accumulates a partial derivative:
 63 *    2*Re(dCost/dOmega[k]) = 2*Re(dCost/dCoef[k] * dCoef[k]/dOmega[k]),
 64 *  where:
 65 *  - Cost is some user-defined real scalar cost function,
 66 *  - dCost/dCoef[k] is the adjoint of the cost function with respect to the k-th instance of the batched coefficient associated with the callback function,
 67 *  - dCoef[k]/dOmega[k] is the gradient of the k-th instance of the batched coefficient with respect to the parameter Omega[k]:
 68 *    dCoef[k]/dOmega[k] = -t * exp(-Omega[k] * t) for k = 0, ..., batchSize-1
 69 */
 70extern "C"
 71int32_t hCoefBatchGradComplex64(
 72  double time,             //in: time point
 73  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 74  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
 75  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
 76  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
 77  void * scalarGrad,       //in: CPU-accessible storage for the batched adjoint of the batched coefficient of shape [0:batchSize-1]
 78  double * paramsGrad,     //inout: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of the returned gradients of the parameter(s) for all instances of the batch
 79  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 80{
 81  if (dataType == CUDA_C_64F) {
 82    const auto * tdCoefAdjoint = static_cast<const cuDoubleComplex *>(scalarGrad); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
 83    for (int64_t k = 0; k < batchSize; ++k) { // for each instance of the batch
 84      const auto omega = params[k * numParams + 0]; // params[0][k]: 0-th parameter for k-th instance of the batch
 85      paramsGrad[k * numParams + 0] += // IMPORTANT: Accumulate the partial derivative for the k-th instance of the batch, not overwrite it!
 86        2.0 * cuCreal(cuCmul(tdCoefAdjoint[k], make_cuDoubleComplex(std::exp((-omega) * time) * (-time), 0.0)));
 87    }
 88  } else {
 89    return 1; // error code (1: Error)
 90  }
 91  return 0; // error code (0: Success)
 92}
 93
 94
 95/** User-provided batched scalar CPU callback C function
 96 *  defining a batched time-dependent coefficient f(t) for all instances
 97 *  of the batch inside the Hamiltonian:
 98 *  f(t)[k] = exp(i * Omega[k] * t)
 99 *          = cos(Omega[k] * t) + i * sin(Omega[k] * t) for k = 0, ..., batchSize-1
100 */
101extern "C"
102int32_t fCoefBatchComplex64(
103  double time,             //in: time point
104  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
105  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
106  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
107  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
108  void * scalarStorage,    //inout: CPU-accessible storage for the returned batched coefficient values of shape [0:batchSize-1]
109  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
110{
111  if (dataType == CUDA_C_64F) {
112    auto * tdCoef = static_cast<cuDoubleComplex *>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
113    for (int64_t k = 0; k < batchSize; ++k) { // for each instance of the batch
114      const auto omega = params[k * numParams + 0]; // params[0][k]: 0-th parameter for k-th instance of the batch
115      tdCoef[k] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time)); // value of the k-th instance of the batched coefficients
116    }
117  } else {
118    return 1; // error code (1: Error)
119  }
120  return 0; // error code (0: Success)
121}
122
123
124/** User-provided batched scalar gradient callback function (CPU-side) for the user-provided
125 *  batched scalar callback function fCoefBatchComplex64, defining the gradients with respect
126 *  to its single (batched) parameter Omega. It accumulates a partial derivative:
127 *    2*Re(dCost/dOmega[k]) = 2*Re(dCost/dCoef[k] * dCoef[k]/dOmega[k]),
128 *  where:
129 *  - Cost is some user-defined real scalar cost function,
130 *  - dCost/dCoef[k] is the adjoint of the cost function with respect to the k-th instance of the batched coefficient associated with the callback function,
131 *  - dCoef[k]/dOmega[k] is the gradient of the k-th instance of the batched coefficient with respect to the parameter Omega[k]:
132 *    dCoef[k]/dOmega[k] = i * t * exp(i * Omega[k] * t)
133 *                       = i * t * cos(Omega[k] * t) - t * sin(Omega[k] * t) for k = 0, ..., batchSize-1
134 */
135extern "C"
136int32_t fCoefBatchGradComplex64(
137  double time,             //in: time point
138  int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
139  int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
140  const double * params,   //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
141  cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
142  void * scalarGrad,       //in: CPU-accessible storage for the batched adjoint of the batched coefficient of shape [0:batchSize-1]
143  double * paramsGrad,     //inout: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of the returned gradients of the parameter(s) for all instances of the batch
144  cudaStream_t stream)     //in: CUDA stream (default is 0x0)
145{
146  if (dataType == CUDA_C_64F) {
147    const auto * tdCoefAdjoint = static_cast<const cuDoubleComplex *>(scalarGrad); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
148    for (int64_t k = 0; k < batchSize; ++k) {
149      const auto omega = params[k * numParams + 0]; // params[0][k]: 0-th parameter for k-th instance of the batch
150      paramsGrad[k * numParams + 0] += // IMPORTANT: Accumulate the partial derivative for the k-th instance of the batch, not overwrite it!
151        2.0 * cuCreal(cuCmul(tdCoefAdjoint[k], make_cuDoubleComplex(-std::sin(omega * time) * time, std::cos(omega * time) * time)));
152    }
153  } else {
154    return 1; // error code (1: Error)
155  }
156  return 0; // error code (0: Success)
157}
158
159
160/** Convenience class which encapsulates a user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
161 *  - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
162 *  - Method `get()` returns a reference to the constructed Liouvillian operator
163 *  - Destructor releases all resources used by the Liouvillian operator
164 */
165class UserDefinedLiouvillian final
166{
167private:
168  // Data members
169  cudensitymatHandle_t handle;             // library context handle
170  int64_t operBatchSize;                   // batch size for the super-operator
171  const std::vector<int64_t> spaceShape;   // Hilbert space shape (extents of the modes of the composite Hilbert space)
172  void * spinXelems {nullptr};             // elements of the X spin operator in GPU RAM (F-order storage)
173  void * spinYYelems {nullptr};            // elements of the fused YY two-spin operator in GPU RAM (F-order storage)
174  void * spinZZelems {nullptr};            // elements of the fused ZZ two-spin operator in GPU RAM (F-order storage)
175  cudensitymatElementaryOperator_t spinX;  // X spin operator (elementary tensor operator)
176  cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator (elementary tensor operator)
177  cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator (elementary tensor operator)
178  cudensitymatOperatorTerm_t oneBodyTerm;  // operator term: H1 = sum_{i} {h_i(t) * X_i} (one-body term)
179  cudensitymatOperatorTerm_t twoBodyTerm;  // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij} (two-body term)
180  cudensitymatOperatorTerm_t noiseTerm;    // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix (two-body mixed term)
181  // Batched coefficients
182  cuDoubleComplex * hCoefsStatic {nullptr}; // static part of the h(t) batched coefficients in the one-body term (of length operBatchSize)
183  cuDoubleComplex * hCoefsTotal {nullptr};  // total h(t) batched coefficients in the one-body term (of length operBatchSize)
184  cuDoubleComplex * fCoefsStaticMinus {nullptr}; // static part of the f(t) batched coefficients in the two-body term (of length operBatchSize)
185  cuDoubleComplex * fCoefsTotalMinus {nullptr};  // total f(t) batched coefficients in the two-body term (of length operBatchSize)
186  cuDoubleComplex * fCoefsStaticPlus {nullptr};  // static part of the f(t) batched coefficients in the dual two-body term (of length operBatchSize)
187  cuDoubleComplex * fCoefsTotalPlus {nullptr};   // total f(t) batched coefficients in the dual two-body term (of length operBatchSize)
188  // Final Liouvillian operator
189  cudensitymatOperator_t liouvillian; // full operator: (-i * (H1 + H2) * {..}) + (i * {..} * (H1 + H2)) + D1{..} (super-operator)
190
191public:
192
193  // Constructor constructs a user-defined Liouvillian operator
194  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,             // library context handle
195                         const std::vector<int64_t> & hilbertSpaceShape, // Hilbert space shape
196                         int64_t batchSize):                             // batch size for the super-operator
197    handle(contextHandle), operBatchSize(batchSize), spaceShape(hilbertSpaceShape)
198  {
199    // Define the necessary operator tensors in GPU memory (F-order storage!)
200    spinXelems = createInitializeArrayGPU<NumericalType>(  // X[i0; j0]
201                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
202                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
203
204    spinYYelems = createInitializeArrayGPU<NumericalType>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
205                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
206                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
207                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
208                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
209
210    spinZZelems = createInitializeArrayGPU<NumericalType>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
211                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
212                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
213                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
214                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
215
216    // Construct the necessary Elementary Tensor Operators
217    //  X_i operator
218    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
219                        1,                                   // one-body operator
220                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
221                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
222                        0,                                   // 0 for dense tensors
223                        nullptr,                             // nullptr for dense tensors
224                        dataType,                            // data type
225                        spinXelems,                          // tensor elements in GPU memory
226                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
227                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
228                        &spinX));                            // the created elementary tensor operator
229    //  ZZ_ij = Z_i * Z_j fused operator
230    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
231                        2,                                   // two-body operator
232                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
233                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
234                        0,                                   // 0 for dense tensors
235                        nullptr,                             // nullptr for dense tensors
236                        dataType,                            // data type
237                        spinZZelems,                         // tensor elements in GPU memory
238                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
239                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
240                        &spinZZ));                           // the created elementary tensor operator
241    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
242    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
243                        2,                                   // two-body operator
244                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
245                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
246                        0,                                   // 0 for dense tensors
247                        nullptr,                             // nullptr for dense tensors
248                        dataType,                            // data type
249                        spinYYelems,                         // tensor elements in GPU memory
250                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
251                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
252                        &spinYY));                           // the created elementary tensor operator
253
254    // Construct the necessary Operator Terms from tensor products of Elementary Tensor Operators
255    //  Create an empty operator term
256    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
257                        spaceShape.size(),                   // Hilbert space rank (number of modes)
258                        spaceShape.data(),                   // Hilbert space shape (mode extents)
259                        &oneBodyTerm));                      // the created empty operator term
260    //  Define the batched operator term: H1[k] = sum_{i} {h_i(t)[k] * X_i}
261    hCoefsStatic = static_cast<cuDoubleComplex *>(createInitializeArrayGPU(
262      std::vector<cuDoubleComplex>(operBatchSize, make_cuDoubleComplex(1.0, 0.0)))); // 1.0 constant for all coefficient instances in the batch
263    hCoefsTotal = static_cast<cuDoubleComplex *>(createInitializeArrayGPU(
264      std::vector<cuDoubleComplex>(operBatchSize, make_cuDoubleComplex(0.0, 0.0)))); // storage for the total coefficients for all instances of the batch
265    for (int32_t i = 0; i < spaceShape.size(); ++i) {
266      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProductBatch(handle,
267                          oneBodyTerm,
268                          1,                                                             // number of elementary tensor operators in the product
269                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
270                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
271                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
272                          operBatchSize,                                                 // batch size
273                          hCoefsStatic,                                                  // static part of the h(t) batched coefficients in the one-body term
274                          hCoefsTotal,                                                   // total h(t) batched coefficients in the one-body term
275                          {hCoefBatchComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU batched scalar callback function defining the time-dependent coefficient associated with this operator product
276                          {hCoefBatchGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr})); // CPU batched scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
277    }
278    //  Create an empty operator term
279    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
280                        spaceShape.size(),                   // Hilbert space rank (number of modes)
281                        spaceShape.data(),                   // Hilbert space shape (mode extents)
282                        &twoBodyTerm));                      // the created empty operator term
283    //  Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
284    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
285      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
286        const double g_ij = -1.0 / static_cast<double>(i + j + 1); // assign some value to the time-independent g_ij coefficient
287        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
288                            twoBodyTerm,
289                            1,                                                              // number of elementary tensor operators in the product
290                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
291                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
292                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
293                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij static coefficient: Always 64-bit-precision complex number
294                            cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
295                            cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
296      }
297    }
298    //  Create an empty operator term
299    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
300                        spaceShape.size(),                   // Hilbert space rank (number of modes)
301                        spaceShape.data(),                   // Hilbert space shape (mode extents)
302                        &noiseTerm));                        // the created empty operator term
303    //  Define the operator term: D1 = d * sum_{i} {YY_ii}
304    for (int32_t i = 0; i < spaceShape.size(); ++i) {
305      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
306                          noiseTerm,
307                          1,                                                              // number of elementary tensor operators in the product
308                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
309                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
310                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
311                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
312                          cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
313                          cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
314    }
315
316    // Construct the full Liouvillian operator as a sum of the created operator terms
317    //  Create an empty operator (super-operator)
318    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
319                        spaceShape.size(),                // Hilbert space rank (number of modes)
320                        spaceShape.data(),                // Hilbert space shape (modes extents)
321                        &liouvillian));                   // the created empty operator (super-operator)
322    //  Append an operator term to the operator (super-operator)
323    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
324                        liouvillian,
325                        oneBodyTerm,                      // appended operator term
326                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
327                        make_cuDoubleComplex(0.0, -1.0),  // -i constant
328                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
329                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
330    //  Append an operator term to the operator (super-operator)
331    fCoefsStaticMinus = static_cast<cuDoubleComplex *>(createInitializeArrayGPU(
332      std::vector<cuDoubleComplex>(operBatchSize, make_cuDoubleComplex(0.0, -1.0)))); // -i constant for all coefficient instances in the batch
333    fCoefsTotalMinus = static_cast<cuDoubleComplex *>(createInitializeArrayGPU(
334      std::vector<cuDoubleComplex>(operBatchSize, make_cuDoubleComplex(0.0, 0.0)))); // storage for the total coefficients for all instances of the batch
335    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTermBatch(handle,
336                        liouvillian,
337                        twoBodyTerm,                      // appended operator term
338                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
339                        operBatchSize,                    // number of instances of the operator term in the batch (they differ by the coefficient value)
340                        fCoefsStaticMinus,                // static part of the f(t) batched coefficients in the two-body term
341                        fCoefsTotalMinus,                 // total f(t) batched coefficients in the two-body term
342                        {fCoefBatchComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU batched scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
343                        {fCoefBatchGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr})); // CPU batched scalar gradient callback function defining the gradient of the coefficient with respect to parameter Omega
344    //  Append an operator term to the operator (super-operator)
345    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
346                        liouvillian,
347                        oneBodyTerm,                      // appended operator term
348                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
349                        make_cuDoubleComplex(0.0, +1.0),  // +i constant
350                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
351                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
352    //  Append an operator term to the operator (super-operator)
353    fCoefsStaticPlus = static_cast<cuDoubleComplex *>(createInitializeArrayGPU(
354      std::vector<cuDoubleComplex>(operBatchSize, make_cuDoubleComplex(0.0, +1.0)))); // +i constant for all coefficient instances in the batch
355    fCoefsTotalPlus = static_cast<cuDoubleComplex *>(createInitializeArrayGPU(
356      std::vector<cuDoubleComplex>(operBatchSize, make_cuDoubleComplex(0.0, 0.0)))); // storage for the total coefficients for all instances of the batch
357    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTermBatch(handle,
358                        liouvillian,
359                        twoBodyTerm,                      // appended operator term
360                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
361                        operBatchSize,                    // number of instances of the operator term in the batch (they differ by the coefficient value)
362                        fCoefsStaticPlus,                 // static part of the f(t) batched coefficients in the dual two-body term
363                        fCoefsTotalPlus,                  // total f(t) batched coefficients in the dual two-body term
364                        {fCoefBatchComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU batched scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
365                        {fCoefBatchGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr})); // CPU batched scalar gradient callback function defining the gradient of the coefficient with respect to parameter Omega
366    //  Append an operator term to the operator (super-operator)
367    const double d = 1.0; // assign some value to the time-independent coefficient
368    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
369                        liouvillian,
370                        noiseTerm,                        // appended operator term
371                        0,                                // operator term action duality as a whole (no duality reversing in this case)
372                        make_cuDoubleComplex(d, 0.0),     // static coefficient associated with the operator term as a whole
373                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
374                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
375  }
376
377  // Destructor destructs the user-defined Liouvillian operator
378  ~UserDefinedLiouvillian()
379  {
380    // Destroy the Liouvillian operator
381    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
382
383    // Destroy operator terms
384    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
385    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
386    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
387
388    // Destroy elementary tensor operators
389    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
390    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
391    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
392
393    // Destroy the batched coefficients
394    destroyArrayGPU(fCoefsTotalPlus);
395    destroyArrayGPU(fCoefsStaticPlus);
396    destroyArrayGPU(fCoefsTotalMinus);
397    destroyArrayGPU(fCoefsStaticMinus);
398    destroyArrayGPU(hCoefsTotal);
399    destroyArrayGPU(hCoefsStatic);
400
401    // Destroy operator tensors
402    destroyArrayGPU(spinYYelems);
403    destroyArrayGPU(spinZZelems);
404    destroyArrayGPU(spinXelems);
405  }
406
407  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
408  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
409  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
410  UserDefinedLiouvillian(UserDefinedLiouvillian &&) = delete;
411  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) = delete;
412
413  /** Returns the number of externally provided Hamiltonian parameters. */
414  int32_t getNumParameters() const
415  {
416    return 1; // one parameter Omega
417  }
418
419  /** Get access to the constructed Liouvillian operator. */
420  cudensitymatOperator_t & get()
421  {
422    return liouvillian;
423  }
424
425};

Once the batched Liouvillian operator has been defined, the rest of the code logic is largely identical to the non-batched case.

  1/* Copyright (c) 2026-2026, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Batched time-dependent transverse-field Ising Hamiltonian operator
 11// with ordered and fused ZZ terms, plus fused unitary dissipation terms
 12#include "transverse_ising_full_fused_noisy_batch_grad.h" // user-defined batched Liouvillian operator example
 13
 14#include <cmath>
 15#include <complex>
 16#include <vector>
 17#include <chrono>
 18#include <iostream>
 19#include <cassert>
 20
 21
 22// Number of times to perform operator action on a quantum state
 23constexpr int NUM_REPEATS = 2;
 24
 25// Logging verbosity
 26bool verbose = true;
 27
 28
 29// Example workflow
 30void exampleWorkflow(cudensitymatHandle_t handle)
 31{
 32  // Define the composite Hilbert space shape and
 33  // quantum state batch size (number of individual quantum states in a batched simulation)
 34  const std::vector<int64_t> spaceShape({2,2,2,2}); // dimensions of quantum degrees of freedom
 35  const int64_t batchSize = 3;                      // number of quantum states per batch
 36
 37  if (verbose) {
 38    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 39    for (const auto & dimsn: spaceShape)
 40      std::cout << dimsn << ",";
 41    std::cout << ")" << std::endl;
 42    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 43  }
 44
 45  // Construct a user-defined batched Liouvillian operator using a convenience C++ class
 46  // Note that the constructed Liouvillian operator has some batched coefficients
 47  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 48  if (verbose)
 49    std::cout << "Constructed the Liouvillian operator\n";
 50
 51  // Set and place external user-provided Hamiltonian parameters in GPU memory
 52  const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
 53  if (verbose)
 54    std::cout << "Number of external user-provided Hamiltonian parameters = " << numParams << std::endl;
 55  std::vector<double> cpuHamParams(numParams * batchSize);
 56  for (int64_t j = 0; j < batchSize; ++j) {
 57    for (int32_t i = 0; i < numParams; ++i) {
 58      cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
 59    }
 60  }
 61  auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
 62  if (verbose)
 63    std::cout << "Created an array of external user-provided Hamiltonian parameters in GPU memory\n";
 64
 65  // Create an array of gradients for the user-provided Hamiltonian parameters in GPU memory
 66  std::vector<double> cpuHamParamsGrad(numParams * batchSize, 0.0);
 67  auto * hamiltonianParamsGrad = static_cast<double *>(createInitializeArrayGPU(cpuHamParamsGrad));
 68  if (verbose)
 69    std::cout << "Created an array of gradients for the external user-provided Hamiltonian parameters in GPU memory\n";
 70
 71  // Declare the input quantum state
 72  cudensitymatState_t inputState;
 73  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 74                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 75                      spaceShape.size(),
 76                      spaceShape.data(),
 77                      batchSize,
 78                      dataType,
 79                      &inputState));
 80
 81  // Query the size of the quantum state storage
 82  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 83  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 84                      inputState,
 85                      1,               // only one storage component (tensor)
 86                      &storageSize));  // storage size in bytes
 87  const std::size_t stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
 88  if (verbose)
 89    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 90
 91  // Prepare some initial value for the input quantum state batch
 92  std::vector<NumericalType> inputStateValue(stateVolume);
 93  if constexpr (std::is_same_v<NumericalType, float>) {
 94    for (std::size_t i = 0; i < stateVolume; ++i) {
 95      inputStateValue[i] = 1.0f / float(i+1); // just some value
 96    }
 97  } else if constexpr (std::is_same_v<NumericalType, double>) {
 98    for (std::size_t i = 0; i < stateVolume; ++i) {
 99      inputStateValue[i] = 1.0 / double(i+1); // just some value
100    }
101  } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
102    for (std::size_t i = 0; i < stateVolume; ++i) {
103      inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
104    }
105  } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
106    for (std::size_t i = 0; i < stateVolume; ++i) {
107      inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
108    }
109  } else {
110    std::cerr << "Error: Unsupported data type!\n";
111    std::exit(1);
112  }
113  // Allocate initialized GPU storage for the input quantum state with prepared values
114  auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
115  if (verbose)
116    std::cout << "Allocated input quantum state storage and initialized it to some value\n";
117
118  // Attach initialized GPU storage to the input quantum state
119  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
120                      inputState,
121                      1,                                                 // only one storage component (tensor)
122                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
123                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
124  if (verbose)
125    std::cout << "Constructed input quantum state\n";
126
127  // Declare the output quantum state of the same shape
128  cudensitymatState_t outputState;
129  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
130                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
131                      spaceShape.size(),
132                      spaceShape.data(),
133                      batchSize,
134                      dataType,
135                      &outputState));
136
137  // Allocate GPU storage for the output quantum state
138  auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
139  if (verbose)
140    std::cout << "Allocated output quantum state storage\n";
141
142  // Attach GPU storage to the output quantum state
143  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
144                      outputState,
145                      1,                                                 // only one storage component (tensor)
146                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
147                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
148  if (verbose)
149    std::cout << "Constructed output quantum state\n";
150
151  // Declare the adjoint input quantum state of the same shape
152  cudensitymatState_t inputStateAdj;
153  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
154                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
155                      spaceShape.size(),
156                      spaceShape.data(),
157                      batchSize,
158                      dataType,  // data type must match
159                      &inputStateAdj));
160
161  // Allocate GPU storage for the adjoint input quantum state
162  auto * inputStateAdjElems = createArrayGPU<NumericalType>(stateVolume);
163  if (verbose)
164    std::cout << "Allocated adjoint input quantum state storage\n";
165
166  // Attach GPU storage to the adjoint input quantum state
167  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
168                      inputStateAdj,
169                      1,                                                 // only one storage component (tensor)
170                      std::vector<void*>({inputStateAdjElems}).data(),   // pointer to the GPU storage for the quantum state
171                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
172  if (verbose)
173    std::cout << "Constructed adjoint input quantum state\n";
174
175  // Declare a workspace descriptor
176  cudensitymatWorkspaceDescriptor_t workspaceDescr;
177  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
178
179  // Query free GPU memory
180  std::size_t freeMem = 0, totalMem = 0;
181  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
182  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.45); // take 45% of the free memory for the workspace buffer
183  if (verbose)
184    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
185
186  // Allocate GPU storage for the workspace buffer
187  const std::size_t bufferVolume = freeMem / sizeof(NumericalType);
188  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
189  if (verbose)
190    std::cout << "Allocated workspace buffer of size (bytes) = " << freeMem << std::endl;
191
192  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
193  auto startTime = std::chrono::high_resolution_clock::now();
194  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
195                      liouvillian.get(),
196                      inputState,
197                      outputState,
198                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
199                      freeMem,                   // max available GPU free memory for the workspace
200                      workspaceDescr,            // workspace descriptor
201                      0x0));                     // default CUDA stream
202  auto finishTime = std::chrono::high_resolution_clock::now();
203  std::chrono::duration<double> timeSec = finishTime - startTime;
204  if (verbose)
205    std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
206
207  // Query the required workspace buffer size (bytes)
208  std::size_t requiredBufferSize {0};
209  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
210                      workspaceDescr,
211                      CUDENSITYMAT_MEMSPACE_DEVICE,
212                      CUDENSITYMAT_WORKSPACE_SCRATCH,
213                      &requiredBufferSize));
214  if (verbose)
215    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
216
217  if (requiredBufferSize > freeMem) {
218    std::cerr << "Error: Required workspace buffer size is greater than the available GPU free memory!\n";
219    std::exit(1);
220  }
221
222  // Attach the workspace buffer to the workspace descriptor
223  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
224                      workspaceDescr,
225                      CUDENSITYMAT_MEMSPACE_DEVICE,
226                      CUDENSITYMAT_WORKSPACE_SCRATCH,
227                      workspaceBuffer,
228                      requiredBufferSize));
229  if (verbose)
230    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
231
232  // Apply the Liouvillian operator to the input quatum state
233  // and accumulate its action into the output quantum state (note the accumulative += semantics)
234  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
235    // Zero out the output quantum state
236    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
237                        outputState,
238                        0x0));
239    if (verbose)
240      std::cout << "Initialized the output quantum state to zero\n";
241    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
242    startTime = std::chrono::high_resolution_clock::now();
243    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
244                        liouvillian.get(),
245                        0.3,                // time point (some value)
246                        batchSize,          // user-defined batch size
247                        numParams,          // number of external user-defined Hamiltonian parameters
248                        hamiltonianParams,  // external Hamiltonian parameters in GPU memory
249                        inputState,         // input quantum state
250                        outputState,        // output quantum state
251                        workspaceDescr,     // workspace descriptor
252                        0x0));              // default CUDA stream
253    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
254    finishTime = std::chrono::high_resolution_clock::now();
255    timeSec = finishTime - startTime;
256    if (verbose)
257      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
258  }
259
260  // Compute the squared norm of the output quantum state
261  void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
262  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
263                      outputState,
264                      norm2,
265                      0x0));
266  if (verbose) {
267    std::cout << "Computed the output quantum state norm:\n";
268    printArrayGPU<double>(norm2, batchSize);
269  }
270
271  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
272
273  // Prepare the Liouvillian operator action backward differentiation (needs to be done only once)
274  startTime = std::chrono::high_resolution_clock::now();
275  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareActionBackwardDiff(handle,
276                      liouvillian.get(),
277                      inputState,
278                      outputState,               // adjoint output quantum state is always congruent to the output quantum state
279                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
280                      freeMem,                   // max available GPU free memory for the workspace buffer
281                      workspaceDescr,            // workspace descriptor
282                      0x0));                     // default CUDA stream
283  finishTime = std::chrono::high_resolution_clock::now();
284  timeSec = finishTime - startTime;
285  if (verbose)
286    std::cout << "Operator action backward differentiation preparation time (sec) = " << timeSec.count() << std::endl;
287
288  // Query the required workspace buffer size (bytes)
289  requiredBufferSize = 0;
290  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
291                      workspaceDescr,
292                      CUDENSITYMAT_MEMSPACE_DEVICE,
293                      CUDENSITYMAT_WORKSPACE_SCRATCH,
294                      &requiredBufferSize));
295  if (verbose)
296    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
297
298  if (requiredBufferSize > freeMem) {
299    std::cerr << "Error: Required workspace buffer size is greater than the available GPU free memory!\n";
300    std::exit(1);
301  }
302
303  // Attach the workspace buffer to the workspace descriptor
304  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
305                      workspaceDescr,
306                      CUDENSITYMAT_MEMSPACE_DEVICE,
307                      CUDENSITYMAT_WORKSPACE_SCRATCH,
308                      workspaceBuffer,
309                      requiredBufferSize));
310  if (verbose)
311    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
312
313  // Liouvillian operator action backward differentiation:
314  // The adjoint output quantum state, which is always congruent to the output quantum state,
315  // depends on the user-defined cost function, so here we simply pass the previously computed output quantum state.
316  // In real-life applications, the user will pass their adjoint output quantum state, computed for their cost function.
317  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
318    // Zero out the adjoint input quantum state and gradients
319    HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
320                        inputStateAdj,
321                        0x0));
322    initializeArrayGPU(std::vector<double>(numParams * batchSize, 0.0), hamiltonianParamsGrad);
323    if (verbose)
324      std::cout << "Initialized the adjoint input quantum state and gradients to zero\n";
325    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
326    startTime = std::chrono::high_resolution_clock::now();
327    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeActionBackwardDiff(handle,
328                        liouvillian.get(),
329                        0.3,                    // time point (some value)
330                        batchSize,              // user-defined batch size
331                        numParams,              // number of external user-defined Hamiltonian parameters
332                        hamiltonianParams,      // external Hamiltonian parameters in GPU memory
333                        inputState,             // input quantum state
334                        outputState,            // adjoint output quantum state (here we just pass the previously computed output quantum state for simplicity)
335                        inputStateAdj,          // adjoint input quantum state
336                        hamiltonianParamsGrad,  // partial gradients with respect to the user-defined real parameters
337                        workspaceDescr,         // workspace descriptor
338                        0x0));                  // default CUDA stream
339    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
340    finishTime = std::chrono::high_resolution_clock::now();
341    timeSec = finishTime - startTime;
342    if (verbose)
343      std::cout << "Operator action backward differentiation computation time (sec) = " << timeSec.count() << std::endl;
344  }
345
346  // Compute the squared norm of the adjoint input quantum state
347  initializeArrayGPU(std::vector<double>(batchSize, 0.0), norm2);
348  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
349                      inputStateAdj,
350                      norm2,
351                      0x0));
352  if (verbose) {
353    std::cout << "Computed the adjoint input quantum state norm:\n";
354    printArrayGPU<double>(norm2, batchSize);
355    std::cout << "Hamiltonian parameters gradients:\n";
356    printArrayGPU<double>(hamiltonianParamsGrad, numParams * batchSize);
357  }
358
359  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
360
361  // Destroy the norm2 array
362  destroyArrayGPU(norm2);
363
364  // Destroy workspace descriptor
365  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
366
367  // Destroy workspace buffer storage
368  destroyArrayGPU(workspaceBuffer);
369
370  // Destroy quantum states
371  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputStateAdj));
372  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
373  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
374
375  // Destroy quantum state storage
376  destroyArrayGPU(inputStateAdjElems);
377  destroyArrayGPU(outputStateElems);
378  destroyArrayGPU(inputStateElems);
379
380  // Destroy external Hamiltonian parameters
381  destroyArrayGPU(static_cast<void *>(hamiltonianParamsGrad));
382  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
383
384  if (verbose)
385    std::cout << "Destroyed resources\n" << std::flush;
386}
387
388
389int main(int argc, char ** argv)
390{
391  // Assign a GPU to the process
392  HANDLE_CUDA_ERROR(cudaSetDevice(0));
393  if (verbose)
394    std::cout << "Set active device\n";
395
396  // Create a library handle
397  cudensitymatHandle_t handle;
398  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
399  if (verbose)
400    std::cout << "Created a library handle\n";
401
402  // Run the example
403  exampleWorkflow(handle);
404
405  // Destroy the library handle
406  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
407  if (verbose)
408    std::cout << "Destroyed the library handle\n";
409
410  HANDLE_CUDA_ERROR(cudaDeviceReset());
411
412  // Done
413  return 0;
414}

Code example (serial execution of operator eigenspectrum computation)#

The following code example illustrates how to use the cuDensityMat library for computing the extreme eigenspectrum of a given operator. The full sample code can be found in the NVIDIA/cuQuantum repository (main serial eigenspectrum code and operator definition as well as the utility code).

First, similarly to the previous examples, we define a transverse-field Ising Hamiltonian with fused ZZ terms the eigenspectrum of which we want to compute. Specifically, in this case, we want to compute a number of the smallest real eigenvalues and their corresponding eigenvectors (pure quantum states). For simplicity, we made our operator time-independent (static).

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"      // GPU helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* DESCRIPTION:
 19   Transverse-field Ising Hamiltonian operator with ordered and fused ZZ terms:
 20    H = sum_{i} {h_i * X_i}         // transverse field sum of X_i operators with static h_i coefficients 
 21      + sum_{i < j} {g_ij * ZZ_ij}  // sum of the fused ordered {Z_i * Z_j} terms with static g_ij coefficients
 22*/
 23
 24/** Define the numerical type and data type for the GPU computations (same) */
 25using NumericalType = std::complex<double>;      // do not change
 26constexpr cudaDataType_t dataType = CUDA_C_64F;  // do not change
 27
 28
 29/** Convenience class which encapsulates a user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
 30 *  - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
 31 *  - Method `get()` returns a reference to the constructed Liouvillian operator
 32 *  - Destructor releases all resources used by the Liouvillian operator
 33 */
 34class UserDefinedLiouvillian final
 35{
 36private:
 37  // Data members
 38  cudensitymatHandle_t handle;             // library context handle
 39  int64_t stateBatchSize;                  // quantum state batch size
 40  const std::vector<int64_t> spaceShape;   // Hilbert space shape (extents of the modes of the composite Hilbert space)
 41  void * spinXelems {nullptr};             // elements of the X spin operator in GPU RAM (F-order storage)
 42  void * spinZZelems {nullptr};            // elements of the fused ZZ two-spin operator in GPU RAM (F-order storage)
 43  cudensitymatElementaryOperator_t spinX;  // X spin operator (elementary tensor operator)
 44  cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator (elementary tensor operator)
 45  cudensitymatOperatorTerm_t oneBodyTerm;  // operator term: H1 = sum_{i} {h_i * X_i} (one-body term)
 46  cudensitymatOperatorTerm_t twoBodyTerm;  // operator term: H2 = sum_{i < j} {g_ij * ZZ_ij} (two-body term)
 47  cudensitymatOperator_t liouvillian;      // full operator: H = H1 + H2
 48
 49public:
 50
 51  // Constructor constructs a user-defined Liouvillian operator
 52  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,             // library context handle
 53                         const std::vector<int64_t> & hilbertSpaceShape, // Hilbert space shape
 54                         int64_t batchSize):                             // batch size for the quantum state
 55    handle(contextHandle), stateBatchSize(batchSize), spaceShape(hilbertSpaceShape)
 56  {
 57    // Define the necessary operator tensors in GPU memory (F-order storage!)
 58    spinXelems = createInitializeArrayGPU<NumericalType>(  // X[i0; j0]
 59                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
 60                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
 61
 62    spinZZelems = createInitializeArrayGPU<NumericalType>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
 63                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
 64                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
 65                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
 66                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
 67
 68    // Construct the necessary Elementary Tensor Operators
 69    //  X_i operator
 70    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
 71                        1,                                   // one-body operator
 72                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
 73                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
 74                        0,                                   // 0 for dense tensors
 75                        nullptr,                             // nullptr for dense tensors
 76                        dataType,                            // data type
 77                        spinXelems,                          // tensor elements in GPU memory
 78                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
 79                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
 80                        &spinX));                            // the created elementary tensor operator
 81    //  ZZ_ij = Z_i * Z_j fused operator
 82    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
 83                        2,                                   // two-body operator
 84                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
 85                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
 86                        0,                                   // 0 for dense tensors
 87                        nullptr,                             // nullptr for dense tensors
 88                        dataType,                            // data type
 89                        spinZZelems,                         // tensor elements in GPU memory
 90                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
 91                        cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
 92                        &spinZZ));                           // the created elementary tensor operator
 93
 94    // Construct the necessary Operator Terms from tensor products of Elementary Tensor Operators
 95    //  Create an empty operator term
 96    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
 97                        spaceShape.size(),                   // Hilbert space rank (number of modes)
 98                        spaceShape.data(),                   // Hilbert space shape (mode extents)
 99                        &oneBodyTerm));                      // the created empty operator term
100    //  Define the operator term: H1 = sum_{i} {h_i * X_i}
101    for (int32_t i = 0; i < spaceShape.size(); ++i) {
102      const double h_i = 1.0 / static_cast<double>(i+1); // assign some value to the time-independent h_i coefficient
103      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
104                          oneBodyTerm,
105                          1,                                                             // number of elementary tensor operators in the product
106                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
107                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
108                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
109                          make_cuDoubleComplex(h_i, 0.0),                                // h_i constant coefficient: Always 64-bit-precision complex number
110                          cudensitymatScalarCallbackNone,                                // no time-dependent coefficient associated with this operator product
111                          cudensitymatScalarGradientCallbackNone));                      // no coefficient gradient associated with this operator product
112    }
113    //  Create an empty operator term
114    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
115                        spaceShape.size(),                   // Hilbert space rank (number of modes)
116                        spaceShape.data(),                   // Hilbert space shape (mode extents)
117                        &twoBodyTerm));                      // the created empty operator term
118    //  Define the operator term: H2 = sum_{i < j} {g_ij * ZZ_ij}
119    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
120      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
121        const double g_ij = -1.0 / static_cast<double>(i + j + 1); // assign some value to the time-independent g_ij coefficient
122        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
123                            twoBodyTerm,
124                            1,                                                              // number of elementary tensor operators in the product
125                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
126                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
127                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
128                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij constant coefficient: Always 64-bit-precision complex number
129                            cudensitymatScalarCallbackNone,                                 // no time-dependent coefficient associated with this operator product
130                            cudensitymatScalarGradientCallbackNone));                       // no coefficient gradient associated with this operator product
131      }
132    }
133
134    // Construct the full Liouvillian operator as a sum of the operator terms
135    //  Create an empty operator
136    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
137                        spaceShape.size(),                // Hilbert space rank (number of modes)
138                        spaceShape.data(),                // Hilbert space shape (modes extents)
139                        &liouvillian));                   // the created empty operator
140    //  Append an operator term to the operator
141    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
142                        liouvillian,
143                        oneBodyTerm,                      // appended operator term
144                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
145                        make_cuDoubleComplex(1.0, 0.0),   // constant coefficient associated with the operator term as a whole
146                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
147                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
148    //  Append an operator term to the operator (super-operator)
149    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
150                        liouvillian,
151                        twoBodyTerm,                      // appended operator term
152                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
153                        make_cuDoubleComplex(1.0, 0.0),   // constant coefficient associated with the operator term as a whole
154                        cudensitymatScalarCallbackNone,   // no time-dependent coefficient associated with the operator term as a whole
155                        cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator term as a whole
156  }
157
158  // Destructor destructs the user-defined Liouvillian operator
159  ~UserDefinedLiouvillian()
160  {
161    // Destroy the Liouvillian operator
162    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
163
164    // Destroy operator terms
165    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
166    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
167
168    // Destroy elementary tensor operators
169    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
170    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
171
172    // Destroy operator tensors
173    destroyArrayGPU(spinZZelems);
174    destroyArrayGPU(spinXelems);
175  }
176
177  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
178  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
179  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
180  UserDefinedLiouvillian(UserDefinedLiouvillian &&) = delete;
181  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) = delete;
182
183  /** Returns the number of externally provided Hamiltonian parameters. */
184  int32_t getNumParameters() const
185  {
186    return 0; // no free parameters
187  }
188
189  /** Get access to the constructed Liouvillian operator. */
190  cudensitymatOperator_t & get()
191  {
192    return liouvillian;
193  }
194
195};

Once the operator has been defined, we can follow standard steps to create necessary quantum states which will store the eigenstates of the defined operator. Then we can prepare the operator eigenspectrum computation, and, finally, compute the eigenspectrum. Note that the leading subset of the quantum states passed to the eigenspectrum compute call to store the computed eigenstates will also be used as the initial guesses for the first Krylov subspace block (if the block size is smaller than the number of requested eigenstates, only the leading quantum states will be used).

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering and spin-operator fusion
 11#include "transverse_ising_full_fused.h"  // user-defined Liouvillian operator example
 12
 13#include <cmath>
 14#include <complex>
 15#include <vector>
 16#include <chrono>
 17#include <iostream>
 18#include <cassert>
 19
 20
 21// Logging verbosity
 22bool verbose = true;
 23
 24
 25// Example workflow
 26void exampleWorkflow(cudensitymatHandle_t handle)
 27{
 28  // Define the composite Hilbert space shape and
 29  // quantum state batch size (number of individual quantum states in a batched simulation)
 30  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 31  const int64_t batchSize = 1;        // number of quantum states per batch (currently only 1 state per batch)
 32  const int32_t numEigenStates = 4;   // number of eigenstates to compute
 33
 34  if (verbose) {
 35    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 36    for (const auto & dimsn: spaceShape)
 37      std::cout << dimsn << ",";
 38    std::cout << ")" << std::endl;
 39    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 40  }
 41
 42  // Construct a user-defined Liouvillian operator using a convenience C++ class
 43  UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
 44  if (verbose)
 45    std::cout << "Constructed the Liouvillian operator\n";
 46
 47  // Create quantum states to store the eigenstates
 48  std::size_t stateVolume {0};
 49  std::vector<cudensitymatState_t> eigenStates(numEigenStates);
 50  std::vector<void *> eigenStatesElems(numEigenStates);
 51  for (int32_t id = 0; id < numEigenStates; ++id) {
 52
 53    // Declare the quantum state
 54    HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 55                        CUDENSITYMAT_STATE_PURITY_PURE,  // pure (state vector)
 56                        spaceShape.size(),
 57                        spaceShape.data(),
 58                        batchSize,
 59                        dataType,
 60                        &eigenStates[id]));
 61
 62    // Query the size of the quantum state storage
 63    std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 64    HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 65                        eigenStates[id],
 66                        1,               // only one storage component (tensor)
 67                        &storageSize));  // storage size in bytes
 68    stateVolume = storageSize / sizeof(NumericalType);  // quantum state tensor volume (number of elements)
 69    if (verbose)
 70      std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 71
 72    // Prepare some initial value for the quantum state
 73    std::vector<NumericalType> stateValue(stateVolume);
 74    if constexpr (std::is_same_v<NumericalType, double>) {
 75      for (std::size_t i = 0; i < stateVolume; ++i) {
 76        stateValue[i] = 1.0 / double(id*5 + i+1); // just some value
 77      }
 78    } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
 79      for (std::size_t i = 0; i < stateVolume; ++i) {
 80        stateValue[i] = NumericalType{1.0 / double(id*5 + i+1), -1.0 / double(id*3 + i+2)}; // just some value
 81      }
 82    } else {
 83      std::cerr << "Error: Unsupported data type!\n";
 84      std::exit(1);
 85    }
 86    // Allocate initialized GPU storage for the quantum state with prepared values
 87    eigenStatesElems[id] = createInitializeArrayGPU(stateValue);
 88    if (verbose)
 89      std::cout << "Allocated quantum state storage and initialized it to some value\n";
 90
 91    // Attach initialized GPU storage to the quantum state
 92    HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
 93                        eigenStates[id],
 94                        1,                                                 // only one storage component (tensor)
 95                        std::vector<void*>({eigenStatesElems[id]}).data(), // pointer to the GPU storage for the quantum state
 96                        std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
 97    if (verbose)
 98      std::cout << "Constructed quantum state\n";
 99  }
100
101  // Allocate storage for the eigenvalues and convergence tolerances
102  void * eigenvalues = createArrayGPU<NumericalType>(numEigenStates * batchSize);
103  std::vector<double> tolerances(numEigenStates * batchSize, 1e-6);
104
105  // Declare a workspace descriptor
106  cudensitymatWorkspaceDescriptor_t workspaceDescr;
107  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
108
109  // Query free GPU memory
110  std::size_t freeMem = 0, totalMem = 0;
111  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
112  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
113  if (verbose)
114    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
115
116  // Create the operator eigenspectrum computation object
117  cudensitymatOperatorSpectrum_t spectrum;
118  HANDLE_CUDM_ERROR(cudensitymatCreateOperatorSpectrum(handle,
119                      liouvillian.get(),
120                      1,  // Hermitian operator
121                      CUDENSITYMAT_OPERATOR_SPECTRUM_SMALLEST_REAL,
122                      &spectrum));
123
124  // Prepare the operator eigenspectrum computation (needs to be done only once)
125  auto startTime = std::chrono::high_resolution_clock::now();
126  HANDLE_CUDM_ERROR(cudensitymatOperatorSpectrumPrepare(handle,
127                      spectrum,
128                      numEigenStates,
129                      eigenStates[0],
130                      CUDENSITYMAT_COMPUTE_64F,
131                      freeMem,
132                      workspaceDescr,
133                      0x0));
134  auto finishTime = std::chrono::high_resolution_clock::now();
135  std::chrono::duration<double> timeSec = finishTime - startTime;
136  if (verbose)
137    std::cout << "Operator eigenspectrum preparation time (sec) = " << timeSec.count() << std::endl;
138
139  // Query the required workspace buffer size (bytes)
140  std::size_t requiredBufferSize {0};
141  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
142                      workspaceDescr,
143                      CUDENSITYMAT_MEMSPACE_DEVICE,
144                      CUDENSITYMAT_WORKSPACE_SCRATCH,
145                      &requiredBufferSize));
146  if (verbose)
147    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
148
149  // Allocate GPU storage for the workspace buffer
150  const std::size_t bufferVolume = requiredBufferSize / sizeof(NumericalType);
151  auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
152  if (verbose)
153    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
154
155  // Attach the workspace buffer to the workspace descriptor
156  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
157                      workspaceDescr,
158                      CUDENSITYMAT_MEMSPACE_DEVICE,
159                      CUDENSITYMAT_WORKSPACE_SCRATCH,
160                      workspaceBuffer,
161                      requiredBufferSize));
162  if (verbose)
163    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
164
165  // Compute the operator eigenspectrum
166  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
167  startTime = std::chrono::high_resolution_clock::now();
168  HANDLE_CUDM_ERROR(cudensitymatOperatorSpectrumCompute(handle,
169                      spectrum,
170                      0.0,
171                      batchSize,
172                      0,
173                      nullptr,
174                      numEigenStates,
175                      eigenStates.data(),
176                      eigenvalues,
177                      tolerances.data(),
178                      workspaceDescr,
179                      0x0));
180  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
181  finishTime = std::chrono::high_resolution_clock::now();
182  timeSec = finishTime - startTime;
183  if (verbose)
184    std::cout << "Operator eigenspectrum computation time (sec) = " << timeSec.count() << std::endl;
185
186  // Print the eigenvalues
187  if (verbose) {
188    std::cout << "Eigenvalues:\n";
189    printArrayGPU<NumericalType>(eigenvalues, numEigenStates);
190  }
191
192  // Print the residual norms
193  if (verbose) {
194    std::cout << "Residual norms:\n";
195    printArrayCPU<double>(tolerances.data(), numEigenStates);
196  }
197
198  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
199
200  // Destroy workspace descriptor
201  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
202
203  // Destroy workspace buffer storage
204  destroyArrayGPU(workspaceBuffer);
205
206  // Destroy operator eigenspectrum computation object
207  HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorSpectrum(spectrum));
208
209  // Destroy eigenvalues storage
210  destroyArrayGPU(eigenvalues);
211
212  // Destroy quantum states
213  for (int32_t id = 0; id < numEigenStates; ++id)
214    HANDLE_CUDM_ERROR(cudensitymatDestroyState(eigenStates[id]));
215
216  // Destroy quantum state storage
217  for (int32_t id = 0; id < numEigenStates; ++id)
218    destroyArrayGPU(eigenStatesElems[id]);
219
220  if (verbose)
221    std::cout << "Destroyed resources\n" << std::flush;
222}
223
224
225int main(int argc, char ** argv)
226{
227  // Assign a GPU to the process
228  HANDLE_CUDA_ERROR(cudaSetDevice(0));
229  if (verbose)
230    std::cout << "Set active device\n";
231
232  // Create a library handle
233  cudensitymatHandle_t handle;
234  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
235  if (verbose)
236    std::cout << "Created a library handle\n";
237
238  // Run the example
239  exampleWorkflow(handle);
240
241  // Destroy the library handle
242  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
243  if (verbose)
244    std::cout << "Destroyed the library handle\n";
245
246  HANDLE_CUDA_ERROR(cudaDeviceReset());
247
248  // Done
249  return 0;
250}

Useful tips#

  • For debugging, one can set the environment variable CUDENSITYMAT_LOG_LEVEL=n. The level n = 0, 1, …, 5 corresponds to the logger level as described in the table below. The environment variable CUDENSITYMAT_LOG_FILE=<filepath> can be used to redirect the log output to a custom file at <filepath> instead of stdout.

Level

Summary

Long Description

0

Off

Logging is disabled (default)

1

Errors

Only errors will be logged

2

Performance Trace

API calls that launch CUDA kernels will log their parameters and important information

3

Performance Hints

Hints that can potentially improve the application’s performance

4

Heuristics Trace

Provides general information about the library execution, may contain details about heuristic status

5

API Trace

API calls will log their parameter and important information