Examples

In this section, we show an example on how to define a quantum operator, quantum states, and then compute the action of the quantum operator on a quantum state. For clarity, the quantum operator is defined inside a separate header transverse_ising_full_fused_noisy.h where it is wrapped in a helper C++ class UserDefinedLiouvillian. We also provide a utility header helpers.h containing convenient GPU array instantiation functions.

Compiling code

Assuming cuQuantum has been extracted in CUQUANTUM_ROOT and cuTENSOR in CUTENSOR_ROOT, we update the library path as follows:

export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}

Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ${CUTENSOR_ROOT}/lib/11).

The serial sample code discussed below (operator_action_example.cpp) can be compiled via the following command:

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -o operator_action_example

For static linking against the cuDensityMat library, use the following command:

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a -L${CUTENSOR_DIR}/lib/12 -lcutensor -o operator_action_example

In order to build parallel (MPI) versions of the example operator_action_mpi_example.cpp, one will need to have an CUDA-aware MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and then set the environment variable $CUDENSITYMAT_COMM_LIB to the path to the MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so. The MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so can be built inside the ${CUQUANTUM_ROOT}/distributed_interfaces folder by calling the provided build script there. In order to link the executable to a CUDA-aware MPI library, one will need to add -I${MPI_PATH}/include and -L${MPI_PATH}/lib -lmpi to the build command:

nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -L${MPI_PATH}/lib -lmpi -o operator_action_mpi_example

Warning

When running operator_action_mpi_example.cpp without CUDA-aware MPI, the program will crash.

Note

Depending on the source of the cuQuantum package, you may need to replace lib above by lib64, depending which folder name is used inside your cuQuantum package.

Code example (serial execution on a single GPU)

The following code example illustrates the common steps necessary to use the cuDensityMat library to compute the action of a quantum many-body operator on a quantum state. The full sample code can be found in the NVIDIA/cuQuantum repository (main serial code and operator definition as well as the utility code).

First let’s introduce a helper class to construct a quantum many-body operator, for example, the transverse field Ising Hamiltonian with fused ZZ terms and and additional noise term.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"      // GPU helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* DESCRIPTION:
 19   Time-dependent transverse-field Ising Hamiltonian operator
 20   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
 21    H = sum_{i} {h_i * X_i}                // transverse field sum of X_i operators with h_i coefficients 
 22      + f(t) * sum_{i < j} {g_ij * ZZ_ij}  // modulated sum of the fused ordered {Z_i * Z_j} terms with g_ij coefficients
 23      + d * sum_{i} {Y_i * {..} * Y_i}     // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
 24   where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
 25*/
 26
 27
 28/** Example of a user-provided scalar CPU callback C function
 29 *  defining a time-dependent coefficient inside the Hamiltonian:
 30 *  f(t) = cos(omega * t) + i * sin(omega * t)
 31 */
 32extern "C"
 33int32_t tdCoefComplex64(double time,             //in: time point
 34                        int64_t batchSize,       //in: user-defined batch size (number of coefficients in the batch)
 35                        int32_t numParams,       //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, omega)
 36                        const double * params,   //in: params[0:numParams-1][0:batchSize-1]: User-provided Hamiltonian parameters for all instances of the batch
 37                        cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this function)
 38                        void * scalarStorage,    //inout: CPU-accessible storage for the returned function value(s) of shape [0:batchSize-1]
 39                        cudaStream_t stream)     //in: CUDA stream (default is 0x0)
 40{
 41  auto * tdCoef = static_cast<cuDoubleComplex*>(scalarStorage); // casting to cuDoubleComplex because the function returns CUDA_C_64F data type
 42  for (int64_t i = 0; i < batchSize; ++i) {
 43    const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
 44    tdCoef[i] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time));
 45  }
 46  return 0; // error code (0: Success)
 47}
 48
 49
 50/** Convenience class which encapsulates the user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
 51 *  - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
 52 *  - Method get() returns a reference to the constructed Liouvillian operator
 53 *  - Destructor releases all resources used by the Liouvillian operator
 54 */
 55class UserDefinedLiouvillian final
 56{
 57private:
 58  // Data members
 59  cudensitymatHandle_t handle;             // library context handle
 60  const std::vector<int64_t> spaceShape;   // Hilbert space shape (extents of the modes of the composite Hilbert space)
 61  void * spinXelems {nullptr};             // elements of the X spin operator in GPU RAM
 62  void * spinYYelems {nullptr};            // elements of the fused YY two-spin operator in GPU RAM
 63  void * spinZZelems {nullptr};            // elements of the fused ZZ two-spin operator in GPU RAM
 64  cudensitymatElementaryOperator_t spinX;  // X spin operator
 65  cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator
 66  cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator
 67  cudensitymatOperatorTerm_t oneBodyTerm;  // operator term: H1 = sum_{i} {h_i * X_i}
 68  cudensitymatOperatorTerm_t twoBodyTerm;  // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
 69  cudensitymatOperatorTerm_t noiseTerm;    // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix
 70  cudensitymatOperator_t liouvillian;      // full operator: (-i * (H1 + H2) * rho) + (i * rho * (H1 + H2)) + D1
 71
 72public:
 73
 74  // Constructor constructs a user-defined Liouvillian operator
 75  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,              // library context handle
 76                         const std::vector<int64_t> & hilbertSpaceShape): // Hilbert space shape
 77    handle(contextHandle), spaceShape(hilbertSpaceShape)
 78  {
 79    // Define the necessary operator tensors in GPU memory (F-order storage!)
 80    spinXelems = createArrayGPU<std::complex<double>>(   // X[i0; j0]
 81                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
 82                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
 83
 84    spinYYelems = createArrayGPU<std::complex<double>>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
 85                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
 86                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
 87                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
 88                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
 89
 90    spinZZelems = createArrayGPU<std::complex<double>>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
 91                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
 92                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
 93                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
 94                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
 95
 96    // Construct the necessary Elementary Tensor Operators
 97    //   X_i operator
 98    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
 99                        1,                                   // one-body operator
100                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
101                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
102                        0,                                   // 0 for dense tensors
103                        nullptr,                             // nullptr for dense tensors
104                        CUDA_C_64F,                          // data type
105                        spinXelems,                          // tensor elements in GPU memory
106                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
107                        &spinX));                            // the created elementary tensor operator
108    //  ZZ_ij = Z_i * Z_j fused operator
109    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
110                        2,                                   // two-body operator
111                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
112                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
113                        0,                                   // 0 for dense tensors
114                        nullptr,                             // nullptr for dense tensors
115                        CUDA_C_64F,                          // data type
116                        spinZZelems,                         // tensor elements in GPU memory
117                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
118                        &spinZZ));                           // the created elementary tensor operator
119    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
120    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
121                        2,                                   // two-body operator
122                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
123                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
124                        0,                                   // 0 for dense tensors
125                        nullptr,                             // nullptr for dense tensors
126                        CUDA_C_64F,                          // data type
127                        spinYYelems,                         // tensor elements in GPU memory
128                        cudensitymatTensorCallbackNone,      // no tensor callback function (tensor is not time-dependent)
129                        &spinYY));                           // the created elementary tensor operator
130
131    // Construct the necessary Operator Terms from direct products of Elementary Tensor Operators
132    //  Create an empty operator term
133    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
134                        spaceShape.size(),                   // Hilbert space rank (number of modes)
135                        spaceShape.data(),                   // Hilbert space shape (mode extents)
136                        &oneBodyTerm));                      // the created empty operator term
137    //  Define the operator term: H1 = sum_{i} {h_i * X_i}
138    for (int32_t i = 0; i < spaceShape.size(); ++i) {
139      const double h_i = 1.0 / static_cast<double>(i+1); // just assign some value (time-independent h_i coefficient)
140      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
141                          oneBodyTerm,
142                          1,                                                             // number of elementary tensor operators in the product
143                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
144                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
145                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
146                          make_cuDoubleComplex(h_i, 0.0),                                // h_i constant coefficient: Always 64-bit-precision complex number
147                          cudensitymatScalarCallbackNone));                              // no time-dependent coefficient associated with this operator product
148    }
149    //  Create an empty operator term
150    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
151                        spaceShape.size(),                   // Hilbert space rank (number of modes)
152                        spaceShape.data(),                   // Hilbert space shape (mode extents)
153                        &twoBodyTerm));                      // the created empty operator term
154    //  Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
155    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
156      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
157        const double g_ij = -1.0 / static_cast<double>(i + j + 1); // just assign some value (time-independent g_ij coefficient)
158        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
159                            twoBodyTerm,
160                            1,                                                              // number of elementary tensor operators in the product
161                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
162                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
163                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
164                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij constant coefficient: Always 64-bit-precision complex number
165                            cudensitymatScalarCallbackNone));                               // no time-dependent coefficient associated with this operator product
166      }
167    }
168    //  Create an empty operator term
169    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
170                        spaceShape.size(),                   // Hilbert space rank (number of modes)
171                        spaceShape.data(),                   // Hilbert space shape (mode extents)
172                        &noiseTerm));                        // the created empty operator term
173    //  Define the operator term: D1 = d * sum_{i} {YY_ii}
174    for (int32_t i = 0; i < spaceShape.size(); ++i) {
175      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
176                          noiseTerm,
177                          1,                                                              // number of elementary tensor operators in the product
178                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
179                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
180                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
181                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
182                          cudensitymatScalarCallbackNone));                               // no time-dependent coefficient associated with this operator product
183    }
184
185    // Construct the full Liouvillian operator as a sum of the operator terms
186    //  Create an empty operator (super-operator)
187    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
188                        spaceShape.size(),                // Hilbert space rank (number of modes)
189                        spaceShape.data(),                // Hilbert space shape (modes extents)
190                        &liouvillian));                   // the created empty operator (super-operator)
191    //  Append an operator term to the operator (super-operator)
192    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
193                        liouvillian,
194                        oneBodyTerm,                      // appended operator term
195                        0,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
196                        make_cuDoubleComplex(0.0, -1.0),  // -i constant
197                        cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with the operator term as a whole
198    //  Append an operator term to the operator (super-operator)
199    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
200                        liouvillian,
201                        twoBodyTerm,                     // appended operator term
202                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
203                        make_cuDoubleComplex(0.0, -1.0), // -i constant
204                        {tdCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr})); // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
205    //  Append an operator term to the operator (super-operator)
206    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
207                        liouvillian,
208                        oneBodyTerm,                      // appended operator term
209                        1,                                // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
210                        make_cuDoubleComplex(0.0, 1.0),   // i constant
211                        cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with the operator term as a whole
212    //  Append an operator term to the operator (super-operator)
213    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
214                        liouvillian,
215                        twoBodyTerm,                     // appended operator term
216                        1,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
217                        make_cuDoubleComplex(0.0, 1.0),  // i constant
218                        {tdCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr})); // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
219    //  Append an operator term to the operator (super-operator)
220    const double d = 0.42; // just assign some value (time-independent coefficient)
221    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
222                        liouvillian,
223                        noiseTerm,                        // appended operator term
224                        0,                                // operator term action duality as a whole (no duality reversing in this case)
225                        make_cuDoubleComplex(d, 0.0),     // constant coefficient associated with the operator term as a whole
226                        cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with the operator term as a whole
227  }
228
229  // Destructor destructs the user-defined Liouvillian operator
230  ~UserDefinedLiouvillian()
231  {
232    // Destroy the Liouvillian operator
233    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
234
235    // Destroy operator terms
236    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
237    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
238    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
239
240    // Destroy elementary tensor operators
241    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
242    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
243    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
244
245    // Destroy operator tensors
246    destroyArrayGPU(spinYYelems);
247    destroyArrayGPU(spinZZelems);
248    destroyArrayGPU(spinXelems);
249  }
250
251  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
252  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
253  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
254  UserDefinedLiouvillian(UserDefinedLiouvillian &&) noexcept = default;
255  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) noexcept = default;
256
257  // Get access to the constructed Liouvillian operator
258  cudensitymatOperator_t & get()
259  {
260    return liouvillian;
261  }
262
263};

Now we can use this quantum many-body operator in our main code.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15#include <cmath>
 16#include <complex>
 17#include <vector>
 18#include <chrono>
 19#include <iostream>
 20#include <cassert>
 21
 22
 23// Number of times to perform operator action on a quantum state
 24constexpr int NUM_REPEATS = 2;
 25
 26// Logging verbosity
 27bool verbose = true;
 28
 29
 30// Example workflow
 31void exampleWorkflow(cudensitymatHandle_t handle)
 32{
 33  // Define the composite Hilbert space shape and
 34  // quantum state batch size (number of individual quantum states in a batched simulation)
 35  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 36  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 37  const int32_t numParams = 1;                              // number of external user-provided Hamiltonian parameters
 38
 39  if (verbose) {
 40    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 41    for (const auto & dimsn: spaceShape)
 42      std::cout << dimsn << ",";
 43    std::cout << ")" << std::endl;
 44    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 45  }
 46
 47  // Place external user-provided Hamiltonian parameters in GPU memory
 48  std::vector<double> cpuHamParams(numParams * batchSize, 13.42); // each parameter can have a different value, of course
 49  auto * hamiltonianParams = static_cast<double *>(createArrayGPU(cpuHamParams));
 50
 51  // Construct a user-defined Liouvillian operator using a convenience C++ class
 52  UserDefinedLiouvillian liouvillian(handle, spaceShape);
 53  if (verbose)
 54    std::cout << "Constructed the Liouvillian operator\n";
 55
 56  // Declare the input quantum state
 57  cudensitymatState_t inputState;
 58  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 59                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 60                      spaceShape.size(),
 61                      spaceShape.data(),
 62                      batchSize,
 63                      CUDA_C_64F,  // data type must match that of the operators created above
 64                      &inputState));
 65
 66  // Query the size of the quantum state storage
 67  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 68  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 69                      inputState,
 70                      1,               // only one storage component (tensor)
 71                      &storageSize));  // storage size in bytes
 72  const std::size_t stateVolume = storageSize / sizeof(std::complex<double>);  // quantum state tensor volume (number of elements)
 73  if (verbose)
 74    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 75
 76  // Prepare some initial value for the input quantum state
 77  std::vector<std::complex<double>> inputStateValue(stateVolume);
 78  for (std::size_t i = 0; i < stateVolume; ++i) {
 79    inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
 80  }
 81
 82  // Allocate initialized GPU storage for the input quantum state with prepared values
 83  auto * inputStateElems = createArrayGPU(inputStateValue);
 84
 85  // Attach initialized GPU storage to the input quantum state
 86  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
 87                      inputState,
 88                      1,                                                 // only one storage component (tensor)
 89                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
 90                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
 91  if (verbose)
 92    std::cout << "Constructed input quantum state\n";
 93
 94  // Declare the output quantum state of the same shape
 95  cudensitymatState_t outputState;
 96  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 97                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 98                      spaceShape.size(),
 99                      spaceShape.data(),
100                      batchSize,
101                      CUDA_C_64F,  // data type must match that of the operators created above
102                      &outputState));
103
104  // Allocate initialized GPU storage for the output quantum state
105  auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
106
107  // Attach initialized GPU storage to the output quantum state
108  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109                      outputState,
110                      1,                                                 // only one storage component (tensor)
111                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
112                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
113  if (verbose)
114    std::cout << "Constructed output quantum state\n";
115
116  // Declare a workspace descriptor
117  cudensitymatWorkspaceDescriptor_t workspaceDescr;
118  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
119
120  // Query free GPU memory
121  std::size_t freeMem = 0, totalMem = 0;
122  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
123  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
124  if (verbose)
125    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
126
127  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
128  const auto startTime = std::chrono::high_resolution_clock::now();
129  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
130                      liouvillian.get(),
131                      inputState,
132                      outputState,
133                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
134                      freeMem,                   // max available GPU free memory for the workspace
135                      workspaceDescr,            // workspace descriptor
136                      0x0));                     // default CUDA stream
137  const auto finishTime = std::chrono::high_resolution_clock::now();
138  const std::chrono::duration<double> timeSec = finishTime - startTime;
139  if (verbose)
140    std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
141
142  // Query the required workspace buffer size (bytes)
143  std::size_t requiredBufferSize {0};
144  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
145                      workspaceDescr,
146                      CUDENSITYMAT_MEMSPACE_DEVICE,
147                      CUDENSITYMAT_WORKSPACE_SCRATCH,
148                      &requiredBufferSize));
149  if (verbose)
150    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
151
152  // Allocate GPU storage for the workspace buffer
153  const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
154  auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
155  if (verbose)
156    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
157
158  // Attach the workspace buffer to the workspace descriptor
159  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
160                      workspaceDescr,
161                      CUDENSITYMAT_MEMSPACE_DEVICE,
162                      CUDENSITYMAT_WORKSPACE_SCRATCH,
163                      workspaceBuffer,
164                      requiredBufferSize));
165  if (verbose)
166    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
167
168  // Zero out the output quantum state
169  HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
170                      outputState,
171                      0x0));
172  if (verbose)
173    std::cout << "Initialized the output quantum state to zero\n";
174
175  // Apply the Liouvillian operator to the input quatum state
176  // and accumulate its action into the output quantum state (note += semantics)
177  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
178    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
179    const auto startTime = std::chrono::high_resolution_clock::now();
180    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
181                        liouvillian.get(),
182                        0.01,                                  // time point
183                        batchSize,                             // user-defined batch size
184                        numParams,                             // number of external user-defined Hamiltonian parameters
185                        hamiltonianParams,                     // external Hamiltonian parameters in GPU memory
186                        inputState,                            // input quantum state
187                        outputState,                           // output quantum state
188                        workspaceDescr,                        // workspace descriptor
189                        0x0));                                 // default CUDA stream
190    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
191    const auto finishTime = std::chrono::high_resolution_clock::now();
192    const std::chrono::duration<double> timeSec = finishTime - startTime;
193    if (verbose)
194      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
195  }
196
197  // Compute the squared norm of the output quantum state
198  void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
199  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
200                      outputState,
201                      norm2,
202                      0x0));
203  if (verbose)
204    std::cout << "Computed the output quantum state norm\n";
205  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
206  destroyArrayGPU(norm2);
207
208  // Destroy workspace descriptor
209  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
210
211  // Destroy workspace buffer storage
212  destroyArrayGPU(workspaceBuffer);
213
214  // Destroy quantum states
215  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
216  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
217
218  // Destroy quantum state storage
219  destroyArrayGPU(outputStateElems);
220  destroyArrayGPU(inputStateElems);
221
222  // Destroy external Hamiltonian parameters
223  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
224
225  if (verbose)
226    std::cout << "Destroyed resources\n" << std::flush;
227}
228
229
230int main(int argc, char ** argv)
231{
232  // Assign a GPU to the process
233  HANDLE_CUDA_ERROR(cudaSetDevice(0));
234  if (verbose)
235    std::cout << "Set active device\n";
236
237  // Create a library handle
238  cudensitymatHandle_t handle;
239  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
240  if (verbose)
241    std::cout << "Created a library handle\n";
242
243  // Run the example
244  exampleWorkflow(handle);
245
246  // Destroy the library handle
247  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
248  if (verbose)
249    std::cout << "Destroyed the library handle\n";
250
251  // Done
252  return 0;
253}

Code example (parallel execution on multiple GPUs)

It is straightforward to adapt main serial code and enable parallel execution across multiple/many GPU devices (across multiple/many nodes). We will illustrate this with an example using the Message Passing Interface (MPI) as the communication layer. Below we show the minor additions that need to be made in order to enable distributed parallel execution without making any changes to the original serial source code.

The full sample code can be found in the NVIDIA/cuQuantum repository (main MPI code and operator definition as well as the utility code).

Here is the updated main code for multi-GPU runs.

  1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15// MPI library (optional)
 16#ifdef MPI_ENABLED
 17#include <mpi.h>
 18#endif
 19
 20#include <cmath>
 21#include <complex>
 22#include <vector>
 23#include <chrono>
 24#include <iostream>
 25#include <cassert>
 26
 27
 28// Number of times to perform operator action on a quantum state
 29constexpr int NUM_REPEATS = 2;
 30
 31// Logging verbosity
 32bool verbose = true;
 33
 34
 35// Example workflow
 36void exampleWorkflow(cudensitymatHandle_t handle)
 37{
 38  // Define the composite Hilbert space shape and
 39  // quantum state batch size (number of individual quantum states in a batched simulation)
 40  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 41  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 42  const int32_t numParams = 1;                              // number of external user-provided Hamiltonian parameters
 43
 44  if (verbose) {
 45    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 46    for (const auto & dimsn: spaceShape)
 47      std::cout << dimsn << ",";
 48    std::cout << ")" << std::endl;
 49    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 50  }
 51
 52  // Place external user-provided Hamiltonian parameters in GPU memory
 53  std::vector<double> cpuHamParams(numParams * batchSize, 13.42); // each parameter can have a different value, of course
 54  auto * hamiltonianParams = static_cast<double *>(createArrayGPU(cpuHamParams));
 55
 56  // Construct a user-defined Liouvillian operator using a convenience C++ class
 57  UserDefinedLiouvillian liouvillian(handle, spaceShape);
 58  if (verbose)
 59    std::cout << "Constructed the Liouvillian operator\n";
 60
 61  // Declare the input quantum state
 62  cudensitymatState_t inputState;
 63  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 64                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 65                      spaceShape.size(),
 66                      spaceShape.data(),
 67                      batchSize,
 68                      CUDA_C_64F,  // data type must match that of the operators created above
 69                      &inputState));
 70
 71  // Query the size of the quantum state storage
 72  std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
 73  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 74                      inputState,
 75                      1,               // only one storage component (tensor)
 76                      &storageSize));  // storage size in bytes
 77  const std::size_t stateVolume = storageSize / sizeof(std::complex<double>);  // quantum state tensor volume (number of elements)
 78  if (verbose)
 79    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 80
 81  // Prepare some initial value for the input quantum state
 82  std::vector<std::complex<double>> inputStateValue(stateVolume);
 83  for (std::size_t i = 0; i < stateVolume; ++i) {
 84    inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
 85  }
 86
 87  // Allocate initialized GPU storage for the input quantum state with prepared values
 88  auto * inputStateElems = createArrayGPU(inputStateValue);
 89
 90  // Attach initialized GPU storage to the input quantum state
 91  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
 92                      inputState,
 93                      1,                                                 // only one storage component (tensor)
 94                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
 95                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
 96  if (verbose)
 97    std::cout << "Constructed input quantum state\n";
 98
 99  // Declare the output quantum state of the same shape
100  cudensitymatState_t outputState;
101  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
102                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
103                      spaceShape.size(),
104                      spaceShape.data(),
105                      batchSize,
106                      CUDA_C_64F,  // data type must match that of the operators created above
107                      &outputState));
108
109  // Allocate initialized GPU storage for the output quantum state
110  auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
111
112  // Attach initialized GPU storage to the output quantum state
113  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
114                      outputState,
115                      1,                                                 // only one storage component (tensor)
116                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
117                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
118  if (verbose)
119    std::cout << "Constructed output quantum state\n";
120
121  // Declare a workspace descriptor
122  cudensitymatWorkspaceDescriptor_t workspaceDescr;
123  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
124
125  // Query free GPU memory
126  std::size_t freeMem = 0, totalMem = 0;
127  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
128  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
129  if (verbose)
130    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
131
132  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
133  const auto startTime = std::chrono::high_resolution_clock::now();
134  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
135                      liouvillian.get(),
136                      inputState,
137                      outputState,
138                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
139                      freeMem,                   // max available GPU free memory for the workspace
140                      workspaceDescr,            // workspace descriptor
141                      0x0));                     // default CUDA stream
142  const auto finishTime = std::chrono::high_resolution_clock::now();
143  const std::chrono::duration<double> timeSec = finishTime - startTime;
144  if (verbose)
145    std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
146
147  // Query the required workspace buffer size (bytes)
148  std::size_t requiredBufferSize {0};
149  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
150                      workspaceDescr,
151                      CUDENSITYMAT_MEMSPACE_DEVICE,
152                      CUDENSITYMAT_WORKSPACE_SCRATCH,
153                      &requiredBufferSize));
154  if (verbose)
155    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
156
157  // Allocate GPU storage for the workspace buffer
158  const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
159  auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
160  if (verbose)
161    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
162
163  // Attach the workspace buffer to the workspace descriptor
164  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
165                      workspaceDescr,
166                      CUDENSITYMAT_MEMSPACE_DEVICE,
167                      CUDENSITYMAT_WORKSPACE_SCRATCH,
168                      workspaceBuffer,
169                      requiredBufferSize));
170  if (verbose)
171    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
172
173  // Zero out the output quantum state
174  HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
175                      outputState,
176                      0x0));
177  if (verbose)
178    std::cout << "Initialized the output quantum state to zero\n";
179
180  // Apply the Liouvillian operator to the input quatum state
181  // and accumulate its action into the output quantum state (note += semantics)
182  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
183    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
184    const auto startTime = std::chrono::high_resolution_clock::now();
185    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
186                        liouvillian.get(),
187                        0.01,                                  // time point
188                        batchSize,                             // user-defined batch size
189                        numParams,                             // number of external user-defined Hamiltonian parameters
190                        hamiltonianParams,                     // external Hamiltonian parameters in GPU memory
191                        inputState,                            // input quantum state
192                        outputState,                           // output quantum state
193                        workspaceDescr,                        // workspace descriptor
194                        0x0));                                 // default CUDA stream
195    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
196    const auto finishTime = std::chrono::high_resolution_clock::now();
197    const std::chrono::duration<double> timeSec = finishTime - startTime;
198    if (verbose)
199      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
200  }
201
202  // Compute the squared norm of the output quantum state
203  void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
204  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
205                      outputState,
206                      norm2,
207                      0x0));
208  if (verbose)
209    std::cout << "Computed the output quantum state norm\n";
210  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
211  destroyArrayGPU(norm2);
212
213  // Destroy workspace descriptor
214  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
215
216  // Destroy workspace buffer storage
217  destroyArrayGPU(workspaceBuffer);
218
219  // Destroy quantum states
220  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
221  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
222
223  // Destroy quantum state storage
224  destroyArrayGPU(outputStateElems);
225  destroyArrayGPU(inputStateElems);
226
227  // Destroy external Hamiltonian parameters
228  destroyArrayGPU(static_cast<void *>(hamiltonianParams));
229
230  if (verbose)
231    std::cout << "Destroyed resources\n" << std::flush;
232}
233
234
235int main(int argc, char ** argv)
236{
237  // Initialize MPI library (if needed)
238#ifdef MPI_ENABLED
239  HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
240  int procRank {-1};
241  HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
242  int numProcs {0};
243  HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
244  if (procRank != 0) verbose = false;
245  if (verbose)
246    std::cout << "Initialized MPI library\n";
247#else
248  const int procRank {0};
249  const int numProcs {1};
250#endif
251
252  // Assign a GPU to the process
253  int numDevices {0};
254  HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
255  const int deviceId = procRank % numDevices;
256  HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
257  if (verbose)
258    std::cout << "Set active device\n";
259
260  // Create a library handle
261  cudensitymatHandle_t handle;
262  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
263  if (verbose)
264    std::cout << "Created a library handle\n";
265
266  // Reset distributed configuration (once)
267#ifdef MPI_ENABLED
268  MPI_Comm comm;
269  HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
270  HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
271                      CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
272                      &comm, sizeof(comm)));
273#endif
274
275  // Run the example
276  exampleWorkflow(handle);
277
278  // Synchronize MPI processes
279#ifdef MPI_ENABLED
280  HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
281#endif
282
283  // Destroy the library handle
284  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
285  if (verbose)
286    std::cout << "Destroyed the library handle\n";
287
288  // Finalize the MPI library
289#ifdef MPI_ENABLED
290  HANDLE_MPI_ERROR(MPI_Finalize());
291  if (verbose)
292    std::cout << "Finalized MPI library\n";
293#endif
294
295  // Done
296  return 0;
297}

Useful tips

  • For debugging, one can set the environment variable CUDENSITYMAT_LOG_LEVEL=n. The level n = 0, 1, …, 5 corresponds to the logger level as described in the table below. The environment variable CUDENSITYMAT_LOG_FILE=<filepath> can be used to redirect the log output to a custom file at <filepath> instead of stdout.

Level

Summary

Long Description

0

Off

Logging is disabled (default)

1

Errors

Only errors will be logged

2

Performance Trace

API calls that launch CUDA kernels will log their parameters and important information

3

Performance Hints

Hints that can potentially improve the application’s performance

4

Heuristics Trace

Provides general information about the library execution, may contain details about heuristic status

5

API Trace

API calls will log their parameter and important information