Examples¶
In this section, we show basic examples on how to define a quantum operator, quantum state,
and then compute the action of the quantum operator on a quantum state, and, optionally,
backward-differentiate the operator action (compute gradients) with respect to user-provided
real parameters parameterizing the operator. For clarity, the quantum operator for each example
is defined inside a separate C++ header, specifically transverse_ising_full_fused_noisy.h
and transverse_ising_full_fused_noisy_grad.h
, where it is wrapped in a helper C++ class
UserDefinedLiouvillian
. We also provide a utility header helpers.h
containing
convenient GPU array instantiation, initialization, and printing functions.
Compiling code¶
Assuming cuQuantum has been extracted in CUQUANTUM_ROOT
and cuTENSOR is in CUTENSOR_ROOT
, we update the library path as follows:
export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}
Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ${CUTENSOR_ROOT}/lib/11
).
A serial sample code discussed below (operator_action_example.cpp
) can be compiled via the following command:
nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensornet -lcutensor -lcublas -o operator_action_example
For static linking against the cuDensityMat library, use the following command:
nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a ${CUQUANTUM_ROOT}/lib/libcutensornet_static.a -L${CUTENSOR_ROOT}/lib/12 -lcutensor -lcublas -o operator_action_example
In order to build a parallel (MPI) version of the example operator_action_mpi_example.cpp
,
one will need to have a CUDA-aware MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and
then set the environment variable $CUDENSITYMAT_COMM_LIB
to the path to the MPI interface
wrapper shared library libcudensitymat_distributed_interface_mpi.so
.
The MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so
can be built
inside the ${CUQUANTUM_ROOT}/distributed_interfaces
folder by calling the provided build script there.
In order to link the executable to a CUDA-aware MPI library, one will need to add
-I${MPI_PATH}/include
and -L${MPI_PATH}/lib -lmpi
to the build command:
nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -L${MPI_PATH}/lib -lcudensitymat -lcutensornet -lcutensor -lcublas -lmpi -o operator_action_mpi_example
Warning
When running operator_action_mpi_example.cpp
with a non-CUDA-aware MPI library, the program will crash.
Note
Depending on the installation of the cuQuantum SDK package, you may need to replace lib
above by lib64
,
depending which folder name is used inside your cuQuantum SDK package.
Code example (serial execution on a single GPU)¶
The following code example illustrates the common steps necessary to use the cuDensityMat library to compute the action of a quantum many-body operator on a quantum state. The full sample code can be found in the NVIDIA/cuQuantum repository (main serial code and operator definition as well as the utility code).
First let’s introduce a helper class to construct a specific quantum many-body operator,
for example, the transverse field Ising Hamiltonian with fused ZZ
terms and
an additional noise term. Here we choose to make the f(t)
coefficient depend on time
and a single user-provided real parameter Omega
. We use a CPU-side user-defined
scalar callback function to define the dependence of the f(t)
coefficient on time
and the user-provided real parameter Omega
. Note that inside the callback function
definition, we explicitly expect the data type to be CUDA_C_64F
(double-precision complex numbers),
which applies to the scalar coefficient f(t)
set by the callback function in-place.
1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#pragma once
7
8#include <cudensitymat.h> // cuDensityMat library header
9#include "helpers.h" // GPU helper functions
10
11#include <cmath>
12#include <complex>
13#include <vector>
14#include <iostream>
15#include <cassert>
16
17
18/* DESCRIPTION:
19 Time-dependent transverse-field Ising Hamiltonian operator
20 with ordered and fused ZZ terms, plus fused unitary dissipation terms:
21 H = sum_{i} {h_i * X_i} // transverse field sum of X_i operators with static h_i coefficients
22 + f(t) * sum_{i < j} {g_ij * ZZ_ij} // modulated sum of the fused ordered {Z_i * Z_j} terms with static g_ij coefficients
23 + d * sum_{i} {Y_i * {..} * Y_i} // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
24 where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
25*/
26
27/** Define the numerical type and data type for the GPU computations (same) */
28using NumericalType = std::complex<double>; // do not change
29constexpr cudaDataType_t dataType = CUDA_C_64F; // do not change
30
31
32/** Example of a user-provided scalar CPU callback C function
33 * defining a time-dependent coefficient inside the Hamiltonian:
34 * f(t) = exp(i * Omega * t) = cos(Omega * t) + i * sin(Omega * t)
35 */
36extern "C"
37int32_t fCoefComplex64(
38 double time, //in: time point
39 int64_t batchSize, //in: user-defined batch size (number of coefficients in the batch)
40 int32_t numParams, //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
41 const double * params, //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
42 cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
43 void * scalarStorage, //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
44 cudaStream_t stream) //in: CUDA stream (default is 0x0)
45{
46 if (dataType == CUDA_C_64F) {
47 auto * tdCoef = static_cast<cuDoubleComplex*>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
48 for (int64_t i = 0; i < batchSize; ++i) {
49 const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
50 tdCoef[i] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time)); // value of the i-th instance of the coefficients batch
51 }
52 } else {
53 return 1; // error code (1: Error)
54 }
55 return 0; // error code (0: Success)
56}
57
58
59/** Convenience class which encapsulates a user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
60 * - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
61 * - Method `get()` returns a reference to the constructed Liouvillian operator
62 * - Destructor releases all resources used by the Liouvillian operator
63 */
64class UserDefinedLiouvillian final
65{
66private:
67 // Data members
68 cudensitymatHandle_t handle; // library context handle
69 int64_t stateBatchSize; // quantum state batch size
70 const std::vector<int64_t> spaceShape; // Hilbert space shape (extents of the modes of the composite Hilbert space)
71 void * spinXelems {nullptr}; // elements of the X spin operator in GPU RAM (F-order storage)
72 void * spinYYelems {nullptr}; // elements of the fused YY two-spin operator in GPU RAM (F-order storage)
73 void * spinZZelems {nullptr}; // elements of the fused ZZ two-spin operator in GPU RAM (F-order storage)
74 cudensitymatElementaryOperator_t spinX; // X spin operator (elementary tensor operator)
75 cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator (elementary tensor operator)
76 cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator (elementary tensor operator)
77 cudensitymatOperatorTerm_t oneBodyTerm; // operator term: H1 = sum_{i} {h_i * X_i} (one-body term)
78 cudensitymatOperatorTerm_t twoBodyTerm; // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij} (two-body term)
79 cudensitymatOperatorTerm_t noiseTerm; // operator term: D1 = d * sum_{i} {YY_ii} // Y_i operators act from different sides on the density matrix (two-body mixed term)
80 cudensitymatOperator_t liouvillian; // full operator: (-i * (H1 + H2) * {..}) + (i * {..} * (H1 + H2)) + D1{..} (super-operator)
81
82public:
83
84 // Constructor constructs a user-defined Liouvillian operator
85 UserDefinedLiouvillian(cudensitymatHandle_t contextHandle, // library context handle
86 const std::vector<int64_t> & hilbertSpaceShape, // Hilbert space shape
87 int64_t batchSize): // batch size for the quantum state
88 handle(contextHandle), stateBatchSize(batchSize), spaceShape(hilbertSpaceShape)
89 {
90 // Define the necessary operator tensors in GPU memory (F-order storage!)
91 spinXelems = createInitializeArrayGPU<NumericalType>( // X[i0; j0]
92 {{0.0, 0.0}, {1.0, 0.0}, // 1st column of matrix X
93 {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
94
95 spinYYelems = createInitializeArrayGPU<NumericalType>( // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
96 {{0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, // 1st column of matrix YY
97 {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix YY
98 {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix YY
99 {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
100
101 spinZZelems = createInitializeArrayGPU<NumericalType>( // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
102 {{1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 1st column of matrix ZZ
103 {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix ZZ
104 {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix ZZ
105 {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}}); // 4th column of matrix ZZ
106
107 // Construct the necessary Elementary Tensor Operators
108 // X_i operator
109 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
110 1, // one-body operator
111 std::vector<int64_t>({2}).data(), // acts in tensor space of shape {2}
112 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
113 0, // 0 for dense tensors
114 nullptr, // nullptr for dense tensors
115 dataType, // data type
116 spinXelems, // tensor elements in GPU memory
117 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
118 cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
119 &spinX)); // the created elementary tensor operator
120 // ZZ_ij = Z_i * Z_j fused operator
121 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
122 2, // two-body operator
123 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
124 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
125 0, // 0 for dense tensors
126 nullptr, // nullptr for dense tensors
127 dataType, // data type
128 spinZZelems, // tensor elements in GPU memory
129 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
130 cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
131 &spinZZ)); // the created elementary tensor operator
132 // YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
133 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
134 2, // two-body operator
135 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
136 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
137 0, // 0 for dense tensors
138 nullptr, // nullptr for dense tensors
139 dataType, // data type
140 spinYYelems, // tensor elements in GPU memory
141 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
142 cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
143 &spinYY)); // the created elementary tensor operator
144
145 // Construct the necessary Operator Terms from tensor products of Elementary Tensor Operators
146 // Create an empty operator term
147 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
148 spaceShape.size(), // Hilbert space rank (number of modes)
149 spaceShape.data(), // Hilbert space shape (mode extents)
150 &oneBodyTerm)); // the created empty operator term
151 // Define the operator term: H1 = sum_{i} {h_i * X_i}
152 for (int32_t i = 0; i < spaceShape.size(); ++i) {
153 const double h_i = 1.0 / static_cast<double>(i+1); // assign some value to the time-independent h_i coefficient
154 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
155 oneBodyTerm,
156 1, // number of elementary tensor operators in the product
157 std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
158 std::vector<int32_t>({i}).data(), // space modes acted on by the operator product
159 std::vector<int32_t>({0}).data(), // space mode action duality (0: from the left; 1: from the right)
160 make_cuDoubleComplex(h_i, 0.0), // h_i constant coefficient: Always 64-bit-precision complex number
161 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with this operator product
162 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator product
163 }
164 // Create an empty operator term
165 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
166 spaceShape.size(), // Hilbert space rank (number of modes)
167 spaceShape.data(), // Hilbert space shape (mode extents)
168 &twoBodyTerm)); // the created empty operator term
169 // Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
170 for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
171 for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
172 const double g_ij = -1.0 / static_cast<double>(i + j + 1); // assign some value to the time-independent g_ij coefficient
173 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
174 twoBodyTerm,
175 1, // number of elementary tensor operators in the product
176 std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
177 std::vector<int32_t>({i, j}).data(), // space modes acted on by the operator product
178 std::vector<int32_t>({0, 0}).data(), // space mode action duality (0: from the left; 1: from the right)
179 make_cuDoubleComplex(g_ij, 0.0), // g_ij constant coefficient: Always 64-bit-precision complex number
180 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with this operator product
181 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator product
182 }
183 }
184 // Create an empty operator term
185 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
186 spaceShape.size(), // Hilbert space rank (number of modes)
187 spaceShape.data(), // Hilbert space shape (mode extents)
188 &noiseTerm)); // the created empty operator term
189 // Define the operator term: D1 = d * sum_{i} {YY_ii}
190 for (int32_t i = 0; i < spaceShape.size(); ++i) {
191 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
192 noiseTerm,
193 1, // number of elementary tensor operators in the product
194 std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
195 std::vector<int32_t>({i, i}).data(), // space modes acted on by the operator product (from different sides)
196 std::vector<int32_t>({0, 1}).data(), // space mode action duality (0: from the left; 1: from the right)
197 make_cuDoubleComplex(1.0, 0.0), // default coefficient: Always 64-bit-precision complex number
198 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with this operator product
199 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator product
200 }
201
202 // Construct the full Liouvillian operator as a sum of the operator terms
203 // Create an empty operator (super-operator)
204 HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
205 spaceShape.size(), // Hilbert space rank (number of modes)
206 spaceShape.data(), // Hilbert space shape (modes extents)
207 &liouvillian)); // the created empty operator (super-operator)
208 // Append an operator term to the operator (super-operator)
209 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
210 liouvillian,
211 oneBodyTerm, // appended operator term
212 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
213 make_cuDoubleComplex(0.0, -1.0), // -i constant
214 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with the operator term as a whole
215 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
216 // Append an operator term to the operator (super-operator)
217 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
218 liouvillian,
219 twoBodyTerm, // appended operator term
220 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
221 make_cuDoubleComplex(0.0, -1.0), // -i constant
222 {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
223 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator term as a whole
224 // Append an operator term to the operator (super-operator)
225 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
226 liouvillian,
227 oneBodyTerm, // appended operator term
228 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
229 make_cuDoubleComplex(0.0, 1.0), // i constant
230 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with the operator term as a whole
231 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
232 // Append an operator term to the operator (super-operator)
233 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
234 liouvillian,
235 twoBodyTerm, // appended operator term
236 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
237 make_cuDoubleComplex(0.0, 1.0), // i constant
238 {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
239 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator term as a whole
240 // Append an operator term to the operator (super-operator)
241 const double d = 0.42; // assign some value to the time-independent coefficient
242 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
243 liouvillian,
244 noiseTerm, // appended operator term
245 0, // operator term action duality as a whole (no duality reversing in this case)
246 make_cuDoubleComplex(d, 0.0), // constant coefficient associated with the operator term as a whole
247 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with the operator term as a whole
248 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
249 }
250
251 // Destructor destructs the user-defined Liouvillian operator
252 ~UserDefinedLiouvillian()
253 {
254 // Destroy the Liouvillian operator
255 HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
256
257 // Destroy operator terms
258 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
259 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
260 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
261
262 // Destroy elementary tensor operators
263 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
264 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
265 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
266
267 // Destroy operator tensors
268 destroyArrayGPU(spinYYelems);
269 destroyArrayGPU(spinZZelems);
270 destroyArrayGPU(spinXelems);
271 }
272
273 // Disable copy constructor/assignment (GPU resources are private, no deep copy)
274 UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
275 UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
276 UserDefinedLiouvillian(UserDefinedLiouvillian &&) = delete;
277 UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) = delete;
278
279 /** Returns the number of externally provided Hamiltonian parameters. */
280 int32_t getNumParameters() const
281 {
282 return 1; // one parameter Omega
283 }
284
285 /** Get access to the constructed Liouvillian operator. */
286 cudensitymatOperator_t & get()
287 {
288 return liouvillian;
289 }
290
291};
Now we can use this parameterized quantum many-body operator in our main code to compute the action of the operator on a mixed quantum state.
1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy.h" // user-defined Liouvillian operator example
13
14#include <cmath>
15#include <complex>
16#include <vector>
17#include <chrono>
18#include <iostream>
19#include <cassert>
20
21
22// Number of times to perform operator action on a quantum state
23constexpr int NUM_REPEATS = 2;
24
25// Logging verbosity
26bool verbose = true;
27
28
29// Example workflow
30void exampleWorkflow(cudensitymatHandle_t handle)
31{
32 // Define the composite Hilbert space shape and
33 // quantum state batch size (number of individual quantum states in a batched simulation)
34 const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
35 const int64_t batchSize = 1; // number of quantum states per batch (default is 1)
36
37 if (verbose) {
38 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
39 for (const auto & dimsn: spaceShape)
40 std::cout << dimsn << ",";
41 std::cout << ")" << std::endl;
42 std::cout << "Quantum state batch size = " << batchSize << std::endl;
43 }
44
45 // Construct a user-defined Liouvillian operator using a convenience C++ class
46 UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
47 if (verbose)
48 std::cout << "Constructed the Liouvillian operator\n";
49
50 // Set and place external user-provided Hamiltonian parameters in GPU memory
51 const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
52 std::vector<double> cpuHamParams(numParams * batchSize);
53 for (int64_t j = 0; j < batchSize; ++j) {
54 for (int32_t i = 0; i < numParams; ++i) {
55 cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
56 }
57 }
58 auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
59
60 // Declare the input quantum state
61 cudensitymatState_t inputState;
62 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
63 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
64 spaceShape.size(),
65 spaceShape.data(),
66 batchSize,
67 dataType,
68 &inputState));
69
70 // Query the size of the quantum state storage
71 std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
72 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
73 inputState,
74 1, // only one storage component (tensor)
75 &storageSize)); // storage size in bytes
76 const std::size_t stateVolume = storageSize / sizeof(NumericalType); // quantum state tensor volume (number of elements)
77 if (verbose)
78 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
79
80 // Prepare some initial value for the input quantum state batch
81 std::vector<NumericalType> inputStateValue(stateVolume);
82 if constexpr (std::is_same_v<NumericalType, float>) {
83 for (std::size_t i = 0; i < stateVolume; ++i) {
84 inputStateValue[i] = 1.0f / float(i+1); // just some value
85 }
86 } else if constexpr (std::is_same_v<NumericalType, double>) {
87 for (std::size_t i = 0; i < stateVolume; ++i) {
88 inputStateValue[i] = 1.0 / double(i+1); // just some value
89 }
90 } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
91 for (std::size_t i = 0; i < stateVolume; ++i) {
92 inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
93 }
94 } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
95 for (std::size_t i = 0; i < stateVolume; ++i) {
96 inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
97 }
98 } else {
99 std::cerr << "Error: Unsupported data type!\n";
100 std::exit(1);
101 }
102 // Allocate initialized GPU storage for the input quantum state with prepared values
103 auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
104 if (verbose)
105 std::cout << "Allocated input quantum state storage and initialized it to some value\n";
106
107 // Attach initialized GPU storage to the input quantum state
108 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109 inputState,
110 1, // only one storage component (tensor)
111 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
112 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
113 if (verbose)
114 std::cout << "Constructed input quantum state\n";
115
116 // Declare the output quantum state of the same shape
117 cudensitymatState_t outputState;
118 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
119 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
120 spaceShape.size(),
121 spaceShape.data(),
122 batchSize,
123 dataType,
124 &outputState));
125
126 // Allocate initialized GPU storage for the output quantum state
127 auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
128 if (verbose)
129 std::cout << "Allocated output quantum state storage\n";
130
131 // Attach GPU storage to the output quantum state
132 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
133 outputState,
134 1, // only one storage component (tensor)
135 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
136 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
137 if (verbose)
138 std::cout << "Constructed output quantum state\n";
139
140 // Declare a workspace descriptor
141 cudensitymatWorkspaceDescriptor_t workspaceDescr;
142 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
143
144 // Query free GPU memory
145 std::size_t freeMem = 0, totalMem = 0;
146 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
147 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
148 if (verbose)
149 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
150
151 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
152 const auto startTime = std::chrono::high_resolution_clock::now();
153 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
154 liouvillian.get(),
155 inputState,
156 outputState,
157 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
158 freeMem, // max available GPU free memory for the workspace
159 workspaceDescr, // workspace descriptor
160 0x0)); // default CUDA stream
161 const auto finishTime = std::chrono::high_resolution_clock::now();
162 const std::chrono::duration<double> timeSec = finishTime - startTime;
163 if (verbose)
164 std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
165
166 // Query the required workspace buffer size (bytes)
167 std::size_t requiredBufferSize {0};
168 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
169 workspaceDescr,
170 CUDENSITYMAT_MEMSPACE_DEVICE,
171 CUDENSITYMAT_WORKSPACE_SCRATCH,
172 &requiredBufferSize));
173 if (verbose)
174 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
175
176 // Allocate GPU storage for the workspace buffer
177 const std::size_t bufferVolume = requiredBufferSize / sizeof(NumericalType);
178 auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
179 if (verbose)
180 std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
181
182 // Attach the workspace buffer to the workspace descriptor
183 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
184 workspaceDescr,
185 CUDENSITYMAT_MEMSPACE_DEVICE,
186 CUDENSITYMAT_WORKSPACE_SCRATCH,
187 workspaceBuffer,
188 requiredBufferSize));
189 if (verbose)
190 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
191
192 // Apply the Liouvillian operator to the input quatum state
193 // and accumulate its action into the output quantum state (note accumulative += semantics)
194 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
195 // Zero out the output quantum state
196 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
197 outputState,
198 0x0));
199 if (verbose)
200 std::cout << "Initialized the output quantum state to zero\n";
201 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
202 const auto startTime = std::chrono::high_resolution_clock::now();
203 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
204 liouvillian.get(),
205 0.3, // time point (some value)
206 batchSize, // user-defined batch size
207 numParams, // number of external user-defined Hamiltonian parameters
208 hamiltonianParams, // external Hamiltonian parameters in GPU memory
209 inputState, // input quantum state
210 outputState, // output quantum state
211 workspaceDescr, // workspace descriptor
212 0x0)); // default CUDA stream
213 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
214 const auto finishTime = std::chrono::high_resolution_clock::now();
215 const std::chrono::duration<double> timeSec = finishTime - startTime;
216 if (verbose)
217 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
218 }
219
220 // Compute the squared norm of the output quantum state
221 void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
222 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
223 outputState,
224 norm2,
225 0x0));
226 if (verbose) {
227 std::cout << "Computed the output quantum state norm:\n";
228 printArrayGPU<double>(norm2, batchSize);
229 }
230
231 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
232
233 // Destroy the norm2 array
234 destroyArrayGPU(norm2);
235
236 // Destroy workspace descriptor
237 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
238
239 // Destroy workspace buffer storage
240 destroyArrayGPU(workspaceBuffer);
241
242 // Destroy quantum states
243 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
244 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
245
246 // Destroy quantum state storage
247 destroyArrayGPU(outputStateElems);
248 destroyArrayGPU(inputStateElems);
249
250 // Destroy external Hamiltonian parameters
251 destroyArrayGPU(static_cast<void *>(hamiltonianParams));
252
253 if (verbose)
254 std::cout << "Destroyed resources\n" << std::flush;
255}
256
257
258int main(int argc, char ** argv)
259{
260 // Assign a GPU to the process
261 HANDLE_CUDA_ERROR(cudaSetDevice(0));
262 if (verbose)
263 std::cout << "Set active device\n";
264
265 // Create a library handle
266 cudensitymatHandle_t handle;
267 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
268 if (verbose)
269 std::cout << "Created a library handle\n";
270
271 // Run the example
272 exampleWorkflow(handle);
273
274 // Destroy the library handle
275 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
276 if (verbose)
277 std::cout << "Destroyed the library handle\n";
278
279 HANDLE_CUDA_ERROR(cudaDeviceReset());
280
281 // Done
282 return 0;
283}
Code example (parallel execution on multiple GPUs)¶
It is straightforward to adapt the main serial code and enable parallel execution across multiple/many GPU devices (across multiple/many nodes). We will illustrate this with an example using the Message Passing Interface (MPI) as the communication layer. Below we show the minor additions that need to be made in order to enable distributed parallel execution without making any changes to the original serial source code.
The full sample code can be found in the NVIDIA/cuQuantum repository (main MPI code and operator definition as well as the utility code).
Here is the updated main code for multi-GPU runs.
1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy.h" // user-defined Liouvillian operator example
13
14
15// MPI library (optional)
16#ifdef MPI_ENABLED
17#include <mpi.h>
18#endif
19
20#include <cmath>
21#include <complex>
22#include <vector>
23#include <chrono>
24#include <iostream>
25#include <cassert>
26
27
28// Number of times to perform operator action on a quantum state
29constexpr int NUM_REPEATS = 2;
30
31// Logging verbosity
32bool verbose = true;
33
34
35// Example workflow
36void exampleWorkflow(cudensitymatHandle_t handle)
37{
38 // Define the composite Hilbert space shape and
39 // quantum state batch size (number of individual quantum states in a batched simulation)
40 const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
41 const int64_t batchSize = 1; // number of quantum states per batch (default is 1)
42
43 if (verbose) {
44 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
45 for (const auto & dimsn: spaceShape)
46 std::cout << dimsn << ",";
47 std::cout << ")" << std::endl;
48 std::cout << "Quantum state batch size = " << batchSize << std::endl;
49 }
50
51 // Construct a user-defined Liouvillian operator using a convenience C++ class
52 UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
53 if (verbose)
54 std::cout << "Constructed the Liouvillian operator\n";
55
56 // Set and place external user-provided Hamiltonian parameters in GPU memory
57 const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
58 std::vector<double> cpuHamParams(numParams * batchSize);
59 for (int64_t j = 0; j < batchSize; ++j) {
60 for (int32_t i = 0; i < numParams; ++i) {
61 cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
62 }
63 }
64 auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
65
66 // Declare the input quantum state
67 cudensitymatState_t inputState;
68 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
69 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
70 spaceShape.size(),
71 spaceShape.data(),
72 batchSize,
73 dataType,
74 &inputState));
75
76 // Query the size of the quantum state storage
77 std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
78 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
79 inputState,
80 1, // only one storage component (tensor)
81 &storageSize)); // storage size in bytes
82 const std::size_t stateVolume = storageSize / sizeof(NumericalType); // quantum state tensor volume (number of elements)
83 if (verbose)
84 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
85
86 // Prepare some initial value for the input quantum state batch
87 std::vector<NumericalType> inputStateValue(stateVolume);
88 if constexpr (std::is_same_v<NumericalType, float>) {
89 for (std::size_t i = 0; i < stateVolume; ++i) {
90 inputStateValue[i] = 1.0f / float(i+1); // just some value
91 }
92 } else if constexpr (std::is_same_v<NumericalType, double>) {
93 for (std::size_t i = 0; i < stateVolume; ++i) {
94 inputStateValue[i] = 1.0 / double(i+1); // just some value
95 }
96 } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
97 for (std::size_t i = 0; i < stateVolume; ++i) {
98 inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
99 }
100 } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
101 for (std::size_t i = 0; i < stateVolume; ++i) {
102 inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
103 }
104 } else {
105 std::cerr << "Error: Unsupported data type!\n";
106 std::exit(1);
107 }
108 // Allocate initialized GPU storage for the input quantum state with prepared values
109 auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
110 if (verbose)
111 std::cout << "Allocated input quantum state storage and initialized it to some value\n";
112
113 // Attach initialized GPU storage to the input quantum state
114 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
115 inputState,
116 1, // only one storage component (tensor)
117 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
118 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
119 if (verbose)
120 std::cout << "Constructed input quantum state\n";
121
122 // Declare the output quantum state of the same shape
123 cudensitymatState_t outputState;
124 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
125 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
126 spaceShape.size(),
127 spaceShape.data(),
128 batchSize,
129 dataType,
130 &outputState));
131
132 // Allocate initialized GPU storage for the output quantum state
133 auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
134 if (verbose)
135 std::cout << "Allocated output quantum state storage\n";
136
137 // Attach GPU storage to the output quantum state
138 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
139 outputState,
140 1, // only one storage component (tensor)
141 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
142 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
143 if (verbose)
144 std::cout << "Constructed output quantum state\n";
145
146 // Declare a workspace descriptor
147 cudensitymatWorkspaceDescriptor_t workspaceDescr;
148 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
149
150 // Query free GPU memory
151 std::size_t freeMem = 0, totalMem = 0;
152 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
153 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
154 if (verbose)
155 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
156
157 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
158 const auto startTime = std::chrono::high_resolution_clock::now();
159 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
160 liouvillian.get(),
161 inputState,
162 outputState,
163 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
164 freeMem, // max available GPU free memory for the workspace
165 workspaceDescr, // workspace descriptor
166 0x0)); // default CUDA stream
167 const auto finishTime = std::chrono::high_resolution_clock::now();
168 const std::chrono::duration<double> timeSec = finishTime - startTime;
169 if (verbose)
170 std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
171
172 // Query the required workspace buffer size (bytes)
173 std::size_t requiredBufferSize {0};
174 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
175 workspaceDescr,
176 CUDENSITYMAT_MEMSPACE_DEVICE,
177 CUDENSITYMAT_WORKSPACE_SCRATCH,
178 &requiredBufferSize));
179 if (verbose)
180 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
181
182 // Allocate GPU storage for the workspace buffer
183 const std::size_t bufferVolume = requiredBufferSize / sizeof(NumericalType);
184 auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
185 if (verbose)
186 std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
187
188 // Attach the workspace buffer to the workspace descriptor
189 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
190 workspaceDescr,
191 CUDENSITYMAT_MEMSPACE_DEVICE,
192 CUDENSITYMAT_WORKSPACE_SCRATCH,
193 workspaceBuffer,
194 requiredBufferSize));
195 if (verbose)
196 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
197
198 // Apply the Liouvillian operator to the input quatum state
199 // and accumulate its action into the output quantum state (note accumulative += semantics)
200 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
201 // Zero out the output quantum state
202 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
203 outputState,
204 0x0));
205 if (verbose)
206 std::cout << "Initialized the output quantum state to zero\n";
207 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
208 const auto startTime = std::chrono::high_resolution_clock::now();
209 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
210 liouvillian.get(),
211 0.3, // time point (some value)
212 batchSize, // user-defined batch size
213 numParams, // number of external user-defined Hamiltonian parameters
214 hamiltonianParams, // external Hamiltonian parameters in GPU memory
215 inputState, // input quantum state
216 outputState, // output quantum state
217 workspaceDescr, // workspace descriptor
218 0x0)); // default CUDA stream
219 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
220 const auto finishTime = std::chrono::high_resolution_clock::now();
221 const std::chrono::duration<double> timeSec = finishTime - startTime;
222 if (verbose)
223 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
224 }
225
226 // Compute the squared norm of the output quantum state
227 void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
228 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
229 outputState,
230 norm2,
231 0x0));
232 if (verbose) {
233 std::cout << "Computed the output quantum state norm:\n";
234 printArrayGPU<double>(norm2, batchSize);
235 }
236
237 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
238
239 // Destroy the norm2 array
240 destroyArrayGPU(norm2);
241
242 // Destroy workspace descriptor
243 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
244
245 // Destroy workspace buffer storage
246 destroyArrayGPU(workspaceBuffer);
247
248 // Destroy quantum states
249 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
250 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
251
252 // Destroy quantum state storage
253 destroyArrayGPU(outputStateElems);
254 destroyArrayGPU(inputStateElems);
255
256 // Destroy external Hamiltonian parameters
257 destroyArrayGPU(static_cast<void *>(hamiltonianParams));
258
259 if (verbose)
260 std::cout << "Destroyed resources\n" << std::flush;
261}
262
263
264int main(int argc, char ** argv)
265{
266 // Initialize MPI library (if needed)
267#ifdef MPI_ENABLED
268 HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
269 int procRank {-1};
270 HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
271 int numProcs {0};
272 HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
273 if (procRank != 0) verbose = false;
274 if (verbose)
275 std::cout << "Initialized MPI library\n";
276#else
277 const int procRank {0};
278 const int numProcs {1};
279#endif
280
281 // Assign a GPU to the process
282 int numDevices {0};
283 HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
284 const int deviceId = procRank % numDevices;
285 HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
286 if (verbose)
287 std::cout << "Set active device\n";
288
289 // Create a library handle
290 cudensitymatHandle_t handle;
291 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
292 if (verbose)
293 std::cout << "Created a library handle\n";
294
295 // Reset distributed configuration (once)
296#ifdef MPI_ENABLED
297 MPI_Comm comm;
298 HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
299 HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
300 CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
301 &comm, sizeof(comm)));
302#endif
303
304 // Run the example
305 exampleWorkflow(handle);
306
307 // Synchronize MPI processes
308#ifdef MPI_ENABLED
309 HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
310#endif
311
312 // Destroy the library handle
313 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
314 if (verbose)
315 std::cout << "Destroyed the library handle\n";
316
317 HANDLE_CUDA_ERROR(cudaDeviceReset());
318
319 // Finalize the MPI library
320#ifdef MPI_ENABLED
321 HANDLE_MPI_ERROR(MPI_Finalize());
322 if (verbose)
323 std::cout << "Finalized MPI library\n";
324#endif
325
326 // Done
327 return 0;
328}
Code example (serial execution with backward differentiation)¶
The following code example illustrates how to use the cuDensityMat library to not only compute the action of a quantum many-body operator on a quantum state, but also backward-differentiate it (compute gradients) with respect to user-provided real parameters parameterizing the operator (one real parameter Omega in this example). The full sample code can be found in the NVIDIA/cuQuantum repository (main serial gradient code and operator definition for gradient as well as the utility code).
First let’s construct a specific quantum many-body operator which, in this case,
is a slightly modified version of the quantum many-body operator used in main serial code.
Here we make both the h(t)
and f(t)
scalar coefficients depend on time and a single
user-provided real parameter Omega
via different (made-up) functional forms.
In order to backward-differentiate the operator action with respect to this single
user-provided real parameter Omega
, we need to manually define a gradient callback
function for each regular callback function we have (for h(t)
and f(t)
in this example).
In our example, we define two CPU-side scalar gradient callback functions which compute
the vector-jacobian product (VJP) of the scalar adjoint of h(t)
and f(t)
with respect
to the user-provided real parameter Omega
, respectively. A gradient callback function
is expected to accumulate the VJP result into the paramsGrad
output array, the final
value of which will be the gradient(s) of the user-defined cost function with respect to
the user-provided real parameters parameterizing the operator. As before, all regular and
gradient callback functions used in our example explicitly expect the data type to be
CUDA_C_64F
(double-precision complex numbers). Updating the data type of the operator
and quantum states would require updating the callback functions accordingly (CUDA_C_32F
).
1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#pragma once
7
8#include <cudensitymat.h> // cuDensityMat library header
9#include "helpers.h" // GPU helper functions
10
11#include <cmath>
12#include <complex>
13#include <vector>
14#include <iostream>
15#include <cassert>
16
17
18/* DESCRIPTION:
19 Time-dependent transverse-field Ising Hamiltonian operator
20 with ordered and fused ZZ terms, plus fused unitary dissipation terms:
21 H = sum_{i} {h_i(t) * X_i} // transverse field sum of X_i operators with time-dependent h_i(t) coefficients
22 + f(t) * sum_{i < j} {g_ij * ZZ_ij} // modulated sum of the fused ordered {Z_i * Z_j} terms with static g_ij coefficients
23 + d * sum_{i} {Y_i * {..} * Y_i} // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
24 where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
25*/
26
27/** Define the numerical type and data type for the GPU computations (same) */
28using NumericalType = std::complex<double>; // do not change
29constexpr cudaDataType_t dataType = CUDA_C_64F; // do not change
30
31
32/** Example of a user-provided scalar CPU callback C function
33 * defining a time-dependent coefficient h_i(t) inside the Hamiltonian:
34 * h_i(t) = exp(-Omega * t)
35 */
36extern "C"
37int32_t hCoefComplex64(
38 double time, //in: time point
39 int64_t batchSize, //in: user-defined batch size (number of coefficients in the batch)
40 int32_t numParams, //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
41 const double * params, //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
42 cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
43 void * scalarStorage, //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
44 cudaStream_t stream) //in: CUDA stream (default is 0x0)
45{
46 if (dataType == CUDA_C_64F) {
47 auto * tdCoef = static_cast<cuDoubleComplex *>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
48 for (int64_t i = 0; i < batchSize; ++i) {
49 const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
50 tdCoef[i] = make_cuDoubleComplex(std::exp((-omega) * time), 0.0); // value of the i-th instance of the coefficients batch
51 }
52 } else {
53 return 1; // error code (1: Error)
54 }
55 return 0; // error code (0: Success)
56}
57
58
59/** User-provided gradient callback function for the user-provided
60 * scalar callback function with respect to its single parameter Omega.
61 * It accumulates a partial derivative 2*Re(dCost/dOmega) = 2*Re(dCost/dCoef * dCoef/dOmega),
62 * where:
63 * - Cost is some user-defined real scalar cost function,
64 * - dCost/dCoef is the adjoint of the cost function with respect to the coefficient (or their batch) associated with the callback function,
65 * - dCoef/dOmega is the gradient of the coefficient (or their batch) with respect to the parameter Omega:
66 * dCoef/dOmega = -time * exp(-Omega * time)
67 */
68extern "C"
69int32_t hCoefGradComplex64(
70 double time, //in: time point
71 int64_t batchSize, //in: user-defined batch size (number of coefficients in the batch)
72 int32_t numParams, //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
73 const double * params, //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
74 cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
75 void * scalarGrad, //inout: CPU-accessible storage for the adjoint(s) of the coefficient(s) of shape [0:batchSize-1]
76 double * paramsGrad, //inout: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of the returned gradient(s) of the parameter(s)
77 cudaStream_t stream) //in: CUDA stream (default is 0x0)
78{
79 if (dataType == CUDA_C_64F) {
80 const auto * tdCoefAdjoint = static_cast<const cuDoubleComplex *>(scalarGrad); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
81 for (int64_t i = 0; i < batchSize; ++i) {
82 const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
83 paramsGrad[i * numParams + 0] += // IMPORTANT: Accumulate the partial derivative for the i-th instance of the batch, not overwrite it!
84 2.0 * cuCreal(cuCmul(tdCoefAdjoint[i], make_cuDoubleComplex(std::exp((-omega) * time) * (-time), 0.0)));
85 }
86 } else {
87 return 1; // error code (1: Error)
88 }
89 return 0; // error code (0: Success)
90}
91
92
93/** Example of a user-provided scalar CPU callback C function
94 * defining a time-dependent coefficient f(t) inside the Hamiltonian:
95 * f(t) = exp(i * Omega * t) = cos(Omega * t) + i * sin(Omega * t)
96 */
97extern "C"
98int32_t fCoefComplex64(
99 double time, //in: time point
100 int64_t batchSize, //in: user-defined batch size (number of coefficients in the batch)
101 int32_t numParams, //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
102 const double * params, //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
103 cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
104 void * scalarStorage, //inout: CPU-accessible storage for the returned coefficient value(s) of shape [0:batchSize-1]
105 cudaStream_t stream) //in: CUDA stream (default is 0x0)
106{
107 if (dataType == CUDA_C_64F) {
108 auto * tdCoef = static_cast<cuDoubleComplex *>(scalarStorage); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
109 for (int64_t i = 0; i < batchSize; ++i) {
110 const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
111 tdCoef[i] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time)); // value of the i-th instance of the coefficients batch
112 }
113 } else {
114 return 1; // error code (1: Error)
115 }
116 return 0; // error code (0: Success)
117}
118
119
120/** User-provided gradient callback function for the user-provided
121 * scalar callback function with respect to its single parameter Omega.
122 * It accumulates a partial derivative 2*Re(dCost/dOmega) = 2*Re(dCost/dCoef * dCoef/dOmega),
123 * where:
124 * - Cost is some user-defined real scalar cost function,
125 * - dCost/dCoef is the adjoint of the cost function with respect to the coefficient associated with the callback function,
126 * - dCoef/dOmega is the gradient of the coefficient with respect to the parameter Omega:
127 * dCoef/dOmega = -i * time * exp(i * Omega * time)
128 * = -i * time * (cos(Omega * time) + i * sin(Omega * time)) =
129 * = -time * sin(Omega * time) + i * time * cos(Omega * time)
130 */
131extern "C"
132int32_t fCoefGradComplex64(
133 double time, //in: time point
134 int64_t batchSize, //in: user-defined batch size (number of coefficients in the batch)
135 int32_t numParams, //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, Omega)
136 const double * params, //in: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of user-provided Hamiltonian parameters for all instances of the batch
137 cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this specific callback function)
138 void * scalarGrad, //inout: CPU-accessible storage for the adjoint(s) of the coefficient(s) of shape [0:batchSize-1]
139 double * paramsGrad, //inout: params[0:numParams-1][0:batchSize-1]: GPU-accessible F-ordered array of the returned gradient(s) of the parameter(s)
140 cudaStream_t stream) //in: CUDA stream (default is 0x0)
141{
142 if (dataType == CUDA_C_64F) {
143 const auto * tdCoefAdjoint = static_cast<const cuDoubleComplex *>(scalarGrad); // casting to cuDoubleComplex because this callback function expects CUDA_C_64F data type
144 for (int64_t i = 0; i < batchSize; ++i) {
145 const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
146 paramsGrad[i * numParams + 0] += // IMPORTANT: Accumulate the partial derivative for the i-th instance of the batch, not overwrite it!
147 2.0 * cuCreal(cuCmul(tdCoefAdjoint[i], make_cuDoubleComplex(-std::sin(omega * time) * time, std::cos(omega * time) * time)));
148 }
149 } else {
150 return 1; // error code (1: Error)
151 }
152 return 0; // error code (0: Success)
153}
154
155
156/** Convenience class which encapsulates a user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
157 * - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
158 * - Method `get()` returns a reference to the constructed Liouvillian operator
159 * - Destructor releases all resources used by the Liouvillian operator
160 */
161class UserDefinedLiouvillian final
162{
163private:
164 // Data members
165 cudensitymatHandle_t handle; // library context handle
166 int64_t stateBatchSize; // quantum state batch size
167 const std::vector<int64_t> spaceShape; // Hilbert space shape (extents of the modes of the composite Hilbert space)
168 void * spinXelems {nullptr}; // elements of the X spin operator in GPU RAM (F-order storage)
169 void * spinYYelems {nullptr}; // elements of the fused YY two-spin operator in GPU RAM (F-order storage)
170 void * spinZZelems {nullptr}; // elements of the fused ZZ two-spin operator in GPU RAM (F-order storage)
171 cudensitymatElementaryOperator_t spinX; // X spin operator (elementary tensor operator)
172 cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator (elementary tensor operator)
173 cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator (elementary tensor operator)
174 cudensitymatOperatorTerm_t oneBodyTerm; // operator term: H1 = sum_{i} {h_i(t) * X_i} (one-body term)
175 cudensitymatOperatorTerm_t twoBodyTerm; // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij} (two-body term)
176 cudensitymatOperatorTerm_t noiseTerm; // operator term: D1 = d * sum_{i} {YY_ii} // Y_i operators act from different sides on the density matrix (two-body mixed term)
177 cudensitymatOperator_t liouvillian; // full operator: (-i * (H1 + H2) * {..}) + (i * {..} * (H1 + H2)) + D1{..} (super-operator)
178
179public:
180
181 // Constructor constructs a user-defined Liouvillian operator
182 UserDefinedLiouvillian(cudensitymatHandle_t contextHandle, // library context handle
183 const std::vector<int64_t> & hilbertSpaceShape, // Hilbert space shape
184 int64_t batchSize): // batch size for the quantum state
185 handle(contextHandle), stateBatchSize(batchSize), spaceShape(hilbertSpaceShape)
186 {
187 // Define the necessary operator tensors in GPU memory (F-order storage!)
188 spinXelems = createInitializeArrayGPU<NumericalType>( // X[i0; j0]
189 {{0.0, 0.0}, {1.0, 0.0}, // 1st column of matrix X
190 {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
191
192 spinYYelems = createInitializeArrayGPU<NumericalType>( // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
193 {{0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, // 1st column of matrix YY
194 {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix YY
195 {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix YY
196 {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
197
198 spinZZelems = createInitializeArrayGPU<NumericalType>( // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
199 {{1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 1st column of matrix ZZ
200 {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix ZZ
201 {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix ZZ
202 {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}}); // 4th column of matrix ZZ
203
204 // Construct the necessary Elementary Tensor Operators
205 // X_i operator
206 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
207 1, // one-body operator
208 std::vector<int64_t>({2}).data(), // acts in tensor space of shape {2}
209 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
210 0, // 0 for dense tensors
211 nullptr, // nullptr for dense tensors
212 dataType, // data type
213 spinXelems, // tensor elements in GPU memory
214 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
215 cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
216 &spinX)); // the created elementary tensor operator
217 // ZZ_ij = Z_i * Z_j fused operator
218 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
219 2, // two-body operator
220 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
221 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
222 0, // 0 for dense tensors
223 nullptr, // nullptr for dense tensors
224 dataType, // data type
225 spinZZelems, // tensor elements in GPU memory
226 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
227 cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
228 &spinZZ)); // the created elementary tensor operator
229 // YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
230 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
231 2, // two-body operator
232 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
233 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
234 0, // 0 for dense tensors
235 nullptr, // nullptr for dense tensors
236 dataType, // data type
237 spinYYelems, // tensor elements in GPU memory
238 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
239 cudensitymatTensorGradientCallbackNone, // no tensor gradient callback function
240 &spinYY)); // the created elementary tensor operator
241
242 // Construct the necessary Operator Terms from tensor products of Elementary Tensor Operators
243 // Create an empty operator term
244 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
245 spaceShape.size(), // Hilbert space rank (number of modes)
246 spaceShape.data(), // Hilbert space shape (mode extents)
247 &oneBodyTerm)); // the created empty operator term
248 // Define the operator term: H1 = sum_{i} {h_i(t) * X_i}
249 for (int32_t i = 0; i < spaceShape.size(); ++i) {
250 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
251 oneBodyTerm,
252 1, // number of elementary tensor operators in the product
253 std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
254 std::vector<int32_t>({i}).data(), // space modes acted on by the operator product
255 std::vector<int32_t>({0}).data(), // space mode action duality (0: from the left; 1: from the right)
256 make_cuDoubleComplex(1.0, 0.0), // static coefficient part: Always 64-bit-precision complex number
257 {hCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator product
258 {hCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
259 CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
260 }
261 // Create an empty operator term
262 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
263 spaceShape.size(), // Hilbert space rank (number of modes)
264 spaceShape.data(), // Hilbert space shape (mode extents)
265 &twoBodyTerm)); // the created empty operator term
266 // Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
267 for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
268 for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
269 const double g_ij = -1.0 / static_cast<double>(i + j + 1); // assign some value to the time-independent g_ij coefficient
270 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
271 twoBodyTerm,
272 1, // number of elementary tensor operators in the product
273 std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
274 std::vector<int32_t>({i, j}).data(), // space modes acted on by the operator product
275 std::vector<int32_t>({0, 0}).data(), // space mode action duality (0: from the left; 1: from the right)
276 make_cuDoubleComplex(g_ij, 0.0), // g_ij static coefficient: Always 64-bit-precision complex number
277 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with this operator product
278 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator product
279 }
280 }
281 // Create an empty operator term
282 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
283 spaceShape.size(), // Hilbert space rank (number of modes)
284 spaceShape.data(), // Hilbert space shape (mode extents)
285 &noiseTerm)); // the created empty operator term
286 // Define the operator term: D1 = d * sum_{i} {YY_ii}
287 for (int32_t i = 0; i < spaceShape.size(); ++i) {
288 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
289 noiseTerm,
290 1, // number of elementary tensor operators in the product
291 std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
292 std::vector<int32_t>({i, i}).data(), // space modes acted on by the operator product (from different sides)
293 std::vector<int32_t>({0, 1}).data(), // space mode action duality (0: from the left; 1: from the right)
294 make_cuDoubleComplex(1.0, 0.0), // default coefficient: Always 64-bit-precision complex number
295 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with this operator product
296 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with this operator product
297 }
298
299 // Construct the full Liouvillian operator as a sum of the operator terms
300 // Create an empty operator (super-operator)
301 HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
302 spaceShape.size(), // Hilbert space rank (number of modes)
303 spaceShape.data(), // Hilbert space shape (modes extents)
304 &liouvillian)); // the created empty operator (super-operator)
305 // Append an operator term to the operator (super-operator)
306 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
307 liouvillian,
308 oneBodyTerm, // appended operator term
309 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
310 make_cuDoubleComplex(0.0, -1.0), // -i constant
311 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with the operator term as a whole
312 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
313 // Append an operator term to the operator (super-operator)
314 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
315 liouvillian,
316 twoBodyTerm, // appended operator term
317 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
318 make_cuDoubleComplex(0.0, -1.0), // -i constant
319 {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
320 {fCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
321 CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
322 // Append an operator term to the operator (super-operator)
323 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
324 liouvillian,
325 oneBodyTerm, // appended operator term
326 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
327 make_cuDoubleComplex(0.0, 1.0), // +i constant
328 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with the operator term as a whole
329 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
330 // Append an operator term to the operator (super-operator)
331 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
332 liouvillian,
333 twoBodyTerm, // appended operator term
334 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
335 make_cuDoubleComplex(0.0, 1.0), // +i constant
336 {fCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr}, // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
337 {fCoefGradComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr,
338 CUDENSITYMAT_DIFFERENTIATION_DIR_BACKWARD})); // CPU scalar gradient callback function defining the gradient of the coefficient with respect to the parameter Omega
339 // Append an operator term to the operator (super-operator)
340 const double d = 1.0; // assign some value to the time-independent coefficient
341 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
342 liouvillian,
343 noiseTerm, // appended operator term
344 0, // operator term action duality as a whole (no duality reversing in this case)
345 make_cuDoubleComplex(d, 0.0), // static coefficient associated with the operator term as a whole
346 cudensitymatScalarCallbackNone, // no time-dependent coefficient associated with the operator term as a whole
347 cudensitymatScalarGradientCallbackNone)); // no coefficient gradient associated with the operator term as a whole
348 }
349
350 // Destructor destructs the user-defined Liouvillian operator
351 ~UserDefinedLiouvillian()
352 {
353 // Destroy the Liouvillian operator
354 HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
355
356 // Destroy operator terms
357 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
358 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
359 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
360
361 // Destroy elementary tensor operators
362 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
363 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
364 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
365
366 // Destroy operator tensors
367 destroyArrayGPU(spinYYelems);
368 destroyArrayGPU(spinZZelems);
369 destroyArrayGPU(spinXelems);
370 }
371
372 // Disable copy constructor/assignment (GPU resources are private, no deep copy)
373 UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
374 UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
375 UserDefinedLiouvillian(UserDefinedLiouvillian &&) = delete;
376 UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) = delete;
377
378 /** Returns the number of externally provided Hamiltonian parameters. */
379 int32_t getNumParameters() const
380 {
381 return 1; // one parameter Omega
382 }
383
384 /** Get access to the constructed Liouvillian operator. */
385 cudensitymatOperator_t & get()
386 {
387 return liouvillian;
388 }
389
390};
Now we can use the defined quantum many-body operator in our main code to compute its
action on a mixed quantum state and then backward-differentiate it (compute gradients)
with respect to the user-provided real parameter Omega
. For the sake of simplicity,
we pass a made-up adjoint of the output quantum state to the cudensitymatOperatorComputeActionBackwardDiff()
call, which is just the output quantum state itself (in real scenarios, the adjoint of the output quantum
state will depend on the user-chosen cost function and will be provided by the user).
Upon completion of the cudensitymatOperatorComputeActionBackwardDiff()
call, the paramsGrad
output argument will contain the gradient of the user-defined cost function with respect to
the user-provided real parameter Omega
. Additionally, the backward-differentiation API call
will also return the adjoint of the input quantum state for cases where the input quantum
state implicitly depends on the user-provided real parameters (for example, cases where
the input quantum state comes from a previous operator action step, which is typical for
time-integration of quantum dynamics master equations). Note that both output arguments,
namely paramsGrad
and stateInAdj
, are accumulative, i.e., they will be accumulated into
(it is user’s responsibility to zero them out before the first call).
1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy_grad.h" // user-defined Liouvillian operator example
13
14#include <cmath>
15#include <complex>
16#include <vector>
17#include <chrono>
18#include <iostream>
19#include <cassert>
20
21
22// Number of times to perform operator action on a quantum state
23constexpr int NUM_REPEATS = 2;
24
25// Logging verbosity
26bool verbose = true;
27
28
29// Example workflow
30void exampleWorkflow(cudensitymatHandle_t handle)
31{
32 // Define the composite Hilbert space shape and
33 // quantum state batch size (number of individual quantum states in a batched simulation)
34 const std::vector<int64_t> spaceShape({2,2,2,2}); // dimensions of quantum degrees of freedom
35 const int64_t batchSize = 1; // number of quantum states per batch
36
37 if (verbose) {
38 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
39 for (const auto & dimsn: spaceShape)
40 std::cout << dimsn << ",";
41 std::cout << ")" << std::endl;
42 std::cout << "Quantum state batch size = " << batchSize << std::endl;
43 }
44
45 // Construct a user-defined Liouvillian operator using a convenience C++ class
46 // Note that the constructed Liouvillian operator has some batched coefficients
47 UserDefinedLiouvillian liouvillian(handle, spaceShape, batchSize);
48 if (verbose)
49 std::cout << "Constructed the Liouvillian operator\n";
50
51 // Set and place external user-provided Hamiltonian parameters in GPU memory
52 const int32_t numParams = liouvillian.getNumParameters(); // number of external user-provided Hamiltonian parameters
53 if (verbose)
54 std::cout << "Number of external user-provided Hamiltonian parameters = " << numParams << std::endl;
55 std::vector<double> cpuHamParams(numParams * batchSize);
56 for (int64_t j = 0; j < batchSize; ++j) {
57 for (int32_t i = 0; i < numParams; ++i) {
58 cpuHamParams[j * numParams + i] = double(i+1) / double(j+1); // just setting some parameter values for each instance of the batch
59 }
60 }
61 auto * hamiltonianParams = static_cast<double *>(createInitializeArrayGPU(cpuHamParams));
62 if (verbose)
63 std::cout << "Created an array of external user-provided Hamiltonian parameters in GPU memory\n";
64
65 // Create an array of gradients for the user-provided Hamiltonian parameters in GPU memory
66 std::vector<double> cpuHamParamsGrad(numParams * batchSize, 0.0);
67 auto * hamiltonianParamsGrad = static_cast<double *>(createInitializeArrayGPU(cpuHamParamsGrad));
68 if (verbose)
69 std::cout << "Created an array of gradients for the external user-provided Hamiltonian parameters in GPU memory\n";
70
71 // Declare the input quantum state
72 cudensitymatState_t inputState;
73 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
74 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
75 spaceShape.size(),
76 spaceShape.data(),
77 batchSize,
78 dataType,
79 &inputState));
80
81 // Query the size of the quantum state storage
82 std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
83 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
84 inputState,
85 1, // only one storage component (tensor)
86 &storageSize)); // storage size in bytes
87 const std::size_t stateVolume = storageSize / sizeof(NumericalType); // quantum state tensor volume (number of elements)
88 if (verbose)
89 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
90
91 // Prepare some initial value for the input quantum state batch
92 std::vector<NumericalType> inputStateValue(stateVolume);
93 if constexpr (std::is_same_v<NumericalType, float>) {
94 for (std::size_t i = 0; i < stateVolume; ++i) {
95 inputStateValue[i] = 1.0f / float(i+1); // just some value
96 }
97 } else if constexpr (std::is_same_v<NumericalType, double>) {
98 for (std::size_t i = 0; i < stateVolume; ++i) {
99 inputStateValue[i] = 1.0 / double(i+1); // just some value
100 }
101 } else if constexpr (std::is_same_v<NumericalType, std::complex<float>>) {
102 for (std::size_t i = 0; i < stateVolume; ++i) {
103 inputStateValue[i] = NumericalType{1.0f / float(i+1), -1.0f / float(i+2)}; // just some value
104 }
105 } else if constexpr (std::is_same_v<NumericalType, std::complex<double>>) {
106 for (std::size_t i = 0; i < stateVolume; ++i) {
107 inputStateValue[i] = NumericalType{1.0 / double(i+1), -1.0 / double(i+2)}; // just some value
108 }
109 } else {
110 std::cerr << "Error: Unsupported data type!\n";
111 std::exit(1);
112 }
113 // Allocate initialized GPU storage for the input quantum state with prepared values
114 auto * inputStateElems = createInitializeArrayGPU(inputStateValue);
115 if (verbose)
116 std::cout << "Allocated input quantum state storage and initialized it to some value\n";
117
118 // Attach initialized GPU storage to the input quantum state
119 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
120 inputState,
121 1, // only one storage component (tensor)
122 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
123 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
124 if (verbose)
125 std::cout << "Constructed input quantum state\n";
126
127 // Declare the output quantum state of the same shape
128 cudensitymatState_t outputState;
129 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
130 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
131 spaceShape.size(),
132 spaceShape.data(),
133 batchSize,
134 dataType,
135 &outputState));
136
137 // Allocate GPU storage for the output quantum state
138 auto * outputStateElems = createArrayGPU<NumericalType>(stateVolume);
139 if (verbose)
140 std::cout << "Allocated output quantum state storage\n";
141
142 // Attach GPU storage to the output quantum state
143 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
144 outputState,
145 1, // only one storage component (tensor)
146 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
147 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
148 if (verbose)
149 std::cout << "Constructed output quantum state\n";
150
151 // Declare the adjoint input quantum state of the same shape
152 cudensitymatState_t inputStateAdj;
153 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
154 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
155 spaceShape.size(),
156 spaceShape.data(),
157 batchSize,
158 dataType, // data type must match that of the operators created above
159 &inputStateAdj));
160
161 // Allocate GPU storage for the adjoint input quantum state
162 auto * inputStateAdjElems = createArrayGPU<NumericalType>(stateVolume);
163 if (verbose)
164 std::cout << "Allocated adjoint input quantum state storage\n";
165
166 // Attach GPU storage to the adjoint input quantum state
167 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
168 inputStateAdj,
169 1, // only one storage component (tensor)
170 std::vector<void*>({inputStateAdjElems}).data(), // pointer to the GPU storage for the quantum state
171 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
172 if (verbose)
173 std::cout << "Constructed adjoint input quantum state\n";
174
175 // Declare a workspace descriptor
176 cudensitymatWorkspaceDescriptor_t workspaceDescr;
177 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
178
179 // Query free GPU memory
180 std::size_t freeMem = 0, totalMem = 0;
181 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
182 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.45); // take 45% of the free memory for the workspace buffer
183 if (verbose)
184 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
185
186 // Allocate GPU storage for the workspace buffer
187 const std::size_t bufferVolume = freeMem / sizeof(NumericalType);
188 auto * workspaceBuffer = createArrayGPU<NumericalType>(bufferVolume);
189 if (verbose)
190 std::cout << "Allocated workspace buffer of size (bytes) = " << freeMem << std::endl;
191
192 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
193 auto startTime = std::chrono::high_resolution_clock::now();
194 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
195 liouvillian.get(),
196 inputState,
197 outputState,
198 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
199 freeMem, // max available GPU free memory for the workspace
200 workspaceDescr, // workspace descriptor
201 0x0)); // default CUDA stream
202 auto finishTime = std::chrono::high_resolution_clock::now();
203 std::chrono::duration<double> timeSec = finishTime - startTime;
204 if (verbose)
205 std::cout << "Operator action preparation time (sec) = " << timeSec.count() << std::endl;
206
207 // Query the required workspace buffer size (bytes)
208 std::size_t requiredBufferSize {0};
209 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
210 workspaceDescr,
211 CUDENSITYMAT_MEMSPACE_DEVICE,
212 CUDENSITYMAT_WORKSPACE_SCRATCH,
213 &requiredBufferSize));
214 if (verbose)
215 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
216
217 if (requiredBufferSize > freeMem) {
218 std::cerr << "Error: Required workspace buffer size is greater than the available GPU free memory!\n";
219 std::exit(1);
220 }
221
222 // Attach the workspace buffer to the workspace descriptor
223 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
224 workspaceDescr,
225 CUDENSITYMAT_MEMSPACE_DEVICE,
226 CUDENSITYMAT_WORKSPACE_SCRATCH,
227 workspaceBuffer,
228 requiredBufferSize));
229 if (verbose)
230 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
231
232 // Apply the Liouvillian operator to the input quatum state
233 // and accumulate its action into the output quantum state (note the accumulative += semantics)
234 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
235 // Zero out the output quantum state
236 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
237 outputState,
238 0x0));
239 if (verbose)
240 std::cout << "Initialized the output quantum state to zero\n";
241 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
242 startTime = std::chrono::high_resolution_clock::now();
243 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
244 liouvillian.get(),
245 0.3, // time point (some value)
246 batchSize, // user-defined batch size
247 numParams, // number of external user-defined Hamiltonian parameters
248 hamiltonianParams, // external Hamiltonian parameters in GPU memory
249 inputState, // input quantum state
250 outputState, // output quantum state
251 workspaceDescr, // workspace descriptor
252 0x0)); // default CUDA stream
253 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
254 finishTime = std::chrono::high_resolution_clock::now();
255 timeSec = finishTime - startTime;
256 if (verbose)
257 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
258 }
259
260 // Compute the squared norm of the output quantum state
261 void * norm2 = createInitializeArrayGPU(std::vector<double>(batchSize, 0.0));
262 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
263 outputState,
264 norm2,
265 0x0));
266 if (verbose) {
267 std::cout << "Computed the output quantum state norm:\n";
268 printArrayGPU<double>(norm2, batchSize);
269 }
270
271 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
272
273 // Prepare the Liouvillian operator action backward differentiation (needs to be done only once)
274 startTime = std::chrono::high_resolution_clock::now();
275 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareActionBackwardDiff(handle,
276 liouvillian.get(),
277 inputState,
278 outputState, // adjoint output quantum state is always congruent to the output quantum state
279 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
280 freeMem, // max available GPU free memory for the workspace buffer
281 workspaceDescr, // workspace descriptor
282 0x0)); // default CUDA stream
283 finishTime = std::chrono::high_resolution_clock::now();
284 timeSec = finishTime - startTime;
285 if (verbose)
286 std::cout << "Operator action backward differentiation preparation time (sec) = " << timeSec.count() << std::endl;
287
288 // Query the required workspace buffer size (bytes)
289 requiredBufferSize = 0;
290 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
291 workspaceDescr,
292 CUDENSITYMAT_MEMSPACE_DEVICE,
293 CUDENSITYMAT_WORKSPACE_SCRATCH,
294 &requiredBufferSize));
295 if (verbose)
296 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
297
298 if (requiredBufferSize > freeMem) {
299 std::cerr << "Error: Required workspace buffer size is greater than the available GPU free memory!\n";
300 std::exit(1);
301 }
302
303 // Attach the workspace buffer to the workspace descriptor
304 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
305 workspaceDescr,
306 CUDENSITYMAT_MEMSPACE_DEVICE,
307 CUDENSITYMAT_WORKSPACE_SCRATCH,
308 workspaceBuffer,
309 requiredBufferSize));
310 if (verbose)
311 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
312
313 // Liouvillian operator action backward differentiation:
314 // The adjoint output quantum state, which is always congruent to the output quantum state,
315 // depends on the user-defined cost function, so here we simply pass the previously computed output quantum state.
316 // In real-life applications, the user will pass their adjoint output quantum state, computed for their cost function.
317 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
318 // Zero out the adjoint input quantum state and gradients
319 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
320 inputStateAdj,
321 0x0));
322 initializeArrayGPU(std::vector<double>(numParams * batchSize, 0.0), hamiltonianParamsGrad);
323 if (verbose)
324 std::cout << "Initialized the adjoint input quantum state and gradients to zero\n";
325 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
326 startTime = std::chrono::high_resolution_clock::now();
327 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeActionBackwardDiff(handle,
328 liouvillian.get(),
329 0.3, // time point (some value)
330 batchSize, // user-defined batch size
331 numParams, // number of external user-defined Hamiltonian parameters
332 hamiltonianParams, // external Hamiltonian parameters in GPU memory
333 inputState, // input quantum state
334 outputState, // adjoint output quantum state (here we just pass the previously computed output quantum state for simplicity)
335 inputStateAdj, // adjoint input quantum state
336 hamiltonianParamsGrad, // partial gradients with respect to the user-defined real parameters
337 workspaceDescr, // workspace descriptor
338 0x0)); // default CUDA stream
339 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
340 finishTime = std::chrono::high_resolution_clock::now();
341 timeSec = finishTime - startTime;
342 if (verbose)
343 std::cout << "Operator action backward differentiation computation time (sec) = " << timeSec.count() << std::endl;
344 }
345
346 // Compute the squared norm of the adjoint input quantum state
347 initializeArrayGPU(std::vector<double>(batchSize, 0.0), norm2);
348 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
349 inputStateAdj,
350 norm2,
351 0x0));
352 if (verbose) {
353 std::cout << "Computed the adjoint input quantum state norm:\n";
354 printArrayGPU<double>(norm2, batchSize);
355 std::cout << "Hamiltonian parameters gradients:\n";
356 printArrayGPU<double>(hamiltonianParamsGrad, numParams * batchSize);
357 }
358
359 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
360
361 // Destroy the norm2 array
362 destroyArrayGPU(norm2);
363
364 // Destroy workspace descriptor
365 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
366
367 // Destroy workspace buffer storage
368 destroyArrayGPU(workspaceBuffer);
369
370 // Destroy quantum states
371 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputStateAdj));
372 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
373 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
374
375 // Destroy quantum state storage
376 destroyArrayGPU(inputStateAdjElems);
377 destroyArrayGPU(outputStateElems);
378 destroyArrayGPU(inputStateElems);
379
380 // Destroy external Hamiltonian parameters
381 destroyArrayGPU(static_cast<void *>(hamiltonianParamsGrad));
382 destroyArrayGPU(static_cast<void *>(hamiltonianParams));
383
384 if (verbose)
385 std::cout << "Destroyed resources\n" << std::flush;
386}
387
388
389int main(int argc, char ** argv)
390{
391 // Assign a GPU to the process
392 HANDLE_CUDA_ERROR(cudaSetDevice(0));
393 if (verbose)
394 std::cout << "Set active device\n";
395
396 // Create a library handle
397 cudensitymatHandle_t handle;
398 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
399 if (verbose)
400 std::cout << "Created a library handle\n";
401
402 // Run the example
403 exampleWorkflow(handle);
404
405 // Destroy the library handle
406 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
407 if (verbose)
408 std::cout << "Destroyed the library handle\n";
409
410 HANDLE_CUDA_ERROR(cudaDeviceReset());
411
412 // Done
413 return 0;
414}
Useful tips¶
For debugging, one can set the environment variable
CUDENSITYMAT_LOG_LEVEL=n
. The leveln
= 0, 1, …, 5 corresponds to the logger level as described in the table below. The environment variableCUDENSITYMAT_LOG_FILE=<filepath>
can be used to redirect the log output to a custom file at<filepath>
instead ofstdout
.
Level |
Summary |
Long Description |
0 |
Off |
Logging is disabled (default) |
1 |
Errors |
Only errors will be logged |
2 |
Performance Trace |
API calls that launch CUDA kernels will log their parameters and important information |
3 |
Performance Hints |
Hints that can potentially improve the application’s performance |
4 |
Heuristics Trace |
Provides general information about the library execution, may contain details about heuristic status |
5 |
API Trace |
API calls will log their parameter and important information |