Examples¶
In this section, we show an example on how to define a quantum operator, quantum states,
and then compute the action of the quantum operator on a quantum state. For clarity,
the quantum operator is defined inside a separate header transverse_ising_full_fused_noisy.h
where it is wrapped in a helper C++ class UserDefinedLiouvillian
. We also provide
a utility header helpers.h
containing convenient GPU array instantiation functions.
Compiling code¶
Assuming cuQuantum has been extracted in CUQUANTUM_ROOT
and cuTENSOR in CUTENSOR_ROOT
, we update the library path as follows:
export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}
Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ${CUTENSOR_ROOT}/lib/11
).
The serial sample code discussed below (operator_action_example.cpp
) can be compiled via the following command:
nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -o operator_action_example
For static linking against the cuDensityMat library, use the following command:
nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a -L${CUTENSOR_DIR}/lib/12 -lcutensor -o operator_action_example
In order to build parallel (MPI) versions of the example operator_action_mpi_example.cpp
,
one will need to have an CUDA-aware MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and then
set the environment variable $CUDENSITYMAT_COMM_LIB
to the path to the MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so
.
The MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so
can be built
inside the ${CUQUANTUM_ROOT}/distributed_interfaces
folder by calling the provided build script there.
In order to link the executable to a CUDA-aware MPI library, one will need to add -I${MPI_PATH}/include
and -L${MPI_PATH}/lib -lmpi
to the build command:
nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -L${MPI_PATH}/lib -lmpi -o operator_action_mpi_example
Warning
When running operator_action_mpi_example.cpp
without CUDA-aware MPI, the program will crash.
Note
Depending on the source of the cuQuantum package, you may need to replace lib
above by lib64
,
depending which folder name is used inside your cuQuantum package.
Code example (serial execution on a single GPU)¶
The following code example illustrates the common steps necessary to use the cuDensityMat library to compute the action of a quantum many-body operator on a quantum state. The full sample code can be found in the NVIDIA/cuQuantum repository (main serial code and operator definition as well as the utility code).
First let’s introduce a helper class to construct a quantum many-body operator, for example, the transverse field Ising Hamiltonian with fused ZZ terms and and additional noise term.
1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#pragma once
7
8#include <cudensitymat.h> // cuDensityMat library header
9#include "helpers.h" // GPU helper functions
10
11#include <cmath>
12#include <complex>
13#include <vector>
14#include <iostream>
15#include <cassert>
16
17
18/* DESCRIPTION:
19 Time-dependent transverse-field Ising Hamiltonian operator
20 with ordered and fused ZZ terms, plus fused unitary dissipation terms:
21 H = sum_{i} {h_i * X_i} // transverse field sum of X_i operators with h_i coefficients
22 + f(t) * sum_{i < j} {g_ij * ZZ_ij} // modulated sum of the fused ordered {Z_i * Z_j} terms with g_ij coefficients
23 + d * sum_{i} {Y_i * {..} * Y_i} // scaled sum of the dissipation terms {Y_i * {..} * Y_i} fused into the YY_ii super-operators
24 where {..} is the placeholder for the density matrix to show that the Y_i operators act from different sides.
25*/
26
27
28/** Example of a user-provided scalar CPU callback C function
29 * defining a time-dependent coefficient inside the Hamiltonian:
30 * f(t) = cos(omega * t) + i * sin(omega * t)
31 */
32extern "C"
33int32_t tdCoefComplex64(double time, //in: time point
34 int64_t batchSize, //in: user-defined batch size (number of coefficients in the batch)
35 int32_t numParams, //in: number of external user-provided Hamiltonian parameters (this function expects one parameter, omega)
36 const double * params, //in: params[0:numParams-1][0:batchSize-1]: User-provided Hamiltonian parameters for all instances of the batch
37 cudaDataType_t dataType, //in: data type (expecting CUDA_C_64F in this function)
38 void * scalarStorage, //inout: CPU-accessible storage for the returned function value(s) of shape [0:batchSize-1]
39 cudaStream_t stream) //in: CUDA stream (default is 0x0)
40{
41 auto * tdCoef = static_cast<cuDoubleComplex*>(scalarStorage); // casting to cuDoubleComplex because the function returns CUDA_C_64F data type
42 for (int64_t i = 0; i < batchSize; ++i) {
43 const auto omega = params[i * numParams + 0]; // params[0][i]: 0-th parameter for i-th instance of the batch
44 tdCoef[i] = make_cuDoubleComplex(std::cos(omega * time), std::sin(omega * time));
45 }
46 return 0; // error code (0: Success)
47}
48
49
50/** Convenience class which encapsulates the user-defined Liouvillian operator (system Hamiltonian + dissipation terms):
51 * - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
52 * - Method get() returns a reference to the constructed Liouvillian operator
53 * - Destructor releases all resources used by the Liouvillian operator
54 */
55class UserDefinedLiouvillian final
56{
57private:
58 // Data members
59 cudensitymatHandle_t handle; // library context handle
60 const std::vector<int64_t> spaceShape; // Hilbert space shape (extents of the modes of the composite Hilbert space)
61 void * spinXelems {nullptr}; // elements of the X spin operator in GPU RAM
62 void * spinYYelems {nullptr}; // elements of the fused YY two-spin operator in GPU RAM
63 void * spinZZelems {nullptr}; // elements of the fused ZZ two-spin operator in GPU RAM
64 cudensitymatElementaryOperator_t spinX; // X spin operator
65 cudensitymatElementaryOperator_t spinYY; // fused YY two-spin operator
66 cudensitymatElementaryOperator_t spinZZ; // fused ZZ two-spin operator
67 cudensitymatOperatorTerm_t oneBodyTerm; // operator term: H1 = sum_{i} {h_i * X_i}
68 cudensitymatOperatorTerm_t twoBodyTerm; // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
69 cudensitymatOperatorTerm_t noiseTerm; // operator term: D1 = d * sum_{i} {YY_ii} // Y_i operators act from different sides on the density matrix
70 cudensitymatOperator_t liouvillian; // full operator: (-i * (H1 + H2) * rho) + (i * rho * (H1 + H2)) + D1
71
72public:
73
74 // Constructor constructs a user-defined Liouvillian operator
75 UserDefinedLiouvillian(cudensitymatHandle_t contextHandle, // library context handle
76 const std::vector<int64_t> & hilbertSpaceShape): // Hilbert space shape
77 handle(contextHandle), spaceShape(hilbertSpaceShape)
78 {
79 // Define the necessary operator tensors in GPU memory (F-order storage!)
80 spinXelems = createArrayGPU<std::complex<double>>( // X[i0; j0]
81 {{0.0, 0.0}, {1.0, 0.0}, // 1st column of matrix X
82 {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
83
84 spinYYelems = createArrayGPU<std::complex<double>>( // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
85 {{0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, // 1st column of matrix YY
86 {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix YY
87 {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix YY
88 {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
89
90 spinZZelems = createArrayGPU<std::complex<double>>( // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
91 {{1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 1st column of matrix ZZ
92 {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix ZZ
93 {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix ZZ
94 {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}}); // 4th column of matrix ZZ
95
96 // Construct the necessary Elementary Tensor Operators
97 // X_i operator
98 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
99 1, // one-body operator
100 std::vector<int64_t>({2}).data(), // acts in tensor space of shape {2}
101 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
102 0, // 0 for dense tensors
103 nullptr, // nullptr for dense tensors
104 CUDA_C_64F, // data type
105 spinXelems, // tensor elements in GPU memory
106 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
107 &spinX)); // the created elementary tensor operator
108 // ZZ_ij = Z_i * Z_j fused operator
109 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
110 2, // two-body operator
111 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
112 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
113 0, // 0 for dense tensors
114 nullptr, // nullptr for dense tensors
115 CUDA_C_64F, // data type
116 spinZZelems, // tensor elements in GPU memory
117 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
118 &spinZZ)); // the created elementary tensor operator
119 // YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
120 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
121 2, // two-body operator
122 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
123 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
124 0, // 0 for dense tensors
125 nullptr, // nullptr for dense tensors
126 CUDA_C_64F, // data type
127 spinYYelems, // tensor elements in GPU memory
128 cudensitymatTensorCallbackNone, // no tensor callback function (tensor is not time-dependent)
129 &spinYY)); // the created elementary tensor operator
130
131 // Construct the necessary Operator Terms from direct products of Elementary Tensor Operators
132 // Create an empty operator term
133 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
134 spaceShape.size(), // Hilbert space rank (number of modes)
135 spaceShape.data(), // Hilbert space shape (mode extents)
136 &oneBodyTerm)); // the created empty operator term
137 // Define the operator term: H1 = sum_{i} {h_i * X_i}
138 for (int32_t i = 0; i < spaceShape.size(); ++i) {
139 const double h_i = 1.0 / static_cast<double>(i+1); // just assign some value (time-independent h_i coefficient)
140 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
141 oneBodyTerm,
142 1, // number of elementary tensor operators in the product
143 std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
144 std::vector<int32_t>({i}).data(), // space modes acted on by the operator product
145 std::vector<int32_t>({0}).data(), // space mode action duality (0: from the left; 1: from the right)
146 make_cuDoubleComplex(h_i, 0.0), // h_i constant coefficient: Always 64-bit-precision complex number
147 cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with this operator product
148 }
149 // Create an empty operator term
150 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
151 spaceShape.size(), // Hilbert space rank (number of modes)
152 spaceShape.data(), // Hilbert space shape (mode extents)
153 &twoBodyTerm)); // the created empty operator term
154 // Define the operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
155 for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
156 for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
157 const double g_ij = -1.0 / static_cast<double>(i + j + 1); // just assign some value (time-independent g_ij coefficient)
158 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
159 twoBodyTerm,
160 1, // number of elementary tensor operators in the product
161 std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
162 std::vector<int32_t>({i, j}).data(), // space modes acted on by the operator product
163 std::vector<int32_t>({0, 0}).data(), // space mode action duality (0: from the left; 1: from the right)
164 make_cuDoubleComplex(g_ij, 0.0), // g_ij constant coefficient: Always 64-bit-precision complex number
165 cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with this operator product
166 }
167 }
168 // Create an empty operator term
169 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
170 spaceShape.size(), // Hilbert space rank (number of modes)
171 spaceShape.data(), // Hilbert space shape (mode extents)
172 &noiseTerm)); // the created empty operator term
173 // Define the operator term: D1 = d * sum_{i} {YY_ii}
174 for (int32_t i = 0; i < spaceShape.size(); ++i) {
175 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
176 noiseTerm,
177 1, // number of elementary tensor operators in the product
178 std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
179 std::vector<int32_t>({i, i}).data(), // space modes acted on by the operator product (from different sides)
180 std::vector<int32_t>({0, 1}).data(), // space mode action duality (0: from the left; 1: from the right)
181 make_cuDoubleComplex(1.0, 0.0), // default coefficient: Always 64-bit-precision complex number
182 cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with this operator product
183 }
184
185 // Construct the full Liouvillian operator as a sum of the operator terms
186 // Create an empty operator (super-operator)
187 HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
188 spaceShape.size(), // Hilbert space rank (number of modes)
189 spaceShape.data(), // Hilbert space shape (modes extents)
190 &liouvillian)); // the created empty operator (super-operator)
191 // Append an operator term to the operator (super-operator)
192 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
193 liouvillian,
194 oneBodyTerm, // appended operator term
195 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
196 make_cuDoubleComplex(0.0, -1.0), // -i constant
197 cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with the operator term as a whole
198 // Append an operator term to the operator (super-operator)
199 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
200 liouvillian,
201 twoBodyTerm, // appended operator term
202 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
203 make_cuDoubleComplex(0.0, -1.0), // -i constant
204 {tdCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr})); // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
205 // Append an operator term to the operator (super-operator)
206 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
207 liouvillian,
208 oneBodyTerm, // appended operator term
209 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
210 make_cuDoubleComplex(0.0, 1.0), // i constant
211 cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with the operator term as a whole
212 // Append an operator term to the operator (super-operator)
213 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
214 liouvillian,
215 twoBodyTerm, // appended operator term
216 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
217 make_cuDoubleComplex(0.0, 1.0), // i constant
218 {tdCoefComplex64, CUDENSITYMAT_CALLBACK_DEVICE_CPU, nullptr})); // CPU scalar callback function defining the time-dependent coefficient associated with this operator term as a whole
219 // Append an operator term to the operator (super-operator)
220 const double d = 0.42; // just assign some value (time-independent coefficient)
221 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
222 liouvillian,
223 noiseTerm, // appended operator term
224 0, // operator term action duality as a whole (no duality reversing in this case)
225 make_cuDoubleComplex(d, 0.0), // constant coefficient associated with the operator term as a whole
226 cudensitymatScalarCallbackNone)); // no time-dependent coefficient associated with the operator term as a whole
227 }
228
229 // Destructor destructs the user-defined Liouvillian operator
230 ~UserDefinedLiouvillian()
231 {
232 // Destroy the Liouvillian operator
233 HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
234
235 // Destroy operator terms
236 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
237 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
238 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
239
240 // Destroy elementary tensor operators
241 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
242 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
243 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
244
245 // Destroy operator tensors
246 destroyArrayGPU(spinYYelems);
247 destroyArrayGPU(spinZZelems);
248 destroyArrayGPU(spinXelems);
249 }
250
251 // Disable copy constructor/assignment (GPU resources are private, no deep copy)
252 UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
253 UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
254 UserDefinedLiouvillian(UserDefinedLiouvillian &&) noexcept = default;
255 UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) noexcept = default;
256
257 // Get access to the constructed Liouvillian operator
258 cudensitymatOperator_t & get()
259 {
260 return liouvillian;
261 }
262
263};
Now we can use this quantum many-body operator in our main code.
1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy.h" // user-defined Liouvillian operator example
13
14
15#include <cmath>
16#include <complex>
17#include <vector>
18#include <chrono>
19#include <iostream>
20#include <cassert>
21
22
23// Number of times to perform operator action on a quantum state
24constexpr int NUM_REPEATS = 2;
25
26// Logging verbosity
27bool verbose = true;
28
29
30// Example workflow
31void exampleWorkflow(cudensitymatHandle_t handle)
32{
33 // Define the composite Hilbert space shape and
34 // quantum state batch size (number of individual quantum states in a batched simulation)
35 const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
36 const int64_t batchSize = 1; // number of quantum states per batch (default is 1)
37 const int32_t numParams = 1; // number of external user-provided Hamiltonian parameters
38
39 if (verbose) {
40 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
41 for (const auto & dimsn: spaceShape)
42 std::cout << dimsn << ",";
43 std::cout << ")" << std::endl;
44 std::cout << "Quantum state batch size = " << batchSize << std::endl;
45 }
46
47 // Place external user-provided Hamiltonian parameters in GPU memory
48 std::vector<double> cpuHamParams(numParams * batchSize, 13.42); // each parameter can have a different value, of course
49 auto * hamiltonianParams = static_cast<double *>(createArrayGPU(cpuHamParams));
50
51 // Construct a user-defined Liouvillian operator using a convenience C++ class
52 UserDefinedLiouvillian liouvillian(handle, spaceShape);
53 if (verbose)
54 std::cout << "Constructed the Liouvillian operator\n";
55
56 // Declare the input quantum state
57 cudensitymatState_t inputState;
58 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
59 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
60 spaceShape.size(),
61 spaceShape.data(),
62 batchSize,
63 CUDA_C_64F, // data type must match that of the operators created above
64 &inputState));
65
66 // Query the size of the quantum state storage
67 std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
68 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
69 inputState,
70 1, // only one storage component (tensor)
71 &storageSize)); // storage size in bytes
72 const std::size_t stateVolume = storageSize / sizeof(std::complex<double>); // quantum state tensor volume (number of elements)
73 if (verbose)
74 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
75
76 // Prepare some initial value for the input quantum state
77 std::vector<std::complex<double>> inputStateValue(stateVolume);
78 for (std::size_t i = 0; i < stateVolume; ++i) {
79 inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
80 }
81
82 // Allocate initialized GPU storage for the input quantum state with prepared values
83 auto * inputStateElems = createArrayGPU(inputStateValue);
84
85 // Attach initialized GPU storage to the input quantum state
86 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
87 inputState,
88 1, // only one storage component (tensor)
89 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
90 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
91 if (verbose)
92 std::cout << "Constructed input quantum state\n";
93
94 // Declare the output quantum state of the same shape
95 cudensitymatState_t outputState;
96 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
97 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
98 spaceShape.size(),
99 spaceShape.data(),
100 batchSize,
101 CUDA_C_64F, // data type must match that of the operators created above
102 &outputState));
103
104 // Allocate initialized GPU storage for the output quantum state
105 auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
106
107 // Attach initialized GPU storage to the output quantum state
108 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109 outputState,
110 1, // only one storage component (tensor)
111 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
112 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
113 if (verbose)
114 std::cout << "Constructed output quantum state\n";
115
116 // Declare a workspace descriptor
117 cudensitymatWorkspaceDescriptor_t workspaceDescr;
118 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
119
120 // Query free GPU memory
121 std::size_t freeMem = 0, totalMem = 0;
122 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
123 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
124 if (verbose)
125 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
126
127 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
128 const auto startTime = std::chrono::high_resolution_clock::now();
129 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
130 liouvillian.get(),
131 inputState,
132 outputState,
133 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
134 freeMem, // max available GPU free memory for the workspace
135 workspaceDescr, // workspace descriptor
136 0x0)); // default CUDA stream
137 const auto finishTime = std::chrono::high_resolution_clock::now();
138 const std::chrono::duration<double> timeSec = finishTime - startTime;
139 if (verbose)
140 std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
141
142 // Query the required workspace buffer size (bytes)
143 std::size_t requiredBufferSize {0};
144 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
145 workspaceDescr,
146 CUDENSITYMAT_MEMSPACE_DEVICE,
147 CUDENSITYMAT_WORKSPACE_SCRATCH,
148 &requiredBufferSize));
149 if (verbose)
150 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
151
152 // Allocate GPU storage for the workspace buffer
153 const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
154 auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
155 if (verbose)
156 std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
157
158 // Attach the workspace buffer to the workspace descriptor
159 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
160 workspaceDescr,
161 CUDENSITYMAT_MEMSPACE_DEVICE,
162 CUDENSITYMAT_WORKSPACE_SCRATCH,
163 workspaceBuffer,
164 requiredBufferSize));
165 if (verbose)
166 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
167
168 // Zero out the output quantum state
169 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
170 outputState,
171 0x0));
172 if (verbose)
173 std::cout << "Initialized the output quantum state to zero\n";
174
175 // Apply the Liouvillian operator to the input quatum state
176 // and accumulate its action into the output quantum state (note += semantics)
177 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
178 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
179 const auto startTime = std::chrono::high_resolution_clock::now();
180 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
181 liouvillian.get(),
182 0.01, // time point
183 batchSize, // user-defined batch size
184 numParams, // number of external user-defined Hamiltonian parameters
185 hamiltonianParams, // external Hamiltonian parameters in GPU memory
186 inputState, // input quantum state
187 outputState, // output quantum state
188 workspaceDescr, // workspace descriptor
189 0x0)); // default CUDA stream
190 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
191 const auto finishTime = std::chrono::high_resolution_clock::now();
192 const std::chrono::duration<double> timeSec = finishTime - startTime;
193 if (verbose)
194 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
195 }
196
197 // Compute the squared norm of the output quantum state
198 void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
199 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
200 outputState,
201 norm2,
202 0x0));
203 if (verbose)
204 std::cout << "Computed the output quantum state norm\n";
205 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
206 destroyArrayGPU(norm2);
207
208 // Destroy workspace descriptor
209 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
210
211 // Destroy workspace buffer storage
212 destroyArrayGPU(workspaceBuffer);
213
214 // Destroy quantum states
215 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
216 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
217
218 // Destroy quantum state storage
219 destroyArrayGPU(outputStateElems);
220 destroyArrayGPU(inputStateElems);
221
222 // Destroy external Hamiltonian parameters
223 destroyArrayGPU(static_cast<void *>(hamiltonianParams));
224
225 if (verbose)
226 std::cout << "Destroyed resources\n" << std::flush;
227}
228
229
230int main(int argc, char ** argv)
231{
232 // Assign a GPU to the process
233 HANDLE_CUDA_ERROR(cudaSetDevice(0));
234 if (verbose)
235 std::cout << "Set active device\n";
236
237 // Create a library handle
238 cudensitymatHandle_t handle;
239 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
240 if (verbose)
241 std::cout << "Created a library handle\n";
242
243 // Run the example
244 exampleWorkflow(handle);
245
246 // Destroy the library handle
247 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
248 if (verbose)
249 std::cout << "Destroyed the library handle\n";
250
251 // Done
252 return 0;
253}
Code example (parallel execution on multiple GPUs)¶
It is straightforward to adapt main serial code and enable parallel execution across multiple/many GPU devices (across multiple/many nodes). We will illustrate this with an example using the Message Passing Interface (MPI) as the communication layer. Below we show the minor additions that need to be made in order to enable distributed parallel execution without making any changes to the original serial source code.
The full sample code can be found in the NVIDIA/cuQuantum repository (main MPI code and operator definition as well as the utility code).
Here is the updated main code for multi-GPU runs.
1/* Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy.h" // user-defined Liouvillian operator example
13
14
15// MPI library (optional)
16#ifdef MPI_ENABLED
17#include <mpi.h>
18#endif
19
20#include <cmath>
21#include <complex>
22#include <vector>
23#include <chrono>
24#include <iostream>
25#include <cassert>
26
27
28// Number of times to perform operator action on a quantum state
29constexpr int NUM_REPEATS = 2;
30
31// Logging verbosity
32bool verbose = true;
33
34
35// Example workflow
36void exampleWorkflow(cudensitymatHandle_t handle)
37{
38 // Define the composite Hilbert space shape and
39 // quantum state batch size (number of individual quantum states in a batched simulation)
40 const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
41 const int64_t batchSize = 1; // number of quantum states per batch (default is 1)
42 const int32_t numParams = 1; // number of external user-provided Hamiltonian parameters
43
44 if (verbose) {
45 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
46 for (const auto & dimsn: spaceShape)
47 std::cout << dimsn << ",";
48 std::cout << ")" << std::endl;
49 std::cout << "Quantum state batch size = " << batchSize << std::endl;
50 }
51
52 // Place external user-provided Hamiltonian parameters in GPU memory
53 std::vector<double> cpuHamParams(numParams * batchSize, 13.42); // each parameter can have a different value, of course
54 auto * hamiltonianParams = static_cast<double *>(createArrayGPU(cpuHamParams));
55
56 // Construct a user-defined Liouvillian operator using a convenience C++ class
57 UserDefinedLiouvillian liouvillian(handle, spaceShape);
58 if (verbose)
59 std::cout << "Constructed the Liouvillian operator\n";
60
61 // Declare the input quantum state
62 cudensitymatState_t inputState;
63 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
64 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
65 spaceShape.size(),
66 spaceShape.data(),
67 batchSize,
68 CUDA_C_64F, // data type must match that of the operators created above
69 &inputState));
70
71 // Query the size of the quantum state storage
72 std::size_t storageSize {0}; // only one storage component (tensor) is needed (no tensor factorization)
73 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
74 inputState,
75 1, // only one storage component (tensor)
76 &storageSize)); // storage size in bytes
77 const std::size_t stateVolume = storageSize / sizeof(std::complex<double>); // quantum state tensor volume (number of elements)
78 if (verbose)
79 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
80
81 // Prepare some initial value for the input quantum state
82 std::vector<std::complex<double>> inputStateValue(stateVolume);
83 for (std::size_t i = 0; i < stateVolume; ++i) {
84 inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
85 }
86
87 // Allocate initialized GPU storage for the input quantum state with prepared values
88 auto * inputStateElems = createArrayGPU(inputStateValue);
89
90 // Attach initialized GPU storage to the input quantum state
91 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
92 inputState,
93 1, // only one storage component (tensor)
94 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
95 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
96 if (verbose)
97 std::cout << "Constructed input quantum state\n";
98
99 // Declare the output quantum state of the same shape
100 cudensitymatState_t outputState;
101 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
102 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
103 spaceShape.size(),
104 spaceShape.data(),
105 batchSize,
106 CUDA_C_64F, // data type must match that of the operators created above
107 &outputState));
108
109 // Allocate initialized GPU storage for the output quantum state
110 auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
111
112 // Attach initialized GPU storage to the output quantum state
113 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
114 outputState,
115 1, // only one storage component (tensor)
116 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
117 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
118 if (verbose)
119 std::cout << "Constructed output quantum state\n";
120
121 // Declare a workspace descriptor
122 cudensitymatWorkspaceDescriptor_t workspaceDescr;
123 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
124
125 // Query free GPU memory
126 std::size_t freeMem = 0, totalMem = 0;
127 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
128 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
129 if (verbose)
130 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
131
132 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
133 const auto startTime = std::chrono::high_resolution_clock::now();
134 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
135 liouvillian.get(),
136 inputState,
137 outputState,
138 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
139 freeMem, // max available GPU free memory for the workspace
140 workspaceDescr, // workspace descriptor
141 0x0)); // default CUDA stream
142 const auto finishTime = std::chrono::high_resolution_clock::now();
143 const std::chrono::duration<double> timeSec = finishTime - startTime;
144 if (verbose)
145 std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
146
147 // Query the required workspace buffer size (bytes)
148 std::size_t requiredBufferSize {0};
149 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
150 workspaceDescr,
151 CUDENSITYMAT_MEMSPACE_DEVICE,
152 CUDENSITYMAT_WORKSPACE_SCRATCH,
153 &requiredBufferSize));
154 if (verbose)
155 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
156
157 // Allocate GPU storage for the workspace buffer
158 const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
159 auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
160 if (verbose)
161 std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
162
163 // Attach the workspace buffer to the workspace descriptor
164 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
165 workspaceDescr,
166 CUDENSITYMAT_MEMSPACE_DEVICE,
167 CUDENSITYMAT_WORKSPACE_SCRATCH,
168 workspaceBuffer,
169 requiredBufferSize));
170 if (verbose)
171 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
172
173 // Zero out the output quantum state
174 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
175 outputState,
176 0x0));
177 if (verbose)
178 std::cout << "Initialized the output quantum state to zero\n";
179
180 // Apply the Liouvillian operator to the input quatum state
181 // and accumulate its action into the output quantum state (note += semantics)
182 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
183 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
184 const auto startTime = std::chrono::high_resolution_clock::now();
185 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
186 liouvillian.get(),
187 0.01, // time point
188 batchSize, // user-defined batch size
189 numParams, // number of external user-defined Hamiltonian parameters
190 hamiltonianParams, // external Hamiltonian parameters in GPU memory
191 inputState, // input quantum state
192 outputState, // output quantum state
193 workspaceDescr, // workspace descriptor
194 0x0)); // default CUDA stream
195 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
196 const auto finishTime = std::chrono::high_resolution_clock::now();
197 const std::chrono::duration<double> timeSec = finishTime - startTime;
198 if (verbose)
199 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
200 }
201
202 // Compute the squared norm of the output quantum state
203 void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
204 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
205 outputState,
206 norm2,
207 0x0));
208 if (verbose)
209 std::cout << "Computed the output quantum state norm\n";
210 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
211 destroyArrayGPU(norm2);
212
213 // Destroy workspace descriptor
214 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
215
216 // Destroy workspace buffer storage
217 destroyArrayGPU(workspaceBuffer);
218
219 // Destroy quantum states
220 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
221 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
222
223 // Destroy quantum state storage
224 destroyArrayGPU(outputStateElems);
225 destroyArrayGPU(inputStateElems);
226
227 // Destroy external Hamiltonian parameters
228 destroyArrayGPU(static_cast<void *>(hamiltonianParams));
229
230 if (verbose)
231 std::cout << "Destroyed resources\n" << std::flush;
232}
233
234
235int main(int argc, char ** argv)
236{
237 // Initialize MPI library (if needed)
238#ifdef MPI_ENABLED
239 HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
240 int procRank {-1};
241 HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
242 int numProcs {0};
243 HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
244 if (procRank != 0) verbose = false;
245 if (verbose)
246 std::cout << "Initialized MPI library\n";
247#else
248 const int procRank {0};
249 const int numProcs {1};
250#endif
251
252 // Assign a GPU to the process
253 int numDevices {0};
254 HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
255 const int deviceId = procRank % numDevices;
256 HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
257 if (verbose)
258 std::cout << "Set active device\n";
259
260 // Create a library handle
261 cudensitymatHandle_t handle;
262 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
263 if (verbose)
264 std::cout << "Created a library handle\n";
265
266 // Reset distributed configuration (once)
267#ifdef MPI_ENABLED
268 MPI_Comm comm;
269 HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
270 HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
271 CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
272 &comm, sizeof(comm)));
273#endif
274
275 // Run the example
276 exampleWorkflow(handle);
277
278 // Synchronize MPI processes
279#ifdef MPI_ENABLED
280 HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
281#endif
282
283 // Destroy the library handle
284 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
285 if (verbose)
286 std::cout << "Destroyed the library handle\n";
287
288 // Finalize the MPI library
289#ifdef MPI_ENABLED
290 HANDLE_MPI_ERROR(MPI_Finalize());
291 if (verbose)
292 std::cout << "Finalized MPI library\n";
293#endif
294
295 // Done
296 return 0;
297}
Useful tips¶
For debugging, one can set the environment variable
CUDENSITYMAT_LOG_LEVEL=n
. The leveln
= 0, 1, …, 5 corresponds to the logger level as described in the table below. The environment variableCUDENSITYMAT_LOG_FILE=<filepath>
can be used to redirect the log output to a custom file at<filepath>
instead ofstdout
.
Level |
Summary |
Long Description |
0 |
Off |
Logging is disabled (default) |
1 |
Errors |
Only errors will be logged |
2 |
Performance Trace |
API calls that launch CUDA kernels will log their parameters and important information |
3 |
Performance Hints |
Hints that can potentially improve the application’s performance |
4 |
Heuristics Trace |
Provides general information about the library execution, may contain details about heuristic status |
5 |
API Trace |
API calls will log their parameter and important information |