Examples¶
In this section, we show an example on how to define a quantum operator, quantum states,
and then compute the action of the quantum operator on a quantum state. For clarity,
the quantum operator is defined inside a separate header transverse_ising_full_fused_noisy.h
where it is wrapped in a helper C++ class UserDefinedLiouvillian
. We also provide
a utility header helpers.h
containing convenient GPU array instantiation functions.
Compiling Code¶
Assuming cuQuantum has been extracted in CUQUANTUM_ROOT
and cuTENSOR in CUTENSOR_ROOT
, we update the library path as follows:
export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}
Depending on your CUDA Toolkit, you might have to choose a different library version (e.g., ${CUTENSOR_ROOT}/lib/11
).
The serial sample code discussed below (operator_action_example.cpp
) can be compiled via the following command:
nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -o operator_action_example
For static linking against the cuDensityMat library, use the following command:
nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a -L${CUTENSOR_DIR}/lib/12 -lcutensor -o operator_action_example
In order to build parallel (MPI) versions of the example operator_action_mpi_example.cpp
,
one will need to have an CUDA-aware MPI library installed (e.g., recent OpenMPI, MPICH or MVAPICH) and then
set the environment variable $CUDENSITYMAT_COMM_LIB
to the path to the MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so
.
The MPI interface wrapper shared library libcudensitymat_distributed_interface_mpi.so
can be built
inside the ${CUQUANTUM_ROOT}/distributed_interfaces
folder by calling the provided build script there.
In order to link the executable to a CUDA-aware MPI library, one will need to add -I${MPI_PATH}/include
and -L${MPI_PATH}/lib -lmpi
to the build command:
nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -L${MPI_PATH}/lib -lmpi -o operator_action_mpi_example
Warning
When running operator_action_mpi_example.cpp
without CUDA-aware MPI, the program will crash.
Note
Depending on the source of the cuQuantum package, you may need to replace lib
above by lib64
,
depending which folder name is used inside your cuQuantum package.
Code Example (Serial Execution on a Single GPU)¶
The following code example illustrates the common steps necessary to use the cuDensityMat library to compute the action of a quantum many-body operator on a quantum state. The full sample code can be found in the NVIDIA/cuQuantum repository (main serial code and operator definition as well as the utility code).
First let’s introduce a helper class to construct a quantum many-body operator, for example, the transverse field Ising Hamiltonian with fused ZZ terms and and additional noise term.
1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#pragma once
7
8#include <cudensitymat.h> // cuDensityMat library header
9#include "helpers.h" // helper functions
10
11#include <cmath>
12#include <complex>
13#include <vector>
14#include <iostream>
15#include <cassert>
16
17
18/* Time-dependent transverse-field Ising Hamiltonian operator
19 with ordered and fused ZZ terms, plus fused unitary dissipation terms:
20 H = sum_{i} {h_i * X_i} // transverse field sum of X_i
21 + f(t) * sum_{i < j} {g_ij * ZZ_ij} // modulated sum of fused {Z_i * Z_j} terms
22 + d * sum_{i} {Y_i * {..} * Y_i} // dissipation terms {Y_i * {..} * Y_i} will be fused into the YY_ii super-operator
23 where {..} is the placeholder for the density matrix to show that the operators act from a different side
24*/
25
26
27// User-defined C++ callback function defining a time-dependent coefficient inside the Hamiltonian:
28// f(t) = cos(omega * t) + i * sin(omega * t)
29extern "C"
30int32_t tdCoefComplex64(double time, // time point
31 int32_t numParams, // number of external user-defined Liouvillian parameters (= 1 here)
32 const double params[], // params[0] is omega (user-defined Liouvillian parameter)
33 cudaDataType_t dataType, // data type (CUDA_C_64F here)
34 void * scalarStorage) // CPU storage for the returned function value
35{
36 const auto omega = params[0];
37 auto * tdCoef = static_cast<std::complex<double>*>(scalarStorage); // casting to complex<double> because it returns CUDA_C_64F data type
38 *tdCoef = {std::cos(omega * time), std::sin(omega * time)};
39 return 0; // error code (0: Success)
40}
41
42
43/** Convenience class to encapsulate the Liouvillian operator:
44 * - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
45 * - get() method returns a reference to the constructed Liouvillian operator
46 * - Destructor releases all resources used by the Liouvillian operator
47 */
48class UserDefinedLiouvillian final
49{
50private:
51 // Data members
52 cudensitymatHandle_t handle; // library context handle
53 const std::vector<int64_t> spaceShape; // Hilbert space shape
54 void * spinXelems {nullptr}; // elements of the X spin operator in GPU RAM
55 void * spinYYelems {nullptr}; // elements of the YY two-spin operator in GPU RAM
56 void * spinZZelems {nullptr}; // elements of the ZZ two-spin operator in GPU RAM
57 cudensitymatElementaryOperator_t spinX; // X spin operator
58 cudensitymatElementaryOperator_t spinYY; // YY two-spin operator
59 cudensitymatElementaryOperator_t spinZZ; // ZZ two-spin operator
60 cudensitymatOperatorTerm_t oneBodyTerm; // operator term: H1 = sum_{i} {h_i * X_i}
61 cudensitymatOperatorTerm_t twoBodyTerm; // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
62 cudensitymatOperatorTerm_t noiseTerm; // operator term: D1 = d * sum_{i} {YY_ii} // Y_i operators act from different sides on the density matrix
63 cudensitymatOperator_t liouvillian; // (-i * (H1 + f(t) * H2) * rho) + (i * rho * (H1 + f(t) * H2)) + D1
64
65public:
66
67 // Constructor constructs a user-defined Liouvillian operator
68 UserDefinedLiouvillian(cudensitymatHandle_t contextHandle, // library context handle
69 const std::vector<int64_t> & hilbertSpaceShape): // Hilbert space shape
70 handle(contextHandle), spaceShape(hilbertSpaceShape)
71 {
72 // Define the necessary elementary tensors in GPU memory (F-order storage!)
73 spinXelems = createArrayGPU<std::complex<double>>(
74 {{0.0, 0.0}, {1.0, 0.0}, // 1st column of matrix X
75 {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
76
77 spinYYelems = createArrayGPU<std::complex<double>>( // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
78 {{0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, // 1st column of matrix YY
79 {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix YY
80 {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix YY
81 {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
82
83 spinZZelems = createArrayGPU<std::complex<double>>( // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
84 {{1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 1st column of matrix ZZ
85 {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix ZZ
86 {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix ZZ
87 {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}}); // 4th column of matrix ZZ
88
89 // Construct the necessary Elementary Tensor Operators
90 // X_i operator
91 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
92 1, // one-body operator
93 std::vector<int64_t>({2}).data(), // acts in tensor space of shape {2}
94 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
95 0, // 0 for dense tensors
96 nullptr, // nullptr for dense tensors
97 CUDA_C_64F, // data type
98 spinXelems, // tensor elements in GPU memory
99 {nullptr, nullptr}, // no tensor callback function (tensor is not time-dependent)
100 &spinX)); // the created elementary tensor operator
101 // ZZ_ij = Z_i * Z_j fused operator
102 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
103 2, // two-body operator
104 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
105 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
106 0, // 0 for dense tensors
107 nullptr, // nullptr for dense tensors
108 CUDA_C_64F, // data type
109 spinZZelems, // tensor elements in GPU memory
110 {nullptr, nullptr}, // no tensor callback function (tensor is not time-dependent)
111 &spinZZ)); // the created elementary tensor operator
112 // YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
113 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
114 2, // two-body operator
115 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
116 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
117 0, // 0 for dense tensors
118 nullptr, // nullptr for dense tensors
119 CUDA_C_64F, // data type
120 spinYYelems, // tensor elements in GPU memory
121 {nullptr, nullptr}, // no tensor callback function (tensor is not time-dependent)
122 &spinYY)); // the created elementary tensor operator
123
124 // Construct the necessary Operator Terms from direct products of Elementary Tensor Operators
125 // Create an empty operator term
126 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
127 spaceShape.size(), // Hilbert space rank (number of dimensions)
128 spaceShape.data(), // Hilbert space shape
129 &oneBodyTerm)); // the created empty operator term
130 // Define the operator term
131 for (int32_t i = 0; i < spaceShape.size(); ++i) {
132 const double h_i = 1.0 / static_cast<double>(i+1); // just some value (time-independent h_i coefficient)
133 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
134 oneBodyTerm,
135 1, // number of elementary tensor operators in the product
136 std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
137 std::vector<int32_t>({i}).data(), // space modes acted on by the operator product
138 std::vector<int32_t>({0}).data(), // space mode action duality (0: from the left; 1: from the right)
139 make_cuDoubleComplex(h_i, 0.0), // h_i constant coefficient: Always 64-bit-precision complex number
140 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator product
141 }
142 // Create an empty operator term
143 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
144 spaceShape.size(), // Hilbert space rank (number of dimensions)
145 spaceShape.data(), // Hilbert space shape
146 &twoBodyTerm)); // the created empty operator term
147 // Define the operator term
148 for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
149 for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
150 const double g_ij = -1.0 / static_cast<double>(i + j + 1); // just some value (time-independent g_ij coefficient)
151 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
152 twoBodyTerm,
153 1, // number of elementary tensor operators in the product
154 std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
155 std::vector<int32_t>({i, j}).data(), // space modes acted on by the operator product
156 std::vector<int32_t>({0, 0}).data(), // space mode action duality (0: from the left; 1: from the right)
157 make_cuDoubleComplex(g_ij, 0.0), // g_ij constant coefficient: Always 64-bit-precision complex number
158 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator product
159 }
160 }
161 // Create an empty operator term
162 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
163 spaceShape.size(), // Hilbert space rank (number of dimensions)
164 spaceShape.data(), // Hilbert space shape
165 &noiseTerm)); // the created empty operator term
166 // Define the operator term
167 for (int32_t i = 0; i < spaceShape.size(); ++i) {
168 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
169 noiseTerm,
170 1, // number of elementary tensor operators in the product
171 std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
172 std::vector<int32_t>({i, i}).data(), // space modes acted on by the operator product (from different sides)
173 std::vector<int32_t>({0, 1}).data(), // space mode action duality (0: from the left; 1: from the right)
174 make_cuDoubleComplex(1.0, 0.0), // default coefficient: Always 64-bit-precision complex number
175 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator product
176 }
177
178 // Construct the full Liouvillian operator as a sum of the operator terms
179 // Create an empty operator (super-operator)
180 HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
181 spaceShape.size(), // Hilbert space rank (number of dimensions)
182 spaceShape.data(), // Hilbert space shape
183 &liouvillian)); // the created empty operator (super-operator)
184 // Append an operator term to the operator (super-operator)
185 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
186 liouvillian,
187 oneBodyTerm, // appended operator term
188 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
189 make_cuDoubleComplex(0.0, -1.0), // -i constant
190 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator term as a whole
191 // Append an operator term to the operator (super-operator)
192 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
193 liouvillian,
194 twoBodyTerm, // appended operator term
195 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
196 make_cuDoubleComplex(0.0, -1.0), // -i constant
197 {tdCoefComplex64, nullptr})); // function callback defining the time-dependent coefficient associated with this operator term as a whole
198 // Append an operator term to the operator (super-operator)
199 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
200 liouvillian,
201 oneBodyTerm, // appended operator term
202 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
203 make_cuDoubleComplex(0.0, 1.0), // i constant
204 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator term as a whole
205 // Append an operator term to the operator (super-operator)
206 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
207 liouvillian,
208 twoBodyTerm, // appended operator term
209 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
210 make_cuDoubleComplex(0.0, 1.0), // i constant
211 {tdCoefComplex64, nullptr})); // function callback defining the time-dependent coefficient associated with this operator term as a whole
212 // Append an operator term to the operator (super-operator)
213 const double d = 0.42; // just some value (time-independent coefficient)
214 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
215 liouvillian,
216 noiseTerm, // appended operator term
217 0, // operator term action duality as a whole (no duality reversing in this case)
218 make_cuDoubleComplex(d, 0.0), // constant coefficient associated with the operator term as a whole
219 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator term as a whole
220 }
221
222 // Destructor destructs the user-defined Liouvillian operator
223 ~UserDefinedLiouvillian()
224 {
225 // Destroy the Liouvillian operator
226 HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
227
228 // Destroy operator terms
229 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
230 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
231 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
232
233 // Destroy elementary tensor operators
234 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
235 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
236 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
237
238 // Destroy elementary tensors
239 destroyArrayGPU(spinYYelems);
240 destroyArrayGPU(spinZZelems);
241 destroyArrayGPU(spinXelems);
242 }
243
244 // Disable copy constructor/assignment (GPU resources are private, no deep copy)
245 UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
246 UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
247 UserDefinedLiouvillian(UserDefinedLiouvillian &&) noexcept = default;
248 UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) noexcept = default;
249
250 // Get access to the constructed Liouvillian
251 cudensitymatOperator_t & get()
252 {
253 return liouvillian;
254 }
255
256};
Now we can use this quantum many-body operator in our main code.
1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy.h" // user-defined Liouvillian operator example
13
14
15#include <cmath>
16#include <complex>
17#include <vector>
18#include <chrono>
19#include <iostream>
20#include <cassert>
21
22
23// Number of times to perform operator action on a quantum state
24constexpr int NUM_REPEATS = 2;
25
26// Logging verbosity
27bool verbose = true;
28
29
30// Example workflow
31void exampleWorkflow(cudensitymatHandle_t handle)
32{
33 // Define the composite Hilbert space shape and
34 // quantum state batch size (number of individual quantum states)
35 const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
36 const int64_t batchSize = 1; // number of quantum states per batch (default is 1)
37
38 if (verbose) {
39 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
40 for (const auto & dimsn: spaceShape)
41 std::cout << dimsn << ",";
42 std::cout << ")" << std::endl;
43 std::cout << "Quantum state batch size = " << batchSize << std::endl;
44 }
45
46 // Construct a user-defined Liouvillian operator using a convenience C++ class
47 UserDefinedLiouvillian liouvillian(handle, spaceShape);
48 if (verbose)
49 std::cout << "Constructed the Liouvillian operator\n";
50
51 // Declare the input quantum state
52 cudensitymatState_t inputState;
53 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
54 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
55 spaceShape.size(),
56 spaceShape.data(),
57 batchSize,
58 CUDA_C_64F, // data type must match that of the operators created above
59 &inputState));
60
61 // Query the size of the quantum state storage
62 std::size_t storageSize {0}; // only one storage component (tensor) is needed
63 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
64 inputState,
65 1, // only one storage component
66 &storageSize)); // storage size in bytes
67 const std::size_t stateVolume = storageSize / sizeof(std::complex<double>); // quantum state tensor volume (number of elements)
68 if (verbose)
69 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
70
71 // Prepare some initial value for the input quantum state
72 std::vector<std::complex<double>> inputStateValue(stateVolume);
73 for (std::size_t i = 0; i < stateVolume; ++i) {
74 inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
75 }
76
77 // Allocate initialized GPU storage for the input quantum state with prepared values
78 auto * inputStateElems = createArrayGPU(inputStateValue);
79
80 // Attach initialized GPU storage to the input quantum state
81 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
82 inputState,
83 1, // only one storage component (tensor)
84 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
85 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
86 if (verbose)
87 std::cout << "Constructed input quantum state\n";
88
89 // Declare the output quantum state of the same shape
90 cudensitymatState_t outputState;
91 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
92 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
93 spaceShape.size(),
94 spaceShape.data(),
95 batchSize,
96 CUDA_C_64F, // data type must match that of the operators created above
97 &outputState));
98
99 // Allocate initialized GPU storage for the output quantum state
100 auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
101
102 // Attach initialized GPU storage to the output quantum state
103 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
104 outputState,
105 1, // only one storage component (no tensor factorization)
106 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
107 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
108 if (verbose)
109 std::cout << "Constructed output quantum state\n";
110
111 // Declare a workspace descriptor
112 cudensitymatWorkspaceDescriptor_t workspaceDescr;
113 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
114
115 // Query free GPU memory
116 std::size_t freeMem = 0, totalMem = 0;
117 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
118 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
119 if (verbose)
120 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
121
122 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
123 const auto startTime = std::chrono::high_resolution_clock::now();
124 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
125 liouvillian.get(),
126 inputState,
127 outputState,
128 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
129 freeMem, // max available GPU free memory for the workspace
130 workspaceDescr, // workspace descriptor
131 0x0)); // default CUDA stream
132 const auto finishTime = std::chrono::high_resolution_clock::now();
133 const std::chrono::duration<double> timeSec = finishTime - startTime;
134 if (verbose)
135 std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
136
137 // Query the required workspace buffer size (bytes)
138 std::size_t requiredBufferSize {0};
139 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
140 workspaceDescr,
141 CUDENSITYMAT_MEMSPACE_DEVICE,
142 CUDENSITYMAT_WORKSPACE_SCRATCH,
143 &requiredBufferSize));
144 if (verbose)
145 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
146
147 // Allocate GPU storage for the workspace buffer
148 const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
149 auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
150 if (verbose)
151 std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
152
153 // Attach the workspace buffer to the workspace descriptor
154 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
155 workspaceDescr,
156 CUDENSITYMAT_MEMSPACE_DEVICE,
157 CUDENSITYMAT_WORKSPACE_SCRATCH,
158 workspaceBuffer,
159 requiredBufferSize));
160 if (verbose)
161 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
162
163 // Zero out the output quantum state
164 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
165 outputState,
166 0x0));
167 if (verbose)
168 std::cout << "Initialized the output state to zero\n";
169
170 // Apply the Liouvillian operator to the input quatum state
171 // and accumulate its action into the output quantum state (note += semantics)
172 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
173 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
174 const auto startTime = std::chrono::high_resolution_clock::now();
175 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
176 liouvillian.get(),
177 0.01, // time point
178 1, // number of external user-defined Hamiltonian parameters
179 std::vector<double>({13.42}).data(), // Hamiltonian parameter(s)
180 inputState, // input quantum state
181 outputState, // output quantum state
182 workspaceDescr, // workspace descriptor
183 0x0)); // default CUDA stream
184 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
185 const auto finishTime = std::chrono::high_resolution_clock::now();
186 const std::chrono::duration<double> timeSec = finishTime - startTime;
187 if (verbose)
188 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
189 }
190
191 // Compute the squared norm of the output quantum state
192 void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
193 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
194 outputState,
195 norm2,
196 0x0));
197 if (verbose)
198 std::cout << "Computed the output state norm\n";
199 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
200 destroyArrayGPU(norm2);
201
202 // Destroy workspace descriptor
203 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
204
205 // Destroy workspace buffer storage
206 destroyArrayGPU(workspaceBuffer);
207
208 // Destroy quantum states
209 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
210 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
211
212 // Destroy quantum state storage
213 destroyArrayGPU(outputStateElems);
214 destroyArrayGPU(inputStateElems);
215
216 if (verbose)
217 std::cout << "Destroyed resources\n" << std::flush;
218}
219
220
221int main(int argc, char ** argv)
222{
223 // Assign a GPU to the process
224 HANDLE_CUDA_ERROR(cudaSetDevice(0));
225 if (verbose)
226 std::cout << "Set active device\n";
227
228 // Create a library handle
229 cudensitymatHandle_t handle;
230 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
231 if (verbose)
232 std::cout << "Created a library handle\n";
233
234 // Run the example
235 exampleWorkflow(handle);
236
237 // Destroy the library handle
238 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
239 if (verbose)
240 std::cout << "Destroyed the library handle\n";
241
242 // Done
243 return 0;
244}
Code Example (Parallel Execution on Multiple GPUs)¶
It is straightforward to adapt main serial code and enable parallel execution across multiple/many GPU devices (across multiple/many nodes). We will illustrate this with an example using the Message Passing Interface (MPI) as the communication layer. Below we show the minor additions that need to be made in order to enable distributed parallel execution without making any changes to the original serial source code.
The full sample code can be found in the NVIDIA/cuQuantum repository (main MPI code and operator definition as well as the utility code).
Here is the updated main code for multi-GPU runs.
1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy.h" // user-defined Liouvillian operator example
13
14
15// MPI library (optional)
16#ifdef MPI_ENABLED
17#include <mpi.h>
18#endif
19
20#include <cmath>
21#include <complex>
22#include <vector>
23#include <chrono>
24#include <iostream>
25#include <cassert>
26
27
28// Number of times to perform operator action on a quantum state
29constexpr int NUM_REPEATS = 2;
30
31// Logging verbosity
32bool verbose = true;
33
34
35// Example workflow
36void exampleWorkflow(cudensitymatHandle_t handle)
37{
38 // Define the composite Hilbert space shape and
39 // quantum state batch size (number of individual quantum states)
40 const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
41 const int64_t batchSize = 1; // number of quantum states per batch (default is 1)
42
43 if (verbose) {
44 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
45 for (const auto & dimsn: spaceShape)
46 std::cout << dimsn << ",";
47 std::cout << ")" << std::endl;
48 std::cout << "Quantum state batch size = " << batchSize << std::endl;
49 }
50
51 // Construct a user-defined Liouvillian operator using a convenience C++ class
52 UserDefinedLiouvillian liouvillian(handle, spaceShape);
53 if (verbose)
54 std::cout << "Constructed the Liouvillian operator\n";
55
56 // Declare the input quantum state
57 cudensitymatState_t inputState;
58 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
59 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
60 spaceShape.size(),
61 spaceShape.data(),
62 batchSize,
63 CUDA_C_64F, // data type must match that of the operators created above
64 &inputState));
65
66 // Query the size of the quantum state storage
67 std::size_t storageSize {0}; // only one storage component (tensor) is needed
68 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
69 inputState,
70 1, // only one storage component
71 &storageSize)); // storage size in bytes
72 const std::size_t stateVolume = storageSize / sizeof(std::complex<double>); // quantum state tensor volume (number of elements)
73 if (verbose)
74 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
75
76 // Prepare some initial value for the input quantum state
77 std::vector<std::complex<double>> inputStateValue(stateVolume);
78 for (std::size_t i = 0; i < stateVolume; ++i) {
79 inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
80 }
81
82 // Allocate initialized GPU storage for the input quantum state with prepared values
83 auto * inputStateElems = createArrayGPU(inputStateValue);
84
85 // Attach initialized GPU storage to the input quantum state
86 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
87 inputState,
88 1, // only one storage component (tensor)
89 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
90 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
91 if (verbose)
92 std::cout << "Constructed input quantum state\n";
93
94 // Declare the output quantum state of the same shape
95 cudensitymatState_t outputState;
96 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
97 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
98 spaceShape.size(),
99 spaceShape.data(),
100 batchSize,
101 CUDA_C_64F, // data type must match that of the operators created above
102 &outputState));
103
104 // Allocate initialized GPU storage for the output quantum state
105 auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
106
107 // Attach initialized GPU storage to the output quantum state
108 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109 outputState,
110 1, // only one storage component (no tensor factorization)
111 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
112 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
113 if (verbose)
114 std::cout << "Constructed output quantum state\n";
115
116 // Declare a workspace descriptor
117 cudensitymatWorkspaceDescriptor_t workspaceDescr;
118 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
119
120 // Query free GPU memory
121 std::size_t freeMem = 0, totalMem = 0;
122 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
123 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
124 if (verbose)
125 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
126
127 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
128 const auto startTime = std::chrono::high_resolution_clock::now();
129 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
130 liouvillian.get(),
131 inputState,
132 outputState,
133 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
134 freeMem, // max available GPU free memory for the workspace
135 workspaceDescr, // workspace descriptor
136 0x0)); // default CUDA stream
137 const auto finishTime = std::chrono::high_resolution_clock::now();
138 const std::chrono::duration<double> timeSec = finishTime - startTime;
139 if (verbose)
140 std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
141
142 // Query the required workspace buffer size (bytes)
143 std::size_t requiredBufferSize {0};
144 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
145 workspaceDescr,
146 CUDENSITYMAT_MEMSPACE_DEVICE,
147 CUDENSITYMAT_WORKSPACE_SCRATCH,
148 &requiredBufferSize));
149 if (verbose)
150 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
151
152 // Allocate GPU storage for the workspace buffer
153 const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
154 auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
155 if (verbose)
156 std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
157
158 // Attach the workspace buffer to the workspace descriptor
159 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
160 workspaceDescr,
161 CUDENSITYMAT_MEMSPACE_DEVICE,
162 CUDENSITYMAT_WORKSPACE_SCRATCH,
163 workspaceBuffer,
164 requiredBufferSize));
165 if (verbose)
166 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
167
168 // Zero out the output quantum state
169 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
170 outputState,
171 0x0));
172 if (verbose)
173 std::cout << "Initialized the output state to zero\n";
174
175 // Apply the Liouvillian operator to the input quatum state
176 // and accumulate its action into the output quantum state (note += semantics)
177 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
178 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
179 const auto startTime = std::chrono::high_resolution_clock::now();
180 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
181 liouvillian.get(),
182 0.01, // time point
183 1, // number of external user-defined Hamiltonian parameters
184 std::vector<double>({13.42}).data(), // Hamiltonian parameter(s)
185 inputState, // input quantum state
186 outputState, // output quantum state
187 workspaceDescr, // workspace descriptor
188 0x0)); // default CUDA stream
189 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
190 const auto finishTime = std::chrono::high_resolution_clock::now();
191 const std::chrono::duration<double> timeSec = finishTime - startTime;
192 if (verbose)
193 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
194 }
195
196 // Compute the squared norm of the output quantum state
197 void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
198 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
199 outputState,
200 norm2,
201 0x0));
202 if (verbose)
203 std::cout << "Computed the output state norm\n";
204 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
205 destroyArrayGPU(norm2);
206
207 // Destroy workspace descriptor
208 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
209
210 // Destroy workspace buffer storage
211 destroyArrayGPU(workspaceBuffer);
212
213 // Destroy quantum states
214 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
215 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
216
217 // Destroy quantum state storage
218 destroyArrayGPU(outputStateElems);
219 destroyArrayGPU(inputStateElems);
220
221 if (verbose)
222 std::cout << "Destroyed resources\n" << std::flush;
223}
224
225
226int main(int argc, char ** argv)
227{
228 // Initialize MPI library (if needed)
229#ifdef MPI_ENABLED
230 HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
231 int procRank {-1};
232 HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
233 int numProcs {0};
234 HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
235 if (procRank != 0) verbose = false;
236 if (verbose)
237 std::cout << "Initialized MPI library\n";
238#else
239 const int procRank {0};
240 const int numProcs {1};
241#endif
242
243 // Assign a GPU to the process
244 int numDevices {0};
245 HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
246 const int deviceId = procRank % numDevices;
247 HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
248 if (verbose)
249 std::cout << "Set active device\n";
250
251 // Create a library handle
252 cudensitymatHandle_t handle;
253 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
254 if (verbose)
255 std::cout << "Created a library handle\n";
256
257 // Reset distributed configuration (once)
258#ifdef MPI_ENABLED
259 MPI_Comm comm;
260 HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
261 HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
262 CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
263 &comm, sizeof(comm)));
264#endif
265
266 // Run the example
267 exampleWorkflow(handle);
268
269 // Synchronize MPI processes
270#ifdef MPI_ENABLED
271 HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
272#endif
273
274 // Destroy the library handle
275 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
276 if (verbose)
277 std::cout << "Destroyed the library handle\n";
278
279 // Finalize the MPI library
280#ifdef MPI_ENABLED
281 HANDLE_MPI_ERROR(MPI_Finalize());
282 if (verbose)
283 std::cout << "Finalized MPI library\n";
284#endif
285
286 // Done
287 return 0;
288}
Useful Tips¶
For debugging, one can set the environment variable
CUDENSITYMAT_LOG_LEVEL=n
. The leveln
= 0, 1, …, 5 corresponds to the logger level as described in the table below. The environment variableCUDENSITYMAT_LOG_FILE=<filepath>
can be used to redirect the log output to a custom file at<filepath>
instead ofstdout
.
Level |
Summary |
Long Description |
0 |
Off |
Logging is disabled (default) |
1 |
Errors |
Only errors will be logged |
2 |
Performance Trace |
API calls that launch CUDA kernels will log their parameters and important information |
3 |
Performance Hints |
Hints that can potentially improve the application’s performance |
4 |
Heuristics Trace |
Provides general information about the library execution, may contain details about heuristic status |
5 |
API Trace |
API calls will log their parameter and important information |