Performing gate split operation using cuTensorNet adopts a very similar workflow as QR example and SVD example. Here, we here highlight the notable differences between the two APIs. The full code can be found in the NVIDIA/cuQuantum repository (here).
Define tensor operands¶
As with QR/SVD decomposition, we first define all the tensor operands by specifying the data type, modes partition, and the extents. In this process, we choose to perform fixed extent truncation to a size of 16.
90 /************************************************************************************
91 * Gate Split: A_{i,j,k,l} B_{k,o,p,q} G_{m,n,l,o}-> A'_{i,j,x,m} S_{x} B'_{x,n,p,q}
92 *************************************************************************************/
93 typedef float floatType;
94 cudaDataType_t typeData = CUDA_R_32F;
95 cutensornetComputeType_t typeCompute = CUTENSORNET_COMPUTE_32F;
96
97 // Create vector of modes
98 std::vector<int32_t> modesAIn{'i','j','k','l'};
99 std::vector<int32_t> modesBIn{'k','o','p','q'};
100 std::vector<int32_t> modesGIn{'m','n','l','o'}; // input, G is the gate operator
101
102 std::vector<int32_t> modesAOut{'i','j','x','m'};
103 std::vector<int32_t> modesBOut{'x','n','p','q'}; // SVD output
104
105 // Extents
106 std::unordered_map<int32_t, int64_t> extent;
107 extent['i'] = 16;
108 extent['j'] = 16;
109 extent['k'] = 16;
110 extent['l'] = 2;
111 extent['m'] = 2;
112 extent['n'] = 2;
113 extent['o'] = 2;
114 extent['p'] = 16;
115 extent['q'] = 16;
116
117 const int64_t maxExtent = 16; //truncate to a maximal extent of 16
118 extent['x'] = maxExtent;
119
120 // Create a vector of extents for each tensor
121 std::vector<int64_t> extentAIn;
122 for (auto mode : modesAIn)
123 extentAIn.push_back(extent[mode]);
124 std::vector<int64_t> extentBIn;
125 for (auto mode : modesBIn)
126 extentBIn.push_back(extent[mode]);
127 std::vector<int64_t> extentGIn;
128 for (auto mode : modesGIn)
129 extentGIn.push_back(extent[mode]);
130 std::vector<int64_t> extentAOut;
131 for (auto mode : modesAOut)
132 extentAOut.push_back(extent[mode]);
133 std::vector<int64_t> extentBOut;
134 for (auto mode : modesBOut)
135 extentBOut.push_back(extent[mode]);
136
Execution¶
Similar to SVD example, we can specify the SVD options in cutensornetTensorSVDConfig_t
.
Workspace size query can be achieved by calling cutensornetWorkspaceComputeGateSplitSizes()
with the provided cutensornetGateSplitAlgo_t
.
Finally, we can execute the gate split computation by calling cutensornetGateSplit()
.
305 /**********************
306 * Execution
307 **********************/
308
309 GPUTimer timer{stream};
310 double minTimeCUTENSOR = 1e100;
311 const int numRuns = 3; // to get stable perf results
312 for (int i=0; i < numRuns; ++i)
313 {
314 // restore output
315 cudaMemsetAsync(D_AOut, 0, sizeAOut, stream);
316 cudaMemsetAsync(D_S, 0, sizeS, stream);
317 cudaMemsetAsync(D_BOut, 0, sizeBOut, stream);
318
319 // With value-based truncation, `cutensornetGateSplit` can potentially update the shared extent in descTensorA/BOut.
320 // We here restore descTensorA/BOut to the original problem.
321 HANDLE_ERROR( cutensornetDestroyTensorDescriptor(descTensorAOut) );
322 HANDLE_ERROR( cutensornetDestroyTensorDescriptor(descTensorBOut) );
323 HANDLE_ERROR( cutensornetCreateTensorDescriptor(handle, numModesAOut, extentAOut.data(), strides, modesAOut.data(), typeData, &descTensorAOut) );
324 HANDLE_ERROR( cutensornetCreateTensorDescriptor(handle, numModesBOut, extentBOut.data(), strides, modesBOut.data(), typeData, &descTensorBOut) );
325
326 cudaDeviceSynchronize();
327 timer.start();
328 HANDLE_ERROR( cutensornetGateSplit(handle,
329 descTensorAIn, D_AIn,
330 descTensorBIn, D_BIn,
331 descTensorGIn, D_GIn,
332 descTensorAOut, D_AOut,
333 D_S,
334 descTensorBOut, D_BOut,
335 gateAlgo,
336 svdConfig, typeCompute, svdInfo,
337 workDesc, stream) );
338 // Synchronize and measure timing
339 auto time = timer.seconds();
340 minTimeCUTENSOR = (minTimeCUTENSOR < time) ? minTimeCUTENSOR : time;
341 }
342
343 printf("Performing Gate Split\n");
Note
Like in cutensornetTensorSVD()
, since we turned on weighted truncation options in this example, the tensor descriptors for outputs A and B need to be
restored if we wish to perform the same computation multiple times.
After the computation, we always need to free up all resources.