Warning
It is not guaranteed that executions of exactly the same Solver function with exactly the same inputs but with different
CUDA architecture (SM), or
number of threads (BlockDim)
number of batches per block (BatchesPerBlock)
will produce bit-identical results.