Memory requirementsΒΆ
In terms of memory usage, cuFFTMp requires a certain amount of scratch memory, depending on the individual dimensions sizes and the type of plan.
Assume we are using built-in Slabs
(i.e., CUFFT_XT_FORMAT_INPLACE
and/or CUFFT_XT_FORMAT_INPLACE_SHUFFLED
).
Also assume we are computing a transform with a total of S=X*Y*Z
elements over n
gpus. Beyond the input/output descriptor
(of approximately S/n
elements per GPU), cuFFT requires between
S/n
and 2S/n
elements of scratch per GPU.
When using custom data distributions (cufftXtSetDistribution
with CUFFT_XT_FORMAT_DISTRIBUTED_INPUT
and/or CUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT
), the amount of scratch is larger.
For a transform of size S=X*Y*Z
, the amount of scratch is approximately
S/n
to2S/n
when all GPUs are connected through peer-to-peer;3S/n
to4S/n
otherwise.
Generally, minimum scratch usage can be obtained for pure powers of 2, 3, 5 and 7 and some composite sizes.
When one has multiple plans
(such as a C2R or R2C plan), it is possible to share scratch
between plans using cufftSetWorkArea
and cufftGetSize
.
Note
The C++ sample located in
r2c_c2r_shared_scratch
shows an example of
scratch sharing between plans.