API usage¶
Usage of LTO-callbacks in cuFFT LTO EA is divided in two parts:
Generating the LTO callback (i.e. compiling the callback to LTO-IR).
Associating the LTO callback with the cuFFT plan.
Generating the LTO callback¶
cuFFT LTO EA currently supports two ways of generating the LTO-callback (i.e. callback code compiled to LTO-IR).
Offline compilation¶
The callback code can be compiled to LTO-IR using nvcc
with any of the supported flags (such as -dlto
or -gencode=arch=compute_XX,code=lto_XX
, with XX
indicating the target GPU architecture).
Notice that PTX JIT is part of the JIT LTO kernel finalization trajectory, so architectures older than the current system architecture are supported; users can compile their callback function to LTO-IR for target arch XX
and it should work in GPUs with arch YY
, with XX <= YY
. Please see Just-in-Time Compilation for more details.
Once compiled to LTO-IR, the binary containing the callback can be turned into a header file containing a C array with the data using the bin2c
application included in the CUDA Toolkit. The header then can be included in the application to pass the array to cuFFT.
Please see the included samples in the cuFFT LTO EA tar ball or the public CUDA Library Samples github repository for more details.
Using NVRTC¶
Another option to generate the LTO-callback is to use NVRTC to do runtime compilation.
NVRTC supports the -dlto
flag to compile the callback to LTO-IR at runtime.
As stated in Offline compilation, PTX JIT is part of the JIT LTO kernel finalization trajectory, so it is possible to compile the callback to any architecture older than the target architecture.
Please see the included samples in the cuFFT LTO EA tar ball for more details.
Associating the LTO callback with the cuFFT plan¶
Associating the LTO callback with cuFFT is done using the API extension in cuFFT LTO EA. Specifically we can use the new function cufftXtSetJITCallback, which works similarly to cufftXtSetCallback(…), with a few caveats.
First, cufftXtSetJITCallback must be called after plan creation with cufftCreate(…), and before calling the plan initialization function with cufftMakePlan…(…)
Second, removing the LTO callback from the plan (using cufftXtClearCallback(…) is currently not supported. A new plan must be created.
Setting the callback shared memory size via cufftXtSetCallbackSharedSize(…) is still supported.