compile_epilog#
-
nvmath.
fft. compile_epilog( - epilog_fn,
- element_dtype,
- user_info_dtype,
- *,
- compute_capability=None,
Compile a Python function to LTO-IR to provide as an epilog function for
fft()
andplan()
.- Parameters:
epilog_fn – The epilog function to be compiled to LTO-IR. It must have the signature:
epilog_fn(data_out, offset, data, user_info, reserved_for_future_use)
, and it essentially stores transformeddata
intodata_out
atoffset
.element_dtype – The data type of the
data_in
argument, one of['float32', 'float64', 'complex64', 'complex128']
. It must have the same data type as that of the FFT operand for prolog functions or the FFT result for epilog functions.user_info_dtype –
The data type of the
user_info
argument. It must be one of['float32', 'float64', 'complex64', 'complex128']
or an object of typenumba.types.Type
. The offset is computed based on the memory layout (shape and strides) of the operand (input for prolog, output for epilog). If the user would like to pass additional tensor asuser_info
and access it based on the offset, it is crucial to know memory layout of the operand. Please note, the actual layout of the input tensor may differ from the layout of the tensor passed to fft call. To learn the memory layout of the input or output, please use stateful FFT API andnvmath.
fft. FFT. get_input_layout() nvmath.
respectively.fft. FFT. get_output_layout() Note
Currently, in the callback, the position of the element in the input and output operands are described with a single flat offset, even if the original operand is multi-dimensional tensor.
compute_capability – The target compute capability, specified as a string (
'80'
,'89'
, …). The default is the compute capability of the current device.
- Returns:
The function compiled to LTO-IR as
bytes
object.
See also
Examples
The cuFFT library expects the end user to manage scaling of the outputs, so in order to replicate the
norm
option found in other Python FFT libraries we can define an epilog which performs the scaling.>>> import cupy as cp >>> import nvmath >>> import math
Create the data for a batched 1-D FFT.
>>> B, N = 256, 1024 >>> a = cp.random.rand(B, N, dtype=cp.float64) + 1j * cp.random.rand(B, N, dtype=cp.float64)
Compute a normalization factor that will create unitary transforms.
>>> norm_factor = 1.0 / math.sqrt(N)
Define the epilog function for the FFT.
>>> def rescale(data_out, offset, data, user_info, unused): ... data_out[offset] = data * norm_factor
Compile the epilog to LTO-IR. In a system with GPUs that have different compute capability, the
compute_capability
option must be specified to thecompile_prolog
orcompile_epilog
helpers. Alternatively, the epilog can be compiled in the context of the device where the FFT to which the epilog is provided is executed. In this case we use the current device context, where the operands have been created.>>> with cp.cuda.Device(): ... epilog = nvmath.fft.compile_epilog(rescale, "complex128", "complex128")
Perform the forward FFT, applying the rescaling as a epilog.
>>> r = nvmath.fft.fft(a, axes=[-1], epilog=dict(ltoir=epilog))
Test that the fused FFT run result matches the result of other libraries.
>>> s = cp.fft.fftn(a, axes=[-1], norm="ortho") >>> assert cp.allclose(r, s)
Notes
The user must ensure that the specified argument types meet the requirements listed above.