Fast Fourier Transform#
Overview#
The Fast Fourier Transform (FFT) module nvmath.
in nvmath-python leverages the
NVIDIA cuFFT library and provides a powerful suite of APIs that can be directly called from
the host to efficiently perform discrete Fourier Transformations. Both stateless
function-form APIs and stateful class-form APIs are provided to support a spectrum of
N-dimensional FFT operations. These include forward and inverse transformations, as well as
complex-to-complex (C2C), complex-to-real (C2R), and real-to-complex (R2C) transforms:
N-dimensional forward C2C FFT transform by
nvmath.
.fft. fft() N-dimensional inverse C2C FFT transform by
nvmath.
.fft. ifft() N-dimensional forward R2C FFT transform by
nvmath.
.fft. rfft() N-dimensional inverse C2R FFT transform by
nvmath.
.fft. irfft() All types of N-dimensional FFT by stateful
nvmath.
.fft. FFT
Furthermore, the nvmath.
class includes utility APIs designed to help users
cache FFT plans, facilitating the efficient execution of repeated calculations across
various computational tasks (see create_key()
).
The FFT transforms performed on GPU can be fused with other operations using FFT callbacks. This enables users to write custom functions in Python for pre or post-processing, while leveraging Just-In-Time (JIT) and Link-Time Optimization (LTO).
Users can also choose CPU execution to utilize all available computational resources.
Note
The API fft()
and related function-form APIs perform N-D FFT
operations, similar to numpy.fft.fftn()
. There are no special 1-D
(numpy.fft.fft()
) or 2-D FFT (numpy.fft.fft2()
) APIs. This not only reduces
the API surface, but also avoids the potential for incorrect use because the number of
batch dimensions is \(N - 1\) for numpy.fft.fft()
and \(N - 2\) for
numpy.fft.fft2()
, where \(N\) is the operand dimension.
FFT Callbacks#
User-defined functions can be compiled to the LTO-IR format and provided as epilog or prolog to the FFT operation, allowing for Link-Time Optimization and fusing. This can be used to implement DFT-based convolutions or scale the FFT output, for example.
The FFT module comes with convenient helper functions nvmath.
and
nvmath.
that compile functions written in Python to LTO-IR format.
Under the hood, the helpers rely on Numba as the compiler. The compiled callbacks can be
passed to functional or stateful FFT APIs as DeviceCallable
.
Alternatively, users can compile the callbacks to LTO-IR format with a compiler of their
choice and pass them as DeviceCallable
to the FFT call.
Examples illustrating use of prolog and epilog functions can be found in the FFT examples directory.
Note
FFT Callbacks are not currently supported on Windows.
Setting-up#
The fastest way to start using cuFFT LTO with nvmath is to install it with device API dependencies. Pip users should run the following command:
pip install nvmath-python[cu12,dx]
Required dependencies#
For those who need to collect the required dependencies manually:
LTO callbacks are supported by cuFFT 11.3 which is shipped with CUDA Toolkit 12.6 Update 2 and newer.
Using cuFFT LTO callbacks requires nvJitLink from the same CUDA toolkit or newer (within the same major CUDA release, for example version 12).
Compiling the callbacks with the
nvmath.
andfft. compile_prolog() nvmath.
helpers requires Numba 0.59+ and nvcc/nvvm from the same CUDA toolkit as nvJitLink or older (within the same major CUDA release). The helpers require the target device to have compute capability 7.0 or higher.fft. compile_epilog()
For further details, refer to the cuFFT LTO documentation.
Older CTKs#
Adventurous users who want to try callback functionality and cannot upgrade the CUDA Toolkit
to 12.6U2, can download and install the older preview release cuFFT LTO EA version 11.1.3.0 from here, which
requires at least CUDA Toolkit 12.2. When using LTO EA, setting environmental variables may
be needed for nvmath to pick the desired cuFFT version. Users should adjust the
LD_PRELOAD
variable, so that the right cuFFT shared library is used:
export LD_PRELOAD="/path_to_cufft_lto_ea/libcufft.so"
Execution space#
FFT transforms can be executed either on NVIDIA GPU or CPU. By default, the execution space
is selected based on the memory space of the operand passed to the FFT call, but it can be
explicitly controlled with ExecutionCUDA
and
ExecutionCPU
passed as the execution
option to the call (for example
FFT
or fft()
).
Note
CPU execution is not currently supported on Windows.
Required dependencies#
With ARM CPUs, such as NVIDIA Grace, nvmath-python can utilize NVPL (Nvidia Performance Libraries) FFT to run the transform. On x86_64 architecture, the MKL library can be used.
For pip users, the fastest way to get the required dependencies is to use 'cu12'
/
'cu11'
and 'cpu'
extras:
# for CPU-only dependencies
pip install nvmath-python[cpu]
# for CUDA-only dependencies (assuming CUDA 12)
pip install nvmath-python[cu12]
# for CUDA 12 and CPU dependencies
pip install nvmath-python[cu12,cpu]
Custom CPU library#
Other libraries that conform to FFTW3 API and ship single and double precision symbols in
the single so
file can be used to back the CPU FFT execution. Users who would like to
use different library for CPU FFT, or point to a custom installation of NVPL or MKL library,
can do so by including the library path in LD_LIBRARY_PATH
and specifying the library
name with NVMATH_FFT_CPU_LIBRARY
. For example:
# nvpl
export LD_LIBRARY_PATH=/path/to/nvpl/:$LD_LIBRARY_PATH
export NVMATH_FFT_CPU_LIBRARY=libnvpl_fftw.so.0
# mkl
export LD_LIBRARY_PATH=/path/to/mkl/:$LD_LIBRARY_PATH
export NVMATH_FFT_CPU_LIBRARY=libmkl_rt.so.2
Host API Reference#
FFT support (nvmath. fft
)#
|
fft(operand, axes=None, direction=None, options=None, execution=None, prolog=None, epilog=None, stream=None) |
|
ifft(operand, axes=None, options=None, execution=None, prolog=None, epilog=None, stream=None) |
|
rfft(operand, axes=None, options=None, execution=None, prolog=None, epilog=None, stream=None) |
|
irfft(operand, axes=None, options=None, execution=None, prolog=None, epilog=None, stream=None) |
|
Create a stateful object that encapsulates the specified FFT computations and required resources. |
|
Compile a Python function to LTO-IR to provide as a prolog function for |
|
Compile a Python function to LTO-IR to provide as an epilog function for |
|
Error type for layouts not supported by the library. |
|
A data class for providing options to the |
|
An IntEnum class specifying the direction of the transform. |
|
A data class for providing GPU execution options to the |
|
A data class for providing CPU execution options to the |
|
A data class capturing LTO-IR callables. |