FFT#
-
class nvmath.
fft. FFT(operand, *, axes=None, options=None, execution=None, stream=None)[source]# Create a stateful object that encapsulates the specified FFT computations and required resources. This object ensures the validity of resources during use and releases them when they are no longer needed to prevent misuse.
This object encompasses all functionalities of function-form APIs
fft()
,ifft()
,rfft()
, andirfft()
, which are convenience wrappers around it. The stateful object also allows for the amortization of preparatory costs when the same FFT operation is to be performed on multiple operands with the same problem specification (seereset_operand()
andcreate_key()
for more details).Using the stateful object typically involves the following steps:
Problem Specification: Initialize the object with a defined operation and options.
Preparation: Use
plan()
to determine the best algorithmic implementation for this specific FFT operation.Execution: Perform the FFT computation with
execute()
, which can be either forward or inverse FFT transformation.Resource Management: Ensure all resources are released either by explicitly calling
free()
or by managing the stateful object within a context manager.
Detailed information on each step described above can be obtained by passing in a
logging.Logger
object toFFTOptions
or by setting the appropriate options in the root logger object, which is used by default:>>> import logging >>> logging.basicConfig( ... level=logging.INFO, ... format="%(asctime)s %(levelname)-8s %(message)s", ... datefmt="%m-%d %H:%M:%S", ... )
- Parameters:
operand – A tensor (ndarray-like object). The currently supported types are
numpy.ndarray
,cupy.ndarray
, andtorch.Tensor
.axes – The dimensions along which the FFT is performed.
axes[-1]
is the ‘last transformed’ axis for rffts. Currently, it is required that the axes are contiguous and include the first or the last dimension. Only up to 3D FFTs are supported.options – Specify options for the FFT as a
FFTOptions
object. Alternatively, adict
containing the parameters for theFFTOptions
constructor can also be provided. If not specified, the value will be set to the default-constructedFFTOptions
object.execution – Specify execution space options for the FFT as a
ExecutionCUDA
orExecutionCPU
object. Alternatively, a string (‘cuda’ or ‘cpu’), or adict
with the ‘name’ key set to ‘cpu’ or ‘cuda’ and optional parameters relevant to the given execution space. If not specified, the execution space will be selected to match operand’s storage (in GPU or host memory), and the correspondingExecutionCUDA
orExecutionCPU
object will be default-constructed.stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t
(as Pythonint
),cupy.cuda.Stream
, andtorch.cuda.Stream
. If a stream is not provided, the current stream from the operand package will be used.
See also
Examples
>>> import cupy as cp >>> import nvmath
Create a 3-D complex128 ndarray on the GPU:
>>> shape = 128, 128, 128 >>> a = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)
We will define a 2-D C2C FFT operation along the first two dimensions, batched along the last dimension:
>>> axes = 0, 1
Create an FFT object encapsulating the problem specification above:
>>> f = nvmath.fft.FFT(a, axes=axes)
Options can be provided above to control the behavior of the operation using the
options
argument (seeFFTOptions
). Similarly, the execution space (CUDA or CPU) and execution options can be passed using theexecution
argument (seeExecutionCUDA
,ExecutionCPU
).Next, plan the FFT. Load and/or store callback functions can be provided to
plan()
using theprolog
andepilog
option:>>> f.plan()
Now execute the FFT, and obtain the result
r1
as a CuPy ndarray. The transform will be performed on GPU, becauseexecution
was not explicitly specified anda
resides in GPU memory.>>> r1 = f.execute()
Finally, free the FFT object’s resources. To avoid this explicit call, it’s recommended to use the FFT object as a context manager as shown below, if possible.
>>> f.free()
Note that all
FFT
methods execute on the current stream by default. Alternatively, thestream
argument can be used to run a method on a specified stream.Let’s now look at the same problem with NumPy ndarrays on the CPU.
Create a 3-D complex128 NumPy ndarray on the CPU:
>>> import numpy as np >>> shape = 128, 128, 128 >>> a = np.random.rand(*shape) + 1j * np.random.rand(*shape)
Create an FFT object encapsulating the problem specification described earlier and use it as a context manager.
>>> with nvmath.fft.FFT(a, axes=axes) as f: ... f.plan() ... ... # Execute the FFT to get the first result. ... r1 = f.execute()
All the resources used by the object are released at the end of the block.
The operation was performed on the CPU because
a
resides in host memory. Withexecution
specified to ‘cuda’, the NumPy array would be temporarily copied to device memory and transformed on the GPU:>>> with nvmath.fft.FFT(a, axes=axes, execution="cuda") as f: ... f.plan() ... ... # Execute the FFT to get the first result. ... r1 = f.execute()
Further examples can be found in the nvmath/examples/fft directory.
Notes
The input must be Hermitian-symmetric when
FFTOptions.fft_type
is'C2R'
, otherwise the result is undefined. As a specific example, if the input for a C2R FFT was generated using an R2C FFT with an odd last axis size, thenFFTOptions.last_axis_parity
must be set toodd
to recover the original signal.
Methods
- __init__(
- operand,
- *,
- axes=None,
- options: FFTOptions | None = None,
- execution: ExecutionCPU | ExecutionCUDA | None = None,
- stream=None,
- static create_key(
- operand,
- *,
- axes=None,
- options=None,
- execution=None,
- prolog=None,
- epilog=None,
Create a key as a compact representation of the FFT problem specification based on the given operand, axes and the FFT options. Note that different combinations of operand layout, axes and options can potentially correspond to the same underlying problem specification (key). Users may reuse the FFT objects when different input problems map to an identical key.
- Parameters:
operand – A tensor (ndarray-like object). The currently supported types are
numpy.ndarray
,cupy.ndarray
, andtorch.Tensor
.axes – The dimensions along which the FFT is performed.
axes[-1]
is the ‘last transformed’ axis for rffts. Currently, it is required that the axes are contiguous and include the first or the last dimension. Only up to 3D FFTs are supported.options – Specify options for the FFT as a
FFTOptions
object. Alternatively, adict
containing the parameters for theFFTOptions
constructor can also be provided. If not specified, the value will be set to the default-constructedFFTOptions
object.execution – Specify execution space options for the FFT as a
ExecutionCUDA
orExecutionCPU
object. Alternatively, a string (‘cuda’ or ‘cpu’), or adict
with the ‘name’ key set to ‘cpu’ or ‘cuda’ and optional parameters relevant to the given execution space. If not specified, the execution space will be selected to match operand’s storage (in GPU or host memory), and the correspondingExecutionCUDA
orExecutionCPU
object will be default-constructed.prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no prolog. Currently, callbacks are supported only with CUDA execution.epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no epilog. Currently, callbacks are supported only with CUDA execution.
- Returns:
A tuple as the key to represent the input FFT problem.
Notes
Users may take advantage of this method to create cached version of
fft()
based on the stateful object APIs (see caching.py for an example implementation).This key is meant for runtime use only and not designed to be serialized or used on a different machine.
It is the user’s responsibility to augment this key with the stream in case they use stream-ordered memory pools.
- execute(direction=None, stream=None, release_workspace=False)[source]#
Execute the FFT operation.
- Parameters:
direction – Specify whether forward or inverse FFT is performed (
FFTDirection
object, or as a string from [‘forward’, ‘inverse’], “or as an int from [-1, 1] denoting forward and inverse directions respectively).stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t
(as Pythonint
),cupy.cuda.Stream
, andtorch.cuda.Stream
. If a stream is not provided, the current stream from the operand package will be used.release_workspace – A value of
True
specifies that the stateful object should release workspace memory back to the package memory pool on function return, while a value ofFalse
specifies that the object should retain the memory. This option may be set toTrue
if the application performs other operations that consume a lot of memory between successive calls to the (same or different)execute()
API, but incurs a small overhead due to obtaining and releasing workspace memory from and to the package memory pool on every call. The default isFalse
.
- Returns:
The transformed operand, which remains on the same device and utilizes the same package as the input operand. The data type and shape of the transformed operand depend on the type of input operand:
For C2C FFT, the data type and shape remain identical to the input.
For R2C and C2R FFT, both data type and shape differ from the input.
- free()[source]#
Free FFT resources.
It is recommended that the
FFT
object be used within a context, but if it is not possible then this method must be called explicitly to ensure that the FFT resources (especially internal library objects) are properly cleaned up.
- get_input_layout()[source]#
Returns a pair of tuples: shape and strides of the FFT input.
Note
In some cases, the FFT operation requires taking a copy of the input tensor (e.g. C2R cuFFT, or provided tensor resides on CPU but FFT is executed on GPU). The copied tensor strides may differ from the input tensor passed by the user, if the original tensor’s strides do not conform to dense C-like layout.
- get_key(*, prolog=None, epilog=None)[source]#
Get the key for this object’s data supplemented with the callbacks.
- Parameters:
prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no prolog. Currently, callbacks are supported only with CUDA execution.epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no epilog. Currently, callbacks are supported only with CUDA execution.
- Returns:
A tuple as the key to represent the input FFT problem.
See also
- plan(
- *,
- prolog=None,
- epilog=None,
- stream: AnyStream | None = None,
- direction=None,
Plan the FFT.
- Parameters:
prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no prolog. Currently, callbacks are supported only with CUDA execution.epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no epilog. Currently, callbacks are supported only with CUDA execution.stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t
(as Pythonint
),cupy.cuda.Stream
, andtorch.cuda.Stream
. If a stream is not provided, the current stream from the operand package will be used.direction – If specified, the same direction must be passed to subsequent
execute()
calls. It may be used as a hint to optimize C2C planning for CPU FFT calls.
- reset_operand(operand=None, *, stream=None)[source]#
Reset the operand held by this
FFT
instance. This method has two use cases:it can be used to provide a new operand for execution
it can be used to release the internal reference to the previous operand and potentially make its memory available for other use by passing
operand=None
.
- Parameters:
operand –
A tensor (ndarray-like object) compatible with the previous one or
None
(default). A value ofNone
will release the internal reference to the previous operand and user is expected to set a new operand before again callingexecute()
. The new operand is considered compatible if all the following properties match with the previous one:The problem specification key for the new operand. Generally the keys will match if the operand shares the same layout (shape, strides and data type). The keys may still match for certain operands with different layout, see
create_key()
for details.The package that the new operand belongs to.
The memory space of the new operand (CPU or GPU).
The device that new operand belongs to if it is on GPU.
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t
(as Pythonint
),cupy.cuda.Stream
, andtorch.cuda.Stream
. If a stream is not provided, the current stream from the operand package will be used..
Examples
>>> import cupy as cp >>> import nvmath
Create a 3-D complex128 ndarray on the GPU:
>>> shape = 128, 128, 128 >>> a = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)
Create an FFT object as a context manager
>>> axes = 0, 1 >>> with nvmath.fft.FFT(a, axes=axes) as f: ... # Plan the FFT ... f.plan() ... ... # Execute the FFT to get the first result. ... r1 = f.execute() ... ... # Reset the operand to a new CuPy ndarray. ... b = cp.random.rand(*shape) + 1j * cp.random.rand(*shape) ... f.reset_operand(b) ... ... # Execute to get the new result corresponding to the updated operand. ... r2 = f.execute()
With
reset_operand()
, minimal overhead is achieved as problem specification and planning are only performed once.For the particular example above, explicitly calling
reset_operand()
is equivalent to updating the operand in-place, i.e, replacingf.reset_operand(b)
witha[:]=b
. Note that updating the operand in-place should be adopted with caution as it can only yield the expected result and incur no additional copies under the additional constraints below:The operation is not a complex-to-real (C2R) FFT.
The operand’s memory matches the FFT execution space. More precisely, the operand memory space should be accessible from the execution space (CPU or CUDA).
For more details, please refer to inplace update example.