nvmath.fft.FFT¶
- class nvmath.fft.FFT(operand, *, axes=None, options=None, stream=None)[source]¶
Create a stateful object that encapsulates the specified FFT computations and required resources. This object ensures the validity of resources during use and releases them when they are no longer needed to prevent misuse.
This object encompasses all functionalities of function-form APIs
fft()
,ifft()
,rfft()
, andirfft()
, which are convenience wrappers around it. The stateful object also allows for the amortization of preparatory costs when the same FFT operation is to be performed on multiple operands with the same problem specification (seereset_operand()
andcreate_key()
for more details).Using the stateful object typically involves the following steps:
Problem Specification: Initialize the object with a defined operation and options.
Preparation: Use
plan()
to determine the best algorithmic implementation for this specific FFT operation.Execution: Perform the FFT computation with
execute()
, which can be either forward or inverse FFT transformation.Resource Management: Ensure all resources are released either by explicitly calling
free()
or by managing the stateful object within a context manager.
Detailed information on each step described above can be obtained by passing in a
logging.Logger
object toFFTOptions
or by setting the appropriate options in the root logger object, which is used by default:>>> import logging >>> logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)-8s %(message)s', datefmt='%m-%d %H:%M:%S')
- Parameters:
operand – A tensor (ndarray-like object). The currently supported types are
numpy.ndarray
,cupy.ndarray
, andtorch.Tensor
.axes – The dimensions along which the FFT is performed. Currently, it is required that the axes are contiguous and include the first or the last dimension. Only up to 3D FFTs are supported.
options – Specify options for the FFT as a
FFTOptions
object. Alternatively, adict
containing the parameters for theFFTOptions
constructor can also be provided. If not specified, the value will be set to the default-constructedFFTOptions
object.stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t
(as Pythonint
),cupy.cuda.Stream
, andtorch.cuda.Stream
. If a stream is not provided, the current stream from the operand package will be used.
See also
Examples
>>> import numpy as np >>> import nvmath
Create a 3-D complex128 ndarray on the CPU:
>>> shape = 128, 128, 128 >>> a = np.random.rand(*shape) + 1j * np.random.rand(*shape)
We will define a 2-D C2C FFT operation along the first two dimensions, batched along the last dimension:
>>> axes = 0, 1
Create an FFT object encapsulating the problem specification above:
>>> f = nvmath.fft.FFT(a, axes=axes)
Options can be provided above to control the behavior of the operation using the
options
argument (seeFFTOptions
).Next, plan the FFT. Load and/or store callback functions can be provided to
plan()
using theprolog
andepilog
option:>>> f.plan()
Now execute the FFT, and obtain the result
r1
as a NumPy ndarray.>>> r1 = f.execute()
Finally, free the FFT object’s resources. _To avoid having to explictly making this call, it’s recommended to use the FFT object as a context manager as shown below, if possible.
>>> f.free()
Note that all
FFT
methods execute on the current stream by default. Alternatively, thestream
argument can be used to run a method on a specified stream.Let’s now look at the same problem with CuPy ndarrays on the GPU.
Create a 3-D complex128 CuPy ndarray on the GPU:
>>> import cupy as cp >>> shape = 128, 128, 128 >>> a = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)
Create an FFT object encapsulating the problem specification described earlier and use it as a context manager.
>>> with nvmath.fft.FFT(a, axes=axes) as f: ... f.plan() ... ... # Execute the FFT to get the first result. ... r1 = f.execute()
All the resources used by the object are released at the end of the block.
Further examples can be found in the nvmath/examples/fft directory.
Notes
The input must be Hermitian-symmetric when
FFTOptions.fft_type
is'C2R'
, otherwise the result is undefined. As a specific example, if the input for a C2R FFT was generated using an R2C FFT with an odd last axis size, thenFFTOptions.last_axis_size
must be set toodd
to recover the original signal.
Methods
- static create_key(operand, *, axes=None, options=None, prolog=None, epilog=None)[source]¶
Create a key as a compact representation of the FFT problem specification based on the given operand, axes and the FFT options. Note that different combinations of operand layout, axes and options can potentially correspond to the same underlying problem specification (key). Users may reuse the FFT objects when different input problems map to an identical key.
- Parameters:
operand – A tensor (ndarray-like object). The currently supported types are
numpy.ndarray
,cupy.ndarray
, andtorch.Tensor
.axes – The dimensions along which the FFT is performed. Currently, it is required that the axes are contiguous and include the first or the last dimension. Only up to 3D FFTs are supported.
options – Specify options for the FFT as a
FFTOptions
object. Alternatively, adict
containing the parameters for theFFTOptions
constructor can also be provided. If not specified, the value will be set to the default-constructedFFTOptions
object.prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no prolog.epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no epilog.
- Returns:
A tuple as the key to represent the input FFT problem.
Notes
Users may take advantage of this method to create cached version of
fft()
based on the stateful object APIs (see caching.py for an example implementation).This key is meant for runtime use only and not designed to be serialized or used on a different machine.
It is the user’s responsiblity to augment this key with the stream in case they use stream-ordered memory pools.
- execute(direction=None, stream=None, release_workspace=False)[source]¶
Execute the FFT operation.
- Parameters:
direction – Specify whether forward or inverse FFT is performed (
FFTDirection
object, or as a string from [‘forward’, ‘inverse’], or as an int from [-1, 1] denoting forward and inverse directions respectively).stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t
(as Pythonint
),cupy.cuda.Stream
, andtorch.cuda.Stream
. If a stream is not provided, the current stream from the operand package will be used.release_workspace – A value of
True
specifies that the stateful object should release workspace memory back to the package memory pool on function return, while a value ofFalse
specifies that the object should retain the memory. This option may be set toTrue
if the application performs other operations that consume a lot of memory between successive calls to the (same or different)execute()
API, but incurs a small overhead due to obtaining and releasing workspace memory from and to the package memory pool on every call. The default isFalse
.
- Returns:
The transformed operand, which remains on the same device and utilizes the same package as the input operand. The data type and shape of the transformed operand depend on the type of input operand:
For C2C FFT, the data type and shape remain identical to the input.
For R2C and C2R FFT, both data type and shape differ from the input.
- free()[source]¶
Free FFT resources.
It is recommended that the
FFT
object be used within a context, but if it is not possible then this method must be called explicitly to ensure that the FFT resources (especially internal library objects) are properly cleaned up.
- get_key(*, prolog=None, epilog=None)[source]¶
Get the key for this object’s data supplemented with the callbacks.
- Parameters:
prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no prolog.epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no epilog.
- Returns:
A tuple as the key to represent the input FFT problem.
See also
- plan(*, prolog=None, epilog=None, stream=None)[source]¶
Plan the FFT.
- Parameters:
prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no prolog.epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type
DeviceCallable
. Alternatively, adict
containing the parameters for theDeviceCallable
constructor can also be provided. The default is no epilog.stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t
(as Pythonint
),cupy.cuda.Stream
, andtorch.cuda.Stream
. If a stream is not provided, the current stream from the operand package will be used.
- reset_operand(operand=None, *, stream=None)[source]¶
Reset the operand held by this
FFT
instance. This method has two use cases: (1) it can be used to provide a new operand for execution when the operand is on the CPU, and (2) it can be used to release the internal reference to the previous operand and potentially make its memory available for other use by passingoperand=None
.- Parameters:
operand –
A tensor (ndarray-like object) compatible with the previous one or
None
(default). A value ofNone
will release the internal reference to the previous operand and user is expected to set a new operand before again callingexecute()
. The new operand is considered compatible if all following properties match with the previous one:The problem specification key for the new operand. Generally the keys will match if the operand shares the same layout (shape, strides and data type). The keys may still match for certain operands with different layout, see
create_key()
for details.The package that the new operand belongs to .
The device that new operand belongs to if it is on GPU.
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include
cudaStream_t
(as Pythonint
),cupy.cuda.Stream
, andtorch.cuda.Stream
. If a stream is not provided, the current stream from the operand package will be used..
Examples
>>> import cupy as cp >>> import nvmath
Create a 3-D complex128 ndarray on the GPU:
>>> shape = 128, 128, 128 >>> a = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)
Create an FFT object as a context manager
>>> axes = 0, 1 >>> with nvmath.fft.FFT(a, axes=axes) as f: ... # Plan the FFT ... f.plan() ... ... # Execute the FFT to get the first result. ... r1 = f.execute() ... ... # Reset the operand to a new CuPy ndarray. ... b = cp.random.rand(*shape) + 1j * cp.random.rand(*shape) ... f.reset_operand(b) ... ... # Execute to get the new result corresponding to the updated operand. ... r2 = f.execute()
With
reset_operand()
, minimal overhead is achieved as problem specification and planning are only performed once.For the particular example above, explicitly calling
reset_operand()
is equivalent to updating the operand in-place, i.e, replacingf.reset_operand(b)
witha[:]=b
. Note that updating the operand in-place should be adopted with caution as it can only yield the expected result under the additional constraints below:The operation is not a complex-to-real (C2R) FFT.
The operand is on the GPU (more precisely, the operand memory space should be accessible from the execution space).
For more details, please refer to inplace update example.