FFT#

Create a stateful object that encapsulates the specified FFT computations and required resources. This object ensures the validity of resources during use and releases them when they are no longer needed to prevent misuse.

This object encompasses all functionalities of function-form APIs fft(), ifft(), rfft(), and irfft(), which are convenience wrappers around it. The stateful object also allows for the amortization of preparatory costs when the same FFT operation is to be performed on multiple operands with the same problem specification (see reset_operand(), reset_operand_unchecked(), and create_key() for more details).

Using the stateful object typically involves the following steps:

Problem Specification: Initialize the object with a defined operation and options.
Preparation: Use plan() to determine the best algorithmic implementation for this specific FFT operation.
Execution: Perform the FFT computation with execute(), which can be either forward or inverse FFT transformation.
Resource Management: Ensure all resources are released either by explicitly calling free() or by managing the stateful object within a context manager.

Detailed information on each step described above can be obtained by passing in a logging.Logger object to FFTOptions or by setting the appropriate options in the root logger object, which is used by default:

>>> import logging
>>> logging.basicConfig(
...     level=logging.INFO,
...     format="%(asctime)s %(levelname)-8s %(message)s",
...     datefmt="%m-%d %H:%M:%S",
... )

Parameters:

operand – A tensor (ndarray-like object). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.
axes – The dimensions along which the FFT is performed. axes[-1] is the ‘last transformed’ axis for rffts. Currently, it is required that the axes are contiguous and include the first or the last dimension. Only up to 3D FFTs are supported.
options – Specify options for the FFT as a FFTOptions object. Alternatively, a dict containing the parameters for the FFTOptions constructor can also be provided. If not specified, the value will be set to the default-constructed FFTOptions object.
execution – Specify execution space options for the FFT as a ExecutionCUDA or ExecutionCPU object. Alternatively, a string (‘cuda’ or ‘cpu’), or a dict with the ‘name’ key set to ‘cpu’ or ‘cuda’ and optional parameters relevant to the given execution space. If not specified, the execution space will be selected to match operand’s storage (in GPU or host memory), and the corresponding ExecutionCUDA or ExecutionCPU object will be default-constructed.
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

Examples

>>> import cupy as cp
>>> import nvmath

Create a 3-D complex128 ndarray on the GPU:

>>> shape = 128, 128, 128
>>> a = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)

We will define a 2-D C2C FFT operation along the first two dimensions, batched along the last dimension:

>>> axes = 0, 1

Create an FFT object encapsulating the problem specification above:

>>> f = nvmath.fft.FFT(a, axes=axes)

Options can be provided above to control the behavior of the operation using the options argument (see FFTOptions). Similarly, the execution space (CUDA or CPU) and execution options can be passed using the execution argument (see ExecutionCUDA, ExecutionCPU).

Next, plan the FFT. Load and/or store callback functions can be provided to plan() using the prolog and epilog option:

>>> f.plan()

Now execute the FFT, and obtain the result r1 as a CuPy ndarray. The transform will be performed on GPU, because execution was not explicitly specified and a resides in GPU memory.

>>> r1 = f.execute()

Finally, free the FFT object’s resources. To avoid this explicit call, it’s recommended to use the FFT object as a context manager as shown below, if possible.

>>> f.free()

Note that all FFT methods execute on the current stream by default. Alternatively, the stream argument can be used to run a method on a specified stream.

Let’s now look at the same problem with NumPy ndarrays on the CPU.

Create a 3-D complex128 NumPy ndarray on the CPU:

>>> import numpy as np
>>> shape = 128, 128, 128
>>> a = np.random.rand(*shape) + 1j * np.random.rand(*shape)

Create an FFT object encapsulating the problem specification described earlier and use it as a context manager.

>>> with nvmath.fft.FFT(a, axes=axes) as f:
...     f.plan()
...
...     # Execute the FFT to get the first result.
...     r1 = f.execute()

All the resources used by the object are released at the end of the block.

The operation was performed on the CPU because a resides in host memory. With execution specified to ‘cuda’, the NumPy array would be temporarily copied to device memory and transformed on the GPU:

>>> with nvmath.fft.FFT(a, axes=axes, execution="cuda") as f:
...     f.plan()
...
...     # Execute the FFT to get the first result.
...     r1 = f.execute()

Further examples can be found in the nvmath/examples/fft directory.

Notes

The input must be Hermitian-symmetric when FFTOptions.fft_type is 'C2R', otherwise the result is undefined. As a specific example, if the input for a C2R FFT was generated using an R2C FFT with an odd last axis size, then FFTOptions.last_axis_parity must be set to odd to recover the original signal.

Methods

__init__( operand, *, axes: Sequence[int] | None = None, options: FFTOptions | None = None, execution: ExecutionCPU | ExecutionCUDA | None = None, stream: AnyStream | None = None, )[source]#

Create a key as a compact representation of the FFT problem specification based on the given operand, axes and the FFT options. Note that different combinations of operand layout, axes and options can potentially correspond to the same underlying problem specification (key). Users may reuse the FFT objects when different input problems map to an identical key.

Parameters:

operand – A tensor (ndarray-like object). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.
axes – The dimensions along which the FFT is performed. axes[-1] is the ‘last transformed’ axis for rffts. Currently, it is required that the axes are contiguous and include the first or the last dimension. Only up to 3D FFTs are supported.
options – Specify options for the FFT as a FFTOptions object. Alternatively, a dict containing the parameters for the FFTOptions constructor can also be provided. If not specified, the value will be set to the default-constructed FFTOptions object.
execution – Specify execution space options for the FFT as a ExecutionCUDA or ExecutionCPU object. Alternatively, a string (‘cuda’ or ‘cpu’), or a dict with the ‘name’ key set to ‘cpu’ or ‘cuda’ and optional parameters relevant to the given execution space. If not specified, the execution space will be selected to match operand’s storage (in GPU or host memory), and the corresponding ExecutionCUDA or ExecutionCPU object will be default-constructed.
prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no prolog. Currently, callbacks are supported only with CUDA execution.
epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no epilog. Currently, callbacks are supported only with CUDA execution.

Returns:

A tuple as the key to represent the input FFT problem.

Notes

Users may take advantage of this method to create cached version of fft() based on the stateful object APIs (see caching.py for an example implementation).
This key is meant for runtime use only and not designed to be serialized or used on a different machine.
It is the user’s responsibility to augment this key with the stream in case they use stream-ordered memory pools.

execute( direction: FFTDirection | None = None, stream: AnyStream | None = None, release_workspace: bool = False, )[source]#

Execute the FFT operation.

Parameters:

direction – Specify whether forward or inverse FFT is performed (FFTDirection object, or as a string from [‘forward’, ‘inverse’], “or as an int from [-1, 1] denoting forward and inverse directions respectively).
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.
release_workspace – A value of True specifies that the stateful object should release workspace memory back to the package memory pool on function return, while a value of False specifies that the object should retain the memory. This option may be set to True if the application performs other operations that consume a lot of memory between successive calls to the (same or different) execute() API, but incurs a small overhead due to obtaining and releasing workspace memory from and to the package memory pool on every call. The default is False.

Returns:

The transformed operand, which remains on the same device and utilizes the same package as the input operand. The data type and shape of the transformed operand depend on the type of input operand:

For C2C FFT, the data type and shape remain identical to the input.
For R2C and C2R FFT, both data type and shape differ from the input.

free()[source]#

Free FFT resources.

It is recommended that the FFT object be used within a context, but if it is not possible then this method must be called explicitly to ensure that the FFT resources (especially internal library objects) are properly cleaned up.

get_input_layout()[source]#: Returns a pair of tuples: shape and strides of the FFT input.

Note

In some cases, the FFT operation requires taking a copy of the input tensor (e.g. C2R cuFFT, or provided tensor resides on CPU but FFT is executed on GPU). The copied tensor strides may differ from the input tensor passed by the user, if the original tensor’s strides do not conform to dense C-like layout.

get_key( *, prolog: DeviceCallable | None = None, epilog: DeviceCallable | None = None, )[source]#

Get the key for this object’s data supplemented with the callbacks.

Parameters:

prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no prolog. Currently, callbacks are supported only with CUDA execution.
epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no epilog. Currently, callbacks are supported only with CUDA execution.

Returns:

A tuple as the key to represent the input FFT problem.

See also

create_key()

get_output_layout()[source]#: Returns a pair of tuples: shape and strides of the FFT output.

plan( *, prolog: DeviceCallable | None = None, epilog: DeviceCallable | None = None, stream: AnyStream | None = None, direction: FFTDirection | None = None, )[source]#

Plan the FFT.

Parameters:

prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no prolog. Currently, callbacks are supported only with CUDA execution.
epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no epilog. Currently, callbacks are supported only with CUDA execution.
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.
direction – If specified, the same direction must be passed to subsequent execute() calls. It may be used as a hint to optimize C2C planning for CPU FFT calls.

reset_operand( operand=None, *, stream: AnyStream | None = None, )[source]#

Reset the operand held by this FFT instance. This method has two use cases:

it can be used to provide a new operand for execution
it can be used to release the internal reference to the previous operand and potentially make its memory available for other use by passing operand=None.

Parameters:

operand –
A tensor (ndarray-like object) compatible with the previous one or None (default). A value of None will release the internal reference to the previous operand and user is expected to set a new operand before again calling execute(). The new operand is considered compatible if all the following properties match with the previous one:
- The problem specification key for the new operand. Generally the keys will match if the operand shares the same layout (shape, strides and data type). The keys may still match for certain operands with different layout, see create_key() for details.
- The package that the new operand belongs to.
- The memory space of the new operand (CPU or GPU).
- The device that new operand belongs to if it is on GPU.
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used..

Semantics:

When used for the use case of providing a new valid operand, the following scenarios apply:

If execution space == memory space and the FFT is not a C2R transform: operand reference update with no data copying.
If execution space == memory space, the FFT is a C2R transform: one data copy to an auxiliary tensor, required to prevent cuFFT from overwriting the user’s input.
If execution space != memory space: data must be copied between different memory spaces.

Examples

>>> import cupy as cp
>>> import nvmath

Create a 3-D complex128 ndarray on the GPU:

>>> shape = 128, 128, 128
>>> a = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)

Create an FFT object as a context manager

>>> axes = 0, 1
>>> with nvmath.fft.FFT(a, axes=axes) as f:
...     # Plan the FFT
...     f.plan()
...
...     # Execute the FFT to get the first result.
...     r1 = f.execute()
...
...     # Reset the operand to a new CuPy ndarray.
...     b = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)
...     f.reset_operand(b)
...
...     # Execute to get the new result corresponding to the updated operand.
...     r2 = f.execute()

With reset_operand(), minimal overhead is achieved as problem specification and planning are only performed once. However it still performs validation to ensure that the operand is compatible with the original, and, if enabled, logging. See reset_operand_unchecked() for an alternative when the caller has already validated the operand or chooses to skip validation and logging.

For the particular example above, explicitly calling reset_operand() is equivalent to updating the operand in-place, i.e, replacing f.reset_operand(b) with a[:]=b. Note that updating the operand in-place should be adopted with caution as it can only yield the expected result and incur no additional copies under the additional constraints below:

The operation is not a complex-to-real (C2R) FFT.

The operand’s memory matches the FFT execution space. More precisely, the operand memory space should be accessible from the execution space (CPU or CUDA).

For more details, please refer to inplace update example.

reset_operand_unchecked( operand, *, stream: AnyStream | None = None, )[source]#

This method is experimental and potentially subject to future changes.

This method is a performance-optimized alternative to reset_operand() that eliminates validation and logging overhead, making it ideal for performance-critical loops where operand compatibility is guaranteed by the caller.

Parameters:

operand – A tensor (ndarray-like object) that is guaranteed by the user to be compatible with the original operand used during planning. See the operand parameter in reset_operand() for the definition of compatibility.
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used..

Returns:

None

Semantics:

The semantics are the same as in reset_operand(), except for the following differences:

This method does not perform any validation (e.g. package match, data type match, key match, etc.) and logging.
This method does not support releasing the operand by passing None as an argument. To release the operand, use reset_operand() instead.

When to Use:

Performance-critical loops with repeated FFT executions on different operands
After verifying correctness with reset_operand() during development
When operand compatibility is guaranteed by construction or invariant

Examples

Example 1: Optimizing a processing loop

import cupy as cp
import nvmath

shape = (1024, 1024)
operand = cp.random.rand(*shape, dtype=cp.complex64)

fft = nvmath.fft.FFT(operand, execution="cuda")
with fft:
    fft.plan()
    for i in range(10000):
        # Process and create new operand with the same shape, dtype,
        # and device as the original operand
        new_operand = process_data(...)
        fft.reset_operand_unchecked(new_operand)
        result = fft.execute()
        # block until the result is ready
        ...

Example 2: Streaming data processing

Processing a stream of incoming data operands with identical layout:

import cupy as cp
import nvmath

# Create a stateful FFT object and prepare it once.
shape = (512, 512)
initial_operand = cp.empty(shape, dtype=cp.complex64)

fft = nvmath.fft.FFT(initial_operand, execution="cuda")
with fft:
    fft.plan()

    # Process stream of incoming operands
    for operand in incoming_data_stream():
        # The user guarantees that the operand is compatible
        # with the original (same shape, dtype, device, ...).
        fft.reset_operand_unchecked(operand)
        result = fft.execute()
        # block until the result is ready
        process_spectrum(result)
        ...