nvmath.fft.FFT

class nvmath.fft.FFT(operand, *, axes=None, options=None, stream=None)[source]

Create a stateful object that encapsulates the specified FFT computations and required resources. This object ensures the validity of resources during use and releases them when they are no longer needed to prevent misuse.

This object encompasses all functionalities of function-form APIs fft(), ifft(), rfft(), and irfft(), which are convenience wrappers around it. The stateful object also allows for the amortization of preparatory costs when the same FFT operation is to be performed on multiple operands with the same problem specification (see reset_operand() and create_key() for more details).

Using the stateful object typically involves the following steps:

  1. Problem Specification: Initialize the object with a defined operation and options.

  2. Preparation: Use plan() to determine the best algorithmic implementation for this specific FFT operation.

  3. Execution: Perform the FFT computation with execute(), which can be either forward or inverse FFT transformation.

  4. Resource Management: Ensure all resources are released either by explicitly calling free() or by managing the stateful object within a context manager.

Detailed information on each step described above can be obtained by passing in a logging.Logger object to FFTOptions or by setting the appropriate options in the root logger object, which is used by default:

>>> import logging
>>> logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)-8s %(message)s', datefmt='%m-%d %H:%M:%S')
Parameters:
  • operand – A tensor (ndarray-like object). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • axes – The dimensions along which the FFT is performed. Currently, it is required that the axes are contiguous and include the first or the last dimension. Only up to 3D FFTs are supported.

  • options – Specify options for the FFT as a FFTOptions object. Alternatively, a dict containing the parameters for the FFTOptions constructor can also be provided. If not specified, the value will be set to the default-constructed FFTOptions object.

  • stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

Examples

>>> import numpy as np
>>> import nvmath

Create a 3-D complex128 ndarray on the CPU:

>>> shape = 128, 128, 128
>>> a = np.random.rand(*shape) + 1j * np.random.rand(*shape)

We will define a 2-D C2C FFT operation along the first two dimensions, batched along the last dimension:

>>> axes = 0, 1

Create an FFT object encapsulating the problem specification above:

>>> f = nvmath.fft.FFT(a, axes=axes)

Options can be provided above to control the behavior of the operation using the options argument (see FFTOptions).

Next, plan the FFT. Load and/or store callback functions can be provided to plan() using the prolog and epilog option:

>>> f.plan()

Now execute the FFT, and obtain the result r1 as a NumPy ndarray.

>>> r1 = f.execute()

Finally, free the FFT object’s resources. _To avoid having to explictly making this call, it’s recommended to use the FFT object as a context manager as shown below, if possible.

>>> f.free()

Note that all FFT methods execute on the current stream by default. Alternatively, the stream argument can be used to run a method on a specified stream.

Let’s now look at the same problem with CuPy ndarrays on the GPU.

Create a 3-D complex128 CuPy ndarray on the GPU:

>>> import cupy as cp
>>> shape = 128, 128, 128
>>> a = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)

Create an FFT object encapsulating the problem specification described earlier and use it as a context manager.

>>> with nvmath.fft.FFT(a, axes=axes) as f:
...    f.plan()
...
...    # Execute the FFT to get the first result.
...    r1 = f.execute()

All the resources used by the object are released at the end of the block.

Further examples can be found in the nvmath/examples/fft directory.

Notes

  • The input must be Hermitian-symmetric when FFTOptions.fft_type is 'C2R', otherwise the result is undefined. As a specific example, if the input for a C2R FFT was generated using an R2C FFT with an odd last axis size, then FFTOptions.last_axis_size must be set to odd to recover the original signal.

Methods

__init__(operand, *, axes=None, options=None, stream=None)[source]
static create_key(operand, *, axes=None, options=None, prolog=None, epilog=None)[source]

Create a key as a compact representation of the FFT problem specification based on the given operand, axes and the FFT options. Note that different combinations of operand layout, axes and options can potentially correspond to the same underlying problem specification (key). Users may reuse the FFT objects when different input problems map to an identical key.

Parameters:
  • operand – A tensor (ndarray-like object). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

  • axes – The dimensions along which the FFT is performed. Currently, it is required that the axes are contiguous and include the first or the last dimension. Only up to 3D FFTs are supported.

  • options – Specify options for the FFT as a FFTOptions object. Alternatively, a dict containing the parameters for the FFTOptions constructor can also be provided. If not specified, the value will be set to the default-constructed FFTOptions object.

  • prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no prolog.

  • epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no epilog.

Returns:

A tuple as the key to represent the input FFT problem.

Notes

  • Users may take advantage of this method to create cached version of fft() based on the stateful object APIs (see caching.py for an example implementation).

  • This key is meant for runtime use only and not designed to be serialized or used on a different machine.

  • It is the user’s responsiblity to augment this key with the stream in case they use stream-ordered memory pools.

execute(direction=None, stream=None, release_workspace=False)[source]

Execute the FFT operation.

Parameters:
  • direction – Specify whether forward or inverse FFT is performed (FFTDirection object, or as a string from [‘forward’, ‘inverse’], or as an int from [-1, 1] denoting forward and inverse directions respectively).

  • stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

  • release_workspace – A value of True specifies that the stateful object should release workspace memory back to the package memory pool on function return, while a value of False specifies that the object should retain the memory. This option may be set to True if the application performs other operations that consume a lot of memory between successive calls to the (same or different) execute() API, but incurs a small overhead due to obtaining and releasing workspace memory from and to the package memory pool on every call. The default is False.

Returns:

The transformed operand, which remains on the same device and utilizes the same package as the input operand. The data type and shape of the transformed operand depend on the type of input operand:

  • For C2C FFT, the data type and shape remain identical to the input.

  • For R2C and C2R FFT, both data type and shape differ from the input.

free()[source]

Free FFT resources.

It is recommended that the FFT object be used within a context, but if it is not possible then this method must be called explicitly to ensure that the FFT resources (especially internal library objects) are properly cleaned up.

get_key(*, prolog=None, epilog=None)[source]

Get the key for this object’s data supplemented with the callbacks.

Parameters:
  • prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no prolog.

  • epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no epilog.

Returns:

A tuple as the key to represent the input FFT problem.

See also

create_key()

plan(*, prolog=None, epilog=None, stream=None)[source]

Plan the FFT.

Parameters:
  • prolog – Provide device-callable function in LTO-IR format to use as load-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no prolog.

  • epilog – Provide device-callable function in LTO-IR format to use as store-callback as an object of type DeviceCallable. Alternatively, a dict containing the parameters for the DeviceCallable constructor can also be provided. The default is no epilog.

  • stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

reset_operand(operand=None, *, stream=None)[source]

Reset the operand held by this FFT instance. This method has two use cases: (1) it can be used to provide a new operand for execution when the operand is on the CPU, and (2) it can be used to release the internal reference to the previous operand and potentially make its memory available for other use by passing operand=None.

Parameters:
  • operand

    A tensor (ndarray-like object) compatible with the previous one or None (default). A value of None will release the internal reference to the previous operand and user is expected to set a new operand before again calling execute(). The new operand is considered compatible if all following properties match with the previous one:

    • The problem specification key for the new operand. Generally the keys will match if the operand shares the same layout (shape, strides and data type). The keys may still match for certain operands with different layout, see create_key() for details.

    • The package that the new operand belongs to .

    • The device that new operand belongs to if it is on GPU.

  • stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used..

Examples

>>> import cupy as cp
>>> import nvmath

Create a 3-D complex128 ndarray on the GPU:

>>> shape = 128, 128, 128
>>> a = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)

Create an FFT object as a context manager

>>> axes = 0, 1
>>> with nvmath.fft.FFT(a, axes=axes) as f:
...    # Plan the FFT
...    f.plan()
...
...    # Execute the FFT to get the first result.
...    r1 = f.execute()
...
...    # Reset the operand to a new CuPy ndarray.
...    b = cp.random.rand(*shape) + 1j * cp.random.rand(*shape)
...    f.reset_operand(b)
...
...    # Execute to get the new result corresponding to the updated operand.
...    r2 = f.execute()

With reset_operand(), minimal overhead is achieved as problem specification and planning are only performed once.

For the particular example above, explicitly calling reset_operand() is equivalent to updating the operand in-place, i.e, replacing f.reset_operand(b) with a[:]=b. Note that updating the operand in-place should be adopted with caution as it can only yield the expected result under the additional constraints below:

  • The operation is not a complex-to-real (C2R) FFT.

  • The operand is on the GPU (more precisely, the operand memory space should be accessible from the execution space).

For more details, please refer to inplace update example.