rfft#

nvmath.distributed.fft.rfft( operand, distribution: Slab | Sequence[Box], sync_symmetric_memory: bool = True, options: FFTOptions | None = None, stream: AnyStream | None = None, )[source]#

Perform an N-D real-to-complex (R2C) distributed FFT on the provided real operand.

Parameters:

operand –
A tensor (ndarray-like object). The currently supported types are numpy.ndarray, cupy.ndarray, and torch.Tensor.

Important

GPU operands must be on the symmetric heap (for example, allocated with nvmath.distributed.allocate_symmetric_memory()).
distribution – Specifies the distribution of input and output operands across processes, which can be: (i) according to a Slab distribution (see Slab), or (ii) a custom box distribution. With Slab distribution, this indicates the distribution of the input operand (the output operand will use the complementary Slab distribution). With box distribution, this indicates the input and output boxes.
sync_symmetric_memory – Indicates whether to issue a symmetric memory synchronization operation on the execute stream before the FFT. Note that before the FFT starts executing, it is required that the input operand be ready on all processes. A symmetric memory synchronization ensures completion and visibility by all processes of previously issued local stores to symmetric memory. Advanced users who choose to manage the synchronization on their own using the appropriate NVSHMEM API, or who know that GPUs are already synchronized on the source operand, can set this to False.
options – Specify options for the FFT as a FFTOptions object. Alternatively, a dict containing the parameters for the FFTOptions constructor can also be provided. If not specified, the value will be set to the default-constructed FFTOptions object.
stream – Provide the CUDA stream to use for executing the operation. Acceptable inputs include cudaStream_t (as Python int), cupy.cuda.Stream, and torch.cuda.Stream. If a stream is not provided, the current stream from the operand package will be used.

Returns:

A complex tensor whose shape will depend on the choice of distribution and reshape option. The operand remains on the same device and belongs to the same package as the input operand. The global extent of the last transformed axis in the result will be global_extent[-1] // 2 + 1.