Release Notes#

This section includes significant changes, new features, performance improvements, and various issues. Unless noted, listed issues should not impact functionality. When functionality is impacted, we offer a work-around to avoid the issue (if available).

NVPL FFT 0.4.2 EA (nvpl-25.5-beta)#

New features#

  • Improved single- and multi-threaded performance of complex-to-complex, complex-to-real and real-to-complex transforms for sizes from 2 to 1024.

Table: Geomean speedup of NVPL FFT 0.4.2 vs NVPL FFT 0.4.1 for sizes \(2^i\times 3^j\times 5^k\times 7^j \leq 1024\).#

type

size

complex-to-complex

real-to-complex

complex-to-real

FP32

[2, 32)

1.09

1.02

1.01

[32, 512]

1.06

1.05

1.05

(512, 1024]

2.51

6.72

7.78

FP64

[2, 32)

1.03

1.02

1.02

[32, 512]

1.06

1.06

1.05

(512, 1024]

2.80

4.94

5.55

Known issues#

  • For real-to-complex and complex-to-real in-place transforms of rank 2 and higher, there are additional constraints on the data-layout as compared to FFTW.

NVPL FFT 0.4.1 EA (nvpl-25.1.1-beta)#

Minor bug-fix release of NVPL FFT library.

New features#

  • N/A

Resolved issues#

  • Fix a bug when a size 1 kernel was not selected during planning for innermost dimension of complex-to-real transform.

Known issues#

  • For real-to-complex and complex-to-real in-place transforms of rank 2 and higher, there are additional constraints on the data-layout as compared to FFTW.

NVPL FFT 0.4.0 EA (nvpl-25.1-beta)#

The 4th early access release of NVPL FFT library.

New features#

  • Added support for modern and legacy FFTW Fortran interfaces.

  • nvpl_fftw.h can now also be found in include/nvpl_fftw/ under the name fftw3.h.

  • Improved single- and multi-threaded performance of complex-to-complex, complex-to-real and real-to-complex transforms for sizes ranging from 2 to 512.

Table: Geomean speedup of NVPL FFT 0.4.0 vs NVPL FFT 0.3.0 for sizes \(2^i\times 3^j\times 5^k\times 7^j \leq 512\).#

complex-to-complex

real-to-complex

complex-to-real

FP32

1.04

1.20

1.20

FP64

1.05

5.65

6.58

_images/nvpl_fft_0_4_0_perf.png

Known issues#

  • For real-to-complex and complex-to-real in-place transforms of rank 2 and higher, there are additional constraints on the data-layout as compared to FFTW.

NVPL FFT 0.3.0 EA (nvpl-24.7-beta)#

The 3rd early access release of NVPL FFT library.

New features#

  • Improved single- and multi-threaded performance of complex-to-complex transforms in double precision for sizes ranging from 2 to 512.

  • Improved single- and multi-threaded performance of complex-to-real and real-to-complex transforms in single precision for sizes ranging from 2 to 512.

Known issues#

  • N/A

NVPL FFT 0.2.0 EA (nvpl-24.03-beta)#

The 2nd early access release of NVPL FFT library.

New features#

  • Improved single- and multi-threaded performance of complex-to-complex transforms in both single and double precisions.

  • Improved scalability of the multi-threaded NVPL FFT.

Known issues#

  • N/A

Resolved issues#

  • NVPL FFT adopts a different threading implementation (see OpenMP-based Threading). Setting the OMP_PROC_BIND environment variable (or OMP_PLACES) will no longer negatively impact the multi-threaded performance.

NVPL FFT 0.1.0 EA (nvpl-23.11-beta)#

The first early access release of NVPL FFT library.

New features#

  • Supports computation of one-, two-, three- dimensional complex-to-complex, real-to-complex, complex-to-real DFTs in single and double precision with arbitrary sizes and strides using FFTW APIs.

  • Supports single- and multi-threaded FFTs computation.

Known issues#

  • Some of the supported FFT sizes, including composite sizes and sizes greater than 50K elements, are not optimized to the full extent.

  • NVPL FFT respects the original thread affinity mask. For applications built with OpenMP runtime, controls of thread affinity (either via the OMP_PROC_BIND or the OMP_PLACES environment variables) could negatively impact the multi-threaded performance.

Resolved issues#

  • N/A