Release Notes#
This section includes significant changes, new features, performance improvements, and various issues. Unless noted, listed issues should not impact functionality. When functionality is impacted, we offer a work-around to avoid the issue (if available).
NVPL FFT 0.6.0 (nvpl-26.5)#
New features#
Improved single- and multi-threaded performance of in-place complex-to-complex transforms.
Relaxed pointer alignment for R2C and C2R transforms: real buffers now only need
sizeof(Real)alignment instead ofsizeof(Complex).Full compatibility with FFTW3 API - all single and double precision functions and auxiliary APIs are added.
New APIs are added as stubs - planning will always return
nullplan, execution will be no-op.They are provided for build-time and link-time compatibility, enabling safe drop-in replacement via
LD_PRELOAD.See Unsupported FFTW APIs for more details.
The APIs are exposed in C, Fortran 77 and Fortran 2003 interfaces.
New FFTW3-compatible
fftw3.hheader atinclude/nvpl_compat/. The previous locationinclude/nvpl_fftw/is deprecated.The header is propagated automatically via CMake (
nvpl::fftw) andpkg-config- no changes are required.If the old location was manually included, it should be replaced with the new location.
Resolved issues#
Fix Fortran 2003 interfaces to accept arrays of rank > 1.
Remove
fftw(f)_executefrom the Fortran 2003 interface file, matching FFTW behavior.
Known issues#
For real-to-complex and complex-to-real in-place transforms of rank 2 and higher, there are additional constraints on the data-layout as compared to FFTW.
NVPL FFT 0.5.0 (nvpl-25.11)#
The first general availability (GA) release of NVPL FFT library.
New features#
Add legacy Fortran symbols with double trailing underscore.
Resolved issues#
Fix illegal instruction when executing on an ARMv8.2a CPU with FCMA instructions.
Known issues#
For real-to-complex and complex-to-real in-place transforms of rank 2 and higher, there are additional constraints on the data-layout as compared to FFTW.
NVPL FFT 0.4.2 EA (nvpl-25.5-beta)#
New features#
Improved single- and multi-threaded performance of complex-to-complex, complex-to-real and real-to-complex transforms for sizes from 2 to 1024.
type |
size |
complex-to-complex |
real-to-complex |
complex-to-real |
|---|---|---|---|---|
FP32 |
[2, 32) |
1.09 |
1.02 |
1.01 |
[32, 512] |
1.06 |
1.05 |
1.05 |
|
(512, 1024] |
2.51 |
6.72 |
7.78 |
|
FP64 |
[2, 32) |
1.03 |
1.02 |
1.02 |
[32, 512] |
1.06 |
1.06 |
1.05 |
|
(512, 1024] |
2.80 |
4.94 |
5.55 |
Known issues#
For real-to-complex and complex-to-real in-place transforms of rank 2 and higher, there are additional constraints on the data-layout as compared to FFTW.
NVPL FFT 0.4.1 EA (nvpl-25.1.1-beta)#
Minor bug-fix release of NVPL FFT library.
New features#
N/A
Resolved issues#
Fix a bug when a size 1 kernel was not selected during planning for innermost dimension of complex-to-real transform.
Known issues#
For real-to-complex and complex-to-real in-place transforms of rank 2 and higher, there are additional constraints on the data-layout as compared to FFTW.
NVPL FFT 0.4.0 EA (nvpl-25.1-beta)#
The 4th early access release of NVPL FFT library.
New features#
Added support for modern and legacy FFTW Fortran interfaces.
nvpl_fftw.hcan now also be found ininclude/nvpl_fftw/under the namefftw3.h.Improved single- and multi-threaded performance of complex-to-complex, complex-to-real and real-to-complex transforms for sizes ranging from 2 to 512.
complex-to-complex |
real-to-complex |
complex-to-real |
|
|---|---|---|---|
FP32 |
1.04 |
1.20 |
1.20 |
FP64 |
1.05 |
5.65 |
6.58 |
Known issues#
For real-to-complex and complex-to-real in-place transforms of rank 2 and higher, there are additional constraints on the data-layout as compared to FFTW.
NVPL FFT 0.3.0 EA (nvpl-24.7-beta)#
The 3rd early access release of NVPL FFT library.
New features#
Improved single- and multi-threaded performance of complex-to-complex transforms in double precision for sizes ranging from 2 to 512.
Improved single- and multi-threaded performance of complex-to-real and real-to-complex transforms in single precision for sizes ranging from 2 to 512.
Known issues#
N/A
NVPL FFT 0.2.0 EA (nvpl-24.03-beta)#
The 2nd early access release of NVPL FFT library.
New features#
Improved single- and multi-threaded performance of complex-to-complex transforms in both single and double precisions.
Improved scalability of the multi-threaded NVPL FFT.
Known issues#
N/A
Resolved issues#
NVPL FFT adopts a different threading implementation (see OpenMP-based Threading). Setting the
OMP_PROC_BINDenvironment variable (orOMP_PLACES) will no longer negatively impact the multi-threaded performance.
NVPL FFT 0.1.0 EA (nvpl-23.11-beta)#
The first early access release of NVPL FFT library.
New features#
Supports computation of one-, two-, three- dimensional complex-to-complex, real-to-complex, complex-to-real DFTs in single and double precision with arbitrary sizes and strides using FFTW APIs.
Supports single- and multi-threaded FFTs computation.
Known issues#
Some of the supported FFT sizes, including composite sizes and sizes greater than 50K elements, are not optimized to the full extent.
NVPL FFT respects the original thread affinity mask. For applications built with OpenMP runtime, controls of thread affinity (either via the
OMP_PROC_BINDor theOMP_PLACESenvironment variables) could negatively impact the multi-threaded performance.
Resolved issues#
N/A