Multi-threaded NVPL FFT

This section describes the multithreading support and implementation of NVPL FFT.

OpenMP-based Threading

Multithreading support in NVPL FFT is based on GNU OpenMP runtime.

The following runtimes are ABI compatible with GNU OpenMP runtime:

  • GNU OpenMP runtime: libgomp.so.1.

  • LLVM OpenMP runtime: libomp.so, libomp.so.5, etc.

  • NVIDIA’s OpenMP runtime: libnvomp.so.

The ABI compatibility here means that an application or a library built with GNU OpenMP, including NVPL FFT, will work transparently with these other runtimes.

Since different OpenMP runtimes use different library names, NVPL FFT doesn’t explicitly depend on any of them. Instead, the library implements lazy dynamic symbol resolution.

  • Lazy meaning that the symbol resolution happens at run time on the first use.

  • Dynamic meaning that the symbols are resolved using dlopen() and dlsym() APIs (or analogues).

NVPL FFT will first attempt to resolve the OpenMP symbols in the address space of the current process; if it fails to do so, it will attempt to load (dlopen()) the default runtime, which is GNU OpenMP.

Warning

Since NVPL FFT does not explicitly depend on any particular OpenMP runtime (with the default being GNU OpenMP runtime), it is strongly recommended to always link the appropriate OpenMP runtime to the final application or library that uses NVPL FFT. This will ensure the appropriate OpenMP runtime is loaded during the execution.

Thread safety

NVPL FFT, as cuFFT and cuFFTW, is thread safe as long as the output data are disjoint and each thread executes FFTs using its own plan. Note that this does not cover fftw_plan_with_nthreads, which is not thread-safe. Therefore, if the user would like to have plans using different number of threads, the plan creations need to be done sequentially.

In addition to the above scenario, as FFTW, the execute APIs that have input and output data as arguments are thread-safe on their own. This means that threads can share a plan and execute FFTs in parallel, assuming again that the output data are disjoint. Note that it is not thread-safe to call fftw_destroy_plan in parallel with the same plan. Users would need to destroy the shared plan outside of the parallel region.

Two examples, c2c_single_withomp_example and r2c_c2r_single_withomp_example, are available from NVPLSamples for demonstrating how to use OpenMP with NVPL FFT with the aforementioned two scenarios.