Other Methods#
Frontend to Backend Traits Conversion#
cufftdx::utils::frontend_to_backend(...) converts a frontend FFT description to a effective backend traits that
is used to query the cuFFTDx databases.
In combination with cuFFT Device API, this function can be used to generate the LTO database containing both device function code and metadata (a C++ header file) for a specified FFT operation. See Custom LTO Helper for an example.
#include "cufftdx/utils.hpp"
namespace cufftdx {
namespace utils {
struct backend_impl_traits {
unsigned int size;
fft_type type;
fft_direction direction;
unsigned int sm;
unsigned int elements_per_thread;
unsigned int min_elements_per_thread;
};
enum class algorithm {
ct,
bluestein,
};
enum class execution_type {
thread,
block,
};
backend_impl_traits backend_traits =
frontend_to_backend(
algorithm algo,
execution_type exec_type,
unsigned int fft_size
/* size_of<FFT>::value */,
fft_type type
/* type_of<FFT>::value */,
fft_direction direction
/* direction_of<FFT>::value */,
unsigned int sm
/* sm_of<FFT>::value */,
real_mode real_mode
/* real_mode_of<FFT>::value */,
unsigned int elements_per_thread
/* elements_per_thread_of<FFT>::value or
* 0 if not set */,
unsigned int block_dim_x
/* block_dim_of<FFT>::x or 0 if not set */,
experimental::code_type code_type
/* experimental::code_type_of<FFT>::value */
);
} // namespace utils
} // namespace cufftdx
Runtime Database Query#
cufftdx::experimental::utils::query_database(...) and cufftdx::experimental::utils::get_all_implementations(...) provide runtime access to the cuFFTDx internal database.
Both methods are useful when FFT configuration is selected dynamically, for example based on user inputs. It can be used to check if a given FFT configuration is supported
without workspace. To better understand the difference between non workspace and workspace required FFTs please check Supported Functionality.
cufftdx::experimental::utils::query_database(...) returns the optimal implementation as std::optional<cufftdx::utils::frontend_impl_traits>.
If no implementation is found for the provided configuration, it returns std::nullopt.
std::optional<cufftdx::utils::frontend_impl_traits>
cufftdx::experimental::utils::query_database(
unsigned int fft_size,
cufftdx::fft_direction dir,
cufftdx::fft_type type,
unsigned int sm,
cufftdx::utils::execution_type execution,
cufftdx::precision prec =
cufftdx::precision::f32,
unsigned int fft_ept = 0,
unsigned int ffts_per_block = 0,
std::tuple<unsigned int, unsigned int, unsigned int>
block_dim = {0, 0, 0},
cufftdx::complex_layout layout =
cufftdx::complex_layout::natural,
cufftdx::real_mode rmode =
cufftdx::real_mode::normal,
cufftdx::experimental::code_type code_type =
cufftdx::experimental::code_type::ptx);
cufftdx::experimental::utils::get_all_implementations(...) returns all matching implementations as std::vector<cufftdx::utils::frontend_impl_traits>.
If no implementation is found, it returns an empty vector.
std::vector<cufftdx::utils::frontend_impl_traits>
cufftdx::experimental::utils::get_all_implementations(
unsigned int fft_size,
cufftdx::fft_direction dir,
cufftdx::fft_type type,
unsigned int sm,
cufftdx::utils::execution_type execution,
cufftdx::precision prec =
cufftdx::precision::f32,
unsigned int fft_ept = 0,
unsigned int ffts_per_block = 0,
std::tuple<unsigned int, unsigned int, unsigned int>
block_dim = {0, 0, 0},
cufftdx::complex_layout layout =
cufftdx::complex_layout::natural,
cufftdx::real_mode rmode =
cufftdx::real_mode::normal,
cufftdx::experimental::code_type code_type =
cufftdx::experimental::code_type::ptx);
For an example on how to use these APIs, please check the nvrtc_query_database_fft_block example.
Note
These APIs are available only when CUFFTDX_ENABLE_RUNTIME_DATABASE is defined.
As of cuFFTDx 1.7.0, you can only query for supported FFT sizes without workspace that are present in cuFFTDx package,
additional sizes supported using the LTO offline database creation are not currently supported.
Warning
Compilation time can significantly increase when using these APIs.
(online) LTO Database Creation#
cufftdx::utils::get_database_and_ltoir() returns a tuple containing:
Database string (
std::string) to be inserted into the cuFFTDx headers.Vector of LTOIRs (
std::vector<std::vector<char>>) for building device functions for the specified FFT operation.Required CUDA block dimensions (
Dim3) for executing the FFT operation.Required shared memory size (
unsigned, in bytes) for executing the FFT operation.
std::tuple<std::string, std::vector<std::vector<char>>, Dim3,
unsigned int>
cufftdx::utils::get_database_and_ltoir(
unsigned int fft_size,
cufftdx::fft_direction dir,
cufftdx::fft_type type,
unsigned int sm,
cufftdx::detail::execution_type execution,
cufftdx::precision prec =
cufftdx::precision::f32,
cufftdx::complex_layout layout =
cufftdx::complex_layout::natural,
cufftdx::real_mode rmode =
cufftdx::real_mode::normal,
unsigned int fft_ept = 0
/* use heuristic */,
unsigned int ffts_per_block = 1
/* 0: use suggested ffts_per_block */);
Example
// Assuming the following FFT operator is defined in the NVRTC-compiled code:
// using FFT = decltype(cufftdx::Block() +
// cufftdx::Size<128>() +
// cufftdx::Type<cufftdx::fft_type::c2c>() +
// cufftdx::Direction<cufftdx::fft_direction::forward>() +
// cufftdx::Precision<float>() +
// cufftdx::ElementsPerThread<8>() +
// cufftdx::FFTsPerBlock<2>() +
// cufftdx::SM<750>());
// You can get the database string and LTOIRs for the FFT operation by calling:
auto [lto_db, ltoirs, block_dim, sm_size] =
cufftdx::utils::get_database_and_ltoir(128,
cufftdx::fft_direction::forward,
cufftdx::fft_type::c2c,
750,
cufftdx::detail::execution_type::block,
cufftdx::precision::f32,
cufftdx::complex_layout::natural,
cufftdx::real_mode::normal,
8,
2);
After obtaining the database string and LTOIRs, you can insert the database string into the cuFFTDx header file and link the LTOIRs to the user code as shown in Use Case II: Online Kernel Generation.
Note
The cufftdx::utils::get_database_and_ltoir() function is a wrapper around cuFFT Device APIs (see cuFFT Device API Reference). To use this function:
Define
CUFFTDX_ENABLE_CUFFT_DEPENDENCY.Link against the cuFFT library.