FFT Runtime Library APIs
This section describes the Fortran interfaces to the cuFFT library. The FFT functions are only accessible from host code. All of the runtime API routines are integer functions that return an error code; they return a value of CUFFT_SUCCESS if the call was successful, or another cuFFT status return value if there was an error.
Chapter 10 contains examples of accessing the cuFFT library routines from OpenACC and CUDA Fortran. In both cases, the interfaces to the library can be exposed by adding the line
use cufft
to your program unit.
Beginning with our 21.9 release, we also support a cufftXt module, which provides interfaces to the multi-gpu support available in the cuFFT library. These interfaces can be used within any Fortran program by adding the line
use cufftxt
to your program unit. The cufftXt interfaces are documented beginning in section 4 of this chapter.
Unless a specific kind is provided in the following interfaces, the plain integer type implies integer(4) and the plain real type implies real(4).
CUFFT Definitions and Helper Functions
This section contains definitions and data types used in the cuFFT library and interfaces to the cuFFT helper functions.
The cuFFT module contains the following constants and enumerations:
integer, parameter :: CUFFT_FORWARD = -1
integer, parameter :: CUFFT_INVERSE = 1
! CUFFT Status
enum, bind(C)
enumerator :: CUFFT_SUCCESS = 0
enumerator :: CUFFT_INVALID_PLAN = 1
enumerator :: CUFFT_ALLOC_FAILED = 2
enumerator :: CUFFT_INVALID_TYPE = 3
enumerator :: CUFFT_INVALID_VALUE = 4
enumerator :: CUFFT_INTERNAL_ERROR = 5
enumerator :: CUFFT_EXEC_FAILED = 6
enumerator :: CUFFT_SETUP_FAILED = 7
enumerator :: CUFFT_INVALID_SIZE = 8
enumerator :: CUFFT_UNALIGNED_DATA = 9
end enum
! CUFFT Transform Types
enum, bind(C)
enumerator :: CUFFT_R2C = z'2a' ! Real to Complex (interleaved)
enumerator :: CUFFT_C2R = z'2c' ! Complex (interleaved) to Real
enumerator :: CUFFT_C2C = z'29' ! Complex to Complex, interleaved
enumerator :: CUFFT_D2Z = z'6a' ! Double to Double-Complex
enumerator :: CUFFT_Z2D = z'6c' ! Double-Complex to Double
enumerator :: CUFFT_Z2Z = z'69' ! Double-Complex to Double-Complex
end enum
! CUFFT Data Layouts
enum, bind(C)
enumerator :: CUFFT_COMPATIBILITY_NATIVE = 0
enumerator :: CUFFT_COMPATIBILITY_FFTW_PADDING = 1
enumerator :: CUFFT_COMPATIBILITY_FFTW_ASYMMETRIC = 2
enumerator :: CUFFT_COMPATIBILITY_FFTW_ALL = 3
end enum
integer, parameter :: CUFFT_COMPATIBILITY_DEFAULT = CUFFT_COMPATIBILITY_FFTW_PADDING
cufftSetCompatibilityMode
This function configures the layout of cuFFT output in FFTW-compatible modes.
integer(4) function cufftSetCompatibilityMode( plan, mode )
integer :: plan
integer :: mode
cufftSetStream
This function sets the stream to be used by the cuFFT library to execute its routines.
integer(4) function cufftSetStream(plan, stream)
integer :: plan
integer(kind=cuda_stream_kind) :: stream
cufftGetVersion
This function returns the version number of cuFFT.
integer(4) function cufftGetVersion( version )
integer :: version
cufftSetAutoAllocation
This function indicates that the caller intends to allocate and manage work areas for plans that have been generated. cuFFT default behavior is to allocate the work area at plan generation time. If cufftSetAutoAllocation() has been called with autoAllocate set to 0 prior to one of the cufftMakePlan*() calls, cuFFT does not allocate the work area. This is the preferred sequence for callers wishing to manage work area allocation.
integer(4) function cufftSetAutoAllocation(plan, autoAllocate)
integer(4) :: plan, autoallocate
cufftSetWorkArea
This function overrides the work area pointer associated with a plan. If the work area was auto-allocated, cuFFT frees the auto-allocated space. The cufftExecute*() calls assume that the work area pointer is valid and that it points to a contiguous region in device memory that does not overlap with any other work area. If this is not the case, results are indeterminate.
integer(4) function cufftSetWorkArea(plan, workArea)
integer(4) :: plan
integer, device :: workArea(*) ! Can be integer, real, complex
! or a type(c_devptr)
cufftDestroy
This function frees all GPU resources associated with a cuFFT plan and destroys the internal plan data structure.
integer(4) function cufftDestroy( plan )
integer :: plan
CUFFT Plans and Estimated Size Functions
This section contains functions from the cuFFT library used to create plans and estimate work buffer size.
cufftPlan1d
This function creates a 1D FFT plan configuration for a specified signal size and data type. Nx is the size of the transform; batch is the number of transforms of size nx.
integer(4) function cufftPlan1d(plan, nx, ffttype, batch)
integer :: plan
integer :: nx
integer :: ffttype
integer :: batch
cufftPlan2d
This function creates a 2D FFT plan configuration according to a specified signal size and data type. For a Fortran array(nx,ny), nx is the size of the of the 1st dimension in the transform, but the 2nd size argument to the function; ny is the size of the 2nd dimension, and the 1st size argument to the function.
integer(4) function cufftPlan2d( plan, ny, nx, ffttype )
integer :: plan
integer :: ny, nx
integer :: ffttype
cufftPlan3d
This function creates a 3D FFT plan configuration according to a specified signal size and data type. For a Fortran array(nx,ny,nz), nx is the size of the of the 1st dimension in the transform, but the 3rd size argument to the function; nz is the size of the 3rd dimension, and the 1st size argument to the function.
integer(4) function cufftPlan3d( plan, nz, ny, nx, ffttype )
integer :: plan
integer :: nz, ny, nx
integer :: ffttype
cufftPlanMany
This function creates an FFT plan configuration of dimension rank, with sizes specified in the array n. Batch is the number of transforms to configure. This function supports more complicated input and output data layouts using the arguments inembed, istride, idist, onembed, ostride, and odist. In the C function, if inembed and onembed are set to NULL, all other stride information is ignored. Fortran programmers can pass NULL when using the NVIDIA cufft module by setting an F90 pointer to null(), either through direct assignment, using c_f_pointer() with c_null_ptr as the first argument, or the nullify statement, then passing the nullified F90 pointer as the actual argument for the inembed and onembed dummies.
integer(4) function cufftPlanMany(plan, rank, n, inembed, istride, idist, onembed, ostride, odist, ffttype, batch )
integer :: plan
integer :: rank
integer :: n
integer :: inembed, onembed
integer :: istride, idist, ostride, odist
integer :: ffttype, batch
cufftCreate
This function creates an opaque handle for further cuFFT calls and allocates some small data structures on the host. In C, the handle type is currently typedef’ed to an int, so in Fortran we use an integer*4 to hold the plan.
integer(4) function cufftCreate(plan)
integer(4) :: plan
cufftMakePlan1d
Following a call to cufftCreate(), this function creates a 1D FFT plan configuration for a specified signal size and data type. Nx is the size of the transform; batch is the number of transforms of size nx. If cufftXtSetGPUs
was called prior to this call with multiple GPUs, then workSize
is an array containing multiple sizes. The workSize values are in bytes.
integer(4) function cufftMakePlan1d(plan, nx, ffttype, batch, worksize)
integer(4) :: plan
integer(4) :: nx
integer(4) :: ffttype
integer(4) :: batch
integer(kind=int_ptr_kind()) :: workSize(*)
cufftMakePlan2d
Following a call to cufftCreate(), this function creates a 2D FFT plan configuration according to a specified signal size and data type. For a Fortran array(nx,ny), nx is the size of the of the 1st dimension in the transform, but the 2nd size argument to the function; ny is the size of the 2nd dimension, and the 1st size argument to the function. If cufftXtSetGPUs
was called prior to this call with multiple GPUs, then workSize
is an array containing multiple sizes. The workSize values are in bytes.
integer(4) function cufftMakePlan2d(plan, ny, nx, ffttype, workSize)
integer(4) :: plan
integer(4) :: ny, nx
integer(4) :: ffttype
integer(kind=int_ptr_kind()) :: workSize(*)
cufftMakePlan3d
Following a call to cufftCreate(), this function creates a 3D FFT plan configuration according to a specified signal size and data type. For a Fortran array(nx,ny,nz), nx is the size of the of the 1st dimension in the transform, but the 3rd size argument to the function; nz is the size of the 3rd dimension, and the 1st size argument to the function. If cufftXtSetGPUs
was called prior to this call with multiple GPUs, then workSize
is an array containing multiple sizes. The workSize values are in bytes.
integer(4) function cufftMakePlan3d(plan, nz, ny, nx, ffttype, workSize)
integer(4) :: plan
integer(4) :: nz, ny, nx
integer(4) :: ffttype
integer(kind=int_ptr_kind()) :: workSize(*)
cufftMakePlanMany
Following a call to cufftCreate(), this function creates an FFT plan configuration of dimension rank, with sizes specified in the array n. Batch is the number of transforms to configure. This function supports more complicated input and output data layouts using the arguments inembed, istride, idist, onembed, ostride, and odist.
In the C function, if inembed and onembed are set to NULL, all other stride information is ignored. Fortran programmers can pass NULL when using the NVIDIA cufft module by setting an F90 pointer to null(), either through direct assignment, using c_f_pointer() with c_null_ptr as the first argument, or the nullify statement, then passing the nullified F90 pointer as the actual argument for the inembed and onembed dummies.
If cufftXtSetGPUs
was called prior to this call with multiple GPUs, then workSize
is an array containing multiple sizes. The workSize values are in bytes.
integer(4) function cufftMakePlanMany(plan, rank, n, inembed, istride, idist, onembed, ostride, odist, ffttype, batch, workSize)
integer(4) :: plan
integer(4) :: rank
integer :: n(rank)
integer :: inembed(rank), onembed(rank)
integer(4) :: istride, idist, ostride, odist
integer(4) :: ffttype, batch
integer(kind=int_ptr_kind()) :: workSize(*)
cufftEstimate1d
This function returns an estimate for the size of the work area required, in bytes, given the specified size and data type, and assuming default plan settings.
integer(4) function cufftEstimate1d(nx, ffttype, batch, workSize)
integer(4) :: nx
integer(4) :: ffttype
integer(4) :: batch
integer(kind=int_ptr_kind()) :: workSize(*)
cufftEstimate2d
This function returns an estimate for the size of the work area required, in bytes, given the specified size and data type, and assuming default plan settings.
integer(4) function cufftEstimate2d(ny, nx, ffttype, workSize)
integer(4) :: ny, nx
integer(4) :: ffttype
integer(kind=int_ptr_kind()) :: workSize(*)
cufftEstimate3d
This function returns an estimate for the size of the work area required, in bytes, given the specified size and data type, and assuming default plan settings.
integer(4) function cufftEstimate3d(nz, ny, nx, ffttype, workSize)
integer(4) :: nz, ny, nx
integer(4) :: ffttype
integer(kind=int_ptr_kind()) :: workSize(*)
cufftEstimateMany
This function returns an estimate for the size of the work area required, in bytes, given the specified size and data type, and assuming default plan settings.
integer(4) function cufftEstimateMany(rank, n, inembed, istride, idist, onembed, ostride, odist, ffttype, batch, workSize)
integer(4) :: rank, istride, idist, ostride, odist
integer(4), dimension(rank) :: n, inembed, onembed
integer(4) :: ffttype
integer(4) :: batch
integer(kind=int_ptr_kind()) :: workSize(*)
cufftGetSize1d
This function gives a more accurate estimate than cufftEstimate1d() of the size of the work area required, in bytes, given the specified plan parameters and taking into account any plan settings which may have been made.
integer(4) function cufftGetSize1d(plan, nx, ffttype, batch, workSize)
integer(4) :: plan, nx, ffttype, batch
integer(kind=int_ptr_kind()) :: workSize(*)
cufftGetSize2d
This function gives a more accurate estimate than cufftEstimate2d() of the size of the work area required, in bytes, given the specified plan parameters and taking into account any plan settings which may have been made.
integer(4) function cufftGetSize2d(plan, ny, nx, ffttype, workSize)
integer(4) :: plan, ny, nx, ffttype
integer(kind=int_ptr_kind()) :: workSize(*)
cufftGetSize3d
This function gives a more accurate estimate than cufftEstimate3d() of the size of the work area required, in bytes, given the specified plan parameters and taking into account any plan settings which may have been made.
integer(4) function cufftGetSize3d(plan, nz, ny, nx, ffttype, workSize)
integer(4) :: plan, nz, ny, nx, ffttype
integer(kind=int_ptr_kind()) :: workSize(*)
cufftGetSizeMany
This function gives a more accurate estimate than cufftEstimateMany() of the size of the work area required, in bytes, given the specified plan parameters and taking into account any plan settings which may have been made.
integer(4) function cufftGetSizeMany(plan, rank, n, inembed, istride, idist, onembed, ostride, odist, ffttype, batch, workSize)
integer(4) :: plan, rank, istride, idist, ostride, odist
integer(4), dimension(rank) :: n, inembed, onembed
integer(4) :: ffttype
integer(4) :: batch
integer(kind=int_ptr_kind()) :: workSize(*)
cufftGetSize
Once plan generation has been done, either with the original API or the extensible API, this call returns the actual size of the work area required, in bytes, to support the plan. Callers who choose to manage work area allocation within their application must use this call after plan generation, and after any cufftSet*() calls subsequent to plan generation, if those calls might alter the required work space size.
integer(4) function cufftGetSize(plan, workSize)
integer(4) :: plan
integer(kind=int_ptr_kind()) :: workSize(*)
CUFFT Execution Functions
This section contains the execution functions, which perform the actual Fourier transform, in the cuFFT library.
cufftExecC2C
This function executes a single precision complex-to-complex transform plan in the transform direction as specified by the direction parameter. If idata and odata are the same, this function does an in-place transform.
integer(4) function cufftExecC2C( plan, idata, odata, direction )
integer :: plan
complex(4), device, dimension(*) :: idata, odata
integer :: direction
cufftExecR2C
This function executes a single precision real-to-complex, implicity forward, cuFFT transform plan. If idata and odata are the same, this function does an in-place transform, but note there are data layout differences between in-place and out-of-place transforms for real-to- complex FFTs in cuFFT.
integer(4) function cufftExecR2C( plan, idata, odata )
integer :: plan
real(4), device, dimension(*) :: idata
complex(4), device, dimension(*) :: odata
cufftExecC2R
This function executes a single precision complex-to-real, implicity inverse, cuFFT transform plan. If idata and odata are the same, this function does an in-place transform.
integer(4) function cufftExecC2R( plan, idata, odata )
integer :: plan
complex(4), device, dimension(*) :: idata
real(4), device, dimension(*) :: odata
cufftExecZ2Z
This function executes a double precision complex-to-complex transform plan in the transform direction as specified by the direction parameter. If idata and odata are the same, this function does an in-place transform.
integer(4) function cufftExecZ2Z( plan, idata, odata, direction )
integer :: plan
complex(8), device, dimension(*) :: idata, odata
integer :: direction
cufftExecD2Z
This function executes a double precision real-to-complex, implicity forward, cuFFT transform plan. If idata and odata are the same, this function does an in-place transform, but note there are data layout differences between in-place and out-of-place transforms for real-to- complex FFTs in cuFFT.
integer(4) function cufftExecD2Z( plan, idata, odata )
integer :: plan
real(8), device, dimension(*) :: idata
complex(8), device, dimension(*) :: odata
cufftExecZ2D
This function executes a double precision complex-to-real, implicity inverse, cuFFT transform plan. If idata and odata are the same, this function does an in-place transform.
integer(4) function cufftExecZ2D( plan, idata, odata )
integer :: plan
complex(8), device, dimension(*) :: idata
real(8), device, dimension(*) :: odata
CUFFTXT Definitions and Helper Functions
This section contains definitions and data types used in the cufftXt library and interfaces to helper functions. Beginning with NVHPC version 22.5, this module also contains some interfaces and definitions used with the cuFFTMp library.
The cufftXt module contains the following constants and enumerations:
integer, parameter :: MAX_CUDA_DESCRIPTOR_GPUS = 64
! libFormat enum is used for the library member of cudaLibXtDesc
enum, bind(C)
enumerator :: LIB_FORMAT_CUFFT = 0
enumerator :: LIB_FORMAT_UNDEFINED = 1
end enum
! cufftXtSubFormat identifies the data layout of a memory descriptor
enum, bind(C)
! by default input is in linear order across GPUs
enumerator :: CUFFT_XT_FORMAT_INPUT = 0
! by default output is in scrambled order depending on transform
enumerator :: CUFFT_XT_FORMAT_OUTPUT = 1
! by default inplace is input order, which is linear across GPUs
enumerator :: CUFFT_XT_FORMAT_INPLACE = 2
! shuffled output order after execution of the transform
enumerator :: CUFFT_XT_FORMAT_INPLACE_SHUFFLED = 3
! shuffled input order prior to execution of 1D transforms
enumerator :: CUFFT_XT_FORMAT_1D_INPUT_SHUFFLED = 4
! distributed input order
enumerator :: CUFFT_XT_FORMAT_DISTRIBUTED_INPUT = 5
! distributed output order
enumerator :: CUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT = 6
enumerator :: CUFFT_FORMAT_UNDEFINED = 7
end enum
! cufftXtCopyType specifies the type of copy for cufftXtMemcpy
enum, bind(C)
enumerator :: CUFFT_COPY_HOST_TO_DEVICE = 0
enumerator :: CUFFT_COPY_DEVICE_TO_HOST = 1
enumerator :: CUFFT_COPY_DEVICE_TO_DEVICE = 2
enumerator :: CUFFT_COPY_UNDEFINED = 3
end enum
! cufftXtQueryType specifies the type of query for cufftXtQueryPlan
enum, bind(c)
enumerator :: CUFFT_QUERY_1D_FACTORS = 0
enumerator :: CUFFT_QUERY_UNDEFINED = 1
end enum
! cufftXtWorkAreaPolicy specifies the policy for cufftXtSetWorkAreaPolicy
enum, bind(c)
enumerator :: CUFFT_WORKAREA_MINIMAL = 0 ! maximum reduction
enumerator :: CUFFT_WORKAREA_USER = 1 ! use workSize parameter as limit
enumerator :: CUFFT_WORKAREA_PERFORMANCE = 2 ! default - 1x overhead or more, max perf
end enum
! cufftMpCommType specifies how to initialize cuFFTMp
enum, bind(c)
enumerator :: CUFFT_COMM_MPI = 0
enumerator :: CUFFT_COMM_NVSHMEM = 1
enumerator :: CUFFT_COMM_UNDEFINED = 2
end enum
The cufftXt module contains the following derived type definitions:
! cufftXt1dFactors type
type, bind(c) :: cufftXt1dFactors
integer(8) :: size
integer(8) :: stringCount
integer(8) :: stringLength
integer(8) :: subStringLength
integer(8) :: factor1
integer(8) :: factor2
integer(8) :: stringMask
integer(8) :: subStringMask
integer(8) :: factor1Mask
integer(8) :: factor2Mask
integer(4) :: stringShift
integer(4) :: subStringShift
integer(4) :: factor1Shift
integer(4) :: factor2Shift
end type cufftXt1dFactors
type, bind(C) :: cudaXtDesc
integer(4) :: version
integer(4) :: nGPUs
integer(4) :: GPUs(MAX_CUDA_DESCRIPTOR_GPUS)
type(c_devptr) :: data(MAX_CUDA_DESCRIPTOR_GPUS)
integer(8) :: size(MAX_CUDA_DESCRIPTOR_GPUS)
type(c_ptr) :: cudaXtState
end type cudaXtDesc
type, bind(C) :: cudaLibXtDesc
integer(4) :: version
type(c_ptr) :: descriptor ! cudaXtDesc *descriptor
integer(4) :: library ! libFormat library
integer(4) :: subFormat
type(c_ptr) :: libDescriptor ! void *libDescriptor
end type cudaLibXtDesc
type, bind(C) :: cufftBox3d
integer(8) :: lower(3)
integer(8) :: upper(3)
integer(8) :: strides(3)
end type cufftBox3d
cufftXtSetGPUs
This function identifies which GPUs are to be used with the plan. The call to cufftXtSetGPUs
must occur after the call to cufftCreate
but before the call to cufftMakePlan*
.
integer(4) function cufftXtSetGPUs( plan, nGPUs, whichGPUs )
integer(4) :: plan
integer(4) :: nGPUs
integer(4) :: whichGPUs(*)
cufftXtMalloc
This function allocates a cufftXt descriptor, and memory for data in the GPUs associated with the plan. The value of cufftXtSubFormat
determines if the buffer will be used for input or output. Fortran programmers should declare and pass a pointer to a type(cudaLibXtDesc)
variable so the entire information can be stored, and also freed in subsequent calls to cufftXtFree
. For programmers comfortable with the C interface, a variant of this function can take a type(c_ptr)
for the 2nd argument.
integer(4) function cufftXtMalloc( plan, descriptor, format )
integer(4) :: plan
type(cudaLibXtDesc), pointer :: descriptor ! A type(c_ptr) is also accepted.
integer(4) :: format ! cufftXtSubFormat value
cufftXtFree
This function frees the cufftXt descriptor, and all memory associated with it. The descriptor and memory must have been allocated by a previous call to cufftXtMalloc
. Fortran programmers should declare and pass a pointer to a type(cudaLibXtDesc)
variable. For programmers comfortable with the C interface, a variant of this function can take a type(c_ptr)
as the only argument.
integer(4) function cufftXtFree( descriptor )
type(cudaLibXtDesc), pointer :: descriptor ! A type(c_ptr) is also accepted.
cufftXtMemcpy
This function copies data between buffers on the host and GPUs, or between GPUs. The value of the type
argument determines the copy direction. In addition, this Fortran function is overloaded to take a type(cudaLibXtDesc)
variable for the destination (H2D transfer), for the source (D2H transfer), or for both (D2D transfer), in which case the type
argument is not required.
integer(4) function cufftXtMemcpy( plan, dst, src, type )
integer(4) :: plan
type(cudaLibXtDesc) :: dst ! Or any host buffer, depending on the type
type(cudaLibXtDesc) :: src ! Or any host buffer, depending on the type
integer(4) :: type ! optional cufftXtCopyType value
CUFFTXT Plans and Work Area Functions
This section contains functions from the cufftXt library used to create plans and manage work buffers.
cufftXtMakePlanMany
Following a call to cufftCreate(), this function creates an FFT plan configuration of dimension rank, with sizes specified in the array n. Batch is the number of transforms to configure. This function supports more complicated input and output data layouts using the arguments inembed, istride, idist, onembed, ostride, and odist. In the C function, if inembed and onembed are set to NULL, all other stride information is ignored. Fortran programmers can pass NULL when using the NVIDIA cufft module by setting an F90 pointer to null(), either through direct assignment, using c_f_pointer() with c_null_ptr as the first argument, or the nullify statement, then passing the nullified F90 pointer as the actual argument for the inembed and onembed dummies.
integer(4) function cufftXtMakePlanMany(plan, rank, n, inembed, istride, &
idist, inputType, onembed, ostride, odist, outputType, batch, workSize, &
executionType)
integer(4) :: plan
integer(4) :: rank
integer(8) :: n(*)
integer(8) :: inembed(*), onembed(*)
integer(8) :: istride, idist, ostride, odist
type(cudaDataType) :: inputType, outputType, executionType
integer(4) :: batch
integer(8) :: workSize(*)
cufftXtQueryPlan
This function only supports multi-gpu 1D transforms. It returns a derived type, factors
, which contains the number of strings, the decomposition of factors, and (in the case of power of 2 sizes) some other useful mask and shift elements, used in converting between permuted and linear indexes.
integer(4) function cufftXtQueryPlan(plan, factors, queryType)
integer(4) :: plan
type(cufftXt1DFactors) :: factors
integer(4) :: queryType
cufftXtSetWorkAreaPolicy
This function overrides the work area associated with a plan. Currently, the workAreaPolicy
can be specified as CUFFT_WORKAREA_MINIMAL
and cuFFT will attempt to re-plan to use zero bytes of work area memory. See the CUFFT documentation for support of other features.
integer(4) function cufftXtSetWorkAreaPolicy(plan, workAreaPolicy, workSize)
integer(4) :: plan
integer(4) :: workAreaPolicy
integer(8) :: workSize
cufftXtGetSizeMany
This function gives a more accurate estimate than cufftEstimateMany() of the size of the work area required, in bytes, given the specified plan parameters used for cufftXtMakePlanMany
and taking into account any plan settings which may have been made.
integer(4) function cufftXtGetSizeMany(plan, rank, n, inembed, istride, &
idist, inputType, onembed, ostride, odist, outputType, batch, workSize, &
executionType)
integer(4) :: plan
integer(4) :: rank
integer(8) :: n(*)
integer(8) :: inembed(*), onembed(*)
integer(8) :: istride, idist, ostride, odist
type(cudaDataType) :: inputType, outputType, executionType
integer(4) :: batch
integer(8) :: workSize(*)
cufftXtSetWorkArea
This function overrides the work areas associated with a plan. If the work area was auto-allocated, cuFFT frees the auto-allocated space. The cufftExecute*() calls assume that the work area pointer is valid and that it points to a contiguous region in device memory that does not overlap with any other work area. If this is not the case, results are indeterminate.
integer(4) function cufftXtSetWorkArea(plan, workArea)
integer(4) :: plan
type(c_devptr) :: workArea(*)
cufftXtSetDistribution
This function registers and describes the data distribution for a subsequent FFT operation. The call to cufftXtSetDistribution
must occur after the call to cufftCreate
but before the call to cufftMakePlan*
.
integer(4) function cufftXtSetDistribution( plan, boxIn, boxOut )
integer(4) :: plan
type(cufftBox3d) :: boxIn
type(cufftBox3d) :: boxOut
CUFFTXT Execution Functions
This section contains the execution functions, which perform the actual Fourier transform, in the cufftXt library.
cufftXtExec
This function executes any Fourier transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms, the direction
argument is ignored. Otherwise, the transform direction is specified by the direction
parameter. This function uses the GPU memory pointed to by input
as input data, and stores the computed Fourier coefficients in the output
array. If those are the same, this method does an in-place transform. Any valid data type for the input
and output
arrays are accepted.
integer(4) function cufftXtExec( plan, input, output, direction )
integer :: plan
real, dimension(*) :: input, output ! Any data type is allowed
integer :: direction
cufftXtExecDescriptor
This function executes any Fourier transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms, the direction
argument is ignored. Otherwise, the transform direction is specified by the direction
parameter. This function stores the result in the specified output arrays.
integer(4) function cufftXtExecDescriptor( plan, input, output, direction )
integer :: plan
type(cudaLibXtDesc) :: input, output
integer :: direction
cufftXtExecDescriptorC2C
This function executes a single precision complex-to-complex transform plan in the transform direction as specified by the direction parameter. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.
integer(4) function cufftXtExecDescriptorC2C( plan, input, output, direction )
integer :: plan
type(cudaLibXtDesc) :: input, output
integer :: direction
cufftXtExecDescriptorZ2Z
This function executes a double precision complex-to-complex transform plan in the transform direction as specified by the direction parameter. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.
integer(4) function cufftXtExecDescriptorZ2Z( plan, input, output, direction )
integer :: plan
type(cudaLibXtDesc) :: input, output
integer :: direction
cufftXtExecDescriptorR2C
This function executes a single precision real-to-complex transform plan. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.
integer(4) function cufftXtExecDescriptorR2C( plan, input, output )
integer :: plan
type(cudaLibXtDesc) :: input, output
cufftXtExecDescriptorD2Z
This function executes a double precision real-to-complex transform plan. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.
integer(4) function cufftXtExecDescriptorD2Z( plan, input, output )
integer :: plan
type(cudaLibXtDesc) :: input, output
cufftXtExecDescriptorC2R
This function executes a single precision complex-to-real transform plan. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.
integer(4) function cufftXtExecDescriptorC2R( plan, input, output )
integer :: plan
type(cudaLibXtDesc) :: input, output
cufftXtExecDescriptorZ2D
This function executes a double precision complex-to-real transform plan. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.
integer(4) function cufftXtExecDescriptorZ2D( plan, input, output )
integer :: plan
type(cudaLibXtDesc) :: input, output
CUFFTMP Functions
This section contains the cuFFTMp functions which extend the cuFFTXt library functionality to multiple processes and multiple GPUs.
cufftMpNvshmemMalloc
This function allocates space from the NVSHMEM symmetric heap. The cuFFTMp library is based on NVSHMEM. However, the user is not allowd to link and use NVSHMEM in their own application. This may cause a crash at applicaton start time. This limitation will be lifted in a future release of cuFFTMp.
However, some functionality of cuFFTMp requires NVSHMEM-allocated memory, so this function is currently exposed and supported. This function requires that at least one cuFFTMp plan is active prior to its use.
integer(4) function cufftMpNvshmemMalloc( size, workArea )
integer(8) :: size ! Size is in bytes
type(c_devptr) :: workArea
cufftMpNvshmemFree
This function frees the space previously allocated from the NVSHMEM symmetric heap. The cuFFTMp library is based on NVSHMEM. However, the user is not allowd to link and use NVSHMEM in their own application. This may cause a crash at applicaton start time. This limitation will be lifted in a future release of cuFFTMp.
However, some functionality of cuFFTMp requires NVSHMEM-allocated memory, so this function is currently exposed and supported. This function requires that at least one cuFFTMp plan is active prior to its use.
integer(4) function cufftMpNvshmemFree( workArea )
type(c_devptr) :: workArea
cufftMpAttachComm
This function attaches a communicator, such as MPI_COMM_WORLD, to a cuFFT plan, for later application of a distributed FFT operation
integer(4) function cufftMpAttachComm( plan, commType, fcomm )
integer(4) :: plan
integer(4) :: commType
integer(4) :: fcomm
cufftMpCreateReshape
This function creates a cuFFTMp reshape handle for later application of a distributed FFT operation
integer(4) function cufftMpCreateReshape( reshapeHandle )
type(c_ptr) :: reshapeHandle
cufftMpAttachReshapeComm
This function attaches a communicator, such as MPI_COMM_WORLD, to a cuFFTMp reshape handle, for later application of a distributed FFT operation
integer(4) function cufftMpAttachReshapeComm( reshapeHandle, commType, fcomm )
type(c_ptr) :: reshapeHandle
integer(4) :: commType
integer(4) :: fcomm
cufftMpGetReshapeSize
This function returns the size needed for work space in the subsequent cuFFTMp reshape execution. Currently, a work area is not required, but that may change in future releases.
integer(4) function cufftMpGetReshapeSize( reshapeHandle, workSize )
type(c_ptr) :: reshapeHandle
integer(8) :: workSize
cufftMpMakeReshape
This function creates a cuFFTMp reshape plan based on the input and output boxes. Note that the boxes use C conventions for bounds and strides.
integer(4) function cufftMpMakeReshape( reshapeHandle, &
elementSize, boxIn, boxOut )
type(c_ptr) :: reshapeHandle
integer(8) :: elementSize
type(cufftBox3d) :: boxIn
type(cufftBox3d) :: boxOut
cufftMpExecReshapeAsync
This function executes a cuFFTMp reshape plan on the specified stream.
integer(4) function cufftMpExecReshapeAsync( reshapeHandle, &
dataOut, dataIn, workSpace, stream )
type(c_ptr) :: reshapeHandle
type(c_devptr) :: dataOut
type(c_devptr) :: dataIn
type(c_devptr) :: workSpace
integer(kind=cuda_stream_kind) :: stream
cufftMpDestroyReshape
This function destroys a cuFFTMp reshape handle.
integer(4) function cufftMpDestroyReshape( reshapeHandle )
type(c_ptr) :: reshapeHandle