FFT Runtime Library APIs

This section describes the Fortran interfaces to the cuFFT library. The FFT functions are only accessible from host code. All of the runtime API routines are integer functions that return an error code; they return a value of CUFFT_SUCCESS if the call was successful, or another cuFFT status return value if there was an error.

Chapter 10 contains examples of accessing the cuFFT library routines from OpenACC and CUDA Fortran. In both cases, the interfaces to the library can be exposed by adding the line

use cufft

to your program unit.

Beginning with our 21.9 release, we also support a cufftXt module, which provides interfaces to the multi-gpu support available in the cuFFT library. These interfaces can be used within any Fortran program by adding the line

use cufftxt

to your program unit. The cufftXt interfaces are documented beginning in section 4 of this chapter.

Unless a specific kind is provided in the following interfaces, the plain integer type implies integer(4) and the plain real type implies real(4).

CUFFT Definitions and Helper Functions

This section contains definitions and data types used in the cuFFT library and interfaces to the cuFFT helper functions.

The cuFFT module contains the following constants and enumerations:

integer, parameter :: CUFFT_FORWARD = -1
integer, parameter :: CUFFT_INVERSE = 1

! CUFFT Status
enum, bind(C)
    enumerator :: CUFFT_SUCCESS        = 0
    enumerator :: CUFFT_INVALID_PLAN   = 1
    enumerator :: CUFFT_ALLOC_FAILED   = 2
    enumerator :: CUFFT_INVALID_TYPE   = 3
    enumerator :: CUFFT_INVALID_VALUE  = 4
    enumerator :: CUFFT_INTERNAL_ERROR = 5
    enumerator :: CUFFT_EXEC_FAILED    = 6
    enumerator :: CUFFT_SETUP_FAILED   = 7
    enumerator :: CUFFT_INVALID_SIZE   = 8
    enumerator :: CUFFT_UNALIGNED_DATA = 9
end enum

! CUFFT Transform Types
enum, bind(C)
    enumerator :: CUFFT_R2C = z'2a'     ! Real to Complex (interleaved)
    enumerator :: CUFFT_C2R = z'2c'     ! Complex (interleaved) to Real
    enumerator :: CUFFT_C2C = z'29'     ! Complex to Complex, interleaved
    enumerator :: CUFFT_D2Z = z'6a'     ! Double to Double-Complex
    enumerator :: CUFFT_Z2D = z'6c'     ! Double-Complex to Double
    enumerator :: CUFFT_Z2Z = z'69'     ! Double-Complex to Double-Complex
end enum

! CUFFT Data Layouts
enum, bind(C)
    enumerator :: CUFFT_COMPATIBILITY_NATIVE          = 0
    enumerator :: CUFFT_COMPATIBILITY_FFTW_PADDING    = 1
    enumerator :: CUFFT_COMPATIBILITY_FFTW_ASYMMETRIC = 2
    enumerator :: CUFFT_COMPATIBILITY_FFTW_ALL        = 3
end enum

integer, parameter :: CUFFT_COMPATIBILITY_DEFAULT = CUFFT_COMPATIBILITY_FFTW_PADDING

cufftSetCompatibilityMode

This function configures the layout of cuFFT output in FFTW-compatible modes.

integer(4) function cufftSetCompatibilityMode( plan, mode )
  integer :: plan
  integer :: mode

cufftSetStream

This function sets the stream to be used by the cuFFT library to execute its routines.

integer(4) function cufftSetStream(plan, stream)
  integer :: plan
  integer(kind=cuda_stream_kind) :: stream

cufftGetVersion

This function returns the version number of cuFFT.

integer(4) function cufftGetVersion( version )
  integer :: version

cufftSetAutoAllocation

This function indicates that the caller intends to allocate and manage work areas for plans that have been generated. cuFFT default behavior is to allocate the work area at plan generation time. If cufftSetAutoAllocation() has been called with autoAllocate set to 0 prior to one of the cufftMakePlan*() calls, cuFFT does not allocate the work area. This is the preferred sequence for callers wishing to manage work area allocation.

integer(4) function cufftSetAutoAllocation(plan, autoAllocate)
  integer(4) :: plan, autoallocate

cufftSetWorkArea

This function overrides the work area pointer associated with a plan. If the work area was auto-allocated, cuFFT frees the auto-allocated space. The cufftExecute*() calls assume that the work area pointer is valid and that it points to a contiguous region in device memory that does not overlap with any other work area. If this is not the case, results are indeterminate.

integer(4) function cufftSetWorkArea(plan, workArea)
  integer(4) :: plan
  integer, device :: workArea(*) ! Can be integer, real, complex
                                 ! or a type(c_devptr)

cufftDestroy

This function frees all GPU resources associated with a cuFFT plan and destroys the internal plan data structure.

integer(4) function cufftDestroy( plan )
  integer :: plan

CUFFT Plans and Estimated Size Functions

This section contains functions from the cuFFT library used to create plans and estimate work buffer size.

cufftPlan1d

This function creates a 1D FFT plan configuration for a specified signal size and data type. Nx is the size of the transform; batch is the number of transforms of size nx.

integer(4) function cufftPlan1d(plan, nx, ffttype, batch)
  integer :: plan
  integer :: nx
  integer :: ffttype
  integer :: batch

cufftPlan2d

This function creates a 2D FFT plan configuration according to a specified signal size and data type. For a Fortran array(nx,ny), nx is the size of the of the 1st dimension in the transform, but the 2nd size argument to the function; ny is the size of the 2nd dimension, and the 1st size argument to the function.

integer(4) function cufftPlan2d( plan, ny, nx, ffttype )
  integer :: plan
  integer :: ny, nx
  integer :: ffttype

cufftPlan3d

This function creates a 3D FFT plan configuration according to a specified signal size and data type. For a Fortran array(nx,ny,nz), nx is the size of the of the 1st dimension in the transform, but the 3rd size argument to the function; nz is the size of the 3rd dimension, and the 1st size argument to the function.

integer(4) function cufftPlan3d( plan, nz, ny, nx, ffttype )
  integer :: plan
  integer :: nz, ny, nx
  integer :: ffttype

cufftPlanMany

This function creates an FFT plan configuration of dimension rank, with sizes specified in the array n. Batch is the number of transforms to configure. This function supports more complicated input and output data layouts using the arguments inembed, istride, idist, onembed, ostride, and odist. In the C function, if inembed and onembed are set to NULL, all other stride information is ignored. Fortran programmers can pass NULL when using the NVIDIA cufft module by setting an F90 pointer to null(), either through direct assignment, using c_f_pointer() with c_null_ptr as the first argument, or the nullify statement, then passing the nullified F90 pointer as the actual argument for the inembed and onembed dummies.

integer(4) function cufftPlanMany(plan, rank, n, inembed, istride, idist, onembed, ostride, odist, ffttype, batch )
  integer :: plan
  integer :: rank
  integer :: n
  integer :: inembed, onembed
  integer :: istride, idist, ostride, odist
  integer :: ffttype, batch

cufftCreate

This function creates an opaque handle for further cuFFT calls and allocates some small data structures on the host. In C, the handle type is currently typedef’ed to an int, so in Fortran we use an integer*4 to hold the plan.

integer(4) function cufftCreate(plan)
  integer(4) :: plan

cufftMakePlan1d

Following a call to cufftCreate(), this function creates a 1D FFT plan configuration for a specified signal size and data type. Nx is the size of the transform; batch is the number of transforms of size nx. If cufftXtSetGPUs was called prior to this call with multiple GPUs, then workSize is an array containing multiple sizes. The workSize values are in bytes.

integer(4) function cufftMakePlan1d(plan, nx, ffttype, batch, worksize)
  integer(4) :: plan
  integer(4) :: nx
  integer(4) :: ffttype
  integer(4) :: batch
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftMakePlan2d

Following a call to cufftCreate(), this function creates a 2D FFT plan configuration according to a specified signal size and data type. For a Fortran array(nx,ny), nx is the size of the of the 1st dimension in the transform, but the 2nd size argument to the function; ny is the size of the 2nd dimension, and the 1st size argument to the function. If cufftXtSetGPUs was called prior to this call with multiple GPUs, then workSize is an array containing multiple sizes. The workSize values are in bytes.

integer(4) function cufftMakePlan2d(plan, ny, nx, ffttype, workSize)
  integer(4) :: plan
  integer(4) :: ny, nx
  integer(4) :: ffttype
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftMakePlan3d

Following a call to cufftCreate(), this function creates a 3D FFT plan configuration according to a specified signal size and data type. For a Fortran array(nx,ny,nz), nx is the size of the of the 1st dimension in the transform, but the 3rd size argument to the function; nz is the size of the 3rd dimension, and the 1st size argument to the function. If cufftXtSetGPUs was called prior to this call with multiple GPUs, then workSize is an array containing multiple sizes. The workSize values are in bytes.

integer(4) function cufftMakePlan3d(plan, nz, ny, nx, ffttype, workSize)
  integer(4) :: plan
  integer(4) :: nz, ny, nx
  integer(4) :: ffttype
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftMakePlanMany

Following a call to cufftCreate(), this function creates an FFT plan configuration of dimension rank, with sizes specified in the array n. Batch is the number of transforms to configure. This function supports more complicated input and output data layouts using the arguments inembed, istride, idist, onembed, ostride, and odist.

In the C function, if inembed and onembed are set to NULL, all other stride information is ignored. Fortran programmers can pass NULL when using the NVIDIA cufft module by setting an F90 pointer to null(), either through direct assignment, using c_f_pointer() with c_null_ptr as the first argument, or the nullify statement, then passing the nullified F90 pointer as the actual argument for the inembed and onembed dummies.

If cufftXtSetGPUs was called prior to this call with multiple GPUs, then workSize is an array containing multiple sizes. The workSize values are in bytes.

integer(4) function cufftMakePlanMany(plan, rank, n, inembed, istride, idist, onembed, ostride, odist, ffttype, batch, workSize)
  integer(4) :: plan
  integer(4) :: rank
  integer :: n(rank)
  integer :: inembed(rank), onembed(rank)
  integer(4) :: istride, idist, ostride, odist
  integer(4) :: ffttype, batch
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftEstimate1d

This function returns an estimate for the size of the work area required, in bytes, given the specified size and data type, and assuming default plan settings.

integer(4) function cufftEstimate1d(nx, ffttype, batch, workSize)
  integer(4) :: nx
  integer(4) :: ffttype
  integer(4) :: batch
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftEstimate2d

This function returns an estimate for the size of the work area required, in bytes, given the specified size and data type, and assuming default plan settings.

integer(4) function cufftEstimate2d(ny, nx, ffttype, workSize)
  integer(4) :: ny, nx
  integer(4) :: ffttype
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftEstimate3d

This function returns an estimate for the size of the work area required, in bytes, given the specified size and data type, and assuming default plan settings.

integer(4) function cufftEstimate3d(nz, ny, nx, ffttype, workSize)
  integer(4) :: nz, ny, nx
  integer(4) :: ffttype
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftEstimateMany

This function returns an estimate for the size of the work area required, in bytes, given the specified size and data type, and assuming default plan settings.

integer(4) function cufftEstimateMany(rank, n, inembed, istride, idist, onembed, ostride, odist, ffttype, batch, workSize)
  integer(4) :: rank, istride, idist, ostride, odist
  integer(4), dimension(rank) :: n, inembed, onembed
  integer(4) :: ffttype
  integer(4) :: batch
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftGetSize1d

This function gives a more accurate estimate than cufftEstimate1d() of the size of the work area required, in bytes, given the specified plan parameters and taking into account any plan settings which may have been made.

integer(4) function cufftGetSize1d(plan, nx, ffttype, batch, workSize)
  integer(4) :: plan, nx, ffttype, batch
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftGetSize2d

This function gives a more accurate estimate than cufftEstimate2d() of the size of the work area required, in bytes, given the specified plan parameters and taking into account any plan settings which may have been made.

integer(4) function cufftGetSize2d(plan, ny, nx, ffttype, workSize)
  integer(4) :: plan, ny, nx, ffttype
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftGetSize3d

This function gives a more accurate estimate than cufftEstimate3d() of the size of the work area required, in bytes, given the specified plan parameters and taking into account any plan settings which may have been made.

integer(4) function cufftGetSize3d(plan, nz, ny, nx, ffttype, workSize)
  integer(4) :: plan, nz, ny, nx, ffttype
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftGetSizeMany

This function gives a more accurate estimate than cufftEstimateMany() of the size of the work area required, in bytes, given the specified plan parameters and taking into account any plan settings which may have been made.

integer(4) function cufftGetSizeMany(plan, rank, n, inembed, istride, idist, onembed, ostride, odist, ffttype, batch, workSize)
  integer(4) :: plan, rank, istride, idist, ostride, odist
  integer(4), dimension(rank) :: n, inembed, onembed
  integer(4) :: ffttype
  integer(4) :: batch
  integer(kind=int_ptr_kind()) :: workSize(*)

cufftGetSize

Once plan generation has been done, either with the original API or the extensible API, this call returns the actual size of the work area required, in bytes, to support the plan. Callers who choose to manage work area allocation within their application must use this call after plan generation, and after any cufftSet*() calls subsequent to plan generation, if those calls might alter the required work space size.

integer(4) function cufftGetSize(plan, workSize)
  integer(4) :: plan
  integer(kind=int_ptr_kind()) :: workSize(*)

CUFFT Execution Functions

This section contains the execution functions, which perform the actual Fourier transform, in the cuFFT library.

cufftExecC2C

This function executes a single precision complex-to-complex transform plan in the transform direction as specified by the direction parameter. If idata and odata are the same, this function does an in-place transform.

integer(4) function cufftExecC2C( plan, idata, odata, direction )
  integer :: plan
  complex(4), device, dimension(*) :: idata, odata
  integer :: direction

cufftExecR2C

This function executes a single precision real-to-complex, implicity forward, cuFFT transform plan. If idata and odata are the same, this function does an in-place transform, but note there are data layout differences between in-place and out-of-place transforms for real-to- complex FFTs in cuFFT.

integer(4) function cufftExecR2C( plan, idata, odata )
  integer :: plan
  real(4), device, dimension(*) :: idata
  complex(4), device, dimension(*) :: odata

cufftExecC2R

This function executes a single precision complex-to-real, implicity inverse, cuFFT transform plan. If idata and odata are the same, this function does an in-place transform.

integer(4) function cufftExecC2R( plan, idata, odata )
  integer :: plan
  complex(4), device, dimension(*) :: idata
  real(4), device, dimension(*) :: odata

cufftExecZ2Z

This function executes a double precision complex-to-complex transform plan in the transform direction as specified by the direction parameter. If idata and odata are the same, this function does an in-place transform.

integer(4) function cufftExecZ2Z( plan, idata, odata, direction )
  integer :: plan
  complex(8), device, dimension(*) :: idata, odata
  integer :: direction

cufftExecD2Z

This function executes a double precision real-to-complex, implicity forward, cuFFT transform plan. If idata and odata are the same, this function does an in-place transform, but note there are data layout differences between in-place and out-of-place transforms for real-to- complex FFTs in cuFFT.

integer(4) function cufftExecD2Z( plan, idata, odata )
  integer :: plan
  real(8), device, dimension(*) :: idata
  complex(8), device, dimension(*) :: odata

cufftExecZ2D

This function executes a double precision complex-to-real, implicity inverse, cuFFT transform plan. If idata and odata are the same, this function does an in-place transform.

integer(4) function cufftExecZ2D( plan, idata, odata )
  integer :: plan
  complex(8), device, dimension(*) :: idata
  real(8), device, dimension(*) :: odata

CUFFTXT Definitions and Helper Functions

This section contains definitions and data types used in the cufftXt library and interfaces to helper functions. Beginning with NVHPC version 22.5, this module also contains some interfaces and definitions used with the cuFFTMp library.

The cufftXt module contains the following constants and enumerations:

integer, parameter :: MAX_CUDA_DESCRIPTOR_GPUS = 64

! libFormat enum is used for the library member of cudaLibXtDesc
enum, bind(C)
    enumerator :: LIB_FORMAT_CUFFT     = 0
    enumerator :: LIB_FORMAT_UNDEFINED = 1
end enum

! cufftXtSubFormat identifies the data layout of a memory descriptor
enum, bind(C)
    ! by default input is in linear order across GPUs
    enumerator :: CUFFT_XT_FORMAT_INPUT = 0

    ! by default output is in scrambled order depending on transform
    enumerator :: CUFFT_XT_FORMAT_OUTPUT = 1

    ! by default inplace is input order, which is linear across GPUs
    enumerator :: CUFFT_XT_FORMAT_INPLACE = 2

    ! shuffled output order after execution of the transform
    enumerator :: CUFFT_XT_FORMAT_INPLACE_SHUFFLED = 3

    ! shuffled input order prior to execution of 1D transforms
    enumerator :: CUFFT_XT_FORMAT_1D_INPUT_SHUFFLED = 4

    ! distributed input order
    enumerator :: CUFFT_XT_FORMAT_DISTRIBUTED_INPUT = 5

    ! distributed output order
    enumerator :: CUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT = 6

    enumerator :: CUFFT_FORMAT_UNDEFINED = 7
end enum

! cufftXtCopyType specifies the type of copy for cufftXtMemcpy
enum, bind(C)
    enumerator :: CUFFT_COPY_HOST_TO_DEVICE   = 0
    enumerator :: CUFFT_COPY_DEVICE_TO_HOST   = 1
    enumerator :: CUFFT_COPY_DEVICE_TO_DEVICE = 2
    enumerator :: CUFFT_COPY_UNDEFINED        = 3
end enum

! cufftXtQueryType specifies the type of query for cufftXtQueryPlan
enum, bind(c)
    enumerator :: CUFFT_QUERY_1D_FACTORS = 0
    enumerator :: CUFFT_QUERY_UNDEFINED  = 1
end enum

! cufftXtWorkAreaPolicy specifies the policy for cufftXtSetWorkAreaPolicy
enum, bind(c)
    enumerator :: CUFFT_WORKAREA_MINIMAL     = 0 ! maximum reduction
    enumerator :: CUFFT_WORKAREA_USER        = 1 ! use workSize parameter as limit
    enumerator :: CUFFT_WORKAREA_PERFORMANCE = 2 ! default - 1x overhead or more, max perf
end enum

! cufftMpCommType specifies how to initialize cuFFTMp
enum, bind(c)
    enumerator :: CUFFT_COMM_MPI       = 0
    enumerator :: CUFFT_COMM_NVSHMEM   = 1
    enumerator :: CUFFT_COMM_UNDEFINED = 2
end enum

The cufftXt module contains the following derived type definitions:

! cufftXt1dFactors type
type, bind(c) :: cufftXt1dFactors
    integer(8) :: size
    integer(8) :: stringCount
    integer(8) :: stringLength
    integer(8) :: subStringLength
    integer(8) :: factor1
    integer(8) :: factor2
    integer(8) :: stringMask
    integer(8) :: subStringMask
    integer(8) :: factor1Mask
    integer(8) :: factor2Mask
    integer(4) :: stringShift
    integer(4) :: subStringShift
    integer(4) :: factor1Shift
    integer(4) :: factor2Shift
end type cufftXt1dFactors

type, bind(C) :: cudaXtDesc
    integer(4) :: version
    integer(4) :: nGPUs
    integer(4) :: GPUs(MAX_CUDA_DESCRIPTOR_GPUS)
    type(c_devptr) :: data(MAX_CUDA_DESCRIPTOR_GPUS)
    integer(8) :: size(MAX_CUDA_DESCRIPTOR_GPUS)
    type(c_ptr) :: cudaXtState
end type cudaXtDesc

type, bind(C) :: cudaLibXtDesc
    integer(4) :: version
    type(c_ptr) :: descriptor     ! cudaXtDesc *descriptor
    integer(4) :: library         ! libFormat library
    integer(4) :: subFormat
    type(c_ptr) :: libDescriptor  ! void *libDescriptor
end type cudaLibXtDesc

type, bind(C) :: cufftBox3d
    integer(8) :: lower(3)
    integer(8) :: upper(3)
    integer(8) :: strides(3)
end type cufftBox3d

cufftXtSetGPUs

This function identifies which GPUs are to be used with the plan. The call to cufftXtSetGPUs must occur after the call to cufftCreate but before the call to cufftMakePlan*.

integer(4) function cufftXtSetGPUs( plan, nGPUs, whichGPUs )
  integer(4) :: plan
  integer(4) :: nGPUs
  integer(4) :: whichGPUs(*)

cufftXtMalloc

This function allocates a cufftXt descriptor, and memory for data in the GPUs associated with the plan. The value of cufftXtSubFormat determines if the buffer will be used for input or output. Fortran programmers should declare and pass a pointer to a type(cudaLibXtDesc) variable so the entire information can be stored, and also freed in subsequent calls to cufftXtFree. For programmers comfortable with the C interface, a variant of this function can take a type(c_ptr) for the 2nd argument.

integer(4) function cufftXtMalloc( plan, descriptor, format )
  integer(4) :: plan
  type(cudaLibXtDesc), pointer :: descriptor  ! A type(c_ptr) is also accepted.
  integer(4) :: format ! cufftXtSubFormat value

cufftXtFree

This function frees the cufftXt descriptor, and all memory associated with it. The descriptor and memory must have been allocated by a previous call to cufftXtMalloc. Fortran programmers should declare and pass a pointer to a type(cudaLibXtDesc) variable. For programmers comfortable with the C interface, a variant of this function can take a type(c_ptr) as the only argument.

integer(4) function cufftXtFree( descriptor )
  type(cudaLibXtDesc), pointer :: descriptor  ! A type(c_ptr) is also accepted.

cufftXtMemcpy

This function copies data between buffers on the host and GPUs, or between GPUs. The value of the type argument determines the copy direction. In addition, this Fortran function is overloaded to take a type(cudaLibXtDesc) variable for the destination (H2D transfer), for the source (D2H transfer), or for both (D2D transfer), in which case the type argument is not required.

integer(4) function cufftXtMemcpy( plan, dst, src, type )
  integer(4) :: plan
  type(cudaLibXtDesc) :: dst  ! Or any host buffer, depending on the type
  type(cudaLibXtDesc) :: src  ! Or any host buffer, depending on the type
  integer(4) :: type          ! optional cufftXtCopyType value

CUFFTXT Plans and Work Area Functions

This section contains functions from the cufftXt library used to create plans and manage work buffers.

cufftXtMakePlanMany

Following a call to cufftCreate(), this function creates an FFT plan configuration of dimension rank, with sizes specified in the array n. Batch is the number of transforms to configure. This function supports more complicated input and output data layouts using the arguments inembed, istride, idist, onembed, ostride, and odist. In the C function, if inembed and onembed are set to NULL, all other stride information is ignored. Fortran programmers can pass NULL when using the NVIDIA cufft module by setting an F90 pointer to null(), either through direct assignment, using c_f_pointer() with c_null_ptr as the first argument, or the nullify statement, then passing the nullified F90 pointer as the actual argument for the inembed and onembed dummies.

integer(4) function cufftXtMakePlanMany(plan, rank, n, inembed, istride, &
    idist, inputType, onembed, ostride, odist, outputType, batch, workSize, &
    executionType)
  integer(4) :: plan
  integer(4) :: rank
  integer(8) :: n(*)
  integer(8) :: inembed(*), onembed(*)
  integer(8) :: istride, idist, ostride, odist
  type(cudaDataType) :: inputType, outputType, executionType
  integer(4) :: batch
  integer(8) :: workSize(*)

cufftXtQueryPlan

This function only supports multi-gpu 1D transforms. It returns a derived type, factors, which contains the number of strings, the decomposition of factors, and (in the case of power of 2 sizes) some other useful mask and shift elements, used in converting between permuted and linear indexes.

integer(4) function cufftXtQueryPlan(plan, factors, queryType)
  integer(4) :: plan
  type(cufftXt1DFactors) :: factors
  integer(4) :: queryType

cufftXtSetWorkAreaPolicy

This function overrides the work area associated with a plan. Currently, the workAreaPolicy can be specified as CUFFT_WORKAREA_MINIMAL and cuFFT will attempt to re-plan to use zero bytes of work area memory. See the CUFFT documentation for support of other features.

integer(4) function cufftXtSetWorkAreaPolicy(plan, workAreaPolicy, workSize)
  integer(4) :: plan
  integer(4) :: workAreaPolicy
  integer(8) :: workSize

cufftXtGetSizeMany

This function gives a more accurate estimate than cufftEstimateMany() of the size of the work area required, in bytes, given the specified plan parameters used for cufftXtMakePlanMany and taking into account any plan settings which may have been made.

integer(4) function cufftXtGetSizeMany(plan, rank, n, inembed, istride, &
    idist, inputType, onembed, ostride, odist, outputType, batch, workSize, &
    executionType)
  integer(4) :: plan
  integer(4) :: rank
  integer(8) :: n(*)
  integer(8) :: inembed(*), onembed(*)
  integer(8) :: istride, idist, ostride, odist
  type(cudaDataType) :: inputType, outputType, executionType
  integer(4) :: batch
  integer(8) :: workSize(*)

cufftXtSetWorkArea

This function overrides the work areas associated with a plan. If the work area was auto-allocated, cuFFT frees the auto-allocated space. The cufftExecute*() calls assume that the work area pointer is valid and that it points to a contiguous region in device memory that does not overlap with any other work area. If this is not the case, results are indeterminate.

integer(4) function cufftXtSetWorkArea(plan, workArea)
  integer(4) :: plan
  type(c_devptr) :: workArea(*)

cufftXtSetDistribution

This function registers and describes the data distribution for a subsequent FFT operation. The call to cufftXtSetDistribution must occur after the call to cufftCreate but before the call to cufftMakePlan*.

integer(4) function cufftXtSetDistribution( plan, boxIn, boxOut )
  integer(4) :: plan
  type(cufftBox3d) :: boxIn
  type(cufftBox3d) :: boxOut

CUFFTXT Execution Functions

This section contains the execution functions, which perform the actual Fourier transform, in the cufftXt library.

cufftXtExec

This function executes any Fourier transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms, the direction argument is ignored. Otherwise, the transform direction is specified by the direction parameter. This function uses the GPU memory pointed to by input as input data, and stores the computed Fourier coefficients in the output array. If those are the same, this method does an in-place transform. Any valid data type for the input and output arrays are accepted.

integer(4) function cufftXtExec( plan, input, output, direction )
  integer :: plan
  real, dimension(*) :: input, output  ! Any data type is allowed
  integer :: direction

cufftXtExecDescriptor

This function executes any Fourier transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms, the direction argument is ignored. Otherwise, the transform direction is specified by the direction parameter. This function stores the result in the specified output arrays.

integer(4) function cufftXtExecDescriptor( plan, input, output, direction )
  integer :: plan
  type(cudaLibXtDesc) :: input, output
  integer :: direction

cufftXtExecDescriptorC2C

This function executes a single precision complex-to-complex transform plan in the transform direction as specified by the direction parameter. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.

integer(4) function cufftXtExecDescriptorC2C( plan, input, output, direction )
  integer :: plan
  type(cudaLibXtDesc) :: input, output
  integer :: direction

cufftXtExecDescriptorZ2Z

This function executes a double precision complex-to-complex transform plan in the transform direction as specified by the direction parameter. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.

integer(4) function cufftXtExecDescriptorZ2Z( plan, input, output, direction )
  integer :: plan
  type(cudaLibXtDesc) :: input, output
  integer :: direction

cufftXtExecDescriptorR2C

This function executes a single precision real-to-complex transform plan. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.

integer(4) function cufftXtExecDescriptorR2C( plan, input, output )
  integer :: plan
  type(cudaLibXtDesc) :: input, output

cufftXtExecDescriptorD2Z

This function executes a double precision real-to-complex transform plan. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.

integer(4) function cufftXtExecDescriptorD2Z( plan, input, output )
  integer :: plan
  type(cudaLibXtDesc) :: input, output

cufftXtExecDescriptorC2R

This function executes a single precision complex-to-real transform plan. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.

integer(4) function cufftXtExecDescriptorC2R( plan, input, output )
  integer :: plan
  type(cudaLibXtDesc) :: input, output

cufftXtExecDescriptorZ2D

This function executes a double precision complex-to-real transform plan. This multiple GPU function currently supports in-place transforms only; the result will be stored in the input arrays.

integer(4) function cufftXtExecDescriptorZ2D( plan, input, output )
  integer :: plan
  type(cudaLibXtDesc) :: input, output

CUFFTMP Functions

This section contains the cuFFTMp functions which extend the cuFFTXt library functionality to multiple processes and multiple GPUs.

cufftMpNvshmemMalloc

This function allocates space from the NVSHMEM symmetric heap. The cuFFTMp library is based on NVSHMEM. However, the user is not allowd to link and use NVSHMEM in their own application. This may cause a crash at applicaton start time. This limitation will be lifted in a future release of cuFFTMp.

However, some functionality of cuFFTMp requires NVSHMEM-allocated memory, so this function is currently exposed and supported. This function requires that at least one cuFFTMp plan is active prior to its use.

integer(4) function cufftMpNvshmemMalloc( size, workArea )
  integer(8) :: size  ! Size is in bytes
  type(c_devptr) :: workArea

cufftMpNvshmemFree

This function frees the space previously allocated from the NVSHMEM symmetric heap. The cuFFTMp library is based on NVSHMEM. However, the user is not allowd to link and use NVSHMEM in their own application. This may cause a crash at applicaton start time. This limitation will be lifted in a future release of cuFFTMp.

However, some functionality of cuFFTMp requires NVSHMEM-allocated memory, so this function is currently exposed and supported. This function requires that at least one cuFFTMp plan is active prior to its use.

integer(4) function cufftMpNvshmemFree( workArea )
  type(c_devptr) :: workArea

cufftMpAttachComm

This function attaches a communicator, such as MPI_COMM_WORLD, to a cuFFT plan, for later application of a distributed FFT operation

integer(4) function cufftMpAttachComm( plan, commType, fcomm )
  integer(4) :: plan
  integer(4) :: commType
  integer(4) :: fcomm

cufftMpCreateReshape

This function creates a cuFFTMp reshape handle for later application of a distributed FFT operation

integer(4) function cufftMpCreateReshape( reshapeHandle )
  type(c_ptr) :: reshapeHandle

cufftMpAttachReshapeComm

This function attaches a communicator, such as MPI_COMM_WORLD, to a cuFFTMp reshape handle, for later application of a distributed FFT operation

integer(4) function cufftMpAttachReshapeComm( reshapeHandle, commType, fcomm )
  type(c_ptr) :: reshapeHandle
  integer(4) :: commType
  integer(4) :: fcomm

cufftMpGetReshapeSize

This function returns the size needed for work space in the subsequent cuFFTMp reshape execution. Currently, a work area is not required, but that may change in future releases.

integer(4) function cufftMpGetReshapeSize( reshapeHandle, workSize )
   type(c_ptr) :: reshapeHandle
   integer(8)  :: workSize

cufftMpMakeReshape

This function creates a cuFFTMp reshape plan based on the input and output boxes. Note that the boxes use C conventions for bounds and strides.

integer(4) function cufftMpMakeReshape( reshapeHandle, &
       elementSize, boxIn, boxOut )
   type(c_ptr) :: reshapeHandle
   integer(8)  :: elementSize
   type(cufftBox3d) :: boxIn
   type(cufftBox3d) :: boxOut

cufftMpExecReshapeAsync

This function executes a cuFFTMp reshape plan on the specified stream.

integer(4) function cufftMpExecReshapeAsync( reshapeHandle, &
       dataOut, dataIn, workSpace, stream )
   type(c_ptr) :: reshapeHandle
   type(c_devptr) :: dataOut
   type(c_devptr) :: dataIn
   type(c_devptr) :: workSpace
   integer(kind=cuda_stream_kind) :: stream

cufftMpDestroyReshape

This function destroys a cuFFTMp reshape handle.

integer(4) function cufftMpDestroyReshape( reshapeHandle )
  type(c_ptr) :: reshapeHandle