API reference¶
Welcome to the cuFFTMp API documentation. You can find here the description of all functions used with cuFFTMp.
Warning
As described in Versioning, the single-GPU and single-process, multi-GPU functionalities of cuFFT and cuFFTMp are identical when their versions match. However, multi-process functionalities are only available on cuFFTMp. This section documents only the APIs relevant for cuFFTMp.
Plan creation, execution and destruction¶
cufftCreate and cufftDestroy¶
-
type cufftHandle¶
An opaque handle to a cuFFTMp plan.
-
cufftResult cufftCreate(cufftHandle *plan)¶
Creates only an opaque handle, and allocates small data structures on the host. The
cufftMakePlan*()
calls actually do the plan generation- Parameters
plan[In] – Pointer to a cufftHandle object
plan[Out] – Contains a cuFFT plan handle value
- Return values
CUFFT_SUCCESS – cuFFTMp successfully created the FFT plan
CUFFT_ALLOC_FAILED – The allocation of resources for the plan failed
CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the API
CUFFT_INTERNAL_ERROR – An internal driver error was detected
CUFFT_SETUP_FAILED – The cuFFTMp library failed to initialize.
-
cufftResult cufftDestroy(cufftHandle plan)¶
Frees all GPU resources associated with a cuFFT plan and destroys the internal plan data structure. This function should be called once a plan is no longer needed, to avoid wasting GPU memory.
- Parameters
plan[In] – The
cufftHandle
object of the plan to be destroyed.
- Return values
CUFFT_SUCCESS – cuFFT successfully destroyed the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
cufftSetStream¶
-
cufftResult cufftSetStream(cufftHandle plan, cudaStream_t stream);¶
Associates a CUDA stream with a cuFFT plan. All kernel launches made during plan execution are now done through the associated stream, enabling overlap with activity in other streams (e.g. data copying). The association remains until the plan is destroyed or the stream is changed with another call to
cufftSetStream()
.- Parameters
plan[In] – The
cufftHandle
object to associate with the streamstream[In] – A valid CUDA stream created with
cudaStreamCreate()
; 0 for the default stream
- Return values
CUFFT_SUCCESS – cuFFT successfully destroyed the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
cufftMpAttachComm¶
-
enum cufftMpCommType¶
-
enumerator CUFFT_COMM_MPI¶
Indicates that the communication handle is a pointer to an MPI communicator.
An enumeration describing the kind of a communication handle. Only
CUFFT_COMM_MPI
is valid, in which case the associated communication handle need to be a pointer to an MPI communicator.-
enumerator CUFFT_COMM_MPI¶
-
cufftResult cufftMpAttachComm(cufftHandle plan, cufftMpCommType comm_type, void *comm_handle)¶
cufftMpAttachComm
attaches a communication handle to the plan and enables the multi-process API.comm_type
is an enum indicating the type of the communication handle, andcomm_handle
is a pointer to the handle. The only valid value for comm_type isCUFFT_COMM_MPI
.- Parameters
plan[In] –
cufftHandle
returned bycufftCreate
comm_type[In] – An enum indicating the type of the communication handle. Only
CUFFT_COMM_MPI
is supported.comm_handle[In] – A pointer to a communication handle. The lifetime of the pointed object need to exceeds plan creation, execution and destruction.
stream[In] – A valid CUDA stream created with
cudaStreamCreate()
; 0 for the default stream
- Return values
CUFFT_SUCCESS – cuFFT successfully associated the communication handle with the plan.
CUFFT_INVALID_PLAN – The plan is not valid.
CUFFT_INVALID_VALUE –
comm_type
is notCUFFT_COMM_MPI
.
Warning
The underlying MPI_Comm
type of comm_handle
needs to be consistent with the NVSHMEM
MPI bootstrap.
In case the MPI bootstrap was built with a non-compatible MPI implementation, behaviour is undefined.
Note that the extra_bootstraps
directory in the code samples shows how to build a custom MPI bootstrap for a custom
MPI implementation.
cufftXtSetDistribution¶
-
type cufftBox3d¶
-
size_t lower[3]¶
-
size_t upper[3]¶
-
size_t strides[3]¶
Consider a global array of size
X*Y*Z
andcufftBox3d box
, such that box describes the current data distribution across processes. Letp
be a point of the globalX*Y*Z
array. Then p belongs to the current process if and only ifbox.lower[i] <= p[i] < box.upper[i]
fori = 0, 1 and 2
. The strides, on the other hand, describe the local data layout.box.strides[i]
indicates the space, in the number of elements, between successive elements in dimensioni
.-
size_t lower[3]¶
-
cufftResult cufftXtSetDistribution(cufftHandle plan, const cufftBox3d *box_in, const cufftBox3d *box_out)¶
cufftXtSetDistribution
indicates to the plan that the input and output descriptor may be usingCUFFT_XT_FORMAT_DISTRIBUTED_INPUT
andCUFFTXT_FORMAT_DISTRIBUTED_OUTPUT
. In such case, the input and output data will be assumed to be distributed according tobox_in
andbox_out
, respectively.- Parameters
plan[In] –
cufftHandle
returned bycufftCreate
box_in[In] – Pointer to the input box. For an R2C or C2R plan,
box_in
should describe the real data distribution.box_out[In] – Pointer to the output box. For an R2C or C2R plan,
box_out
should describe the complex data distribution.
- Return values
CUFFT_SUCCESS – cuFFTMp successfully associated the plan with the input and output boxes.
CUFFT_INVALID_PLAN – The plan is not valid.
CUFFT_INVALID_VALUE – One of
box_in
orbox_out
are not valid pointers.
cufftMakePlan¶
-
cufftResult cufftMakePlan2d(cufftHandle plan, int nx, int ny, cufftType type, size_t *workSize)¶
-
cufftResult cufftMakePlan3d(cufftHandle plan, int nx, int ny, int nz, cufftType type, size_t *workSize)¶
Following a call to
cufftCreate
, makes a 2D (resp. 3D) FFT plan configuration according to specified signal sizes and data type. This call can only be used once for a given handle. It will fail and returnCUFFT_INVALID_PLAN
if the plan is locked, i.e. the handle was previously used with a differentcufftPlan
orcufftMakePlan
call.- Parameters
plan[In] –
cufftHandle
returned bycufftCreate
nx[In] – The transform size in the x dimension. This is slowest changing dimension of a transform (strided in memory).
ny[In] – The transform size in the y dimension.
nz[In] – The transform size in the z dimension. This is fastest changing dimension of a transform (contiguous in memory).
type[In] – The transform data type (e.g., CUFFT_R2C for single precision real to complex).
*workSize[Out] –
Pointer to the size(s), in bytes, of the work areas.
- Return values
CUFFT_SUCCESS – cuFFT successfully created the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the API.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_SIZE – One or more of the nx, ny, or nz parameters is not a supported size.
cufftXtExecDescriptor¶
-
cufftResult cufftXtExecDescriptor(cufftHandle plan, cudaLibXtDesc *input, cudaLibXtDesc *output, int direction);¶
Function
cufftXtExecDescriptor
executes any cuFFT transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms direction parameter is ignored. cuFFT uses the GPU memory pointed to bycudaLibXtDesc *input descriptor
as input data andcudaLibXtDesc *output
as output data.- Parameters
plan[In] –
cufftHandle
returned bycufftCreate
input[In] – Pointer to the complex input data (in GPU memory) to transform
output[In] – Pointer to the complex output data (in GPU memory)
direction[In] – The transform direction:
CUFFT_FORWARD
orCUFFT_INVERSE
. Ignored for complex-to-real and real-to-complex transforms.
- Return values
CUFFT_SUCCESS – cuFFT successfully executed the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_INVALID_VALUE – At least one of the parameters input and output is not valid
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_EXEC_FAILED – cuFFT failed to execute the transform on the GPU.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_DEVICE – An invalid GPU index was specified in a descriptor.
Descriptors¶
-
enum cufftXtSubFormat¶
-
enumerator CUFFT_XT_FORMAT_INPLACE¶
Describes a built-in Slab data distribution distributed along the X axis.
-
enumerator CUFFT_XT_FORMAT_INPLACE_SHUFFLED¶
Describes a built-in Slab data distribution distributed along the Y axis.
-
enumerator CUFFT_XT_FORMAT_DISTRIBUTED_INPUT¶
Describes a data distribution distributed according to the
box_input
argument ofcufftXtSetDistribution
-
enumerator CUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT¶
Describes a data distribution distributed according to the
box_output
argument ofcufftXtSetDistribution
-
enumerator CUFFT_XT_FORMAT_INPLACE¶
-
enum cufftXtCopyType¶
-
enumerator CUFFT_COPY_HOST_TO_DEVICE¶
Copies data from a host CPU buffer to the device descriptor. Data should be distributed according to the descriptor’s subformat. This does not redistribute data across processes.
-
enumerator CUFFT_COPY_DEVICE_TO_HOST¶
Copies data from the device descriptor to a host CPU buffer. Data will be distributed according to the descriptor’s subformat. This does not redistribute data across processes.
-
enumerator CUFFT_COPY_DEVICE_TO_DEVICE¶
Redistribute data from a device descriptor to another.
-
enumerator CUFFT_COPY_HOST_TO_DEVICE¶
cufftXtMalloc and cufftXtFree¶
-
cufftResult cufftXtMalloc(cufftHandle plan, cudaLibXtDesc **descriptor, cufftXtSubFormat format)¶
cufftXtMalloc
allocates a descriptor, and all memory for data in GPUs associated with the plan, and returns a pointer to the descriptor. Note the descriptor contains an array of device pointers so that the application may preprocess or postprocess the data on the GPUs. The enumerated parametercufftXtSubFormat_t
indicates if the buffer will be used for input or output.- Parameters
plan[In] – cufftHandle returned by cufftCreate
descriptor[In/Out] – Pointer to a pointer to a
cudaLibXtDesc
objectformat[In] –
cufftXtSubFormat
value
- Return values
CUFFT_SUCCESS – cuFFT successfully allows user to allocate descriptor and GPU memory.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle or it is not a multiple GPU plan.
CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_DEVICE – An invalid GPU index was specified in the descriptor.
-
cufftResult cufftXtFree(cudaLibXtDesc *descriptor)¶
cufftXtFree
frees the descriptor and all memory associated with it. The descriptor and memory must have been returned by a previous call tocufftXtMalloc
.- Parameters
descriptor[In] – Pointer to a
cudaLibXtDesc
object
- Return values
CUFFT_SUCCESS – cuFFT successfully allows user to free descriptor and associated GPU memory.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
cufftXtMemcpy¶
-
cufftResult cufftXtMemcpy(cufftHandle plan, void *dstPointer, void *srcPointer, cufftXtCopyType type)¶
cufftXtMemcpy
copies data between buffers on the host and GPUs or between GPUs. The enumerated parametercufftXtCopyType_t
indicates the type and direction of transfer.This function is synchronous with respect to the host. In particular, if a stream was associated with the plan, the stream should be synchronized before calling
cufftXtMemcpy
.- Parameters
plan[In] – cufftHandle returned by cufftCreate
dstPointer[Out] – Pointer to the destination address(es)
srcPointer[In] – Pointer to the source address(es)
type[In] – cufftXtCopyType value
- Return values
CUFFT_SUCCESS – cuFFT successfully allows user to copy memory between host and GPUs or between GPUs.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the API.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_DEVICE – An invalid GPU index was specified in a descriptor.
Standalone Reshape¶
-
type cufftReshapeHandle¶
An opaque handle to a reshape operation.
cufftMpCreateReshape¶
-
cufftResult cufftMpCreateReshape(cufftReshapeHandle *handle)¶
This function initializes a reshape handle for future use. This function is not collective.
- Parameters
handle[In/Out] – A pointer to an opaque cufftReshapeHandle object.
- Return values
CUFFT_SUCCESS – cuFFT successfully created a reshape handle.
CUFFT_ALLOC_FAILED – cuFFT failed to allocate enough host memory for the handle.
cufftMpAttachReshapeComm¶
-
cufftResult cufftMpAttachReshapeComm(cufftReshapeHandle handle, cufftMpCommType comm_type, void *comm_handle)¶
This function attaches a communication handle to a reshape. This function is not collective.
- Parameters
handle[In] – A handle to a reshape operation, following
cufftMpCreateReshape
comm_type[In] – An enum describing the communication type of the handle. Should be
CUFFT_COMM_MPI
.comm_handle[In] – A pointer to the communication handle. If comm_type is
CUFFT_COMM_MPI
, this should be a pointer to an MPI communicator. The pointer should remain valid until destruction of the handle.
- Return values
CUFFT_SUCCESS – cuFFT successfully associated a communication handle to the reshape.
CUFFT_INVALID_VALUE –
comm_handle
is null or the handle is invalid.
cufftMpMakeReshape¶
-
cufftResult cufftMpMakeReshape(cufftReshapeHandle handle, size_t element_size, const cufftBox3d *box_in, const cufftBox3d *box_out)¶
This function creates a reshape intended to re-distribute a global array of data. The data is initially distributed, on the current process, according to
*box_in
. After the reshape, the data will be distributed according to*box_out
.*box_in, *box_out
represent the portion of the global array that the current process owns, before and after the reshape. Each element is of size element_size, in bytes. This function is collective and should be called by all process together.- Parameters
handle[In] – The reshape handle.
element_size[In] – The size of the individual elements, in bytes. Allowed values are 4, 8 and 16.
box_in[In] – A pointer to the box describing the input data distribution on the current process.
box_out[In] – A pointer to the box describing the output data distribution on the current process.
- Return values
CUFFT_SUCCESS – cuFFT successfully created the reshape operation.
CUFFT_INVALID_VALUE – The handle is invalid or cufftMpAttachReshapeComm was not called.
CUFFT_ALLOC_FAILED – cuFFT failed to allocate enough host and/or device memory for the handle.
CUFFT_INTERNAL_ERROR – cuFFT failed to initialize the underlying communication library.
Note
The new experimental multi-node implementation can be choosen by defining CUFFT_RESHAPE_USE_PACKING=1
in the environment.
This requires scratch space but provides improved performances over Infiniband.
cufftMpGetReshapeSize¶
-
cufftResult cufftMpGetReshapeSize(cufftReshapeHandle handle, size_t *workspace_size)¶
Returns the amount (in bytes) of workspace required to execute the handle. There is no guarantee that the
workspace_size
will or will not change between versions of cuFFTMp.- Parameters
handle[In] – A handle created using cufftMpCreateReshape.
workspace_size[Out] – The size, in bytes, of the workspace required during reshape execution
- Return values
CUFFT_SUCCESS – cuFFT successfully returned the workspace size.
cufftMpExecReshapeAsync¶
-
cufftResult cufftMpExecReshapeAsync(cufftReshapeHandle handle, void *data_out, const void *data_in, void *workspace, cudaStream_t stream)¶
Executes the reshape, redistributing
data_in
intodata_out
using the workspace inworkspace
. This function executes in the stream stream. This function is collective and stream-ordered. The user is responsible to ensure that all GPUs involved in the communication will be able to synchronize in the stream(s), otherwise deadlocks may occur.- Parameters
handle[In] – The reshape handle.
data_out[Out] – A symmetric-heap pointer to the output data. This memory should be NVSHMEM allocated and identical on all processes.
data_in[In] – A symmetric-heap pointer to the input data. This memory should be NVSHMEM allocated and identical on all processes.
workspace[Out] – A symmetric-heap pointer to the workspace data. This memory should be NVSHMEM allocated and identical on all processes.
stream[In] – The CUDA stream in which to run the reshape operation.
- Return values
CUFFT_SUCCESS – cuFFT successfully created the reshape operation.
CUFFT_INVALID_VALUE – cufftMpMakeReshape was not called prior to this function.
CUFFT_INTERNAL_ERROR – An error occurred during kernel execution.
cufftMpDestroyReshape¶
-
cufftResult cufftMpDestroyReshape(cufftReshapeHandle handle)¶
Destroys a reshape and all its associated data.
- Parameters
handle[In] – The reshape handle to destroy.
- Return values
CUFFT_SUCCESS – The handle was successfully destroyed.