API reference#
This section describes all the cuFFTMp functions.
Warning
As described in Versioning, the single-GPU and single-process, multi-GPU functionalities of cuFFT and cuFFTMp are identical when their versions match. However, multi-process functionalities are only available on cuFFTMp. This section documents only the APIs relevant for cuFFTMp.
Plan creation, execution and destruction#
cufftCreate and cufftDestroy#
-
type cufftHandle#
An opaque handle to a cuFFTMp plan.
-
cufftResult cufftCreate(cufftHandle *plan)#
Creates only an opaque handle, and allocates small data structures on the host. The
cufftMpMakePlan*()calls do the actual plan generation- Parameters:
plan[In] – Pointer to a cufftHandle object
plan[Out] – Contains a cuFFT plan handle value
- Return values:
CUFFT_SUCCESS – cuFFTMp successfully created the FFT plan
CUFFT_ALLOC_FAILED – The allocation of resources for the plan failed
CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the API
CUFFT_INTERNAL_ERROR – An internal driver error was detected
CUFFT_SETUP_FAILED – The cuFFTMp library failed to initialize.
-
cufftResult cufftDestroy(cufftHandle plan)#
Frees all GPU resources associated with a cuFFT plan and destroys the internal plan data structure. This function should be called once a plan is no longer needed, to avoid wasting GPU memory.
- Parameters:
plan[In] – The
cufftHandleobject of the plan to be destroyed.
- Return values:
CUFFT_SUCCESS – cuFFT successfully destroyed the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
cufftSetStream#
-
cufftResult cufftSetStream(cufftHandle plan, cudaStream_t stream);#
Associates a CUDA stream with a cuFFT plan. All kernel launches made during plan execution are now done through the associated stream, enabling overlap with activity in other streams (e.g. data copying). The association remains until the plan is destroyed or the stream is changed with another call to
cufftSetStream().- Parameters:
plan[In] – The
cufftHandleobject to associate with the streamstream[In] – A valid CUDA stream created with
cudaStreamCreate(); 0 for the default stream
- Return values:
CUFFT_SUCCESS – cuFFT successfully destroyed the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
cufftMpAttachComm#
-
enum cufftMpCommType#
-
enumerator CUFFT_COMM_MPI#
Indicates that the communication handle is a pointer to an MPI communicator. In this case cuFFTMp will initialize NVSHMEM only for the processes belonging to the MPI communicator. This is equivalent to calling
nvshmem_init_attr(see here) withNVSHMEMX_INIT_WITH_MPI_COMM.
-
enumerator CUFFT_COMM_NONE#
Indicates that the communication handle is NULL. In this case, cuFFTMp will initialize NVSHMEM for all processes in the program. This is equivalent to calling
nvshmem_init(see here).NVSHMEM, and cuFFTMp, will be bootstrapped and initialized according to the value of the environment variable
NVSHMEM_BOOTSTRAP. The default isNVSHMEM_BOOTSTRAP=PMI, in which case PMI will be used to bootstrap NVSHMEM and cuFFTMp. In that case, all cuFFTMp APIs (cufftMpAttachComm,cufftMakePlan, etc.) need to be called by all processes managed by PMI. IfNVSHMEM_BOOTSTRAP=MPI, then all cuFFTMp APIs must be called by all processes inMPI_COMM_WORLD. More information on bootstrapping and initialization can be found in the NVSHMEM documentation here and here.
An enumeration describing the kind of a communication handle and how to initialize cuFFTMp and NVSHMEM.
-
enumerator CUFFT_COMM_MPI#
- cufftResult cufftMpAttachComm(
- cufftHandle plan,
- cufftMpCommType comm_type,
- void *comm_handle,
Warning
This function is deprecated and will be removed in a future release. Use the new cufftMpMakePlan functions and cufftMpMakePlanDecomposition function, which combine communicator attachment and plan creation steps in a single API.
cufftMpAttachCommattaches a communication handle to the plan and enables the multi-process API.comm_typeis an enum indicating the type of the communication handle, andcomm_handleis a pointer to the handle.comm_handleis a pointer to a communication handle, andThe pointer should remain valid up until
cufftDestroyis called;The underlying handle should remain valid up until
cufftDestroyis called.
- Parameters:
plan[In] –
cufftHandlereturned bycufftCreatecomm_type[In] – An enum indicating the type of the communication handle.
comm_handle[In] – A pointer to a communication handle. The lifetime of the pointed object need to exceeds plan creation, execution and destruction.
- Return values:
CUFFT_SUCCESS – cuFFT successfully associated the communication handle with the plan.
CUFFT_INVALID_PLAN – The plan is not valid.
CUFFT_INVALID_VALUE –
comm_handleis null forCUFFT_COMM_MPIorcomm_handleis not null forCUFFT_COMM_NONE.
Warning
When using comm_type == CUFFT_COMM_MPI, comm_handle should point to an MPI communicator of type MPI_Comm.
The MPI implementation should be consistent with the NVSHMEM MPI bootstrap, which is built for OpenMPI.
Using another MPI implementation requires a different NVSHMEM MPI bootstrap, otherwise behaviour is undefined.
The extra_bootstraps directory in the code samples shows how to build a custom MPI bootstrap for a custom
MPI implementation.
Warning
When using comm_type == CUFFT_COMM_MPI, the communicator should remain valid from plan creation to destruction.
This means that if the communicator is MPI_COMM_WORLD, MPI_Finalize needs to be called after cufftDestroy.
If the communicator is a custom-built communicator, MPI_Comm_free needs to be called after cufftDestroy.
Warning
When comm_handle is a pointer to a communicator, the pointer’s lifetime should exceed the plan creation and destruction.
This means this is invalid
{
MPI_Comm comm = MPI_COMM_WORLD;
void* comm_handle = &comm;
cufftXtAttachComm(plan, CUFFT_COMM_MPI, comm_handle)
} // comm goes out of scope and &comm is dangling
cufftMakePlan(...) // &comm is now dangling and behaviour is undefined
...
cufftDestroy(...) // &comm is now dangling and behaviour is undefined
In cuFFTMp 11.4.0, the new cufftMpMakePlan functions and cufftMpMakePlanDecomposition function are introduced to eliminate this issue.
cufftXtSetDistribution#
Warning
This function is deprecated and will be removed in a future release. Use the new cufftMpMakePlanDecomposition function, which combines communicator attachment, distribution setup, and plan creation in a single API.
- cufftResult cufftXtSetDistribution(
- cufftHandle plan,
- int rank,
- const long long int *lower_input,
- const long long int *upper_input,
- const long long int *lower_output,
- const long long int *upper_output,
- const long long int *strides_input,
- const long long int *strides_output,
cufftXtSetDistributionindicates to the plan that the input and output descriptor will be of typeCUFFT_XT_FORMAT_DISTRIBUTED_INPUTandCUFFTXT_FORMAT_DISTRIBUTED_OUTPUT. In such case, the input and output data will be assumed to be distributed according to(lower_input, upper_input)and(lower_output, upper_output), respectively.(lower_input, upper_input)describes the section of the globalnx x ny(ifrankis 2) ornx * ny * nz(ifrankis 3) space owned by the current process, and similarly for(lower_output, upper_output).strides_inputandstrides_outputdescribe the data layout in memory in the input and output, respectively. The local data layout needs to be in row-major order, possibly with padding between dimensions. The number of elements should be at least(upper_input[0] - lower_input[0]) * strides_input[0]for input and(upper_output[0] - lower_output[0]) * strides_output[0]for output. All the six input arrays of the function may be freed immediately after the function returns.- Parameters:
plan[In] –
cufftHandlereturned bycufftCreaterank[In] – The rank of the transform, and the length of the
lower_input,upper_input,lower_output,upper_output,strides_inputandstrides_outputarrays.rankshould be2or3.lower_input[In] – An array of length
rank, respresenting the lower-corner (inclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the input descriptor.upper_input[In] – An array of length
rank, respresenting the upper-corner (exclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the input descriptor.lower_output[In] – An array of length
rank, respresenting the lower-corner (inclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the output descriptor.upper_output[In] – An array of length
rank, respresenting the upper-corner (exclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the output descriptor.strides_input[In] – An array of length
rank, respresenting the local data layout of the input descriptor in memory. All entries much be decreasing and positive.strides_output[In] – An array of length
rank, respresenting the local data layout of the output descriptor in memory. All entries much be decreasing and positive.
- Return values:
CUFFT_SUCCESS – cuFFTMp successfully associated the plan with the input and output boxes.
CUFFT_INVALID_PLAN – The plan is not valid.
CUFFT_INVALID_VALUE – Either
rankis not 2 or 3, the strides are not-positive and decreasing or the lower/input arrays are not valid.
cufftMpMakePlanDecomposition#
- cufftResult cufftMpMakePlanDecomposition(
- cufftHandle plan,
- int rank,
- int *n,
- const long long int *lower_input,
- const long long int *upper_input,
- const long long int *strides_input,
- const long long int *lower_output,
- const long long int *upper_output,
- const long long int *strides_output,
- cufftType type,
- void *comm_handle,
- cufftMpCommType comm_type,
- size_t *workSize,
cufftMpMakePlanDecompositionindicates to the plan that the input and output descriptor will be of typeCUFFT_XT_FORMAT_DISTRIBUTED_INPUTandCUFFTXT_FORMAT_DISTRIBUTED_OUTPUT. In such case, the input and output data will be assumed to be distributed according to(lower_input, upper_input, strides_input)and(lower_output, upper_output, strides_output), respectively.(lower_input, upper_input, strides_input)describes the section of the globalnx x ny(ifrankis 2) ornx * ny * nz(ifrankis 3) space owned by the current process, and similarly for(lower_output, upper_output, strides_output). The local data layout needs to be in row-major order, possibly with padding between dimensions. The number of elements should be at least(upper_input[0] - lower_input[0]) * strides_input[0]for input and(upper_output[0] - lower_output[0]) * strides_output[0]for output. All the six input arrays of the function may be freed immediately after the function returns.cufftMpMakePlanDecompositionis a convenience function that combinescufftXtSetDistribution,cufftMpAttachComm, andcufftMakePlan2d/cufftMakePlan3dinto a single API call. It sets up a custom data distribution across processes, attaches the communication handle to the plan, and creates a plan for FFT execution. This helps ensure the communicator does not go out of scope during the planning phase and simplifies the API usage.- Parameters:
plan[In] –
cufftHandlereturned bycufftCreate.rank[In] – The dimensionality of the transform (should be 2 or 3).
n[In] – Array specifying the size of the transform in each dimension.
lower_input[In] – An array of length
rank, respresenting the lower-corner (inclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the input descriptor.upper_input[In] – An array of length
rank, respresenting the upper-corner (exclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the input descriptor.strides_input[In] – An array of length
rank, respresenting the local data layout of the input descriptor in memory. All entries much be decreasing and positive.lower_output[In] – An array of length
rank, respresenting the lower-corner (inclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the output descriptor.upper_output[In] – An array of length
rank, respresenting the upper-corner (exclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the output descriptor.strides_output[In] – An array of length
rank, respresenting the local data layout of the output descriptor in memory. All entries much be decreasing and positive.type[In] – The transform type (e.g., CUFFT_R2C, CUFFT_C2C).
comm_handle[In] – A pointer to a communication handle.
comm_type[In] – An enum indicating the type of the communication handle.
*workSize[Out] –
Pointer to the size(s), in bytes, of the work areas.
- Return values:
CUFFT_SUCCESS – cuFFTMp successfully created the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the API.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_SIZE – One or more parameters is not supported.
cufftXtSetSubformatDefault#
- cufftResult cufftXtSetSubformatDefault(
- cufftHandle plan,
- cufftXtSubFormat subformat_forward,
- cufftXtSubFormat subformat_inverse,
cufftXtSetSubformatDefaultindicates the data distribution expected bycufftXtExecorcufftExec*.cufftXtSetSubformatDefaultmust be called prior to usingcufftXtExecorcufftExec*APIs.When doing a forward transform (e.g.,
cufftExecC2C(..., CUFFT_FORWARD)orcufftExecR2C), the input data distribution is described bysubformat_forwardand the output bysubformat_inverse. When doing an inverse transform (e.g.,cufftExecC2C(..., CUFFT_INVERSE)orcufftExecC2R), the input data distribution is described bysubformat_inverseand the output bysubformat_forward.subformat_forwardandsubformat_inversemust be opposite from each other. The opposite ofCUFFT_XT_FORMAT_INPLACEisCUFFT_XT_FORMAT_INPLACE_SHUFFLED(and vice-versa). The opposite ofCUFFT_XT_FORMAT_DISTRIBUTED_INPUTisCUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT(and vice-versa).cufftXtExecDescriptorsandcufftXtExec(orcufftExec*) can both be used whencufftXtSetSubformatDefaulthas been applied to a plan.- Parameters:
plan[In] –
cufftHandlereturned bycufftCreatesubformat_forward[In] – The input subformat for a forward transform. Must be
CUFFT_XT_FORMAT_INPLACE,CUFFT_XT_FORMAT_INPLACE_SHUFFLED,CUFFT_XT_FORMAT_DISTRIBUTED_INPUTorCUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT.subformat_inverse[In] – The input subformat for an inverse transform. Must be the opposite of
subformat_forward.
- Return values:
CUFFT_SUCCESS – cuFFTMp successfully associated the plan with the input and output boxes.
CUFFT_INVALID_PLAN – The plan is not valid.
CUFFT_INVALID_VALUE –
subformat_forwardis not one of the four accepted value, orsubformard_inverseis not the opposite ofsubformat_forward.
cufftMakePlan#
- cufftResult cufftMakePlan2d(
- cufftHandle plan,
- int nx,
- int ny,
- cufftType type,
- size_t *workSize,
- cufftResult cufftMakePlan3d(
- cufftHandle plan,
- int nx,
- int ny,
- int nz,
- cufftType type,
- size_t *workSize,
Warning
These functions are deprecated for multi-process plans. Use the new cufftMpMakePlan functions below, which combine communicator attachment and plan creation in a single API.
Following a call to
cufftCreate, makes a 2D (resp. 3D) FFT plan configuration according to specified signal sizes and data type. This call can only be used once for a given handle. It will fail and returnCUFFT_INVALID_PLANif the plan is locked, i.e. the handle was previously used with a differentcufftPlanorcufftMakePlancall. For more details on memory buffer management, please also refer to NVSHMEM memory buffer in cuFFTMp.- Parameters:
plan[In] –
cufftHandlereturned bycufftCreatenx[In] – The transform size in the x dimension. This is slowest changing dimension of a transform (strided in memory).
ny[In] – The transform size in the y dimension.
nz[In] – The transform size in the z dimension. This is fastest changing dimension of a transform (contiguous in memory).
type[In] – The transform data type (e.g., CUFFT_R2C for single precision real to complex).
*workSize[Out] –
Pointer to the size(s), in bytes, of the work areas.
- Return values:
CUFFT_SUCCESS – cuFFTMp successfully created the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the API.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_SIZE – One or more of the nx, ny, or nz parameters is not a supported size.
cufftMpMakePlan#
- cufftResult cufftMpMakePlan2d(
- cufftHandle plan,
- int nx,
- int ny,
- cufftType type,
- void *comm_handle,
- cufftMpCommType comm_type,
- size_t *workSize,
- cufftResult cufftMpMakePlan3d(
- cufftHandle plan,
- int nx,
- int ny,
- int nz,
- cufftType type,
- void *comm_handle,
- cufftMpCommType comm_type,
- size_t *workSize,
Following a call to
cufftCreate, makes a 2D (resp. 3D) FFT plan configuration according to specified signal sizes and data type.cufftMpMakePlan2dandcufftMpMakePlan3dare convenience functions that combinecufftMpAttachCommand the originalcufftMakePlan2dandcufftMakePlan3dinto two APIs. This helps ensure the communicator does not go out of scope during the planning phase. This call can only be used once for a given handle. It will fail and returnCUFFT_INVALID_PLANif the plan is locked, i.e. the handle was previously used with a differentcufftPlanorcufftMpMakePlancall. For more details on memory buffer management, please also refer to NVSHMEM memory buffer in cuFFTMp.- Parameters:
plan[In] –
cufftHandlereturned bycufftCreate.nx[In] – The transform size in the x dimension. This is slowest changing dimension of a transform (strided in memory).
ny[In] – The transform size in the y dimension.
nz[In] – The transform size in the z dimension. This is fastest changing dimension of a transform (contiguous in memory).
type[In] – The transform data type (e.g., CUFFT_R2C for single precision real to complex).
comm_handle[In] – A pointer to a communication handle.
comm_type[In] – An enum indicating the type of the communication handle.
*workSize[Out] –
Pointer to the size(s), in bytes, of the work areas.
- Return values:
CUFFT_SUCCESS – cuFFTMp successfully created the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the API.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_SIZE – One or more of the nx, ny, or nz parameters is not a supported size.
cufftXtExecDescriptor#
- cufftResult cufftXtExecDescriptor(
- cufftHandle plan,
- cudaLibXtDesc *input,
- cudaLibXtDesc *output,
- int direction,
Function
cufftXtExecDescriptorexecutes any cuFFT transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms direction parameter is ignored. cuFFT uses the GPU memory pointed to bycudaLibXtDesc *input descriptoras input data andcudaLibXtDesc *outputas output data.- Parameters:
plan[In] –
cufftHandlereturned bycufftCreateinput[In] – Pointer to the complex input data (in GPU memory) to transform
output[In] – Pointer to the complex output data (in GPU memory)
direction[In] – The transform direction:
CUFFT_FORWARDorCUFFT_INVERSE. Ignored for complex-to-real and real-to-complex transforms.
- Return values:
CUFFT_SUCCESS – cuFFT successfully executed the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_INVALID_VALUE – At least one of the parameters input and output is not valid
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_EXEC_FAILED – cuFFT failed to execute the transform on the GPU.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_DEVICE – An invalid GPU index was specified in a descriptor.
cufftXtExec, cufftExec*#
- cufftResult cufftXtExec(
- cufftHandle plan,
- void *idata,
- void *odata,
- int direction,
Executes a plan on a distributed array.
idataandodatamust both be the start of an NVSHMEM allocated buffer. Can only be called ifcufftXtSetSubformatDefaultwas previously called on the plan. The same conditions apply tocufftExec*APIs. For strided input/output data (set by cufftMpMakePlanDecomposition), the number of elements in NVSHMEM symmetric heap should be at least(upper_input[0] - lower_input[0]) * strides_input[0]for input buffer and(upper_output[0] - lower_output[0]) * strides_output[0]for output buffer when allocating memory for the NVSHMEM symmetric heap. If the number of elements on each GPU is not the same, the maximum buffer size across all GPUs should be used. As this executes communication calls to write on memory buffers on remote GPUs, the user is responsible for making sureidata,odataand workspace buffer (if there is any) are available on all other GPUs before kernel execution (such as placing synchronous points/barriers likenvshmemx_sync_all_on_stream(stream)before the API) to avoid race conditions. Upon return, the memory buffers on all GPUs are available.- Parameters:
plan[In] –
cufftHandlereturned bycufftCreateidata[In/Out] – Pointer to the input data (in GPU memory and NVSHMEM allocated) to transform
odata[In/Out] – Pointer to the output data (in GPU memory and NVSHMEM allocated)
direction[In] – The transform direction:
CUFFT_FORWARDorCUFFT_INVERSE. Ignored for complex-to-real and real-to-complex transforms.
- Return values:
CUFFT_SUCCESS – cuFFT successfully executed the FFT plan.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_INVALID_VALUE – At least one of the parameters input and output is not valid
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_EXEC_FAILED – cuFFT failed to execute the transform on the GPU.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_DEVICE – An invalid GPU index was specified in a descriptor.
Descriptors#
-
enum cufftXtSubFormat#
-
enumerator CUFFT_XT_FORMAT_INPLACE#
Describes a built-in Slab data distribution distributed along the X axis.
-
enumerator CUFFT_XT_FORMAT_INPLACE_SHUFFLED#
Describes a built-in Slab data distribution distributed along the Y axis.
-
enumerator CUFFT_XT_FORMAT_DISTRIBUTED_INPUT#
Describes a data distribution distributed according to the
box_inputargument ofcufftXtSetDistribution
-
enumerator CUFFT_XT_FORMAT_DISTRIBUTED_OUTPUT#
Describes a data distribution distributed according to the
box_outputargument ofcufftXtSetDistribution
-
enumerator CUFFT_XT_FORMAT_INPLACE#
-
enum cufftXtCopyType#
-
enumerator CUFFT_COPY_HOST_TO_DEVICE#
Copies data from a host CPU buffer to the device descriptor. Data should be distributed according to the descriptor’s subformat. This does not redistribute data across processes.
-
enumerator CUFFT_COPY_DEVICE_TO_HOST#
Copies data from the device descriptor to a host CPU buffer. Data will be distributed according to the descriptor’s subformat. This does not redistribute data across processes.
-
enumerator CUFFT_COPY_DEVICE_TO_DEVICE#
Redistribute data from a device descriptor to another.
-
enumerator CUFFT_COPY_HOST_TO_DEVICE#
cufftXtMalloc and cufftXtFree#
- cufftResult cufftXtMalloc(
- cufftHandle plan,
- cudaLibXtDesc **descriptor,
- cufftXtSubFormat format,
cufftXtMallocallocates a descriptor, and all memory for data in GPUs associated with the plan, and returns a pointer to the descriptor. Note the descriptor contains an array of device pointers so that the application may preprocess or postprocess the data on the GPUs. The enumerated parametercufftXtSubFormat_tindicates if the buffer will be used for input or output. For more details on memory buffer management, please also refer to NVSHMEM memory buffer in cuFFTMp.- Parameters:
plan[In] – cufftHandle returned by cufftCreate
descriptor[In/Out] – Pointer to a pointer to a
cudaLibXtDescobjectformat[In] –
cufftXtSubFormatvalue
- Return values:
CUFFT_SUCCESS – cuFFT successfully allows user to allocate descriptor and GPU memory.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle or it is not a multiple GPU plan.
CUFFT_ALLOC_FAILED – The allocation of GPU resources for the plan failed.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_DEVICE – An invalid GPU index was specified in the descriptor.
-
cufftResult cufftXtFree(cudaLibXtDesc *descriptor)#
cufftXtFreefrees the descriptor and all memory associated with it. The descriptor and memory must have been returned by a previous call tocufftXtMalloc.- Parameters:
descriptor[In] – Pointer to a
cudaLibXtDescobject
- Return values:
CUFFT_SUCCESS – cuFFT successfully allows user to free descriptor and associated GPU memory.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
cufftXtMemcpy#
- cufftResult cufftXtMemcpy(
- cufftHandle plan,
- void *dstPointer,
- void *srcPointer,
- cufftXtCopyType type,
cufftXtMemcpycopies data between buffers on the host and GPUs or between GPUs. The enumerated parametercufftXtCopyType_tindicates the type and direction of transfer.This function is synchronous with respect to the host. In particular, if a stream was associated with the plan, the stream should be synchronized before calling
cufftXtMemcpy.- Parameters:
plan[In] – cufftHandle returned by cufftCreate
dstPointer[Out] – Pointer to the destination address(es)
srcPointer[In] – Pointer to the source address(es)
type[In] – cufftXtCopyType value
- Return values:
CUFFT_SUCCESS – cuFFT successfully allows user to copy memory between host and GPUs or between GPUs.
CUFFT_INVALID_PLAN – The plan parameter is not a valid handle.
CUFFT_INVALID_VALUE – One or more invalid parameters were passed to the API.
CUFFT_INTERNAL_ERROR – An internal driver error was detected.
CUFFT_SETUP_FAILED – The cuFFT library failed to initialize.
CUFFT_INVALID_DEVICE – An invalid GPU index was specified in a descriptor.
Standalone Reshape#
-
type cufftReshapeHandle#
An opaque handle to a reshape operation.
cufftMpCreateReshape#
-
cufftResult cufftMpCreateReshape(cufftReshapeHandle *handle)#
This function initializes a reshape handle for future use. This function is not collective.
- Parameters:
handle[In/Out] – A pointer to an opaque cufftReshapeHandle object.
- Return values:
CUFFT_SUCCESS – cuFFT successfully created a reshape handle.
CUFFT_ALLOC_FAILED – cuFFT failed to allocate enough host memory for the handle.
cufftMpAttachReshapeComm#
- cufftResult cufftMpAttachReshapeComm(
- cufftReshapeHandle handle,
- cufftMpCommType comm_type,
- void *comm_handle,
Warning
This function is deprecated. Use the updated cufftMpMakeReshape function which now accepts communicator parameters directly.
This function attaches a communication handle to a reshape. This function is not collective.
- Parameters:
handle[In] – A handle to a reshape operation, following
cufftMpCreateReshapecomm_type[In] – An enum describing the communication type of the handle.
comm_handle[In] – If comm_type is
CUFFT_COMM_MPI, this should be a pointer to an MPI communicator. The pointer should remain valid until destruction of the handle. Otherwise, this should be NULL.
- Return values:
CUFFT_SUCCESS – cuFFT successfully associated a communication handle to the reshape.
CUFFT_INVALID_VALUE –
comm_handleis NULL forCUFFT_COMM_MPIorcomm_handleis not NULL forCUFFT_COMM_NONE.
cufftMpMakeReshape#
- cufftResult cufftMpMakeReshape(
- cufftReshapeHandle handle,
- size_t element_size,
- int rank,
- const long long int *lower_input,
- const long long int *upper_input,
- const long long int *strides_input,
- const long long int *lower_output,
- const long long int *upper_output,
- const long long int *strides_output,
- void *comm_handle,
- cufftMpCommType comm_type,
Warning
The API signature of this function changed at version 11.4.0. The previous function signature without communicator parameters is now deprecated.
This function creates a reshape intended to re-distribute a global array of 3D data.
The data is initially distributed, on the current process, according to *box_in.
After the reshape, the data will be distributed according to *box_out.
The meaning of rank, lower_input, upper_input, strides_input, lower_output, upper_output and strides_output is identical to the cufftMpMakePlanDecomposition function.
Each element is of size element_size, in bytes. This function is collective and should be called by all process together.
All input arrays may be free immediately after this function returns.
- param handle[In]:
The reshape handle.
- param element_size[In]:
The size of the individual elements, in bytes. Allowed values are 4, 8 and 16.
- param rank[In]:
The length of the
lower_input,upper_input,lower_output,upper_output,strides_inputandstrides_outputarrays.rankshould be3.- param lower_input[In]:
An array of length
rank, respresenting the lower-corner (inclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the input descriptor.- param upper_input[In]:
An array of length
rank, respresenting the upper-corner (exclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the input descriptor.- param strides_input[In]:
An array of length
rank, respresenting the local data layout of the input descriptor in memory. All entries much be decreasing and positive.- param lower_output[In]:
An array of length
rank, respresenting the lower-corner (inclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the output descriptor.- param upper_output[In]:
An array of length
rank, respresenting the upper-corner (exclusive) of the portion of the globalnx * ny * nzarray owned by the current process in the output descriptor.- param strides_output[In]:
An array of length
rank, respresenting the local data layout of the output descriptor in memory. All entries much be decreasing and positive.- param comm_handle[In]:
A pointer to a communication handle. For
CUFFT_COMM_MPI, this should be a pointer to an MPI communicator.- param comm_type[In]:
An enum indicating the type of the communication handle.
- retval CUFFT_SUCCESS:
cuFFT successfully created the reshape operation.
- retval CUFFT_INVALID_VALUE:
The handle is invalid,
rankis not3or any of the arrays is incorrect.- retval CUFFT_ALLOC_FAILED:
cuFFT failed to allocate enough host and/or device memory for the handle.
- retval CUFFT_INTERNAL_ERROR:
cuFFT failed to initialize the underlying communication library.
Note
The new experimental multi-node implementation can be choosen by defining CUFFT_RESHAPE_USE_PACKING=1 in the environment.
This requires scratch space but provides improved performances over Infiniband.
cufftMpGetReshapeSize#
- cufftResult cufftMpGetReshapeSize(
- cufftReshapeHandle handle,
- size_t *workspace_size,
Returns the amount (in bytes) of workspace required to execute the handle. There is no guarantee that the
workspace_sizewill or will not change between versions of cuFFTMp.- Parameters:
handle[In] – A handle created using cufftMpCreateReshape.
workspace_size[Out] – The size, in bytes, of the workspace required during reshape execution
- Return values:
CUFFT_SUCCESS – cuFFT successfully returned the workspace size.
cufftMpExecReshapeAsync#
- cufftResult cufftMpExecReshapeAsync(
- cufftReshapeHandle handle,
- void *data_out,
- const void *data_in,
- void *workspace,
- cudaStream_t stream,
Executes the reshape, redistributing
data_inintodata_outusing the workspace inworkspace. This function executes in the stream stream. This function is collective and stream-ordered. The user is responsible to ensure that all GPUs involved in the communication will be able to synchronize in the stream(s), otherwise deadlocks may occur. For strided input/output data, the number of elements in NVSHMEM symmetric heap should be at least(upper_input[0] - lower_input[0]) * strides_input[0]for input buffer and(upper_output[0] - lower_output[0]) * strides_output[0]for output buffer when allocating memory for the NVSHMEM symmetric heap. If the number of elements on each GPU is not the same (asymmetric reshape), the maximum buffer size across all GPUs should be used. As this executes communication calls to write on memory buffers on remote GPUs, the user is responsible for making suredata_in,data_outandworkspaceare available on all other GPUs before kernel execution (such as placing synchronous points/barriers likenvshmemx_sync_all_on_stream(stream)before the API) to avoid race conditions. Upon return, the memory buffers on all GPUs are available.- Parameters:
handle[In] – The reshape handle.
data_out[Out] – A symmetric-heap pointer to the output data. This memory should be NVSHMEM allocated and identical on all processes.
data_in[In] – A symmetric-heap pointer to the input data. This memory should be NVSHMEM allocated and identical on all processes.
workspace[Out] – A symmetric-heap pointer to the workspace data. This memory should be NVSHMEM allocated and identical on all processes.
stream[In] – The CUDA stream in which to run the reshape operation.
- Return values:
CUFFT_SUCCESS – cuFFT successfully created the reshape operation.
CUFFT_INVALID_VALUE – cufftMpMakeReshape was not called prior to this function.
CUFFT_INTERNAL_ERROR – An error occurred during kernel execution.
cufftMpDestroyReshape#
-
cufftResult cufftMpDestroyReshape(cufftReshapeHandle handle)#
Destroys a reshape and all its associated data.
- Parameters:
handle[In] – The reshape handle to destroy.
- Return values:
CUFFT_SUCCESS – The handle was successfully destroyed.