Declaration of functions for CUDA interoperability. More...
Functions | |
VPIStatus | vpiArrayCreateCudaMemWrapper (const VPIArrayData *arrayData, uint32_t flags, VPIArray *array) |
Create an array object by wrapping an existing device (CUDA) memory block. More... | |
VPIStatus | vpiArraySetWrappedCudaMem (VPIArray array, const VPIArrayData *arrayData) |
Redefines the wrapped device (CUDA) memory in an existing VPIArray wrapper. More... | |
VPIStatus | vpiImageCreateCudaMemWrapper (const VPIImageData *cudaData, uint32_t flags, VPIImage *img) |
Create an image object by wrapping around an existing device (CUDA) memory block. More... | |
VPIStatus | vpiImageSetWrappedCudaMem (VPIImage img, const VPIImageData *hostData) |
Redefines the wrapped device (CUDA) memory in an existing VPIImage wrapper. More... | |
VPIStatus | vpiStreamCreateCudaStreamWrapper (CUstream cudaStream, uint32_t flags, VPIStream *stream) |
Wraps an existing cudaStream_t into a VPI stream. More... | |
Declaration of functions for CUDA interoperability.
The provided methods allows wrapping CUDA objects external to VPI. They can then be used efficiently in VPI compute pipelines.
VPIStatus vpiArrayCreateCudaMemWrapper | ( | const VPIArrayData * | arrayData, |
uint32_t | flags, | ||
VPIArray * | array | ||
) |
#include <vpi/Array.h>
Create an array object by wrapping an existing device (CUDA) memory block.
Stride between elements has to be at least as large as the element structure size. It also has to respect alignment requirements of the element data structure.
The returned handle must be destroyed when not being used anymore by calling vpiArrayDestroy.
The object doesn't own the wrapped memory. The user is still responsible for wrapped memory lifetime, which must be valid until the array object is destroyed.
[in] | arrayData | VPIArrayData pointing to the device (CUDA) memory block to be wrapped. |
[in] | flags | Array flags. Here it can be specified in what backends the array can be used by or-ing together VPIBackend flags. Set flags to 0 to enable it in all backends supported by the active VPI context. |
[out] | array | Pointer to memory that will receive the created array handle. |
VPIStatus vpiArraySetWrappedCudaMem | ( | VPIArray | array, |
const VPIArrayData * | arrayData | ||
) |
#include <vpi/Array.h>
Redefines the wrapped device (CUDA) memory in an existing VPIArray wrapper.
The old wrapped memory and the new one must have same capacity, element type and must point to device-side memory. The VPIArray must have been created by vpiArrayCreateCudaMemWrapper.
This operation is efficient and does not allocate memory. The wrapped memory will be accessible to the same backends specified during wrapper creation.
The wrapped memory must not be deallocated while it's still being wrapped.
[in] | array | Handle to array created by vpiArrayCreateCudaMemWrapper. |
[in] | arrayData | VPIArrayData pointing to the new host memory block to be wrapped. |
VPIStatus vpiImageCreateCudaMemWrapper | ( | const VPIImageData * | cudaData, |
uint32_t | flags, | ||
VPIImage * | img | ||
) |
#include <vpi/Image.h>
Create an image object by wrapping around an existing device (CUDA) memory block.
Only pitch-linear format is supported. The underlying image object does not own/claim the memory block.
[in] | cudaData | Pointer to structure with cuda memory to be wrapped. |
[in] | flags | Image flags. Here it can be specified in what backends the image can be used by or-ing together VPIBackend flags. Set flags to 0 to enable it in all backends supported by the active VPI context. |
[out] | img | Pointer to memory that will receive the created image handle. |
VPIStatus vpiImageSetWrappedCudaMem | ( | VPIImage | img, |
const VPIImageData * | hostData | ||
) |
#include <vpi/Image.h>
Redefines the wrapped device (CUDA) memory in an existing VPIImage wrapper.
The old wrapped memory and the new one must have same dimensions, format and must point to device-side (cuda-accessible) memory.
The VPIImage must have been created by vpiImageCreateCudaMemWrapper.
This operation is efficient and does not allocate memory. The wrapped memory will be accessible to the same backends specified during wrapper creation.
The wrapped memory must not be deallocated while it's still being wrapped.
[in] | img | Handle to image created by vpiImageCreateCudaMemWrapper. |
[in] | hostData | VPIImageData pointing to the new device memory block to be wrapped. |
VPIStatus vpiStreamCreateCudaStreamWrapper | ( | CUstream | cudaStream, |
uint32_t | flags, | ||
VPIStream * | stream | ||
) |
#include <vpi/Stream.h>
Wraps an existing cudaStream_t
into a VPI stream.
CUDA algorithms are submitted for execution in the wrapped cudaStream_t
. This allows to insert a VPI-driven processing into an existing CUDA pipeline. Algorithms can still be submitted to other backends.
The VPIStream doesn't own the cudaStream_t
. It must be valid during VPIStream lifetime.
CUDA kernels can only be submitted directly to cudaStream_t if it's guaranteed that all tasks submitted to VPIStream are finished.
[in] | cudaStream | The CUDA stream handle to be wrapped. |
[in] | flags | Stream flags. VPI_BACKEND_CUDA is always added, but other backends can be specified as well by or-ing together VPIBackend flags. |
[out] | stream | Pointer that will receive the newly created VPIStream. |