VPI - Vision Programming Interface

1.2 Release

CUDA Interoperability

Declaration of functions for CUDA interoperability. More...

VPIStatus vpiImageCreateCUDAMemWrapper (const VPIImageData *cudaData, uint32_t flags, VPIImage *img)
 Create an image object by wrapping around an existing device (CUDA) memory block. More...
 
VPIStatus vpiArrayCreateCUDAMemWrapper (const VPIArrayData *arrayData, uint32_t flags, VPIArray *array)
 Create an array object by wrapping an existing device (CUDA) memory block. More...
 
VPIStatus vpiArraySetWrappedCUDAMem (VPIArray array, const VPIArrayData *arrayData)
 Redefines the wrapped device (CUDA) memory in an existing VPIArray wrapper. More...
 
VPIStatus vpiStreamCreateCUDAStreamWrapper (CUstream cudaStream, uint32_t flags, VPIStream *stream)
 Wraps an existing cudaStream_t into a VPI stream. More...
 
VPIStatus vpiImageSetWrappedCUDAMem (VPIImage img, const VPIImageData *hostData)
 Redefines the wrapped device (CUDA) memory in an existing VPIImage wrapper. More...
 
VPIStatus vpiEventCreateCUDAEventWrapper (CUevent cudaEvent, VPIEvent *event)
 Create an event object by wrapping around an existing CUDA CUevent object. More...
 

Detailed Description

Declaration of functions for CUDA interoperability.

The provided methods allows wrapping CUDA objects external to VPI. They can then be used efficiently in VPI compute pipelines.

Function Documentation

◆ vpiArrayCreateCUDAMemWrapper()

VPIStatus vpiArrayCreateCUDAMemWrapper ( const VPIArrayData arrayData,
uint32_t  flags,
VPIArray array 
)

#include <vpi/CUDAInterop.h>

Create an array object by wrapping an existing device (CUDA) memory block.

Stride between elements has to be at least as large as the element structure size. It also has to respect alignment requirements of the element data structure.

The returned handle must be destroyed when not being used anymore by calling vpiArrayDestroy.

The object doesn't own the wrapped memory. The user is still responsible for wrapped memory lifetime, which must be valid until the array object is destroyed.

Parameters
[in]arrayDataVPIArrayData pointing to the device (CUDA) memory block to be wrapped.
[in]flagsArray flags. Here it can be specified in what backends the array can be used by or-ing together VPIBackend flags. Set flags to 0 to enable it in all backends supported by the active VPI context.
[out]arrayPointer to memory that will receive the created array handle.
Returns
an error code on failure else VPI_SUCCESS

◆ vpiArraySetWrappedCUDAMem()

VPIStatus vpiArraySetWrappedCUDAMem ( VPIArray  array,
const VPIArrayData arrayData 
)

#include <vpi/CUDAInterop.h>

Redefines the wrapped device (CUDA) memory in an existing VPIArray wrapper.

The old wrapped memory and the new one must have same capacity, element format and must point to device-side memory. The VPIArray must have been created by vpiArrayCreateCUDAMemWrapper.

This operation is efficient and does not allocate memory. The wrapped memory will be accessible to the same backends specified during wrapper creation.

The wrapped memory must not be deallocated while it's still being wrapped.

Parameters
[in]arrayHandle to array created by vpiArrayCreateCUDAMemWrapper.
[in]arrayDataVPIArrayData pointing to the new host memory block to be wrapped.
Returns
an error code on failure else VPI_SUCCESS.

◆ vpiEventCreateCUDAEventWrapper()

VPIStatus vpiEventCreateCUDAEventWrapper ( CUevent  cudaEvent,
VPIEvent event 
)

#include <vpi/CUDAInterop.h>

Create an event object by wrapping around an existing CUDA CUevent object.

The created event can be used by vpiEventSync / vpiStreamWaitEvent to synchronize on a previously recorded CUDA event, or CUDA synchronization functions can be used to synchronize on events captured with vpiEventRecord().

Warning
This function is currently not implemented.
Parameters
[in]cudaEventCUDA event handle to be wrapped.
[out]eventPointer to memory that will receive the created event handle.
Returns
Always returns VPI_ERROR_NOT_IMPLEMENTED.

◆ vpiImageCreateCUDAMemWrapper()

VPIStatus vpiImageCreateCUDAMemWrapper ( const VPIImageData cudaData,
uint32_t  flags,
VPIImage img 
)

#include <vpi/CUDAInterop.h>

Create an image object by wrapping around an existing device (CUDA) memory block.

Only pitch-linear format is supported. The underlying image object does not own/claim the memory block.

Parameters
[in]cudaDataPointer to structure with cuda memory to be wrapped.
[in]flagsImage flags. Here it can be specified in what backends the image can be used by or-ing together VPIBackend flags. Set flags to 0 to enable it in all backends supported by the active VPI context.
[out]imgPointer to memory that will receive the created image handle.
Returns
an error code on failure else VPI_SUCCESS

◆ vpiImageSetWrappedCUDAMem()

VPIStatus vpiImageSetWrappedCUDAMem ( VPIImage  img,
const VPIImageData hostData 
)

#include <vpi/CUDAInterop.h>

Redefines the wrapped device (CUDA) memory in an existing VPIImage wrapper.

The old wrapped memory and the new one must have same dimensions, format and must point to device-side (cuda-accessible) memory.

The VPIImage must have been created by vpiImageCreateCUDAMemWrapper.

This operation is efficient and does not allocate memory. The wrapped memory will be accessible to the same backends specified during wrapper creation.

The wrapped memory must not be deallocated while it's still being wrapped.

Parameters
[in]imgHandle to image created by vpiImageCreateCUDAMemWrapper.
[in]hostDataVPIImageData pointing to the new device memory block to be wrapped.
Returns
an error code on failure else VPI_SUCCESS

◆ vpiStreamCreateCUDAStreamWrapper()

VPIStatus vpiStreamCreateCUDAStreamWrapper ( CUstream  cudaStream,
uint32_t  flags,
VPIStream stream 
)

#include <vpi/CUDAInterop.h>

Wraps an existing cudaStream_t into a VPI stream.

CUDA algorithms are submitted for execution in the wrapped cudaStream_t. This allows to insert a VPI-driven processing into an existing CUDA pipeline. Algorithms can still be submitted to other backends.

The VPIStream doesn't own the cudaStream_t. It must be valid during VPIStream lifetime.

CUDA kernels can only be submitted directly to cudaStream_t if it's guaranteed that all tasks submitted to VPIStream are finished.

Parameters
[in]cudaStreamThe CUDA stream handle to be wrapped.
[in]flagsStream flags. VPI_BACKEND_CUDA is always added, but other backends can be specified as well by or-ing together VPIBackend flags.
[out]streamPointer that will receive the newly created VPIStream.