Sample Application: Erode Operation#

The following example demonstrates the basic usage of PVA Operators to perform an erode operation. This step-by-step tutorial will guide you through the necessary steps to create and submit the operator for execution on the PVA. Note that this example is a simplified version of the actual implementation in the repository. For clarity, error handling code has been omitted.

Step 1: Create an Allocator Handle

The setup phase of the application involves creating PVA-accessible input and output image buffers. PVA Operators utilize NVCV Tensor data structures to store the image attributes and data. First, you need to construct an allocator instance. In the next steps, the allocator handle will be passed to Tensor Construction APIs to allocate tensors with PVA-accessible memory.

NVCVAllocatorHandle allocatorHandle = NULL;
nvcvAllocatorConstructPva(&allocatorHandle);

Step 2: Calculate Tensor Requirements

Next, calculate the requirements for the input and output tensors. The NVCVTensorRequirements structure holds all the information about the tensor, including the shape, stride, data type, and layout. The nvcvTensorCalcRequirementsPva API initializes this structure, which will be passed to the tensor construction APIs and operator creation APIs.

NVCVTensorLayout tensorLayout;
nvcvTensorLayoutMake("HWC", &tensorLayout);

int64_t tensorShape[] = {imgHeight, imgWidth, channelCount};
NVCVTensorRequirements tensorRequirements;
nvcvTensorCalcRequirementsPva(tensorRank, tensorShape, NVCV_DATA_TYPE_U8, tensorLayout,
                                                             0, 0, &tensorRequirements);

Step 3: Construct Input/Output Tensors

With the tensor requirements calculated, you can now construct the input (inTensorHandle) and output (outTensorHandle) tensors. The allocator handle created in Step 1 is passed to the tensor construction APIs to allocate the tensors with PVA-accessible memory. Please note that if the tensors were constructed with the default NVCV allocator, the tensor data would not be accessible to the PVA.

NVCVTensorHandle inTensorHandle;
NVCVTensorHandle outTensorHandle;
nvcvTensorConstruct(&tensorRequirements, allocatorHandle, &inTensorHandle);
nvcvTensorConstruct(&tensorRequirements, allocatorHandle, &outTensorHandle);

Step 4: Create the Morphology Operator

Create the morphology operator, which will perform the erode operation. This involves setting up the morphology mask parameters and initializing the operator handle with the pvaMorphologyCreate API. Specify the tensor requirements, border type, and border value for the operator creation. The Create API initializes the CUPVA Executables and CmdPrograms required to schedule the operator task on the PVA. It also sets up CUPVA DataFlows that will be used to transfer image data in and out of the internal VPU memory (VMEM) using the DMA engine.

PvaMorphologyMaskParams maskParams;
maskParams.maskWidth  = knlWidth;
maskParams.maskHeight = knlHeight;
maskParams.maskShape  = RECTANGLE_MASK;
NVCVOperatorHandle operatorHandle;
pvaMorphologyCreate(&operatorHandle, &tensorRequirements, PVA_ERODE, &maskParams,
                                               NVCV_BORDER_CONSTANT, borderValue);

Step 5: Submit the Operator to a CUPVA Stream

Create a CUPVA Stream to manage the scheduling of the operator task on the PVA. The operator can be submitted to the CUPVA Stream using the pvaMorphologySubmit API. Ensure that the tensor parameters are consistent with the tensor requirements specified during the operator creation. The operator submit API calls are non-blocking and return immediately after the operator is submitted to the stream. You can submit the same operator instance to multiple streams. This is especially useful for processing multiple video streams concurrently using both VPU cores.

cupvaStream_t stream;
CupvaStreamCreate(&stream, CUPVA_PVA0, CUPVA_VPU0);
pvaMorphologySubmit(operatorHandle, stream, inTensorHandle, outTensorHandle);

Step 6: Wait for the Operator to Complete

CUPVA synchronization APIs (Sync Objects, Fences, and RequestFence Commands) are used to manage synchronization between the host and the PVA. A fence is created and submitted to the stream using a RequestFence command to signal when the operator has completed its execution. After submission, the host waits for the fence to be signaled before reading the output tensor data. You can submit multiple operators to a stream and wait for all of them to complete using a single fence.

cupvaSyncObj_t sync;
CupvaSyncObjCreate(&sync, false, CUPVA_SIGNALER_WAITER, CUPVA_SYNC_YIELD);
cupvaFence_t fence;
CupvaFenceInit(&fence, sync);
cupvaCmd_t requestfFences;
CupvaCmdRequestFencesInit(&requestfFences, &fence, 1);
cupvaCmd_t const *cmds[1]  = {&requestfFences};
CupvaStreamSubmit(stream, cmds, NULL, 1, CUPVA_IN_ORDER, -1, -1);
CupvaFenceWait(&fence, -1, NULL);

Step 7: Free Resources

The destruction phase of the application involves freeing the resources used by the operator and tensors. Ensure that all CUPVA streams and SyncObjects are properly destroyed in this phase to prevent resource leaks and ensure clean application termination.

CupvaStreamDestroy(stream);
CupvaSyncObjDestroy(sync);
nvcvTensorDecRef(inTensorHandle, NULL);
nvcvTensorDecRef(outTensorHandle, NULL);
nvcvOperatorDestroy(operatorHandle);
nvcvAllocatorDecRef(allocatorHandle, NULL);