SyncObject, Fence, and Stream#

In Hello World Tutorial, the mechanisms for submitting tasks to the PVA and how to synchronize were hidden in the RunPVAProgram function.

This tutorial expands the RunPVAProgram function and introduce concepts for:

How to submit a task wrapped in a CmdProgram to a VPU through a Stream.
How the VPU signals the completion of a task to the host using SyncObjects and Fences.

These APIs enable us to use PVA in complex processing pipelines.

This tutorial walks you through the C++ and C API versions of the RunPVAProgram function:

Host Code#

C++

RunPVAProgram function submits a CmdProgram to PVA and waits for its completion.
```
int RunPVAProgram(cupva::CmdProgram &program)
{
```
We first create a SyncObj that stores data needed for synchronization of PVA with the CPU. A SyncObj can be thought of as a monotonically increasing integer which increases each time it is signaled.
```
    cupva::SyncObj sync = cupva::SyncObj::Create();
```
Fences are used to synchronize between PVA programs. In this example, we use it to signal completion of VPU task to the host application. A Fence represents a future state of a SyncObj. Waiters on the Fence are released when the SyncObj reaches or exceeds the state represented by the Fence. When Fences are first created, they are associated with no future SyncObj state. Waiting on such Fences is therefore an error. We see how to populate Fences with a future SyncObj state in the next step.
```
    cupva::Fence fence{sync};
```
We wrap the Fence in a Cmd using the CmdRequestFences API so that we can submit it to PVA along with the CmdProgram. When CmdRequestFences is submitted, it populates its Fence objects with a future state for their associated SyncObjs. The Fences can now be used to wait for this state. When CmdRequestFences is executed on the PVA, it writes the future states to the SyncObjs. All associated Fences are now signaled. Expiration of a fence therefore means all the tasks submitted to a PVA stream before the CmdRequestFences are completed.
```
    cupva::CmdRequestFences rf{fence};
```
We now have a CmdProgram describing the PVA task and a Fence Cmd to signal completion of the task. It is now time to submit these Cmds to the PVA using a Stream object. A Stream can be seen as a sequence of Cmds which execute on PVA. cupva::Stream::Create() API also allows setting the destination for the Cmds, which is elaborated in the upcoming tutorials.
```
    cupva::Stream stream = cupva::Stream::Create();
```
PVA engine starts processing tasks when it receives the Cmds submitted to the Stream. cupva::Stream::submit() call is non-blocking. It returns as soon as it can, usually before its Cmds have even started executing on the PVA.
```
    stream.submit({&program, &rf}, nullptr);
```
Now that we submitted the tasks to PVA, it is time to wait until the fence is triggered. cupva::Fence::wait() blocks until the future state of the fence is reached. An optional timeout value (in microseconds) can be provided to put an upper limit on how long to wait for the fence.
```
    fence.wait();
    return 0;
}
```

C

RunPVAProgram function submits a CmdProgram to PVA and waits for its completion.
```
int RunPVAProgram(cupvaCmd_t *program)
{
    int32_t err = 0;
```
We first create a SyncObj that stores data needed for synchronization of PVA with the CPU. A SyncObj can be thought of as a monotonically increasing integer which increases each time it is signaled. We call the CupvaSyncObjCreate() API using common default options which is discussed in more detail in future tutorials.
```
    cupvaSyncObj_t sync;
    CHECK_ERROR_GOTO(CupvaSyncObjCreate(&sync, false, CUPVA_SIGNALER_WAITER, CUPVA_SYNC_YIELD), err,
                     SyncObjCreateFailed);
```
Fences are used to synchronize between PVA programs. In this example, we use it to signal completion of VPU task to the host application. A Fence represents a future state of a SyncObj. Waiters on the Fence are released when the SyncObj reaches or exceeds the state represented by the Fence. When Fences are first created, they are associated with no future SyncObj state. Waiting on such Fences is therefore an error. We see how to populate Fences with a future SyncObj state in the next step.
```
    cupvaFence_t fence;
    CHECK_ERROR_GOTO(CupvaFenceInit(&fence, sync), err, StreamCreateFailed);
```
We wrap the Fence in a Cmd using the CmdRequestFences API so that we can submit it to PVA along with the CmdProgram. When CmdRequestFences is submitted, it populates its Fence objects with a future state for their associated SyncObjs. The Fences can now be used to wait for this state. When CmdRequestFences is executed on the PVA, it writes the future states to the SyncObjs. All associated Fences are now signaled. Expiration of a fence therefore means all the tasks submitted to a PVA stream before the CmdRequestFences are completed.
```
    cupvaCmd_t rf;
    CHECK_ERROR_GOTO(CupvaCmdRequestFencesInit(&rf, &fence, 1), err, StreamCreateFailed);
```
We now have a CmdProgram describing the PVA task and a Fence Cmd to signal completion of the task. It is now time to submit these Cmds to the PVA using a Stream object. A Stream can be seen as a sequence of Cmds which execute on PVA. CupvaStreamCreate() API allows setting the destination for the Cmds, which is elaborated in the upcoming tutorials.
```
    cupvaStream_t stream;
    CHECK_ERROR_GOTO(CupvaStreamCreate(&stream, CUPVA_PVA0, CUPVA_VPU_ANY), err, StreamCreateFailed);
```
PVA engine starts processing tasks when it receives the Cmds submitted to the Stream. CupvaStreamSubmit() call is non-blocking. It returns as soon as it can, usually before its Cmds have even started executing on the PVA. We are using common default options which is discussed in more detail in future tutorials.
```
    cupvaCmd_t const *cmds[2]  = {program, &rf};
    cupvaCmdStatus_t status[2] = {NULL, NULL};
    CHECK_ERROR_GOTO(CupvaStreamSubmit(stream, cmds, status, 2, CUPVA_IN_ORDER, -1, -1), err, DeAllocateAllResources);
```
Now that we submitted the tasks to PVA, it is time to wait until the fence is triggered. CupvaFenceWait() blocks until the future state of the fence is reached. An optional timeout value (in microseconds) can be provided to put an upper limit on how long to wait for the fence.
```
    CHECK_ERROR_GOTO(CupvaFenceWait(&fence, -1, NULL), err, DeAllocateAllResources);
```

Make sure to clean the resources allocated with Create calls to prevent leaks. Jump labels are created to handle API call failures occur at different stages of the execution and deallocate all resources allocated prior to a specific failure.

DeAllocateAllResources: /* clean up all allocated resources */
    CupvaStreamDestroy(stream);
StreamCreateFailed: /* clean up resources allocated prior to StreamCreate */
    CupvaSyncObjDestroy(sync);
SyncObjCreateFailed: /* clean up resources allocated prior to SyncObjCreate */
    return err;
}