VMEM Scalars and CheckCmdStatus#

Lets tweak the “Hello World!” application we developed earlier. This time VPU prints a different message depending on the value of an index variable passed by the host. VPU application returns an error code if the message index input is out of bounds.

This tutorial demonstrates several concepts on:

How the host CPU sets the scalar values in PVA internal memory, i.e., VMEM.
How the VPU returns error codes to the host application if a problem arises when executing the CmdProgram.
How to submit multiple CmdPrograms in a batch.
How to reuse CmdProgram objects and submit them multiple times.

Lets walk through the code starting with the device code.

Device Code#

The VPU task prints a message depending on the message index provided by the host-side application.

Include required device side (PVA side) header files first.

#include <cupva_device.h>       /* Main device-side header file */
#include <cupva_device_debug.h> /* Header file containing the printf function */

VMEM is the internal Vector Memory of the PVA. VPU engines can load and store from this low latency and high bandwidth memory. VPUs cannot access the data stored in external DRAM directly. In the upcoming tutorials, we show how big chunks of data can be moved to and from the VMEM.

In this tutorial, we declare a scalar variable within the internal memory using the cuPVA “VMEM” macro. VMEM consist of multiple superbanks, namely A, B, and C. The first argument of the “VMEM” macro lets users select the superbank to use for the variable. The type of the scalar and its name should be provided with second and third arguments. The host is able to set the scalar using its name, as you see in the host code.
```
VMEM(A, int32_t, messageIndex);
```
Define the set of different messages that the VPU prints depending on the messageIndex. NUM_MESSAGES are used to check whether the requested messageIndex is within the bounds.
```
char const *g_messages[] = {"Hello World!", "Welcome to PVA!"};
#define NUM_MESSAGES (int32_t)(sizeof(g_messages) / sizeof(char *))
```

Again, the main function is very simple. The selected message is printed. If the provided messageIndex is out of bounds we return a non-zero value. A non-zero return value triggers Error::VpuApplicationError in the CmdStatus that the host application receives.

CUPVA_VPU_MAIN()
{
    if (messageIndex < NUM_MESSAGES)
    {
        printf("%s\n", g_messages[messageIndex]);
    }
    else
    {
        printf("messageIndex (%d) is out of bounds (max %d)\n", messageIndex, NUM_MESSAGES - 1);
        return 1;
    }
    return 0;
}

Host Code#

C++

In the host side “main” function, we first create an executable object similar to “Hello World!” example in Hello World Tutorial.

#include <cupva_host.hpp>           // Main host-side C++-API header file
#include <cupva_host_nonsafety.hpp> // Header file for VPU printf functionality.

#include <cupva_platform.h> // Header that includes macros for specifying PVA executables

#include <iostream>

using namespace cupva;

PVA_DECLARE_EXECUTABLE(vmem_and_batch_submit_dev)

int main()
{
    try
    {
        Executable exec = Executable::Create(PVA_EXECUTABLE_DATA(vmem_and_batch_submit_dev),
                                             PVA_EXECUTABLE_SIZE(vmem_and_batch_submit_dev));

In this example, we would like to submit three CmdPrograms to a VPU as a batch. Submitted CmdPrograms are executed in the VPU consecutively. Batch submission is a useful feature if user wants multiple passes of the same or different processing kernels without needing additional involvement from the host. Applications that need host-side intervention in between VPU tasks may choose to submit the CmdPrograms one by one instead.
```
        constexpr uint32_t programCount{3};
        CmdProgram programs[programCount];
        for (uint32_t i = 0; i < programCount; i++)
        {
            programs[i] = CmdProgram::Create(exec);
```
The same VPU executable is re-used for creating the CmdPrograms, but they are configured to use different parameter settings. The messageIndex integer variable, VMEM scalar defined in the device code, can be set using the variable name as the key. Here, the messageIndex is set to the submission order index of the CmdProgram. The VPU task can read the messageIndex value when the CmdProgram gets executed in the VPU.
```
            programs[i]["messageIndex"] = i;
        }
```

The synchronization objects and Streams are setup similar to previous Tutorials.

        SetVPUPrintBufferSize(64 * 1024);
        SyncObj sync = SyncObj::Create();
        Fence fence{sync};
        CmdRequestFences rf{fence};
        Stream stream = Stream::Create();

Three CmdPrograms and a trailing CmdRequestFences to signal their completion are submitted to the stream. This time we are also submitting a CmdStatus array along with the Cmds. The status of the device-side Cmd execution is reported using the CmdStatus objects. The size of the submitted CmdStatus array should be equal to the number Cmds so that we can get a status update for each Cmd. Here, we are submitting three CmdPrograms and one CmdRequestFences, therefore the CmdStatus array size is four.
```
        CmdStatus status[programCount + 1];
        stream.submit({&programs[0], &programs[1], &programs[2], &rf}, status);
```
When the fence expires, messages requested by the first two CmdPrograms should be printed successfully. CmdStatus code for the first two CmdPrograms is Error::None. The third CmdProgram the messageIndex was out of bounds, socuPVA reports that by setting the CmdStatus code to Error::VpuApplicationError. You can check the cupva::Error API reference for various Error codes such as VpuIllegalInstruction and VpuDivideByZero.

Note that in submissions containing multiple commands, only the failing program reports the actual reason for failure in its output status. Other programs submitted in the same batch may report cupva::Error::AbortedCmdBuffer. In cupva::OrderType::OUT_OF_ORDER mode, this does not necessarily indicate that the command itself was aborted, as it may have been partially or fully executed before the abort occurred.
```
        fence.wait();
        for (uint32_t i = 0; i < programCount; i++)
        {
            cupva::Error statusCode = CheckCommandStatus(status[i]);
            if (statusCode != cupva::Error::AbortedCmdBuffer)
            {
                std::cout << "VPU Program-" << i << " that caused the abort returned an Error Code: " << (int32_t)statusCode << std::endl;
            }
            else
            {
                std::cout << "VPU Program-" << i << " were aborted with an Error Code: " << (int32_t)statusCode << std::endl;
            }
        }
```
cuPVA also allows reconfiguring and reusing the same CmdProgram object and submitting it multiple times. Let’s set a different messageIndex value for program[0] and submit it one more time.
```
        programs[0]["messageIndex"] = 1;
        stream.submit({&programs[0], &rf}, nullptr);
        fence.wait();
```

We simply print the cupva::Exception error message and then exit with an error if an exception is caught. In a real application, exceptions should be handled more carefully.

    }
    catch (cupva::Exception const &e)
    {
        std::cout << "Caught a cuPVA exception with message: " << e.what() << std::endl;
        return 1;
    }
    return 0;
}

C

In the host-side “main” function, we first create an executable object similar to “Hello World!” example in Hello World Tutorial.

#include <cupva_host.h>           /* Main host-side C-API header file */
#include <cupva_host_nonsafety.h> /* Header file for VPU printf functionality. */
#include <cupva_platform.h>       /* Header that includes macros for specifying PVA executables */
#include <stdio.h>

#define CHECK_ERROR_GOTO(__v, __e, __l)                  \
    __e = __v;                                           \
    if (__e != CUPVA_ERROR_NONE)                         \
    {                                                    \
        printf("cuPVA C-API return error: %d\n", (__v)); \
        goto __l;                                        \
    }

PVA_DECLARE_EXECUTABLE(vmem_and_batch_submit_dev)

#define PROGRAM_COUNT 3

int main(int argc, char **argv)
{
    int32_t err                 = 0;
    int32_t createdProgramCount = 0;

    cupvaExecutable_t exec;
    CHECK_ERROR_GOTO(CupvaExecutableCreate(&exec, PVA_EXECUTABLE_DATA(vmem_and_batch_submit_dev),
                                           PVA_EXECUTABLE_SIZE(vmem_and_batch_submit_dev)),
                     err, ExecutableCreateFailed);

In this example, we would like to submit three CmdPrograms to a VPU as a batch. Submitted CmdPrograms are executed in the VPU consecutively. Batch submission is a useful feature if user wants multiple passes of the same or different processing kernels without needing additional involvement from the host. Applications that need host-side intervention in between VPU tasks may choose to submit the CmdPrograms one by one instead.
```
    cupvaCmd_t programs[PROGRAM_COUNT];

    cupvaParameter_t messageIndexParameter;
    for (; createdProgramCount < PROGRAM_COUNT; createdProgramCount++)
    {
        CHECK_ERROR_GOTO(CupvaCmdProgramCreate(&programs[createdProgramCount], exec), err, CmdProgramCreateFailed);
```
The same VPU executable is re-used for creating the CmdPrograms but they are configured to use different parameter settings. The messageIndex integer variable, VMEM scalar defined in the device code, can be set by first getting the pointer with the CupvaCmdProgramGetParameter API and then setting the value by calling the CupvaParameterSetValueScalar API. Here, the messageIndex is set to the submission order index of the CmdProgram. The VPU task can read the messageIndex value when the CmdProgram gets executed in the VPU.
```
        CHECK_ERROR_GOTO(
            CupvaCmdProgramGetParameter(&programs[createdProgramCount], &messageIndexParameter, "messageIndex"), err,
            CmdProgramCreateFailed);
        CHECK_ERROR_GOTO(CupvaParameterSetValueScalar(&messageIndexParameter, &createdProgramCount, sizeof(int32_t)),
                         err, CmdProgramCreateFailed);
    }
```

The synchronization objects and Streams are setup similar to previous tutorials.

    CHECK_ERROR_GOTO(CupvaSetVPUPrintBufferSize(64 * 1024), err, SyncObjCreateFailed);

    cupvaSyncObj_t sync;
    CHECK_ERROR_GOTO(CupvaSyncObjCreate(&sync, false, CUPVA_SIGNALER_WAITER, CUPVA_SYNC_YIELD), err,
                     SyncObjCreateFailed);

    cupvaFence_t fence;
    CHECK_ERROR_GOTO(CupvaFenceInit(&fence, sync), err, StreamCreateFailed);

    cupvaCmd_t rf;
    CHECK_ERROR_GOTO(CupvaCmdRequestFencesInit(&rf, &fence, 1), err, StreamCreateFailed);

    cupvaStream_t stream;
    CHECK_ERROR_GOTO(CupvaStreamCreate(&stream, CUPVA_PVA0, CUPVA_VPU_ANY), err, StreamCreateFailed);

Three CmdPrograms and a trailing CmdRequestFences to signal their completion are submitted to the stream. This time, we are also submitting a CmdStatus array along with the Cmds. The status of the device-side Cmd execution is reported using the CmdStatus objects. The size of the submitted CmdStatus array should be equal to the number Cmds so that we can get a status update for each Cmd. Here, we are submitting three CmdPrograms and one CmdRequestFences, therefore the CmdStatus array size is four.
```
    cupvaCmd_t const *cmds[PROGRAM_COUNT + 1]  = {&programs[0], &programs[1], &programs[2], &rf};
    cupvaCmdStatus_t status[PROGRAM_COUNT + 1] = {NULL, NULL, NULL, NULL};
    CHECK_ERROR_GOTO(CupvaStreamSubmit(stream, cmds, status, PROGRAM_COUNT + 1, CUPVA_IN_ORDER, -1, -1), err,
                     DeAllocateAllResources);
```
When the fence expires, messages requested by the first two CmdPrograms should be printed successfully. CmdStatus code for the first two CmdPrograms is CUPVA_ERROR_NONE. The third CmdProgram the messageIndex was out of bounds, so cuPVA reports that by setting the CmdStatus code to CUPVA_VPU_APPLICATION_ERROR. You can check the cupvaError_t API reference for various Error codes such as CUPVA_VPU_ILLEGAL_INSTRUCTION and CUPVA_VPU_DIVIDE_BY_ZERO.

Note that in submissions containing multiple commands, only the failing program reports the actual reason for failure in its output status. Other programs submitted in the same batch may report cupvaError_t::CUPVA_ABORTED_CMD_BUFFER. In cupvaOrderType_t::CUPVA_OUT_OF_ORDER mode, this does not necessarily indicate that the command itself was aborted, as it may have been partially or fully executed before the abort occurred.
```
    bool waitSuccess;
    CHECK_ERROR_GOTO(CupvaFenceWait(&fence, -1, &waitSuccess), err, DeAllocateAllResources);

    for (int i = 0; i < PROGRAM_COUNT; i++)
    {
        cupvaError_t statusCode = CUPVA_ABORTED_CMD_BUFFER;
        CupvaCheckCommandStatus(status[i], &statusCode);
        if (statusCode != CUPVA_ABORTED_CMD_BUFFER)
        {
            printf("VPU Program-%d that caused the abort returned an Error Code: %d\n", i, (int32_t)statusCode);
        }
        else
        {
            printf("VPU Program-%d were aborted with an Error Code: %d\n", i, (int32_t)statusCode);
        }
    }
```

cuPVA also allows reconfiguring and reusing the same CmdProgram object and submitting it multiple times. Let’s set a different messageIndex value for program[0] and submit it one more time.

    CHECK_ERROR_GOTO(CupvaCmdProgramGetParameter(&programs[0], &messageIndexParameter, "messageIndex"), err,
                     CmdProgramCreateFailed);
    int32_t messageIndex = 1;
    CHECK_ERROR_GOTO(CupvaParameterSetValueScalar(&messageIndexParameter, &messageIndex, sizeof(int32_t)), err,
                     CmdProgramCreateFailed);

    cmds[0] = &programs[0];
    cmds[1] = &rf;

    CHECK_ERROR_GOTO(CupvaStreamSubmit(stream, cmds, NULL, 2, CUPVA_IN_ORDER, -1, -1), err, DeAllocateAllResources);

    CHECK_ERROR_GOTO(CupvaFenceWait(&fence, -1, &waitSuccess), err, DeAllocateAllResources);

Make sure to clean the resources allocated with Create calls to prevent leaks.

DeAllocateAllResources: /* clean up all allocated resources */
    CupvaStreamDestroy(stream);
StreamCreateFailed: /* clean up resources allocated prior to StreamCreate */
    CupvaSyncObjDestroy(sync);
SyncObjCreateFailed:    /* clean up resources allocated prior to SyncObjCreate */
CmdProgramCreateFailed: /* clean up resources allocated prior to CmdProgramCreate */
    for (int i = 0; i < createdProgramCount; i++)
    {
        CupvaCmdDestroy(&programs[i]);
    }
    CupvaExecutableDestroy(exec);
ExecutableCreateFailed: /* clean up resources allocated prior to ExecutableCreate */
    return err;
}

Output#

C++

The print outputs of the submitted CmdPrograms should look like this:

$ ./vmem_and_batch_submit_cpp
Hello World!
Welcome to PVA!
messageIndex (2) is out of bounds (max 1)
VPU Program-2 returned an Error Code: 11
Welcome to PVA!

C

The print outputs of the submitted CmdPrograms should look like this:

$ ./vmem_and_batch_submit_c
Hello World!
Welcome to PVA!
messageIndex (2) is out of bounds (max 1)
VPU Program-2 returned an Error Code: 11
Welcome to PVA!