Hello World! - Host, Device, Executable and CmdProgram#
The first tutorial is a simple “Hello World!” application. This example show how the PVA code is structured and how it is built.
In this first tutorial, we learn:
How to compile device and host side code to create a PVA application.
How to create an Executable object to hold the device side VPU binary.
How to wrap the Executable object in a CmdProgram that can be submitted to the PVA for execution.
Let’s walk through the code starting with the device code and steps to build it.
Device Code#
This is the simplest device-side code you can get. The VPU puts the “Hello World!” message to the printf buffer and returns.
Include required device-side (PVA side) header files.
#include <cupva_device.h> /* Main device-side header file */ #include <cupva_device_debug.h> /* Header file containing the printf function*/
CUPVA_VPU_MAIN
defines the name of the entrypoint function on the device. In other words, it is our main function.CUPVA_VPU_MAIN() {
printf
is a useful feature for debugging the code running on the PVA side. Printed strings are directed to the standard output of the host side application. In this first example, we just print the “Hello World!” message and exit the VPU program.printf("Hello World!\n"); return 0; }
Building the Device Code#
Device-side code can be built easily using the CMake commands installed with the PVA SDK. It is recommended that you first follow the instructions in installation and PVA SDK samples to ensure that the PVA SDK is installed and configured correctly.
CMake File#
The CMake files for the host and device sides are separate in this first tutorial, but it is perfectly fine to merge them. Lets start with the device side CMake file:
We first call the
find_package(pva-sdk)
command to detect the installed PVA SDK package before setting the name of the project.cmake_minimum_required(VERSION ${CMAKE_MINIMUM_REQUIRED_VERSION}) find_package(pva-sdk REQUIRED) project(hello_world)
The
pva_device
command is used to specify target for the device-side code. The first argument is the name of the target, subsequent arguments are the list of files to build. As you will see shortly, we embed this target to the final executable binary while building the host-side application.pva_device(hello_world_dev hello_world_top.c)
To submit device code to a VPU, the cuPVA host API is used. The cuPVA host API is exposed in functionally identical C and C++ variants. You can use the below radio buttons to switch between C++ API and C API versions of the Host code and corresponding CMake files.
Host Code#
First include the required host-side header files.
#include <cupva_host.hpp> // Main host-side C++-API header file #include <cupva_host_nonsafety.hpp> // Header file for VPU printf functionality. #include <cupva_platform.h> // Header that includes macros for specifying PVA executables #include <iostream>
PVA_DECLARE_EXECUTABLE
macro is used to declare the PVA executable that are created when the device code is built. The input argument to the macro should match the the device target specified with thepva_device
command call in the CMake file.PVA_DECLARE_EXECUTABLE(hello_world_dev)
In this first example, the host process submits a simple printf task to the PVA engine and waits for it to complete. The cuPVA C++ API sends errors that may occur during the initialization or run-time through C++ exceptions. It is a good practice to call cuPVA APIs within a try/catch block and handle the exceptions.
int main() { try {
The binary code that runs on the device (VPU) side is stored in an
Executable
object. cuPVA defines macros for getting the data pointer and size of the VPU binaries created by the build system. The pointer and size of device Executable declared above are passed as arguments to create the Executable object.cupva::Executable exec = cupva::Executable::Create(PVA_EXECUTABLE_DATA(hello_world_dev), PVA_EXECUTABLE_SIZE(hello_world_dev));
CmdProgram
is the basic unit of work for submission to the PVA engine. cuPVA packs all required code and parameters needed to execute a PVA task in a CmdProgram. To execute a complex algorithm user may create multiple CmdPrograms that contain kernels corresponding to different stages of the task. Subsequent tutorials contain more examples to demonstrate ways to run and synchronize multiple CmdPrograms. TheCmdProgram::Create
call takes the Executable object as the input and returns the created CmdProgram.cupva::CmdProgram prog = cupva::CmdProgram::Create(exec);
The output of printf functions executed on the VPU are directed to standard output (stdout) of the host-side application. This direction is buffered, so we need to set the buffer size that holds the printed message strings. In this example we set the
VPUPrintBufferSize
to 64 kilobytes which is quite enough for most debugging tasks.cupva::SetVPUPrintBufferSize(64 * 1024);
It is now time to run the PVA task and wait for its completion. cuPVA provides APIs for the fine grained communication and synchronization of host CPU and PVA. For keeping this first tutorial simple, we packed the synchronization API calls in the
RunPVAProgram
function. We expand the contents ofRunPVAProgram
function in the next tutorial. TheRunPVAProgram
call blocks the host side thread until PVA task is completed.int RunPVAProgram(cupva::CmdProgram &program) {
We simply print the
cupva::Exception
error message and then exit with an error if an exception is caught.catch (cupva::Exception const &e) { std::cout << "Caught a cuPVA exception with message: " << e.what() << std::endl; return 1; } return 0; }
First, include the required host-side header files.
#include <cupva_host.h> /* Main host-side C-API header file */ #include <cupva_host_nonsafety.h> /* Header file for VPU printf functionality. */ #include <cupva_platform.h> /* Header that includes macros for specifying PVA executables */ #include <stdio.h>
The
CHECK_ERROR_GOTO
macro defined below checks the error code returned by a cuPVA C-API call and sets value of provided input error code argument. It jumps to a label for resource deallocation if the call results in an error.#define CHECK_ERROR_GOTO(__v, __e, __l) \ __e = __v; \ if (__e != CUPVA_ERROR_NONE) \ { \ printf("cuPVA C-API return error: %d\n", (__v)); \ goto __l; \ }
The
PVA_DECLARE_EXECUTABLE
macro is used to declare the PVA executable that is created when the device code is built. The input argument to the macro should match the the device target specified with thepva_device
call in the CMake file.PVA_DECLARE_EXECUTABLE(hello_world_dev)
In this first example, the host process submits a simple printf task to the PVA engine and waits for it to complete.
int main(int argc, char **argv) { int32_t err = 0;
The binary code that runs on the device (VPU) side is stored in an Executable object. cuPVA defines macros for getting the data pointer and size of the VPU binaries created by the build system. Pointer and size of device Executable declared above are passed as arguments to create the Executable object.
cupvaExecutable_t exec; CHECK_ERROR_GOTO(CupvaExecutableCreate(&exec, PVA_EXECUTABLE_DATA(hello_world_dev), PVA_EXECUTABLE_SIZE(hello_world_dev)), err, ExecutableCreateFailed);
CmdProgram
is the basic unit of work for submission to the PVA engine. cuPVA packs all required code and parameters needed to execute a PVA task in a CmdProgram. To execute a complex algorithm user may create multiple CmdPrograms that contain kernels corresponding to different stages of the task. Subsequent tutorials contain more examples to demonstrate ways to run and synchronize multiple CmdPrograms.CupvaCmdProgramCreate()
call takes the Executable as the input and returns the pointer to the createdcupvaCmd_t
structure.cupvaCmd_t prog; CHECK_ERROR_GOTO(CupvaCmdProgramCreate(&prog, exec), err, CmdProgramCreateFailed);
Output of printf functions executed on the VPU are directed to standard output (stdout) of the host-side application. This direction is buffered, so we need to set the buffer size that holds the printed message strings. In this example we set the
VPUPrintBufferSize
to 64 kilobytes which is quite enough for most debugging tasks.CHECK_ERROR_GOTO(CupvaSetVPUPrintBufferSize(64 * 1024), err, DeAllocateAllResources);
It is now time to run the PVA task and wait for its completion. cuPVA provides APIs for the fine grained communication and synchronization of host CPU and PVA. For keeping this first tutorial simple, we packed the synchronization API calls in the
RunPVAProgram
function. We expand the contents ofRunPVAProgram
function in the next tutorial. TheRunPVAProgram
call blocks the host-side thread until PVA task is completed.int RunPVAProgram(cupvaCmd_t *program) { int32_t err = 0;
Make sure to clean the resources allocated with Create calls to prevent leaks. Jump labels are created to handle API call failures occur at different stages of the execution and deallocate all resources allocated prior to a specific failure.
DeAllocateAllResources: /* clean up all allocated resources */ CupvaCmdDestroy(&prog); CmdProgramCreateFailed: /* clean up resources allocated prior to CmdProgramCreate */ CupvaExecutableDestroy(exec); ExecutableCreateFailed: /* clean up resources allocated prior to ExecutableCreate */ return err; }
Building the Host Code#
The PVA SDK includes CMake scripts to enable building an application for target device/OS.
PVA SDK build scripts can handle cross compilation of host code using the argument -DPVA_BUILD_MODE
.
PVA_BUILD_MODE can be set to QNX
or L4T
(Linux for Tegra).
To build a PVA application for Linux, you can do:
$ cmake -DPVA_BUILD_MODE=L4T ..
$ make
CMake File#
The host-side CMake file for the first tutorial includes the following steps:
We first call the
find_package(pva-sdk)
command to detect the installed PVA SDK package before setting the name of the project.cmake_minimum_required(VERSION ${CMAKE_MINIMUM_REQUIRED_VERSION}) find_package(pva-sdk REQUIRED) project(hello_world)
Let’s create an executable by building the host side application code.
add_executable_pva
is a custom build command provided by the PVA SDK. Host source files listed under theHOST
keyword are added to the target. Device targets are embedded in the host target as data blobs.add_executable_pva
command also handles linking against the cuPVA run-time libraries.add_executable_pva(hello_world_cpp HOST hello_world.cpp DEVICE hello_world_dev)
Specify
STATIC_HOST
option inadd_executable_pva
command to handle linking against the cuPVA run-time static libraries.add_executable_pva(hello_world_cpp_static STATIC_HOST HOST ../host_cpp_api/hello_world.cpp DEVICE hello_world_dev)
The host-side CMake file for the first tutorial includes the following steps:
We first call the
find_package(pva-sdk)
command to detect the installed PVA SDK package before setting the name of the project.cmake_minimum_required(VERSION ${CMAKE_MINIMUM_REQUIRED_VERSION}) find_package(pva-sdk REQUIRED) project(hello_world)
Let’s create an executable by building the host side application code.
add_executable_pva
is a custom build command provided by cuPVA. Host source files listed under theHOST
keyword are added to the target. Device targets are embedded in the host target as data blobs.add_executable_pva
command also handles linking against the cuPVA run-time libraries.add_executable_pva(hello_world_c HOST hello_world.c DEVICE hello_world_dev)
Output#
Running the application on the target should print the “Hello World!” message.
$ ./hello_world_cpp
Hello World!
Running the application on the target should print the “Hello World!” message.
$ ./hello_world_c
Hello World!