Developing, Optimizing and Profiling a Complete PVA Program#

This section provides a comprehensive guide to programming the Vector Processing Unit (VPU).

It begins with an unsharp masking kernel example, an image enhancement algorithm, demonstrating how to configure DMA for halo in Raster DataFlows and manage memory with circular buffers. Subsequent tutorials delve into core VPU programming concepts, such as utilizing its multi-dimensional Address Generators (AGEN) and vector operations for efficient data processing, by converting a scalar convolution to vector code.

You then learn advanced optimization techniques, including Speed-of-Light (SOL) performance analysis, examining disassembled code, and applying methods like loop unrolling and software pipelining.

Finally, the section covers various profiling tools and techniques to measure application performance and identify bottlenecks, ensuring your VPU programs are both powerful and efficient.

This section includes the following tutorials: