ImageBlend#

Overview#

In image processing, blending refers to the technique of combining two or more images to create a new image. This process is often used in graphic design, photography, and computer vision to achieve various artistic effects or to enhance visual information.

Reference Implementation#

The blending operation can be mathematically expressed with the following equation:

\[\text{dst} = (1 - \alpha) \cdot \text{img1} + \alpha \cdot \text{img2}\]

Where:

  • \(\text{dst}\) is the resulting image after blending.

  • \(\text{img1}\) is the first source image.

  • \(\text{img2}\) is the second source image.

  • \(\alpha\) is the blending factor, ranging from 0 to 1.

The blending factor \(\alpha\) has been quantized to \(Q1.7\) for both the C reference and VPU implementations, ensuring bit-exact results.

More information can be found in [1].

Implementation Details#

Limitations#

The supported ranges of the image blend parameters are listed as following:

  • The source images \(\text{img1}\) and \(\text{img2}\) as well as the destination image \(\text{dst}\) must possess identical resolution and image format.

The supported image formats are shown in the following table.

Supported Image Formats#

Image format

Allowed

U8

Yes

YUYV

Yes

UYVY

Yes

VYUY

Yes

YUV8p

Yes

BGR8

Yes

RGB8

Yes

BGRA8

Yes

RGBA8

Yes

BGR8p

Yes

RGB8p

Yes

Dataflow Configuration#

Each image plane requires 3 RasterDataFlow(RDF).

  • 1 input image RDF is used to split the source image \(\text{img1}\) into 64x64 pixel tiles and transfer them from DRAM into ping-pong buffer in VMEM.

  • 1 input image RDF is used to split the source image \(\text{img2}\) into 64x64 pixel tiles and transfer them from DRAM into ping-pong buffer in VMEM.

  • 1 output image RDF is used to transfer the destination image \(\text{dst}\) in tiles from ping-pong buffer in VMEM to DRAM.

The number of planes in a single image is determined by its format. The maximum number of planes that can be accommodated in a single image is 4. Thus, when considering an image with 4 planes, there can be up to 12 RDFs in total.

Buffer Allocation#

3 VMEM buffers are needed,

  • 1 input image buffer with double buffering for each tile of the source image \(\text{img1}\).

  • 1 input image buffer with double buffering for each tile of the source image \(\text{img2}\).

  • 1 output image buffer with double buffering for each tile of the destination image \(\text{dst}\).

Kernel implementation#

The implementation of image blending uses the dvblend instruction to efficiently compute the blending operation. In this process, a scalar \(\alpha\) value is used in conjunction with pixels from both images \(\text{img1}\) and \(\text{img2}\) to perform the blending calculation.

Additionally, we utilize ping-pong buffers to overlap VPU computation time with DMA transfer time, thereby optimizing performance and minimizing latency during data movement.

Performance#

ImageSize

ImageFormat

Execution Time

Submit Latency

Total Power

1920x1080

U8

0.227ms

0.021ms

16.846W

1920x1080

RGBA8

0.891ms

0.024ms

16.763W

For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.

Reference#

  1. OpenCV Documentation: https://docs.opencv.org/4.10.0/d0/d86/tutorial_py_image_arithmetics.html#autotoc_md1200