ImageBlend#

Overview#

In image processing, blending refers to the technique of combining two or more images to create a new image. This process is often used in graphic design, photography, and computer vision to achieve various artistic effects or to enhance visual information.

Reference Implementation#

The blending operation can be mathematically expressed with the following equation:

\[\text{dst} = (1 - \alpha) \cdot \text{img1} + \alpha \cdot \text{img2}\]

Where:

\(\text{dst}\) is the resulting image after blending.
\(\text{img1}\) is the first source image.
\(\text{img2}\) is the second source image.
\(\alpha\) is the blending factor, ranging from 0 to 1.

The blending factor \(\alpha\) has been quantized to \(Q1.7\) for both the C reference and VPU implementations, ensuring bit-exact results.

More information can be found in [1].

Implementation Details#

Limitations#

The supported ranges of the image blend parameters are listed as following:

The source images \(\text{img1}\) and \(\text{img2}\) as well as the destination image \(\text{dst}\) must possess identical resolution and image format.

The supported image formats are shown in the following table.

Supported Image Formats#
Image format	Allowed
U8	Yes
YUYV	Yes
UYVY	Yes
VYUY	Yes
YUV8p	Yes
BGR8	Yes
RGB8	Yes
BGRA8	Yes
RGBA8	Yes
BGR8p	Yes
RGB8p	Yes

Dataflow Configuration#

Each image plane requires 3 RasterDataFlow(RDF).

1 input image RDF is used to split the source image \(\text{img1}\) into 64x64 pixel tiles and transfer them from DRAM into ping-pong buffer in VMEM.
1 input image RDF is used to split the source image \(\text{img2}\) into 64x64 pixel tiles and transfer them from DRAM into ping-pong buffer in VMEM.
1 output image RDF is used to transfer the destination image \(\text{dst}\) in tiles from ping-pong buffer in VMEM to DRAM.

The number of planes in a single image is determined by its format. The maximum number of planes that can be accommodated in a single image is 4. Thus, when considering an image with 4 planes, there can be up to 12 RDFs in total.

Buffer Allocation#

3 VMEM buffers are needed,

1 input image buffer with double buffering for each tile of the source image \(\text{img1}\).
1 input image buffer with double buffering for each tile of the source image \(\text{img2}\).
1 output image buffer with double buffering for each tile of the destination image \(\text{dst}\).

Kernel implementation#

The implementation of image blending uses the dvblend instruction to efficiently compute the blending operation. In this process, a scalar \(\alpha\) value is used in conjunction with pixels from both images \(\text{img1}\) and \(\text{img2}\) to perform the blending calculation.

Additionally, we utilize ping-pong buffers to overlap VPU computation time with DMA transfer time, thereby optimizing performance and minimizing latency during data movement.

Performance#

ImageSize	ImageFormat	Execution Time	Submit Latency	Total Power
1920x1080	U8	0.227ms	0.021ms	16.846W
1920x1080	RGBA8	0.891ms	0.024ms	16.763W

For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.

Reference#

OpenCV Documentation: https://docs.opencv.org/4.10.0/d0/d86/tutorial_py_image_arithmetics.html#autotoc_md1200