ImageBlend#
Overview#
In image processing, blending refers to the technique of combining two or more images to create a new image. This process is often used in graphic design, photography, and computer vision to achieve various artistic effects or to enhance visual information.
Reference Implementation#
The blending operation can be mathematically expressed with the following equation:
Where:
\(\text{dst}\) is the resulting image after blending.
\(\text{img1}\) is the first source image.
\(\text{img2}\) is the second source image.
\(\alpha\) is the blending factor, ranging from 0 to 1.
The blending factor \(\alpha\) has been quantized to \(Q1.7\) for both the C reference and VPU implementations, ensuring bit-exact results.
More information can be found in [1].
Implementation Details#
Limitations#
The supported ranges of the image blend parameters are listed as following:
The source images \(\text{img1}\) and \(\text{img2}\) as well as the destination image \(\text{dst}\) must possess identical resolution and image format.
The supported image formats are shown in the following table.
Image format |
Allowed |
---|---|
U8 |
Yes |
YUYV |
Yes |
UYVY |
Yes |
VYUY |
Yes |
YUV8p |
Yes |
BGR8 |
Yes |
RGB8 |
Yes |
BGRA8 |
Yes |
RGBA8 |
Yes |
BGR8p |
Yes |
RGB8p |
Yes |
Dataflow Configuration#
Each image plane requires 3 RasterDataFlow(RDF).
1 input image RDF is used to split the source image \(\text{img1}\) into 64x64 pixel tiles and transfer them from DRAM into ping-pong buffer in VMEM.
1 input image RDF is used to split the source image \(\text{img2}\) into 64x64 pixel tiles and transfer them from DRAM into ping-pong buffer in VMEM.
1 output image RDF is used to transfer the destination image \(\text{dst}\) in tiles from ping-pong buffer in VMEM to DRAM.
The number of planes in a single image is determined by its format. The maximum number of planes that can be accommodated in a single image is 4. Thus, when considering an image with 4 planes, there can be up to 12 RDFs in total.
Buffer Allocation#
3 VMEM buffers are needed,
1 input image buffer with double buffering for each tile of the source image \(\text{img1}\).
1 input image buffer with double buffering for each tile of the source image \(\text{img2}\).
1 output image buffer with double buffering for each tile of the destination image \(\text{dst}\).
Kernel implementation#
The implementation of image blending uses the dvblend
instruction to efficiently compute the blending operation.
In this process, a scalar \(\alpha\) value is used in conjunction with pixels from both images \(\text{img1}\) and \(\text{img2}\) to perform the blending calculation.
Additionally, we utilize ping-pong buffers to overlap VPU computation time with DMA transfer time, thereby optimizing performance and minimizing latency during data movement.
Performance#
ImageSize |
ImageFormat |
Execution Time |
Submit Latency |
Total Power |
---|---|---|---|---|
1920x1080 |
U8 |
0.227ms |
0.021ms |
16.846W |
1920x1080 |
RGBA8 |
0.891ms |
0.024ms |
16.763W |
For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.
Reference#
OpenCV Documentation: https://docs.opencv.org/4.10.0/d0/d86/tutorial_py_image_arithmetics.html#autotoc_md1200