ConvertImageFormat#

Overview#

The ConvertImageFormat operator converts an input image with one format into an output image with another format.

Algorithm description#

The algorithm is implemented as a pixel-wise conversion function that reads the pixels from the input image, applies a conversion-dependent series of transformations and writes the results to the output image in the same position.

The YUV to YUV and RGBA to RGBA image format conversions are handled as channel order swizzling.
The equations for YUV with studio range to RGB image format conversion are listed as follows:

\[\begin{split}& R = 1.164 \times (Y - 16) + 1.596 \times (V - 128) \\ & G = 1.164 \times (Y - 16) - 0.392 \times (U - 128) - 0.813 \times (V - 128) \\ & B = 1.164 \times (Y - 16) + 2.017 \times (U - 128)\end{split}\]

The equations for RGB to YUV with studio range image format conversion are listed as follows:

\[\begin{split}& Y = \ \ \ 0.257 \times R + 0.504 \times G + 0.098 \times B + 16 \\ & U = -0.148 \times R - 0.291 \times G + 0.439 \times B + 128 \\ & V = \ \ \ 0.439 \times R - 0.368 \times G - 0.071 \times B + 128\end{split}\]

Implementation Details#

Parameters#

Input image with one image format
Output image with another image format

Dataflow Configuration#

The RasterDataflow (RDF) is used to split the whole input/output image into tiles and transfer them between DRAM and VMEM one by one.

The number of input/output RDF equals to the number of plane for each input/output image format and at most 4 planes are supported.
The input/output tile size will be adjusted to make sure that the number of iterations for each input/output RDF are the same.
Since transpose load/store operations are used in the VPU functions to implement a simple way to load/store each channel in the image format separately, transposition mode should be set to TRANS_MODE_1 for each RDF to apply the tile line pitch which can meet the requirement of transpose load/store operations used in the VPU functions.

VMEM Buffer Allocation#

8 VMEM buffers are allocated,

4 buffers are allocated for 1st~4th plane of input image
4 buffers are allocated for 1st~4th plane of output image

VPU function implementation#

Fixed-point coefficients and vdotp4x2_bbh()/vdotp4_bbh() instructions are used to optimize the performance of the image format conversion functions.

Since the computation part of image format conversion function is composed of a series of MAC operators, vdotp4x2_bbh()/vdotp4_bbh() instruction can provide 4x/2x MAC throughput comparing to dvmaddbh().
Since there are 3 channels involved in the image format conversion function between RGB and YUV, one vdotp4x2_bbh() instruction is used to handle 2 of them and one vdotp4_bbh() instruction is used to handle the 3rd channel.

The truncation and clamping the 8-bit unsigned integer output data in the range [0, 255] is handled by setting the rounding/truncation option and saturation option in the agen configurations which can lead to zero performance overhead for these operations.

There are 4 functions implemented for the image format conversion,

For YUV channel order swizzling, convert_image_format_yuv2yuv_knl_exec() is called.
- If input image format is NV12, convert_image_format_sampling_knl_exec() should be called first for input UV plane upsampling.
- If output image format is NV12, convert_image_format_sampling_knl_exec() should be called last for output UV plane downsampling.
- The usage of convert_image_format_sampling_knl_exec() as upsampling or downsampling can be controlled by the agen configurations.
For RGBA channel order swizzling, convert_image_format_rgba2rgba_knl_exec() is called.
- If output image format has alpha channel but input image format doesn’t, the alpha channel’s value will be set as the maximum representable of its type, e.g., 255 for 8-bit unsigned integer.
For YUV to RGB conversion, convert_image_format_yuv2rgb_knl_exec() is called.
- If input image format is NV12, convert_image_format_sampling_knl_exec() should be called first for input UV plane upsampling.
For RGB to YUV conversion, convert_image_format_rgb2yuv_knl_exec() is called.
- If output image format is NV12, convert_image_format_sampling_knl_exec() should be called last for output UV plane downsampling.

Performance#

ImageSize	InImageFormat	OutImageFormat	Execution Time	Submit Latency	Total Power
1920x1080	BGR8	NV12	0.491ms	0.019ms	15.458W
1920x1080	BGR8	YUYV	0.418ms	0.019ms	16.241W
1920x1080	BGR8p	NV12	0.495ms	0.022ms	14.875W
1920x1080	BGR8p	YUYV	0.451ms	0.021ms	16.042W
1920x1080	NV12	BGR8	0.457ms	0.022ms	14.676W
1920x1080	NV12	BGR8p	0.435ms	0.026ms	14.676W
1920x1080	RGB8	BGRA8	0.564ms	0.020ms	14.594W
1920x1080	YUV8p	NV12	0.302ms	0.020ms	15.88W
1920x1080	YUV8p	YUYV	0.305ms	0.021ms	15.981W
1920x1080	YUYV	BGR8	0.447ms	0.022ms	14.876W
1920x1080	YUYV	BGR8p	0.434ms	0.025ms	15.359W
1920x1080	YUYV	NV12	0.244ms	0.019ms	15.578W

For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.