ConvertImageFormat#
Overview#
The ConvertImageFormat operator converts an input image with one format into an output image with another format.
Algorithm description#
The algorithm is implemented as a pixel-wise conversion function that reads the pixels from the input image, applies a conversion-dependent series of transformations and writes the results to the output image in the same position.
The YUV to YUV and RGBA to RGBA image format conversions are handled as channel order swizzling.
The equations for YUV with studio range to RGB image format conversion are listed as follows:
The equations for RGB to YUV with studio range image format conversion are listed as follows:
Implementation Details#
Parameters#
Input image with one image format
Output image with another image format
Dataflow Configuration#
The RasterDataflow (RDF) is used to split the whole input/output image into tiles and transfer them between DRAM and VMEM one by one.
The number of input/output RDF equals to the number of plane for each input/output image format and at most 4 planes are supported.
The input/output tile size will be adjusted to make sure that the number of iterations for each input/output RDF are the same.
Since transpose load/store operations are used in the VPU functions to implement a simple way to load/store each channel in the image format separately, transposition mode should be set to TRANS_MODE_1 for each RDF to apply the tile line pitch which can meet the requirement of transpose load/store operations used in the VPU functions.
VMEM Buffer Allocation#
8 VMEM buffers are allocated,
4 buffers are allocated for 1st~4th plane of input image
4 buffers are allocated for 1st~4th plane of output image
VPU function implementation#
- Fixed-point coefficients and
vdotp4x2_bbh()
/vdotp4_bbh()
instructions are used to optimize the performance of the image format conversion functions. Since the computation part of image format conversion function is composed of a series of MAC operators,
vdotp4x2_bbh()
/vdotp4_bbh()
instruction can provide 4x/2x MAC throughput comparing todvmaddbh()
.Since there are 3 channels involved in the image format conversion function between RGB and YUV, one
vdotp4x2_bbh()
instruction is used to handle 2 of them and onevdotp4_bbh()
instruction is used to handle the 3rd channel.
The truncation and clamping the 8-bit unsigned integer output data in the range [0, 255] is handled by setting the rounding/truncation option and saturation option in the agen configurations which can lead to zero performance overhead for these operations.
There are 4 functions implemented for the image format conversion,
For YUV channel order swizzling,
convert_image_format_yuv2yuv_knl_exec()
is called.If input image format is NV12,
convert_image_format_sampling_knl_exec()
should be called first for input UV plane upsampling.If output image format is NV12,
convert_image_format_sampling_knl_exec()
should be called last for output UV plane downsampling.The usage of
convert_image_format_sampling_knl_exec()
as upsampling or downsampling can be controlled by the agen configurations.
For RGBA channel order swizzling,
convert_image_format_rgba2rgba_knl_exec()
is called.If output image format has alpha channel but input image format doesn’t, the alpha channel’s value will be set as the maximum representable of its type, e.g., 255 for 8-bit unsigned integer.
For YUV to RGB conversion,
convert_image_format_yuv2rgb_knl_exec()
is called.If input image format is NV12,
convert_image_format_sampling_knl_exec()
should be called first for input UV plane upsampling.
For RGB to YUV conversion,
convert_image_format_rgb2yuv_knl_exec()
is called.If output image format is NV12,
convert_image_format_sampling_knl_exec()
should be called last for output UV plane downsampling.
Performance#
ImageSize |
InImageFormat |
OutImageFormat |
Execution Time |
Submit Latency |
Total Power |
---|---|---|---|---|---|
1920x1080 |
BGR8 |
NV12 |
0.491ms |
0.019ms |
15.458W |
1920x1080 |
BGR8 |
YUYV |
0.418ms |
0.019ms |
16.241W |
1920x1080 |
BGR8p |
NV12 |
0.495ms |
0.022ms |
14.875W |
1920x1080 |
BGR8p |
YUYV |
0.451ms |
0.021ms |
16.042W |
1920x1080 |
NV12 |
BGR8 |
0.457ms |
0.022ms |
14.676W |
1920x1080 |
NV12 |
BGR8p |
0.435ms |
0.026ms |
14.676W |
1920x1080 |
RGB8 |
BGRA8 |
0.564ms |
0.020ms |
14.594W |
1920x1080 |
YUV8p |
NV12 |
0.302ms |
0.020ms |
15.88W |
1920x1080 |
YUV8p |
YUYV |
0.305ms |
0.021ms |
15.981W |
1920x1080 |
YUYV |
BGR8 |
0.447ms |
0.022ms |
14.876W |
1920x1080 |
YUYV |
BGR8p |
0.434ms |
0.025ms |
15.359W |
1920x1080 |
YUYV |
NV12 |
0.244ms |
0.019ms |
15.578W |
For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.