ConvertImageFormat#

Overview#

The ConvertImageFormat operator converts an input image with one format into an output image with another format.

Algorithm description#

The algorithm is implemented as a pixel-wise conversion function that reads the pixels from the input image, applies a conversion-dependent series of transformations and writes the results to the output image in the same position.

  • The YUV to YUV and RGBA to RGBA image format conversions are handled as channel order swizzling.

  • The equations for YUV with studio range to RGB image format conversion are listed as follows:

\[\begin{split}& R = 1.164 \times (Y - 16) + 1.596 \times (V - 128) \\ & G = 1.164 \times (Y - 16) - 0.392 \times (U - 128) - 0.813 \times (V - 128) \\ & B = 1.164 \times (Y - 16) + 2.017 \times (U - 128)\end{split}\]
  • The equations for RGB to YUV with studio range image format conversion are listed as follows:

\[\begin{split}& Y = \ \ \ 0.257 \times R + 0.504 \times G + 0.098 \times B + 16 \\ & U = -0.148 \times R - 0.291 \times G + 0.439 \times B + 128 \\ & V = \ \ \ 0.439 \times R - 0.368 \times G - 0.071 \times B + 128\end{split}\]

Implementation Details#

Parameters#

  • Input image with one image format

  • Output image with another image format

Dataflow Configuration#

The RasterDataflow (RDF) is used to split the whole input/output image into tiles and transfer them between DRAM and VMEM one by one.

  • The number of input/output RDF equals to the number of plane for each input/output image format and at most 4 planes are supported.

  • The input/output tile size will be adjusted to make sure that the number of iterations for each input/output RDF are the same.

  • Since transpose load/store operations are used in the VPU functions to implement a simple way to load/store each channel in the image format separately, transposition mode should be set to TRANS_MODE_1 for each RDF to apply the tile line pitch which can meet the requirement of transpose load/store operations used in the VPU functions.

VMEM Buffer Allocation#

8 VMEM buffers are allocated,

  • 4 buffers are allocated for 1st~4th plane of input image

  • 4 buffers are allocated for 1st~4th plane of output image

VPU function implementation#

Fixed-point coefficients and vdotp4x2_bbh()/vdotp4_bbh() instructions are used to optimize the performance of the image format conversion functions.
  • Since the computation part of image format conversion function is composed of a series of MAC operators, vdotp4x2_bbh()/vdotp4_bbh() instruction can provide 4x/2x MAC throughput comparing to dvmaddbh().

  • Since there are 3 channels involved in the image format conversion function between RGB and YUV, one vdotp4x2_bbh() instruction is used to handle 2 of them and one vdotp4_bbh() instruction is used to handle the 3rd channel.

The truncation and clamping the 8-bit unsigned integer output data in the range [0, 255] is handled by setting the rounding/truncation option and saturation option in the agen configurations which can lead to zero performance overhead for these operations.

There are 4 functions implemented for the image format conversion,

  1. For YUV channel order swizzling, convert_image_format_yuv2yuv_knl_exec() is called.

    • If input image format is NV12, convert_image_format_sampling_knl_exec() should be called first for input UV plane upsampling.

    • If output image format is NV12, convert_image_format_sampling_knl_exec() should be called last for output UV plane downsampling.

    • The usage of convert_image_format_sampling_knl_exec() as upsampling or downsampling can be controlled by the agen configurations.

  2. For RGBA channel order swizzling, convert_image_format_rgba2rgba_knl_exec() is called.

    • If output image format has alpha channel but input image format doesn’t, the alpha channel’s value will be set as the maximum representable of its type, e.g., 255 for 8-bit unsigned integer.

  3. For YUV to RGB conversion, convert_image_format_yuv2rgb_knl_exec() is called.

    • If input image format is NV12, convert_image_format_sampling_knl_exec() should be called first for input UV plane upsampling.

  4. For RGB to YUV conversion, convert_image_format_rgb2yuv_knl_exec() is called.

    • If output image format is NV12, convert_image_format_sampling_knl_exec() should be called last for output UV plane downsampling.

Performance#

ImageSize

InImageFormat

OutImageFormat

Execution Time

Submit Latency

Total Power

1920x1080

BGR8

NV12

0.491ms

0.019ms

15.458W

1920x1080

BGR8

YUYV

0.418ms

0.019ms

16.241W

1920x1080

BGR8p

NV12

0.495ms

0.022ms

14.875W

1920x1080

BGR8p

YUYV

0.451ms

0.021ms

16.042W

1920x1080

NV12

BGR8

0.457ms

0.022ms

14.676W

1920x1080

NV12

BGR8p

0.435ms

0.026ms

14.676W

1920x1080

RGB8

BGRA8

0.564ms

0.020ms

14.594W

1920x1080

YUV8p

NV12

0.302ms

0.020ms

15.88W

1920x1080

YUV8p

YUYV

0.305ms

0.021ms

15.981W

1920x1080

YUYV

BGR8

0.447ms

0.022ms

14.876W

1920x1080

YUYV

BGR8p

0.434ms

0.025ms

15.359W

1920x1080

YUYV

NV12

0.244ms

0.019ms

15.578W

For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.