ImageFlip#

Overview#

ImageFlip performs coordinate transformation operations on single-channel grayscale images to achieve mirroring effects along the horizontal, vertical, or both axes. The operation preserves image content while changing pixel spatial relationships through address mapping transformations.

Three flip modes are supported:

  • Horizontal Flip: Coordinate transformation along the vertical axis, reversing column ordering

  • Vertical Flip: Coordinate transformation along the horizontal axis, reversing row ordering

  • Combined Flip: Combined coordinate transformation reversing both row and column ordering

The following example demonstrates the effect of image flipping on a grayscale image:

Input Image

Horizontal Flip

../../_images/imageflip-input.png
../../_images/imageflip-horizontal.png

Vertical Flip

Combined Flip

../../_images/imageflip-vertical.png
../../_images/imageflip-both.png

Algorithm Description#

The ImageFlip operator performs coordinate transformation operations using three different mathematical approaches:

Horizontal Flip:

(1)#\[\text{dst}(x, y) = \text{src}(W - 1 - x, y)\]

Vertical Flip:

(2)#\[\text{dst}(x, y) = \text{src}(x, H - 1 - y)\]

Combined Flip:

(3)#\[\text{dst}(x, y) = \text{src}(W - 1 - x, H - 1 - y)\]

Where:

  • \(\text{dst}(x, y)\) is the pixel value at coordinates (x, y) in the output image

  • \(\text{src}(x, y)\) is the pixel value at coordinates (x, y) in the input image

  • \(W\) is the image width

  • \(H\) is the image height

Parameters#

  • Input/Output image type: 8-bit unsigned single-channel grayscale

  • Flip Direction: Runtime selectable (horizontal, vertical, combined)

Implementation#

The implementation uses tile-based processing of the input image, where each Tile(i,j) denotes tile coordinates: i = horizontal tile number, j = vertical tile number.

RDF configurations use tile dimensions of \(256 \times 128\) pixels for optimal performance and employ double buffering to overlap computation and data transfer.

  1. Horizontal Flip:

    • Input RDF: Reads Tile(i,j)
      • Handle: srcHdl

      • Tile buffer: input_v in VMEM

      • Dimensions: (\(TW \times TH\))

      • Configuration: .scanOrder(HORIZONTAL_REVERSED) for tile column reversal

    • VPU Processing: Horizontal pixel transformation t(x,y) = t(tw-x-1,y)
      • Vector operations: 32-wide parallel processing

      • Vector Permute Instruction (vpermute): Performs efficient pixel reordering within vector registers for horizontal flipping using a reverse indexed pattern.

      • Reverse address generation for pixel column reversal

    • Output RDF: Writes Tile(i,j) to Tile(w-i-1,j)
      • Handle: dstHdl

      • Tile buffer: output_v in VMEM

      • Dimensions: (\(TW \times TH\))

  2. Vertical Flip:

    • Input RDF: Reads Tile(i,j)
      • Handle: srcHdl1

      • Tile buffer: input_v in VMEM

      • Dimensions: (\(TW \times TH\))

      • Configuration: .scanOrder(VERTICAL_REVERSED) for tile row reversal

    • VPU Processing: Vertical pixel transformation t(x,y) = t(x,th-y-1)
      • Vector operations: 32-wide parallel processing

      • Reverse address generation for pixel row reversal

    • Output RDF: Writes Tile(i,j) to Tile(i,h-j-1)
      • Handle: dstHdl1

      • Tile buffer: output_v in VMEM

      • Dimensions: (\(TW \times TH\))

  3. Combined Flip:

    • Input RDF: Reads Tile(i,j)
      • Handle: srcHdl2

      • Tile buffer: input_v in VMEM

      • Dimensions: (\(TW \times TH\))

      • Configuration: .scanOrder(HORIZONTAL_REVERSED | VERTICAL_REVERSED) for dual tile reversal

    • VPU Processing: Combined pixel transformation t(x,y) = t(tw-x-1,th-y-1)
      • Vector operations: 32-wide parallel processing

      • Vector Permute Instruction (vpermute): Performs efficient pixel reordering within vector registers for horizontal flipping using a reverse indexed pattern

      • Compound address generation for dual pixel transformation

    • Output RDF: Writes Tile(i,j) to Tile(w-i-1,h-j-1)
      • Handle: dstHdl2

      • Tile buffer: output_v in VMEM

      • Dimensions: (\(TW \times TH\))

Performance#

Execution Time is the average time required to execute the operator on a single VPU core. Note that each PVA contains two VPU cores, which can operate in parallel to process two streams simultaneously, or reduce execution time by approximately half by splitting the workload between the two cores.

Total Power represents the average total power consumed by the module when the operator is executed concurrently on both VPU cores. Idle power is approximately 7W when the PVA is not processing data.

For detailed information on interpreting the performance table below and understanding the benchmarking setup, see Performance Benchmark.

ImageSize

Format

Direction

Execution Time

Submit Latency

Total Power

256x128

U8

HORIZONTAL

0.016ms

0.016ms

12.021W

1280x720

U8

HORIZONTAL

0.077ms

0.020ms

15.85W

1920x1080

U8

HORIZONTAL

0.153ms

0.025ms

15.77W

3840x2160

U8

HORIZONTAL

0.576ms

0.028ms

16.254W

1920x1080

U8

VERTICAL

0.153ms

0.028ms

15.77W

1920x1080

U8

BOTH

0.152ms

0.027ms

16.152W

1920x1080

Y8

HORIZONTAL

0.153ms

0.024ms

16.152W

1920x1080

Y8

HORIZONTAL

0.153ms

0.025ms

16.152W

1920x1080

Y8

VERTICAL

0.153ms

0.027ms

15.77W

1920x1080

Y8

VERTICAL

0.153ms

0.027ms

15.77W

1920x1080

Y8

BOTH

0.152ms

0.027ms

15.77W

1920x1080

Y8

BOTH

0.152ms

0.028ms

15.77W