Conv2d#

Overview#

Conv2d primitive can perform general 2D convolution operations with a random valued filter.

Requirements#

Conv2d primitive has the following requirements:

  1. The source image height must be an even number.

  2. The kernel (filter) size must be 3x3 or 5x5.

  3. Users should manually reformat kernel coefficients using the reformat function.

  4. To reach the SOL perf, odd and even rows of the source image must be stored on two superbanks.

  5. If the source image has padding, users should manually set buffer starting address.

  6. If the border mode is replicate, users should manually copy padding between odd and even buffers.

Due to the above assumptions, here is the recommended buffer allocation:

  • src_even_v : input circular buffer with even rows of the input image.

  • src_odd_v : input circular buffer with odd rows of the input image.

  • knl_v : kernel buffer for the reformatted convolution kernel coefficients.

  • dst_v : output buffer with double buffering for the convolution result.

Implementation Details#

Conv_U8#

We use vfilt4x2x2 instruction for u8/s8 convolution.

void vfilt4x2x2_bbh(dvcharx src1a, dvcharx src1b, dvcharx src2, dvshortx src3_0, dvshortx src3_1, int pred, dvshortx & dst_0, dvshortx & dst_1);

Take conv3x3 u8 as an example, there are two steps to finish the convolution: reformat and compute.

Reformat#

Use the kernel coefficients to construct a 16x2 array, where the left and right part are two 4x4 kernels.

This is actually reformatting the 3x3 kernel into two 4x4 kernels, with padding 0s arounding them.

../../_images/conv2d-3x3-reformat.png

Figure 1: reformat 3x3 kernel into two 4x4 kernels.#

Then load coefficients with permutations from the 16x2 array: set coef_01 to the first row, coef_23 to the second row.

Compute#

First use vfilt4x2x2 to convolve row 0 and row 1 with coef_01, save results in two accumulators.

vfilt4x2x2_bbh(row0, row1, coef01, vacc0, vacc1, 0, vacc0, vacc1);
../../_images/conv2d-3x3u8-01.png

Figure 2: convolve row 0 and row 1 with coef_01.#

Then use vfilt4x2x2 to convolve row 2 and row 3 with coef_23, with the same accumulators.

vfilt4x2x2_bbh(row2, row3, coef23, vacc0, vacc1, -1, vacc0, vacc1);
../../_images/conv2d-3x3u8-23.png

Figure 3: convolve row 2 and row 3 with coef_23.#

Putting them together, it’s actually a 4x4 kernel convolution:

../../_images/conv2d-3x3u8-all.png

Figure 4: 4x4 kernel convolution.#

Write result from accumulators to VMEM, then move down to the next two rows.

Conv_U16#

We use vfilt4x2 instruction for u16/s16 convolution.

void vfilt4x2_hhw(vshortx src1a, vshortx src1b, dvshortx src2, dvintx src3_0, dvintx src3_1, int pred, dvintx & dst_0, dvintx & dst_1);

Take conv5x5 u16 as an example. The kernel implementation is similar to conv_u8.

First use the same reformat function to reformat 5x5 kernel into two 8x6 kernels.

../../_images/conv2d-5x5-reformat.png

Figure 5: reformat 5x5 kernel into two 8x6 kernels.#

Load coefficients with permutations, use vfilt4x2 to convolve first 4 pixels in row 0 with coef_00:

vfilt4x2_hhw(row0, row0_ofst4, vcoef00, vacc0, vacc1, 0, vacc0, vacc1);
../../_images/conv2d-5x5u16-00.png

Figure 6: convolve first 4 pixels in row 0 with coef_00.#

Then continue to convolve next 4 pixels in row 0 with coef_01:

vfilt4x2_hhw(row0_ofst4, row0, vcoef01, vacc0, vacc1, -1, vacc0, vacc1);
../../_images/conv2d-5x5u16-01.png

Figure 7: convolve next 4 pixels in row 0 with coef_01.#

Note that for this step, actually the correct code should be:

vfilt4x2_hhw(row0_ofst4, row0_ofst8, vcoef01, vacc0, vacc1, -1, vacc0, vacc1);

Since the last 3 elements in each row of the 8x6 kernel are all zeros, it doesn’t matter what we put in src1b.

Continue to convolve for row 1 to row 5; putting them together, it’s actually a 8x6 kernel convolution:

../../_images/conv2d-5x5u16-all.png

Figure 8: 8x6 kernel convolution.#

Copy Padding#

Users should manually copy padding between odd and even buffers when the following coditions are true:

  1. The source image has padding, and its border mode is set to replicate (BPE).

  2. Two RDFs are used to transfer odd rows and even rows of the source image separately.

Take conv3x3 as an example, suppose the image height is 64, the border mode is BPE, and we use two RDFs to transfer data. Because the border mode is BPE, the top padding of even buffer is 0, while the top padding of odd buffer is 1. The blue part represents the data of source image. The green part represents the correct padding (same as source image) that we want.

Conv3x3 kernel will first read the top padding from odd buffer, then read row 0 data (the first blue row) from even buffer. However the top padding of odd buffer is not correct. Thus, users should copy top padding from even buffer to odd buffer (shown as the arrow). Users also should copy bottom padding due to the same reason.

../../_images/conv2d-copy-padding.png

Figure 9: copy padding between odd and even buffers.#

Compatibility#

Requires PVA SDK 2.6.0 and later.