BoxFilter#

Overview#

The BoxFilter(also known as box linear filter or box blur) is a low pass filter that when applied to an image, it replaces the value of each pixel with the average value of the pixels in a surrounding neighborhood defined by a kernel. The current implementation supports 3x3, 5x5 and 7x7 kernels.

Input Image

Output Image

../../_images/in1.png
../../_images/out1.png

Box Filter Parameters: kernel size = 5×5

Algorithm Description#

Box filter is implemented as a convolution operation on the input image where the kernel has the following weights:

\[w_{float} = \frac{1}{n^2}\]

where \(n\) is the width and height of the kernel (3, 5, or 7).

Implementation Details#

Parameters#

  • KernelSize (int) : the size of the kernel used for filtering. The current implementation only supports 3x3, 5x5 and 7x7 kernels.

  • BorderMode (NVCVBorderType) : the border mode used for padding the source image, can be constant or replicate.

  • BorderValue (int) : the value used for constant border padding.

Kernel Quantization#

The box filter kernel is originally generated in floating point format. The kernel coefficients are then quantized to integers using the following formula:

\[w_{fixed} = \lfloor w_{float} \cdot 2^{qbits} \rfloor\]

where \(w_{fixed}\) is the quantized kernel coefficient, \(w_{float}\) is the original floating point kernel coefficient, and \(qbits\) is the number of bits used for quantization. \(qbits\) is set to 8 for uint8/int8 types and 16 for uint16/int16 types.

To ensure the sum of the quantized kernel coefficients equals to \(2^{qbits}\), we use floor operation in quantization and distribute the difference across kernel elements. The adjustment process can be described as follows:

Let \(\Delta = 2^{qbits} - \sum_{i=0}^{n^2-1} w_{fixed}\) be the difference between the target sum and the current sum. Since floor operation always reduces values, we have \(0 \leq \Delta \leq n^2\).

The adjustment is distributed across kernel elements using a step-based algorithm that increments elements at indices \(0, \text{step}, 2\text{step}, \ldots\) until exactly \(\Delta\) adjustments are made, where:

\[\text{step} = \max(1, \lfloor n^2/\Delta \rfloor)\]

The algorithm ensures exactly \(\Delta\) elements are incremented by using a counter that stops when the target number of adjustments is reached, distributing the adjustments as evenly as possible across the kernel.

Dataflow Configuration#

Refer to GaussianFilter operator documentation for more details. BoxFilter operator uses the same dataflow configuration as GaussianFilter operator.

Buffer Allocation#

Refer to GaussianFilter operator documentation for more details. BoxFilter operator uses the same buffer allocation as GaussianFilter operator.

Kernel Implementation#

Refer to GaussianFilter operator documentation for more details. BoxFilter operator uses the same kernel implementation as GaussianFilter operator.

Performance#

Execution Time is the average time required to execute the operator on a single VPU core. Note that each PVA contains two VPU cores, which can operate in parallel to process two streams simultaneously, or reduce execution time by approximately half by splitting the workload between the two cores.

Total Power represents the average total power consumed by the module when the operator is executed concurrently on both VPU cores. Idle power is approximately 7W when the PVA is not processing data.

For detailed information on interpreting the performance table below and understanding the benchmarking setup, see Performance Benchmark.

ImageSize

DataType

KernelSize

Execution Time

Submit Latency

Total Power

1920x1080

U8

3x3

0.158ms

0.023ms

16.252W

1920x1080

U8

5x5

0.161ms

0.023ms

17.295W

1920x1080

U8

7x7

0.166ms

0.024ms

17.777W

1920x1080

S8

3x3

0.158ms

0.024ms

16.632W

1920x1080

S8

5x5

0.161ms

0.025ms

17.295W

1920x1080

S8

7x7

0.166ms

0.025ms

18.056W

1920x1080

U16

3x3

0.301ms

0.023ms

16.913W

1920x1080

U16

5x5

0.356ms

0.023ms

17.858W

1920x1080

U16

7x7

0.457ms

0.023ms

16.592W

1920x1080

S16

3x3

0.301ms

0.023ms

17.295W

1920x1080

S16

5x5

0.356ms

0.025ms

17.856W

1920x1080

S16

7x7

0.458ms

0.024ms

16.972W

Compatibility#

Requires PVA SDK 2.6.0 and later.