GaussianFilter#

Overview#

The GaussianFilter operator is a low-pass discrete Gaussian filter that smooths out the image by doing a Gaussian-weighted averaging of neighbor pixels of a given input pixel. The current implementation supports 3x3, 5x5 and 7x7 kernels.

Input Image

Output Image

../../_images/in1.png
../../_images/out1.png

Gaussian Filter Parameters: sigmaX = 1.7, sigmaY = 1.7, kernel size = 5×5

Algorithm Description#

Gaussian filter is implemented as a convolution operation on the input image where the kernel has the following weights:

\[w_g[x, y]=\frac{1}{2\pi\sigma^2}\cdot e^{-\frac{x^2+y^2}{2\sigma^2}}\]

Implementation Details#

Parameters#

  • SigmaX (float) : the standard deviation of the Gaussian kernel on X direction. Higher values result in more blurring, while lower values result in less blurring.

  • SigmaY (float) : the standard deviation of the Gaussian kernel on Y direction. Higher values result in more blurring, while lower values result in less blurring.

  • KernelSize (int) : the size of the kernel used for filtering. The current implementation only supports 3x3, 5x5 and 7x7 kernels.

  • BorderMode (NVCVBorderType) : the border mode used for padding the source image, can be constant or replicate.

  • BorderValue (int) : the value used for constant border padding.

Kernel Quantization#

The Gaussian filter kernel is originally generated in floating point format. The kernel coefficients are then quantized to unsigned integers using the following formula:

\[k_{int} = \text{truncate}(k_{float} \cdot 2^{qbits})\]

where \(k_{int}\) is the quantized kernel coefficient, \(k_{float}\) is the original floating point kernel coefficient, and \(qbits\) is the number of bits used for quantization. \(qbits\) is set to 8 for uint8/int8 types and 16 for uint16/int16 types.

To ensure the sum of the quantized kernel coefficients equals to \(2^{qbits}\), we use truncation in quantization and adjust the center coefficient of the quantized kernel by the following formula:

\[k_{int}[center] = k_{int}[center] + (2^{qbits} - \sum_{i=0}^{n} k_{int}[i])\]

where \(k_{int}[center]\) is the center coefficient of the quantized kernel.

Dataflow Configuration#

The RasterDataFlow (RDF) with halo and circular buffer is an ideal fit for the Conv2d primitive. The RDF halo API is used to configure the boundary padding and the size of overlapped pixels. RDF automatically handles the padding of border pixels based on the borderMode and borderValue parameters.

Two RDFs are used to transfer odd rows and even rows of input image from DRAM to VMEM respectively.

One RDF is used to transfer the convolution result from VMEM to DRAM.

Buffer Allocation#

4 VMEM buffers are needed:

  • src_even_v : input circular buffer with even rows of the input image.

  • src_odd_v : input circular buffer with odd rows of the input image.

  • knl_v : kernel buffer for the reformatted convolution kernel coefficients.

  • dst_v : output buffer with double buffering for the convolution result.

Kernel Implementation#

GaussianFilter operator uses Conv2d primitive to perform convolution operations. Please refer to Conv2d primitive documentation for more details.

Performance#

Execution Time is the average time required to execute the operator on a single VPU core. Note that each PVA contains two VPU cores, which can operate in parallel to process two streams simultaneously, or reduce execution time by approximately half by splitting the workload between the two cores.

Total Power represents the average total power consumed by the module when the operator is executed concurrently on both VPU cores.

For detailed information on interpreting the performance table below and understanding the benchmarking setup, see Performance Benchmark.

ImageSize

DataType

FilterSize

Execution Time

Submit Latency

Total Power

1920x1080

U8

3x3

0.159ms

0.019ms

16.256W

1920x1080

U8

5x5

0.162ms

0.019ms

17.018W

1920x1080

U8

7x7

0.167ms

0.018ms

17.784W

1920x1080

S8

3x3

0.158ms

0.019ms

16.256W

1920x1080

S8

5x5

0.162ms

0.019ms

17.401W

1920x1080

S8

7x7

0.166ms

0.018ms

18.166W

1920x1080

U16

3x3

0.301ms

0.018ms

17.402W

1920x1080

U16

5x5

0.356ms

0.018ms

17.481W

1920x1080

U16

7x7

0.457ms

0.018ms

16.98W

1920x1080

S16

3x3

0.302ms

0.019ms

17.402W

1920x1080

S16

5x5

0.356ms

0.018ms

17.863W

1920x1080

S16

7x7

0.457ms

0.017ms

17.361W

Compatibility#

Requires PVA SDK 2.6.0 and later.