GaussianFilter#

Overview#

The GaussianFilter operator is a low-pass discrete Gaussian filter that smooths out the image by doing a Gaussian-weighted averaging of neighbor pixels of a given input pixel. The current implementation supports 3x3 and 5x5 kernels.

Input Image

Output Image

../../_images/in.png
../../_images/out.png

Gaussian Filter Parameters: sigmaX = 1.7, sigmaY = 1.7, kernel size = 5×5

Algorithm Description#

Gaussian filter is implemented as a convolution operation on the input image where the kernel has the following weights:

\[w_g[x, y]=\frac{1}{2\pi\sigma^2}\cdot e^{-\frac{x^2+y^2}{2\sigma^2}}\]

Implementation Details#

Parameters#

  • SigmaX (float) : the standard deviation of the Gaussian kernel on X direction. Higher values result in more blurring, while lower values result in less blurring.

  • SigmaY (float) : the standard deviation of the Gaussian kernel on Y direction. HIgher values result in more blurring, while lower values result in less blurring.

  • KernelSize (int) : the size of the kernel used for filtering. The current implementation only supports 3x3 and 5x5 kernels.

  • BorderMode (NVCVBorderType) : the border mode used for padding the source image, can be constant or replicate.

  • BorderValue (int) : the value used for constant border padding.

Dataflow Configuration#

The RasterDataFlow (RDF) with halo and circular buffer is an ideal fit for the Conv2d primitive. The RDF halo API is used to configure the boundary padding and the size of overlapped pixels. RDF automatically handles the padding of border pixels based on the borderMode and borderValue parameters.

Two RDFs are used to transfer odd rows and even rows of input image from DRAM to VMEM respectively.

One RDF is used to transfer the convolution result from VMEM to DRAM.

Buffer Allocation#

4 VMEM buffers are needed:

  • src_even_v : input circular buffer with even rows of the input image.

  • src_odd_v : input circular buffer with odd rows of the input image.

  • knl_v : kernel buffer for the reformatted convolution kernel coefficients.

  • dst_v : output buffer with double buffering for the convolution result.

Kernel Implementation#

GaussianFilter operator uses Conv2d primitive to perform convolution operations. Please refer to Conv2d primitive documentation for more details.

Performance#

ImageSize

DataType

FilterSize

Execution Time

Submit Latency

Total Power

1920x1080

U8

3x3

0.157ms

0.019ms

16.04W

1920x1080

U8

5x5

0.161ms

0.019ms

17.188W

1920x1080

S8

3x3

0.157ms

0.020ms

16.524W

1920x1080

S8

5x5

0.160ms

0.019ms

17.188W

1920x1080

U16

3x3

0.300ms

0.017ms

16.825W

1920x1080

U16

5x5

0.356ms

0.018ms

16.886W

1920x1080

S16

3x3

0.301ms

0.020ms

17.29W

1920x1080

S16

5x5

0.356ms

0.018ms

17.269W

For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.

Compatibility#

Requires PVA SDK 2.6.0 and later.