GaussianFilter#

Overview#

The GaussianFilter operator is a low-pass discrete Gaussian filter that smooths out the image by doing a Gaussian-weighted averaging of neighbor pixels of a given input pixel. The current implementation supports 3x3 and 5x5 kernels.

Input Image	Output Image

Gaussian Filter Parameters: sigmaX = 1.7, sigmaY = 1.7, kernel size = 5×5

Algorithm Description#

Gaussian filter is implemented as a convolution operation on the input image where the kernel has the following weights:

\[w_g[x, y]=\frac{1}{2\pi\sigma^2}\cdot e^{-\frac{x^2+y^2}{2\sigma^2}}\]

Implementation Details#

Parameters#

SigmaX (float) : the standard deviation of the Gaussian kernel on X direction. Higher values result in more blurring, while lower values result in less blurring.
SigmaY (float) : the standard deviation of the Gaussian kernel on Y direction. HIgher values result in more blurring, while lower values result in less blurring.
KernelSize (int) : the size of the kernel used for filtering. The current implementation only supports 3x3 and 5x5 kernels.
BorderMode (NVCVBorderType) : the border mode used for padding the source image, can be constant or replicate.
BorderValue (int) : the value used for constant border padding.

Dataflow Configuration#

The RasterDataFlow (RDF) with halo and circular buffer is an ideal fit for the Conv2d primitive. The RDF halo API is used to configure the boundary padding and the size of overlapped pixels. RDF automatically handles the padding of border pixels based on the borderMode and borderValue parameters.

Two RDFs are used to transfer odd rows and even rows of input image from DRAM to VMEM respectively.

One RDF is used to transfer the convolution result from VMEM to DRAM.

Buffer Allocation#

4 VMEM buffers are needed:

src_even_v : input circular buffer with even rows of the input image.
src_odd_v : input circular buffer with odd rows of the input image.
knl_v : kernel buffer for the reformatted convolution kernel coefficients.
dst_v : output buffer with double buffering for the convolution result.

Kernel Implementation#

GaussianFilter operator uses Conv2d primitive to perform convolution operations. Please refer to Conv2d primitive documentation for more details.

Performance#

ImageSize	DataType	FilterSize	Execution Time	Submit Latency	Total Power
1920x1080	U8	3x3	0.157ms	0.019ms	16.04W
1920x1080	U8	5x5	0.161ms	0.019ms	17.188W
1920x1080	S8	3x3	0.157ms	0.020ms	16.524W
1920x1080	S8	5x5	0.160ms	0.019ms	17.188W
1920x1080	U16	3x3	0.300ms	0.017ms	16.825W
1920x1080	U16	5x5	0.356ms	0.018ms	16.886W
1920x1080	S16	3x3	0.301ms	0.020ms	17.29W
1920x1080	S16	5x5	0.356ms	0.018ms	17.269W

For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.

Compatibility#

Requires PVA SDK 2.6.0 and later.