GaussianFilter#

Overview#

The GaussianFilter operator is a low-pass discrete Gaussian filter that smooths out the image by doing a Gaussian-weighted averaging of neighbor pixels of a given input pixel. The current implementation supports 3x3, 5x5 and 7x7 kernels.

Input Image	Output Image

Gaussian Filter Parameters: sigmaX = 1.7, sigmaY = 1.7, kernel size = 5×5

Algorithm Description#

Gaussian filter is implemented as a convolution operation on the input image where the kernel has the following weights:

\[w_g[x, y]=\frac{1}{2\pi\sigma^2}\cdot e^{-\frac{x^2+y^2}{2\sigma^2}}\]

Implementation Details#

Parameters#

SigmaX (float) : the standard deviation of the Gaussian kernel on X direction. Higher values result in more blurring, while lower values result in less blurring.
SigmaY (float) : the standard deviation of the Gaussian kernel on Y direction. Higher values result in more blurring, while lower values result in less blurring.
KernelSize (int) : the size of the kernel used for filtering. The current implementation only supports 3x3, 5x5 and 7x7 kernels.
BorderMode (NVCVBorderType) : the border mode used for padding the source image, can be constant or replicate.
BorderValue (int) : the value used for constant border padding.

Kernel Quantization#

The Gaussian filter kernel is originally generated in floating point format. The kernel coefficients are then quantized to unsigned integers using the following formula:

\[k_{int} = \text{truncate}(k_{float} \cdot 2^{qbits})\]

where \(k_{int}\) is the quantized kernel coefficient, \(k_{float}\) is the original floating point kernel coefficient, and \(qbits\) is the number of bits used for quantization. \(qbits\) is set to 8 for uint8/int8 types and 16 for uint16/int16 types.

To ensure the sum of the quantized kernel coefficients equals to \(2^{qbits}\), we use truncation in quantization and adjust the center coefficient of the quantized kernel by the following formula:

\[k_{int}[center] = k_{int}[center] + (2^{qbits} - \sum_{i=0}^{n} k_{int}[i])\]

where \(k_{int}[center]\) is the center coefficient of the quantized kernel.

Dataflow Configuration#

The RasterDataFlow (RDF) with halo and circular buffer is an ideal fit for the Conv2d primitive. The RDF halo API is used to configure the boundary padding and the size of overlapped pixels. RDF automatically handles the padding of border pixels based on the borderMode and borderValue parameters.

Two RDFs are used to transfer odd rows and even rows of input image from DRAM to VMEM respectively.

One RDF is used to transfer the convolution result from VMEM to DRAM.

Buffer Allocation#

4 VMEM buffers are needed:

src_even_v : input circular buffer with even rows of the input image.
src_odd_v : input circular buffer with odd rows of the input image.
knl_v : kernel buffer for the reformatted convolution kernel coefficients.
dst_v : output buffer with double buffering for the convolution result.

Kernel Implementation#

GaussianFilter operator uses Conv2d primitive to perform convolution operations. Please refer to Conv2d primitive documentation for more details.

Performance#

Execution Time is the average time required to execute the operator on a single VPU core. Note that each PVA contains two VPU cores, which can operate in parallel to process two streams simultaneously, or reduce execution time by approximately half by splitting the workload between the two cores.

Total Power represents the average total power consumed by the module when the operator is executed concurrently on both VPU cores.

For detailed information on interpreting the performance table below and understanding the benchmarking setup, see Performance Benchmark.

ImageSize	DataType	FilterSize	Execution Time	Submit Latency	Total Power
1920x1080	U8	3x3	0.159ms	0.019ms	16.256W
1920x1080	U8	5x5	0.162ms	0.019ms	17.018W
1920x1080	U8	7x7	0.167ms	0.018ms	17.784W
1920x1080	S8	3x3	0.158ms	0.019ms	16.256W
1920x1080	S8	5x5	0.162ms	0.019ms	17.401W
1920x1080	S8	7x7	0.166ms	0.018ms	18.166W
1920x1080	U16	3x3	0.301ms	0.018ms	17.402W
1920x1080	U16	5x5	0.356ms	0.018ms	17.481W
1920x1080	U16	7x7	0.457ms	0.018ms	16.98W
1920x1080	S16	3x3	0.302ms	0.019ms	17.402W
1920x1080	S16	5x5	0.356ms	0.018ms	17.863W
1920x1080	S16	7x7	0.457ms	0.017ms	17.361W

Compatibility#

Requires PVA SDK 2.6.0 and later.