GaussianFilter#
Overview#
The GaussianFilter operator is a low-pass discrete Gaussian filter that smooths out the image by doing a Gaussian-weighted averaging of neighbor pixels of a given input pixel. The current implementation supports 3x3 and 5x5 kernels.
Gaussian Filter Parameters: sigmaX = 1.7, sigmaY = 1.7, kernel size = 5×5
Algorithm Description#
Gaussian filter is implemented as a convolution operation on the input image where the kernel has the following weights:
Implementation Details#
Parameters#
SigmaX (float) : the standard deviation of the Gaussian kernel on X direction. Higher values result in more blurring, while lower values result in less blurring.
SigmaY (float) : the standard deviation of the Gaussian kernel on Y direction. HIgher values result in more blurring, while lower values result in less blurring.
KernelSize (int) : the size of the kernel used for filtering. The current implementation only supports 3x3 and 5x5 kernels.
BorderMode (NVCVBorderType) : the border mode used for padding the source image, can be constant or replicate.
BorderValue (int) : the value used for constant border padding.
Dataflow Configuration#
The RasterDataFlow (RDF) with halo and circular buffer is an ideal fit for the Conv2d primitive. The RDF halo API is used to configure the boundary padding and the size of overlapped pixels. RDF automatically handles the padding of border pixels based on the borderMode and borderValue parameters.
Two RDFs are used to transfer odd rows and even rows of input image from DRAM to VMEM respectively.
One RDF is used to transfer the convolution result from VMEM to DRAM.
Buffer Allocation#
4 VMEM buffers are needed:
src_even_v
: input circular buffer with even rows of the input image.src_odd_v
: input circular buffer with odd rows of the input image.knl_v
: kernel buffer for the reformatted convolution kernel coefficients.dst_v
: output buffer with double buffering for the convolution result.
Kernel Implementation#
GaussianFilter operator uses Conv2d primitive to perform convolution operations. Please refer to Conv2d primitive documentation for more details.
Performance#
ImageSize |
DataType |
FilterSize |
Execution Time |
Submit Latency |
Total Power |
---|---|---|---|---|---|
1920x1080 |
U8 |
3x3 |
0.157ms |
0.019ms |
16.04W |
1920x1080 |
U8 |
5x5 |
0.161ms |
0.019ms |
17.188W |
1920x1080 |
S8 |
3x3 |
0.157ms |
0.020ms |
16.524W |
1920x1080 |
S8 |
5x5 |
0.160ms |
0.019ms |
17.188W |
1920x1080 |
U16 |
3x3 |
0.300ms |
0.017ms |
16.825W |
1920x1080 |
U16 |
5x5 |
0.356ms |
0.018ms |
16.886W |
1920x1080 |
S16 |
3x3 |
0.301ms |
0.020ms |
17.29W |
1920x1080 |
S16 |
5x5 |
0.356ms |
0.018ms |
17.269W |
For detailed information on interpreting the performance table above and understanding the benchmarking setup, see Performance Benchmark.
Compatibility#
Requires PVA SDK 2.6.0 and later.