Overview

Image convolver algorithm performs a 2D convolution operation on the input image with the provided 2D kernel. This is useful when the kernel isn't separable and its dimensions are smaller than 5x5. In other cases, it's usually preferable to use the separable image convolver algorithm due to its speed.

Input	Kernel	Output
	\[ \begin{bmatrix} 1 & 0 & -1 \\ 0 & 0 & 0 \\ -1 & 0 & 1 \end{bmatrix} \]

Implementation

Discrete 2D convolution is implemented using the following discrete function:

\[ I'[x,y] = \sum_{m=0}^{k_h} \sum_{n=0}^{k_w} K[m,n] \times I[x-(n-\lfloor k_w/2 \rfloor), y-(m-\lfloor k_h/2 \rfloor) ] \]

Where:

\(I\) is the input image.
\(I'\) is the result image.
\(K\) is the convolution kernel.
\(k_w,k_h\) are the kernel's width and height, respectively.

Note: Most computer vision libraries expect the kernel to be reversed before calling their convolution functions. Not so with VPI, we implement an actual convolution, not cross-correlation. Naturally, this is irrelevant if the kernel is symmetric.

Usage

Initialization phase
1. Include the header that defines the needed functions and structures.
  #include <vpi/algo/ImageConvolver.h>
2. Define the stream on which the algorithm will be executed, the input and output images.
  VPIStream stream = /*...*/;
  
  VPIImage input = /*...*/;
3. Create the output image.
  uint32_t w, h;
  
  vpiImageGetSize(input, &w, &h);
  
  VPIImageType type;
  
  vpiImageGetType(input, &type);
  
  VPIImage output;
  
  vpiImageCreate(w, h, type, 0, &output);
Processing phase
1. Define the kernel to be used. In this case, a simple 3x3 edge detector.
  float kernel[3 * 3] = {1, 0, -1, 0, 0, 0, -1, 0, 1};
2. Submit the algorithm to the stream, passing the kernel, input, output images and boundary condition.
  vpiSubmitImageConvolver(stream, input, output, kernel, 3, 3, VPI_BOUNDARY_COND_ZERO);
3. Optionally, wait until the processing is done.
  vpiStreamSync(stream);

Consult the Image Convolution for a complete example.

For more details, consult the API reference.

Limitations and Constraints

Constraints for specific backends supersede the ones specified for all backends.

All Backends

Input and output images must have the same dimensions and type.
The following image types are accepted:
Minimum convolution kernel size is 1x1, maximum is 11x11.
The following boundary conditions are accepted.
- VPI_BOUNDARY_COND_ZERO
- VPI_BOUNDARY_COND_CLAMP

PVA

Input and output dimensions must be between 65x33 and 3264x2448.
Minimum convolution kernel size is 2x2.
Maximum convolution kernel size is 11x11.
Kernel weights are restricted to \(|weight| < 1\)
Only VPI_BOUNDARY_COND_ZERO is accepted.

Performance

For further information on how performance was benchmarked, see Performance Measurement.

Jetson AGX Xavier
size	type	kernel	CPU	CUDA	PVA
1920x1080	u8	3x3	0.652 ms	0.0651 ms	1.005 ms
1920x1080	u8	5x5	1.029 ms	0.0757 ms	1.310 ms
1920x1080	u8	7x7	1.45 ms	0.1193 ms	1.842 ms
1920x1080	u8	11x11	3.59 ms	0.2504 ms	3.341 ms
1920x1080	u16	3x3	0.84 ms	0.1064 ms	1.111 ms
1920x1080	u16	5x5	1.10 ms	0.1264 ms	1.592 ms
1920x1080	u16	7x7	1.52 ms	0.1915 ms	2.431 ms
1920x1080	u16	11x11	3.42 ms	0.4168 ms	4.732 ms

Jetson TX2
size	type	kernel	CPU	CUDA	PVA
1920x1080	u8	3x3	1.7 ms	0.260 ms	n/a
1920x1080	u8	5x5	2.6 ms	0.306 ms	n/a
1920x1080	u8	7x7	4.06 ms	0.484 ms	n/a
1920x1080	u8	11x11	11.78 ms	0.8022 ms	n/a
1920x1080	u16	3x3	1.72 ms	0.387 ms	n/a
1920x1080	u16	5x5	2.9 ms	0.390 ms	n/a
1920x1080	u16	7x7	4.2 ms	0.578 ms	n/a
1920x1080	u16	11x11	11.59 ms	0.9992 ms	n/a

Jetson Nano
size	type	kernel	CPU	CUDA	PVA
1920x1080	u8	3x3	3.305 ms	0.6744 ms	n/a
1920x1080	u8	5x5	5.64 ms	0.8808 ms	n/a
1920x1080	u8	7x7	8.809 ms	1.3282 ms	n/a
1920x1080	u8	11x11	25.60 ms	2.2274 ms	n/a
1920x1080	u16	3x3	4.04 ms	0.973 ms	n/a
1920x1080	u16	5x5	5.989 ms	1.0316 ms	n/a
1920x1080	u16	7x7	9.15 ms	1.568 ms	n/a
1920x1080	u16	11x11	25.91 ms	2.8112 ms	n/a

VPI - Vision Programming Interface