Overview

The Separable Image Convolver algorithm performs a 2D convolution operation, but takes advantage of the fact that the 2D kernel is separable. The user passes one horizontal and one vertical 1D kernel. This usually leads to better performance, especially for kernels larger than 5x5. For smaller kernels, it's preferable to use Image Convolver algorithm with a 2D kernel directly.

Input	Sobel kernel	Output
	\begin{eqnarray} k_{col} &=& \frac{1}{64} \begin{bmatrix} 1 \\ 6 \\ 15 \\ 20 \\ 15 \\ 6 \\ 1 \end{bmatrix} \\ k_{row} &=& \begin{bmatrix} -1 & -5 & -6 & 0 & 6 & 5 & 1 \end{bmatrix} \end{eqnarray}

Implementation

Discrete 2D convolution is implemented using the following discrete function:

\begin{eqnarray*} I'[x,y] &=& \sum_{m=0}^{k_w} K_{row}[m] \times I[x,y-(m - \lfloor k_w/2 \rfloor)] \\ I''[x,y] &=& \sum_{m=0}^{k_h} K_{col}[m] \times I'[x-(m - \lfloor k_h/2 \rfloor),y] \end{eqnarray*}

Where:

\(I\) is the input image.
\(I'\) is the temporary image with convolution along the rows.
\(I''\) is the final result.
\(K_{row}\) is the row convolution kernel.
\(K_{col}\) is the column convolution kernel.
\(k_w,k_h\) are the kernel's width and height, respectively.

Note: Most computer vision libraries expect the kernel to be reversed before calling their convolution functions. Not so with VPI, we implement a actual convolution, not cross-correlation. Naturally, this is irrelevant if the kernel is symmetric.

Usage

Initialization phase
1. Include the header that defines the needed functions and structures.
  #include <vpi/algo/SeparableImageConvolver.h>
2. Define the stream on which the algorithm will be executed, the input and output images.
  VPIStream stream = /*...*/;
  
  VPIImage input = /*...*/;
3. Create the output image.
  uint32_t w, h;
  
  vpiImageGetSize(input, &w, &h);
  
  VPIImageType type;
  
  vpiImageGetType(input, &type);
  
  VPIImage output;
  
  vpiImageCreate(w, h, type, 0, &output);
Processing phase
1. Define the kernel to be used. In this case, a simple 3x3 edge detector.
  float sobel_row[7] = {-1, -5, -6, 0, +6, +5, +1};
  
  float sobel_col[7] = {1 / 64.f, 6 / 64.f, 15 / 64.f, 20 / 64.f, 15 / 64.f, 6 / 64.f, 1 / 64.f};
2. Submit the algorithm to the stream, passing the kernel, input, output images and boundary condition.
  VPI_CHECK_STATUS(
  
  vpiSubmitSeparableImageConvolver(stream, input, output, sobel_row, 7, sobel_col, 7, VPI_BOUNDARY_COND_ZERO));
3. Optionally, wait until the processing is done.
  vpiStreamSync(stream);

Consult the Image Convolution for a complete example.

For more details, consult the API reference.

Limitations and Constraints

Constraints for specific backends supersede the ones specified for all backends.

All Backends

Input and output images must have the same dimensions and type.
The following image types are accepted:
Minimum 1D convolution kernel size is 1, maximum is 11.
The following boundary conditions are accepted.
- VPI_BOUNDARY_COND_ZERO
- VPI_BOUNDARY_COND_CLAMP

PVA

Input and output dimensions must be between 160x92 and 3264x2448.
Minimum 1D convolution kernel size is 2, maximum is 11.
Horizontal and vertical kernel sizes must be equal, i.e., only square kernels can be used.
Kernel weights are restricted to \(|weight| < 1\).
The following image types are accepted:
- VPI_IMAGE_TYPE_S16
The following boundary conditions are accepted.
- VPI_BOUNDARY_COND_ZERO

Performance

For further information on how performance benchmarked, see Performance Measurement.

Jetson AGX Xavier
size	type	kernel	CPU	CUDA	PVA
1920x1080	u8	3x3	0.362 ms	0.0652 ms	n/a
1920x1080	u8	5x5	0.442 ms	0.0689 ms	n/a
1920x1080	u8	7x7	0.98 ms	0.0875 ms	n/a
1920x1080	u8	11x11	1.283 ms	0.0988 ms	n/a
1920x1080	s16	3x3	0.456 ms	0.1062 ms	3.267 ms
1920x1080	s16	5x5	0.59 ms	0.1154 ms	3.915 ms
1920x1080	s16	7x7	1.12 ms	0.1342 ms	3.864 ms
1920x1080	s16	11x11	1.30 ms	0.1589 ms	4.791 ms

Jetson TX2
size	type	kernel	CPU	CUDA	PVA
1920x1080	u8	3x3	1.44 ms	0.260 ms	n/a
1920x1080	u8	5x5	1.74 ms	0.289 ms	n/a
1920x1080	u8	7x7	2.17 ms	0.396 ms	n/a
1920x1080	u8	11x11	3.00 ms	0.472 ms	n/a
1920x1080	s16	3x3	2.0 ms	0.392 ms	n/a
1920x1080	s16	5x5	2.09 ms	0.429 ms	n/a
1920x1080	s16	7x7	2.61 ms	0.594 ms	n/a
1920x1080	s16	11x11	3.42 ms	0.692 ms	n/a

Jetson Nano
size	type	kernel	CPU	CUDA	PVA
1920x1080	u8	3x3	3.06 ms	0.671 ms	n/a
1920x1080	u8	5x5	3.83 ms	0.7470 ms	n/a
1920x1080	u8	7x7	4.70 ms	1.027 ms	n/a
1920x1080	u8	11x11	6.85 ms	1.239 ms	n/a
1920x1080	s16	3x3	3.60 ms	1.001 ms	n/a
1920x1080	s16	5x5	4.242 ms	1.051 ms	n/a
1920x1080	s16	7x7	5.51 ms	1.426 ms	n/a
1920x1080	s16	11x11	7.36 ms	1.692 ms	n/a

VPI - Vision Programming Interface