Overview

The Separable Convolution algorithm performs a 2D convolution operation, but takes advantage of the fact that the 2D kernel is separable. The user passes one horizontal and one vertical 1D kernel. This usually leads to better performance, especially for kernels larger than 5x5. For smaller kernels, it's preferable to use Convolution algorithm with a 2D kernel directly.

Input	Sobel kernel	Output
	\begin{eqnarray} k_{col} &=& \frac{1}{64} \begin{bmatrix} 1 \\ 6 \\ 15 \\ 20 \\ 15 \\ 6 \\ 1 \end{bmatrix} \\ k_{row} &=& \begin{bmatrix} -1 & -5 & -6 & 0 & 6 & 5 & 1 \end{bmatrix} \end{eqnarray}

Implementation

Discrete 2D convolution is implemented using the following discrete function:

\begin{eqnarray*} I'[x,y] &=& \sum_{m=0}^{k_w} K_{row}[m] \times I[x,y-(m - \lfloor k_w/2 \rfloor)] \\ I''[x,y] &=& \sum_{m=0}^{k_h} K_{col}[m] \times I'[x-(m - \lfloor k_h/2 \rfloor),y] \end{eqnarray*}

Where:

\(I\) is the input image.
\(I'\) is the temporary image with convolution along the rows.
\(I''\) is the final result.
\(K_{row}\) is the row convolution kernel.
\(K_{col}\) is the column convolution kernel.
\(k_w,k_h\) are the kernel's width and height, respectively.

Note: Most computer vision libraries expect the kernel to be reversed before calling their convolution functions. Not so with VPI, we implement a actual convolution, not cross-correlation. Naturally, this is irrelevant if the kernel is symmetric.

Usage

Initialization phase
1. Include the header that defines the needed functions and structures.
  #include <vpi/algo/Convolution.h>
2. Define the input image object.
  VPIImage input = /*...*/;
3. Create the output image. It gets its dimensions and format from the input image.
  uint32_t w, h;
  
  vpiImageGetSize(input, &w, &h);
  
  VPIImageFormat type;
  
  vpiImageGetType(input, &type);
  
  VPIImage output;
  
  vpiImageCreate(w, h, type, 0, &output);
4. Create the stream where the algorithm will be submitted for execution.
  VPIStream stream;
  
  vpiStreamCreate(0, &stream);
Processing phase
1. Define the kernel to be used. In this case, a simple 7x7 Sobel filter.
  float sobel_row[7] = {-1, -5, -6, 0, +6, +5, +1};
  
  float sobel_col[7] = {1/64.f, 6/64.f, 15/64.f, 20/64.f, 15/64.f, 6/64.f, 1/64.f};
2. Submit the algorithm to the stream, passing the 1D kernels and remaining arguments. I'll be executed by the CUDA backend.
  vpiSubmitSeparableConvolution(stream, VPI_BACKEND_CUDA, input, output, sobel_row, 7, sobel_col, 7, VPI_BOUNDARY_COND_ZERO);
3. Optionally, wait until the processing is done.
  vpiStreamSync(stream);
Cleanup phase
1. Free resources held by the stream and the input and output images.
  vpiStreamDestroy(stream);
  
  vpiImageDestroy(input);
  
  vpiImageDestroy(output);

For more details, consult the Convolution API reference.

Limitations and Constraints

Constraints for specific backends supersede the ones specified for all backends.

All Backends

Input and output images must have the same dimensions and type.
The following image formats are accepted:
Minimum 1D convolution kernel size is 1, maximum is 11.
The following boundary conditions are accepted.
- VPI_BOUNDARY_COND_ZERO
- VPI_BOUNDARY_COND_CLAMP

PVA

Only available on Jetson Xavier devices.
Input and output dimensions must be between 160x92 and 3264x2448.
Minimum 1D convolution kernel size is 2, maximum is 11.
Horizontal and vertical kernel sizes must be equal, i.e., only square kernels can be used.
Kernel weights are restricted to \(|weight| < 1\).
The following image formats are the only ones accepted:
- VPI_IMAGE_FORMAT_S16
The following boundary conditions are accepted.
- VPI_BOUNDARY_COND_ZERO

VIC

Not implemented.

Performance

For information on how to use the performance table below, see Algorithm Performance Tables.
Before comparing measurements, consult Comparing Algorithm Elapsed Times.
For further information on how performance was benchmarked, see Performance Measurement.

VPI - Vision Programming Interface

0.4.4 Release