The Separable Image Convolver algorithm performs a 2D convolution operation, but takes advantage of the fact that the 2D kernel is separable. The user passes one horizontal and one vertical 1D kernel. This usually leads to better performance, especially for kernels larger than 5x5. For smaller kernels, it's preferable to use Image Convolver algorithm with a 2D kernel directly.
Input | Sobel kernel | Output |
---|---|---|
![]() | \begin{eqnarray*} k_{col} &=& \frac{1}{64} \begin{bmatrix} 1 \\ 6 \\ 15 \\ 20 \\ 15 \\ 6 \\ 1 \end{bmatrix} \\ k_{row} &=& \begin{bmatrix} -1 & -5 & -6 & 0 & 6 & 5 & 1 \end{bmatrix} \end{eqnarray*} | ![]() |
Discrete 2D convolution is implemented using the following discrete function:
\begin{eqnarray*} I'[x,y] &=& \sum_{m=0}^{k_w} K_{row}[m] \times I[x,y-(m - \lfloor k_w/2 \rfloor)] \\ I''[x,y] &=& \sum_{m=0}^{k_h} K_{col}[m] \times I'[x-(m - \lfloor k_h/2 \rfloor),y] \end{eqnarray*}
Where:
Consult the Image Convolution for a complete example.
For more details, consult the API reference.
Constraints for specific backends supersede the ones specified for all backends.
For further information on how performance benchmarked, see Performance Measurement.
size | type | kernel | CPU | CUDA | PVA |
---|---|---|---|---|---|
1920x1080 | u8 | 3x3 | 0.362 ms | 0.0652 ms | n/a |
1920x1080 | u8 | 5x5 | 0.442 ms | 0.0689 ms | n/a |
1920x1080 | u8 | 7x7 | 0.98 ms | 0.0875 ms | n/a |
1920x1080 | u8 | 11x11 | 1.283 ms | 0.0988 ms | n/a |
1920x1080 | s16 | 3x3 | 0.456 ms | 0.1062 ms | 3.267 ms |
1920x1080 | s16 | 5x5 | 0.59 ms | 0.1154 ms | 3.915 ms |
1920x1080 | s16 | 7x7 | 1.12 ms | 0.1342 ms | 3.864 ms |
1920x1080 | s16 | 11x11 | 1.30 ms | 0.1589 ms | 4.791 ms |
size | type | kernel | CPU | CUDA | PVA |
---|---|---|---|---|---|
1920x1080 | u8 | 3x3 | 1.44 ms | 0.260 ms | n/a |
1920x1080 | u8 | 5x5 | 1.74 ms | 0.289 ms | n/a |
1920x1080 | u8 | 7x7 | 2.17 ms | 0.396 ms | n/a |
1920x1080 | u8 | 11x11 | 3.00 ms | 0.472 ms | n/a |
1920x1080 | s16 | 3x3 | 2.0 ms | 0.392 ms | n/a |
1920x1080 | s16 | 5x5 | 2.09 ms | 0.429 ms | n/a |
1920x1080 | s16 | 7x7 | 2.61 ms | 0.594 ms | n/a |
1920x1080 | s16 | 11x11 | 3.42 ms | 0.692 ms | n/a |
size | type | kernel | CPU | CUDA | PVA |
---|---|---|---|---|---|
1920x1080 | u8 | 3x3 | 3.06 ms | 0.671 ms | n/a |
1920x1080 | u8 | 5x5 | 3.83 ms | 0.7470 ms | n/a |
1920x1080 | u8 | 7x7 | 4.70 ms | 1.027 ms | n/a |
1920x1080 | u8 | 11x11 | 6.85 ms | 1.239 ms | n/a |
1920x1080 | s16 | 3x3 | 3.60 ms | 1.001 ms | n/a |
1920x1080 | s16 | 5x5 | 4.242 ms | 1.051 ms | n/a |
1920x1080 | s16 | 7x7 | 5.51 ms | 1.426 ms | n/a |
1920x1080 | s16 | 11x11 | 7.36 ms | 1.692 ms | n/a |