The Separable Image Convolver algorithm performs a 2D convolution operation, but takes advantage of the fact that the 2D kernel is separable. The user passes one horizontal and one vertical 1D kernel. This usually leads to better performance, especially for kernels larger than 5x5. For smaller kernels, it's preferable to use Image Convolver algorithm with a 2D kernel directly.
Input | Sobel kernel | Output |
---|---|---|
![]() | \begin{eqnarray*} k_{col} &=& \frac{1}{64} \begin{bmatrix} 1 \\ 6 \\ 15 \\ 20 \\ 15 \\ 6 \\ 1 \end{bmatrix} \\ k_{row} &=& \begin{bmatrix} -1 & -5 & -6 & 0 & 6 & 5 & 1 \end{bmatrix} \end{eqnarray*} | ![]() |
Discrete 2D convolution is implemented using the following discrete function:
\begin{eqnarray*} I'[x,y] &=& \sum_{m=0}^{k_w} K_{row}[m] \times I[x,y-(m - \lfloor k_w/2 \rfloor)] \\ I''[x,y] &=& \sum_{m=0}^{k_h} K_{col}[m] \times I'[x-(m - \lfloor k_h/2 \rfloor),y] \end{eqnarray*}
Where:
Consult the Image Convolution for a complete example.
Constraints for specific backends supersede the ones specified for all backends.
For further information on how performance benchmarked, see Performance Measurement.
size | type | kernel | CPU | CUDA | PVA |
---|---|---|---|---|---|
1920x1080 | u8 | 3x3 | 0.668 ms | 0.0649 ms | n/a |
1920x1080 | u8 | 5x5 | 0.831 ms | 0.0688 ms | n/a |
1920x1080 | u8 | 7x7 | 1.00 ms | 0.0878 ms | n/a |
1920x1080 | u8 | 11x11 | 1.290 ms | 0.0994 ms | n/a |
1920x1080 | s16 | 3x3 | 0.452 ms | 0.1069 ms | 3.1794 ms |
1920x1080 | s16 | 5x5 | 0.58 ms | 0.1153 ms | 3.8299 ms |
1920x1080 | s16 | 7x7 | 1.119 ms | 0.1348 ms | 3.7808 ms |
1920x1080 | s16 | 11x11 | 1.30 ms | 0.1596 ms | 4.7061 ms |
size | type | kernel | CPU | CUDA | PVA |
---|---|---|---|---|---|
1920x1080 | u8 | 3x3 | 1.39 ms | 0.258 ms | n/a |
1920x1080 | u8 | 5x5 | 1.82 ms | 0.288 ms | n/a |
1920x1080 | u8 | 7x7 | 2.19 ms | 0.397 ms | n/a |
1920x1080 | u8 | 11x11 | 3.00 ms | 0.473 ms | n/a |
1920x1080 | s16 | 3x3 | 1.83 ms | 0.383 ms | n/a |
1920x1080 | s16 | 5x5 | 2.05 ms | 0.422 ms | n/a |
1920x1080 | s16 | 7x7 | 2.58 ms | 0.585 ms | n/a |
1920x1080 | s16 | 11x11 | 3.47 ms | 0.686 ms | n/a |
size | type | kernel | CPU | CUDA | PVA |
---|---|---|---|---|---|
1920x1080 | u8 | 3x3 | 3.14 ms | 0.670 ms | n/a |
1920x1080 | u8 | 5x5 | 3.80 ms | 0.7460 ms | n/a |
1920x1080 | u8 | 7x7 | 4.84 ms | 1.027 ms | n/a |
1920x1080 | u8 | 11x11 | 6.80 ms | 1.235 ms | n/a |
1920x1080 | s16 | 3x3 | 3.540 ms | 0.985 ms | n/a |
1920x1080 | s16 | 5x5 | 4.24 ms | 1.050 ms | n/a |
1920x1080 | s16 | 7x7 | 5.38 ms | 1.424 ms | n/a |
1920x1080 | s16 | 11x11 | 7.32 ms | 1.693 ms | n/a |