The Image Format Converter is used to convert an image with a given format into another format. It handles both color space, format and depth conversions. The algorithm also supports input range conversion, when one is required to map, for instance, an unsigned char
\([0,255]\) image into signed short
\([-32768,32767]\) range.
Color Input | Grayscale Output |
---|---|
![]() | ![]() |
The algorithm is implemented as a pixel-wise conversion function that reads in the input pixels, applies a conversion-dependent series of transformations and writes the result to the output image in the same position. User inputs are:
Several types of conversion are available:
The grayscale (or single channel) formats available are:
The color formats available are:
The following table shows which combinations of input and output image types are available for conversion.
in | U8 | S8 | U16 | S16 | F32 | NV12 | RGB8 | RGBA8 | BGR8 | BGRA8 | 2F32 |
---|---|---|---|---|---|---|---|---|---|---|---|
out | |||||||||||
U8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
S8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
U16 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
S16 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
F32 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
NV12 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
RGB8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
RGBA8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
BGR8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
BGRA8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
2F32 | no | no | no | no | no | no | no | no | no | no | yes1 |
1 - Only available when scale == 1
and offset == 0
The following sections describe how input value is converted into output. In general, these conversions amount to color space, depth, channel order (swizzle), adding or removing alpha channel, down- or up-sampling transformations. These are represented as conversion pipelines made out of basic processing blocks defined below.
Channel depth conversion is represented by the block aptly named "depth" and is defined by the following sub-pipeline:
depth | \(=\) | range | \(\rightarrow\) | round | \(\rightarrow\) | clamp/cast |
range: input is converted to floating point (fp32) and the following formula is applied:
\[ f(x) = \text{scale} \times x + \text{offset} \]
If scale==1
and offset==0
, a shortcut is taken and no operation (not even conversion to floating point) is performed.
round(0.5) == 1.0
and round(-0.5) == -1.0
.static_cast
would do. Underflows and overflows will behave as described by C specification (including undefined behavior). This is used when it's known that input range fits into output and maximum performance is needed.When applied to multiple channels such as RGB, the operation is performed on each channel independently.
This is represented by the following block:
swizzle |
It's used to permute (or swizzle) input type's channel order. Used in conversions from/to color spaces that can be represented in multiple ways, like RGB and BGR. The color space conversion functions assume a pre-determined channel order. In order to use them, channels must be reordered.
For RGB \(\leftrightarrow\) YUV conversions, VPI uses the ITU-R BT.601 625-line specification. It's the same standard used by JPEG File Interchange Format (JFIF).
To precisely establish the conversion, let's define the following constants:
\begin{align} K_r &= 0.299 \\ K_g &= 0.587 \\ K_b &= 0.114 \\ K_{c_b} &= 1.772 \\ K_{c_r} &= 1.402 \\ \end{align}
For notation convenience, we're assuming that \(U\) and \(V\) correspond to \(C_b\) and \(C_r\) respectively. This assumption doesn't hold in general.
The conversion blocks can be defined as:
rgb2yuv |
\begin{align} Y(r,g,b) &= \text{round}(r K_r + g K_g + b K_b)\big|^{255}_{0} \\ C_b(r,g,b) &= \text{round}((-r K_r - g K_g + b (1 - K_b )) / K_ {c_b} + 128)\big|^{255}_{0} \\ C_r(r,g,b) &= \text{round}((r (1-K_r) - g K_g - b K_b) / K_{c_r} + 128)\big|^{255}_{0} \end{align}
These functions expect \(r,g,b \in [0,255]\)
yuv2rgb |
\begin{align} R(y,c_b,c_r) &= \text{round}(y+K_{c_r}(c_r-128))\big|^{255}_{0} \\ G(y,c_b,c_r) &= \text{round}(y-[K_b K_{c_b} (c_b-128) + K_r K_{c_r} (c_r-128)] / K_g)\big|^{255}_{0} \\ B(y,c_b,c_r) &= \text{round}(y + K_{c_b} (c_b - 128))\big|^{255}_{0} \end{align}
These functions expect \(y,c_b,c_r \in [0,255]\)
The notation \(X\big|^{N} _ {M} \) represents clamping X's underflows and overflows to M and N respectively.
The round
function follows the definition here.
Conversion from RGB to grayscale follows the same specification used for conversion from RGB to YUV, but just returning the luma component. Hence, using the same constants defined here.
rgb2gray |
\[ Y(r,g,b) = \text{round}(K_r \times r + K_g \times g + K_b \times b)\big|^{255}_{0} \]
For grayscale to RGB the conversion is simply:
gray2rgb |
\[ f(x) = (x,x,x) \]
For image formats that includes subsampled planes like VPI_IMAGE_TYPE_NV12, the following block definitions are needed:
2x downsample |
\[ D[x,y] = S[2x,2y] \]
2x upsample |
\[ D[x,y] = S[\lfloor x/2 \rfloor, \lfloor y/2 \rfloor] \]
Depending on input and output pixel type, i.e. whether it's required to remove or add an alpha channel, the following block might be used:
alpha |
This section defines how input pixel is converted to output. It uses the basic conversion blocks defined in previous section.
input | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | depth | \(\rightarrow\) | Y plane | \(\searrow\) |
(128,128) | \(\rightarrow\) | UV plane | \(\nearrow\) |
output |
input | \(\rightarrow\) | Y plane | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | depth | \(\rightarrow\) | gray2rgb | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | output |
input | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | rgb2gray | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | depth | \(\rightarrow\) | rgb2yuv |
\(\nearrow\) | Y plane | \(\searrow\) | ||
\(\searrow\) | UV plane | \(\rightarrow\) | 2x downsample | \(\nearrow\) |
output |
input |
\(\nearrow\) | Y plane | \(\searrow\) | ||
\(\searrow\) | UV plane | \(\rightarrow\) | 2x upsample | \(\nearrow\) |
yuv2rgb | \(\rightarrow\) | depth | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | output |
For more details, consult the API reference.
For further information on how performance was benchmarked, see Performance Measurement.
size | input type | output type | conv. | scale | offset | CPU | CUDA | PVA |
---|---|---|---|---|---|---|---|---|
1920x1080 | u8 | u8 | cast | 1 | 0 | 0.188 ms | 0.135 ms | n/a |
1920x1080 | u8 | u8 | clamp | 2 | 128 | 1.38 ms | 0.1132 ms | n/a |
1920x1080 | u8 | u16 | cast | 1 | 0 | 0.8 ms | 0.1225 ms | n/a |
1920x1080 | u8 | f32 | cast | 1 | 0 | 0.24 ms | 0.1602 ms | n/a |
1920x1080 | u8 | nv12 | cast | 1 | 0 | 0.749 ms | 0.1168 ms | n/a |
1920x1080 | u8 | rgb8 | cast | 1 | 0 | 0.3 ms | 0.1898 ms | n/a |
1920x1080 | u8 | rgba8 | cast | 1 | 0 | 0.378 ms | 0.1612 ms | n/a |
1920x1080 | u16 | u8 | cast | 1 | 0 | 0.677 ms | 0.1104 ms | n/a |
1920x1080 | u16 | u16 | cast | 1 | 0 | 0.3518 ms | 0.193 ms | n/a |
1920x1080 | u16 | u16 | clamp | 2 | 128 | 0.923 ms | 0.1402 ms | n/a |
1920x1080 | u16 | f32 | cast | 1 | 0 | 0.886 ms | 0.1669 ms | n/a |
1920x1080 | u16 | nv12 | cast | 1 | 0 | 0.758 ms | 0.1237 ms | n/a |
1920x1080 | u16 | rgb8 | cast | 1 | 0 | 0.3 ms | 0.1931 ms | n/a |
1920x1080 | u16 | rgba8 | cast | 1 | 0 | 0.29 ms | 0.1666 ms | n/a |
1920x1080 | f32 | u8 | cast | 1 | 0 | 0.883 ms | 0.1203 ms | n/a |
1920x1080 | f32 | u16 | cast | 1 | 0 | 0.684 ms | 0.1345 ms | n/a |
1920x1080 | f32 | f32 | cast | 1 | 0 | 0.73 ms | 0.3035 ms | n/a |
1920x1080 | f32 | f32 | clamp | 2 | 128 | 1.029 ms | 0.1633 ms | n/a |
1920x1080 | f32 | nv12 | cast | 1 | 0 | 0.980 ms | 0.1330 ms | n/a |
1920x1080 | f32 | rgb8 | cast | 1 | 0 | 0.345 ms | 0.2038 ms | n/a |
1920x1080 | f32 | rgba8 | cast | 1 | 0 | 0.42 ms | 0.1642 ms | n/a |
1920x1080 | nv12 | u8 | cast | 1 | 0 | 0.611 ms | 0.1041 ms | n/a |
1920x1080 | nv12 | u16 | cast | 1 | 0 | 0.8 ms | 0.1225 ms | n/a |
1920x1080 | nv12 | f32 | cast | 1 | 0 | 0.221 ms | 0.1587 ms | n/a |
1920x1080 | nv12 | nv12 | cast | 1 | 0 | 0.25 ms | 0.163 ms | n/a |
1920x1080 | nv12 | nv12 | clamp | 2 | 128 | 1.69 ms | 0.1506 ms | n/a |
1920x1080 | nv12 | rgb8 | cast | 1 | 0 | 3.6 ms | 0.1890 ms | n/a |
1920x1080 | nv12 | rgba8 | cast | 1 | 0 | 3.1 ms | 0.1792 ms | n/a |
1920x1080 | rgb8 | u8 | cast | 1 | 0 | 3.549 ms | 0.1240 ms | n/a |
1920x1080 | rgb8 | u16 | cast | 1 | 0 | 3.64 ms | 0.1407 ms | n/a |
1920x1080 | rgb8 | f32 | cast | 1 | 0 | 3.93 ms | 0.1645 ms | n/a |
1920x1080 | rgb8 | nv12 | cast | 1 | 0 | 6.6 ms | 0.1453 ms | n/a |
1920x1080 | rgb8 | rgb8 | cast | 1 | 0 | 0.586 ms | 0.2488 ms | n/a |
1920x1080 | rgb8 | rgb8 | clamp | 2 | 128 | 1.456 ms | 0.18074 ms | n/a |
1920x1080 | rgb8 | bgr8 | cast | 1 | 0 | 0.324 ms | 0.2022 ms | n/a |
1920x1080 | rgb8 | rgba8 | cast | 1 | 0 | 0.354 ms | 0.1582 ms | n/a |
1920x1080 | rgba8 | u8 | cast | 1 | 0 | 4.34 ms | 0.1267 ms | n/a |
1920x1080 | rgba8 | u16 | cast | 1 | 0 | 4.45 ms | 0.1409 ms | n/a |
1920x1080 | rgba8 | f32 | cast | 1 | 0 | 4.84 ms | 0.1683 ms | n/a |
1920x1080 | rgba8 | nv12 | cast | 1 | 0 | 7.4 ms | 0.1447 ms | n/a |
1920x1080 | rgba8 | rgb8 | cast | 1 | 0 | 0.346 ms | 0.2019 ms | n/a |
1920x1080 | rgba8 | rgba8 | cast | 1 | 0 | 0.773 ms | 0.303 ms | n/a |
1920x1080 | rgba8 | rgba8 | clamp | 2 | 128 | 4.19 ms | 0.1714 ms | n/a |
1920x1080 | rgba8 | bgra8 | cast | 1 | 0 | 0.4 ms | 0.1683 ms | n/a |
size | input type | output type | conv. | scale | offset | CPU | CUDA | PVA |
---|---|---|---|---|---|---|---|---|
1920x1080 | u8 | u8 | cast | 1 | 0 | 0.585 ms | 1.769 ms | n/a |
1920x1080 | u8 | u8 | clamp | 2 | 128 | 2.5 ms | 0.445 ms | n/a |
1920x1080 | u8 | u16 | cast | 1 | 0 | 0.47 ms | 0.424 ms | n/a |
1920x1080 | u8 | f32 | cast | 1 | 0 | 0.64 ms | 0.498 ms | n/a |
1920x1080 | u8 | nv12 | cast | 1 | 0 | 0.869 ms | 0.430 ms | n/a |
1920x1080 | u8 | rgb8 | cast | 1 | 0 | 0.50 ms | 0.555 ms | n/a |
1920x1080 | u8 | rgba8 | cast | 1 | 0 | 0.708 ms | 0.496 ms | n/a |
1920x1080 | u16 | u8 | cast | 1 | 0 | 1.17 ms | 0.428 ms | n/a |
1920x1080 | u16 | u16 | cast | 1 | 0 | 1.089 ms | 0.623 ms | n/a |
1920x1080 | u16 | u16 | clamp | 2 | 128 | 1.67 ms | 0.489 ms | n/a |
1920x1080 | u16 | f32 | cast | 1 | 0 | 1.032 ms | 0.521 ms | n/a |
1920x1080 | u16 | nv12 | cast | 1 | 0 | 1.243 ms | 0.450 ms | n/a |
1920x1080 | u16 | rgb8 | cast | 1 | 0 | 1.14 ms | 0.572 ms | n/a |
1920x1080 | u16 | rgba8 | cast | 1 | 0 | 1.15 ms | 0.521 ms | n/a |
1920x1080 | f32 | u8 | cast | 1 | 0 | 1.83 ms | 0.461 ms | n/a |
1920x1080 | f32 | u16 | cast | 1 | 0 | 1.049 ms | 0.491 ms | n/a |
1920x1080 | f32 | f32 | cast | 1 | 0 | 2.083 ms | 1.116 ms | n/a |
1920x1080 | f32 | f32 | clamp | 2 | 128 | 1.35 ms | 0.549 ms | n/a |
1920x1080 | f32 | nv12 | cast | 1 | 0 | 1.90 ms | 0.491 ms | n/a |
1920x1080 | f32 | rgb8 | cast | 1 | 0 | 1.31 ms | 0.61 ms | n/a |
1920x1080 | f32 | rgba8 | cast | 1 | 0 | 1.41 ms | 0.56 ms | n/a |
1920x1080 | nv12 | u8 | cast | 1 | 0 | 0.80 ms | 0.400 ms | n/a |
1920x1080 | nv12 | u16 | cast | 1 | 0 | 0.448 ms | 0.428 ms | n/a |
1920x1080 | nv12 | f32 | cast | 1 | 0 | 0.637 ms | 0.499 ms | n/a |
1920x1080 | nv12 | nv12 | cast | 1 | 0 | 0.981 ms | 2.575 ms | n/a |
1920x1080 | nv12 | nv12 | clamp | 2 | 128 | 3.01 ms | 0.577 ms | n/a |
1920x1080 | nv12 | rgb8 | cast | 1 | 0 | 8.70 ms | 0.661 ms | n/a |
1920x1080 | nv12 | rgba8 | cast | 1 | 0 | 8.7 ms | 0.64 ms | n/a |
1920x1080 | rgb8 | u8 | cast | 1 | 0 | 11.7 ms | 0.510 ms | n/a |
1920x1080 | rgb8 | u16 | cast | 1 | 0 | 11.9 ms | 0.534 ms | n/a |
1920x1080 | rgb8 | f32 | cast | 1 | 0 | 12.7 ms | 0.60 ms | n/a |
1920x1080 | rgb8 | nv12 | cast | 1 | 0 | 13.3 ms | 0.65 ms | n/a |
1920x1080 | rgb8 | rgb8 | cast | 1 | 0 | 1.577 ms | 2.319 ms | n/a |
1920x1080 | rgb8 | rgb8 | clamp | 2 | 128 | 2.91 ms | 0.67 ms | n/a |
1920x1080 | rgb8 | bgr8 | cast | 1 | 0 | 1.22 ms | 0.599 ms | n/a |
1920x1080 | rgb8 | rgba8 | cast | 1 | 0 | 1.33 ms | 0.57 ms | n/a |
1920x1080 | rgba8 | u8 | cast | 1 | 0 | 11.8 ms | 0.497 ms | n/a |
1920x1080 | rgba8 | u16 | cast | 1 | 0 | 11.9 ms | 0.523 ms | n/a |
1920x1080 | rgba8 | f32 | cast | 1 | 0 | 13.62 ms | 0.593 ms | n/a |
1920x1080 | rgba8 | nv12 | cast | 1 | 0 | 13.6 ms | 0.588 ms | n/a |
1920x1080 | rgba8 | rgb8 | cast | 1 | 0 | 1.19 ms | 0.60 ms | n/a |
1920x1080 | rgba8 | rgba8 | cast | 1 | 0 | 2.091 ms | 1.126 ms | n/a |
1920x1080 | rgba8 | rgba8 | clamp | 2 | 128 | 15.56 ms | 0.69 ms | n/a |
1920x1080 | rgba8 | bgra8 | cast | 1 | 0 | 1.32 ms | 0.561 ms | n/a |
size | input type | output type | conv. | scale | offset | CPU | CUDA | PVA |
---|---|---|---|---|---|---|---|---|
1920x1080 | u8 | u8 | cast | 1 | 0 | 0.58 ms | 2.429 ms | n/a |
1920x1080 | u8 | u8 | clamp | 2 | 128 | 5.511 ms | 1.122 ms | n/a |
1920x1080 | u8 | u16 | cast | 1 | 0 | 0.815 ms | 1.023 ms | n/a |
1920x1080 | u8 | f32 | cast | 1 | 0 | 1.170 ms | 1.115 ms | n/a |
1920x1080 | u8 | nv12 | cast | 1 | 0 | 1.744 ms | 1.046 ms | n/a |
1920x1080 | u8 | rgb8 | cast | 1 | 0 | 0.949 ms | 1.307 ms | n/a |
1920x1080 | u8 | rgba8 | cast | 1 | 0 | 1.116 ms | 1.134 ms | n/a |
1920x1080 | u16 | u8 | cast | 1 | 0 | 2.69 ms | 1.030 ms | n/a |
1920x1080 | u16 | u16 | cast | 1 | 0 | 1.1713 ms | 0.7099 ms | n/a |
1920x1080 | u16 | u16 | clamp | 2 | 128 | 4.26 ms | 1.195 ms | n/a |
1920x1080 | u16 | f32 | cast | 1 | 0 | 2.02 ms | 1.175 ms | n/a |
1920x1080 | u16 | nv12 | cast | 1 | 0 | 2.78 ms | 1.089 ms | n/a |
1920x1080 | u16 | rgb8 | cast | 1 | 0 | 2.503 ms | 1.335 ms | n/a |
1920x1080 | u16 | rgba8 | cast | 1 | 0 | 2.59 ms | 1.184 ms | n/a |
1920x1080 | f32 | u8 | cast | 1 | 0 | 3.16 ms | 1.096 ms | n/a |
1920x1080 | f32 | u16 | cast | 1 | 0 | 1.755 ms | 1.146 ms | n/a |
1920x1080 | f32 | f32 | cast | 1 | 0 | 2.341 ms | 1.2967 ms | n/a |
1920x1080 | f32 | f32 | clamp | 2 | 128 | 1.901 ms | 1.250 ms | n/a |
1920x1080 | f32 | nv12 | cast | 1 | 0 | 3.27 ms | 1.151 ms | n/a |
1920x1080 | f32 | rgb8 | cast | 1 | 0 | 2.079 ms | 1.407 ms | n/a |
1920x1080 | f32 | rgba8 | cast | 1 | 0 | 2.134 ms | 1.279 ms | n/a |
1920x1080 | nv12 | u8 | cast | 1 | 0 | 1.628 ms | 0.989 ms | n/a |
1920x1080 | nv12 | u16 | cast | 1 | 0 | 0.825 ms | 1.022 ms | n/a |
1920x1080 | nv12 | f32 | cast | 1 | 0 | 1.167 ms | 1.116 ms | n/a |
1920x1080 | nv12 | nv12 | cast | 1 | 0 | 1.028 ms | 3.439 ms | n/a |
1920x1080 | nv12 | nv12 | clamp | 2 | 128 | 6.85 ms | 1.457 ms | n/a |
1920x1080 | nv12 | rgb8 | cast | 1 | 0 | 15.17 ms | 1.826 ms | n/a |
1920x1080 | nv12 | rgba8 | cast | 1 | 0 | 14.96 ms | 1.765 ms | n/a |
1920x1080 | rgb8 | u8 | cast | 1 | 0 | 24.44 ms | 1.334 ms | n/a |
1920x1080 | rgb8 | u16 | cast | 1 | 0 | 24.72 ms | 1.379 ms | n/a |
1920x1080 | rgb8 | f32 | cast | 1 | 0 | 25.34 ms | 1.448 ms | n/a |
1920x1080 | rgb8 | nv12 | cast | 1 | 0 | 25.54 ms | 1.7445 ms | n/a |
1920x1080 | rgb8 | rgb8 | cast | 1 | 0 | 1.7994 ms | 3.057 ms | n/a |
1920x1080 | rgb8 | rgb8 | clamp | 2 | 128 | 6.0 ms | 1.738 ms | n/a |
1920x1080 | rgb8 | bgr8 | cast | 1 | 0 | 2.67 ms | 1.474 ms | n/a |
1920x1080 | rgb8 | rgba8 | cast | 1 | 0 | 2.70 ms | 1.337 ms | n/a |
1920x1080 | rgba8 | u8 | cast | 1 | 0 | 24.66 ms | 1.213 ms | n/a |
1920x1080 | rgba8 | u16 | cast | 1 | 0 | 25.04 ms | 1.275 ms | n/a |
1920x1080 | rgba8 | f32 | cast | 1 | 0 | 25.65 ms | 1.378 ms | n/a |
1920x1080 | rgba8 | nv12 | cast | 1 | 0 | 25.69 ms | 1.577 ms | n/a |
1920x1080 | rgba8 | rgb8 | cast | 1 | 0 | 1.826 ms | 1.382 ms | n/a |
1920x1080 | rgba8 | rgba8 | cast | 1 | 0 | 2.321 ms | 1.3002 ms | n/a |
1920x1080 | rgba8 | rgba8 | clamp | 2 | 128 | 29.55 ms | 1.701 ms | n/a |
1920x1080 | rgba8 | bgra8 | cast | 1 | 0 | 1.920 ms | 1.294 ms | n/a |