The Image Format Converter is used to convert an image with a given format into another format. It handles both color space, format and depth conversions. The algorithm also supports input range conversion, when one is required to map, for instance, an unsigned char
\([0,255]\) image into signed short
\([-32768,32767]\) range.
Color Input | Grayscale Output |
---|---|
![]() | ![]() |
The algorithm is implemented as a pixel-wise conversion function that reads in the input pixels, applies a conversion-dependent series of transformations and writes the result to the output image in the same position. User inputs are:
Several types of conversion are available:
The grayscale (or single channel) formats available are:
The color formats available are:
The following table shows which combinations of input and output image types are available for conversion.
in | U8 | S8 | U16 | S16 | F32 | NV12 | RGB8 | RGBA8 | BGR8 | BGRA8 | 2F32 |
---|---|---|---|---|---|---|---|---|---|---|---|
out | |||||||||||
U8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
S8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
U16 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
S16 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
F32 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
NV12 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
RGB8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
RGBA8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
BGR8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
BGRA8 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | no |
2F32 | no | no | no | no | no | no | no | no | no | no | yes1 |
1 - Only available when scale == 1
and offset == 0
The following sections describe how input value is converted into output. In general, these conversions amount to color space, depth, channel order (swizzle), adding or removing alpha channel, down- or up-sampling transformations. These are represented as conversion pipelines made out of basic processing blocks defined below.
Channel depth conversion is represented by the block aptly named "depth" and is defined by the following sub-pipeline:
depth | \(=\) | range | \(\rightarrow\) | round | \(\rightarrow\) | clamp/cast |
range: input is converted to floating point (fp32) and the following formula is applied:
\[ f(x) = \text{scale} \times x + \text{offset} \]
If scale==1
and offset==0
, a shortcut is taken and no operation (not even conversion to floating point) is performed.
round(0.5) == 1.0
and round(-0.5) == -1.0
.static_cast
would do. Underflows and overflows will behave as described by C specification (including undefined behavior). This is used when it's known that input range fits into output and maximum performance is needed.When applied to multiple channels such as RGB, the operation is performed on each channel independently.
This is represented by the following block:
swizzle |
It's used to permute (or swizzle) input type's channel order. Used in conversions from/to color spaces that can be represented in multiple ways, like RGB and BGR. The color space conversion functions assume a pre-determined channel order. In order to use them, channels must be reordered.
For RGB \(\leftrightarrow\) YUV conversions, VPI uses the ITU-R BT.601 625-line specification. It's the same standard used by JPEG File Interchange Format (JFIF).
To precisely establish the conversion, let's define the following constants:
\begin{align} K_r &= 0.299 \\ K_g &= 0.587 \\ K_b &= 0.114 \\ K_{c_b} &= 1.772 \\ K_{c_r} &= 1.402 \\ \end{align}
For notation convenience, we're assuming that \(U\) and \(V\) correspond to \(C_b\) and \(C_r\) respectively. This assumption doesn't hold in general.
The conversion blocks can be defined as:
rgb2yuv |
\begin{align} Y(r,g,b) &= \text{round}(r K_r + g K_g + b K_b)\big|^{255}_{0} \\ C_b(r,g,b) &= \text{round}((-r K_r - g K_g + b (1 - K_b )) / K_ {c_b} + 128)\big|^{255}_{0} \\ C_r(r,g,b) &= \text{round}((r (1-K_r) - g K_g - b K_b) / K_{c_r} + 128)\big|^{255}_{0} \end{align}
These functions expect \(r,g,b \in [0,255]\)
yuv2rgb |
\begin{align} R(y,c_b,c_r) &= \text{round}(y+K_{c_r}(c_r-128))\big|^{255}_{0} \\ G(y,c_b,c_r) &= \text{round}(y-[K_b K_{c_b} (c_b-128) + K_r K_{c_r} (c_r-128)] / K_g)\big|^{255}_{0} \\ B(y,c_b,c_r) &= \text{round}(y + K_{c_b} (c_b - 128))\big|^{255}_{0} \end{align}
These functions expect \(y,c_b,c_r \in [0,255]\)
The notation \(X\big|^{N} _ {M} \) represents clamping X's underflows and overflows to M and N respectively.
The round
function follows the definition here.
Conversion from RGB to grayscale follows the same specification used for conversion from RGB to YUV, but just returning the luma component. Hence, using the same constants defined here.
rgb2gray |
\[ Y(r,g,b) = \text{round}(K_r \times r + K_g \times g + K_b \times b)\big|^{255}_{0} \]
For grayscale to RGB the conversion is simply:
gray2rgb |
\[ f(x) = (x,x,x) \]
For image formats that includes subsampled planes like VPI_IMAGE_TYPE_NV12, the following block definitions are needed:
2x downsample |
\[ D[x,y] = S[2x,2y] \]
2x upsample |
\[ D[x,y] = S[\lfloor x/2 \rfloor, \lfloor y/2 \rfloor] \]
Depending on input and output pixel type, i.e. whether it's required to remove or add an alpha channel, the following block might be used:
alpha |
This section defines how input pixel is converted to output. It uses the basic conversion blocks defined in previous section.
input | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | depth | \(\rightarrow\) | Y plane | \(\searrow\) |
(128,128) | \(\rightarrow\) | UV plane | \(\nearrow\) |
output |
input | \(\rightarrow\) | Y plane | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | depth | \(\rightarrow\) | gray2rgb | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | output |
input | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | rgb2gray | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | depth | \(\rightarrow\) | rgb2yuv |
\(\nearrow\) | Y plane | \(\searrow\) | ||
\(\searrow\) | UV plane | \(\rightarrow\) | 2x downsample | \(\nearrow\) |
output |
input |
\(\nearrow\) | Y plane | \(\searrow\) | ||
\(\searrow\) | UV plane | \(\rightarrow\) | 2x upsample | \(\nearrow\) |
yuv2rgb | \(\rightarrow\) | depth | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | output |
For further information on how performance was benchmarked, see Performance Measurement.
size | input type | output type | conv. | scale | offset | CPU | CUDA | PVA |
---|---|---|---|---|---|---|---|---|
1920x1080 | u8 | u8 | cast | 1 | 0 | 0.1452 ms | 0.0603 ms | n/a |
1920x1080 | u8 | u8 | clamp | 2 | 128 | 1.37 ms | 0.1157 ms | n/a |
1920x1080 | u8 | u16 | cast | 1 | 0 | 0.79 ms | 0.1234 ms | n/a |
1920x1080 | u8 | f32 | cast | 1 | 0 | 0.211 ms | 0.1604 ms | n/a |
1920x1080 | u8 | nv12 | cast | 1 | 0 | 0.749 ms | 0.1186 ms | n/a |
1920x1080 | u8 | rgb8 | cast | 1 | 0 | 0.4 ms | 0.1929 ms | n/a |
1920x1080 | u8 | rgba8 | cast | 1 | 0 | 0.2 ms | 0.1620 ms | n/a |
1920x1080 | u16 | u8 | cast | 1 | 0 | 0.669 ms | 0.1120 ms | n/a |
1920x1080 | u16 | u16 | cast | 1 | 0 | 0.322 ms | 0.1230 ms | n/a |
1920x1080 | u16 | u16 | clamp | 2 | 128 | 0.919 ms | 0.1402 ms | n/a |
1920x1080 | u16 | f32 | cast | 1 | 0 | 0.882 ms | 0.1688 ms | n/a |
1920x1080 | u16 | nv12 | cast | 1 | 0 | 0.737 ms | 0.1253 ms | n/a |
1920x1080 | u16 | rgb8 | cast | 1 | 0 | 0.40 ms | 0.1961 ms | n/a |
1920x1080 | u16 | rgba8 | cast | 1 | 0 | 0.3 ms | 0.1675 ms | n/a |
1920x1080 | f32 | u8 | cast | 1 | 0 | 0.867 ms | 0.1219 ms | n/a |
1920x1080 | f32 | u16 | cast | 1 | 0 | 0.694 ms | 0.1368 ms | n/a |
1920x1080 | f32 | f32 | cast | 1 | 0 | 0.786 ms | 0.2505 ms | n/a |
1920x1080 | f32 | f32 | clamp | 2 | 128 | 1.036 ms | 0.1651 ms | n/a |
1920x1080 | f32 | nv12 | cast | 1 | 0 | 0.93 ms | 0.1351 ms | n/a |
1920x1080 | f32 | rgb8 | cast | 1 | 0 | 0.534 ms | 0.2057 ms | n/a |
1920x1080 | f32 | rgba8 | cast | 1 | 0 | 0.414 ms | 0.1658 ms | n/a |
1920x1080 | nv12 | u8 | cast | 1 | 0 | 0.449 ms | 0.1057 ms | n/a |
1920x1080 | nv12 | u16 | cast | 1 | 0 | 0.84 ms | 0.1237 ms | n/a |
1920x1080 | nv12 | f32 | cast | 1 | 0 | 0.233 ms | 0.1608 ms | n/a |
1920x1080 | nv12 | nv12 | cast | 1 | 0 | 0.222 ms | 0.0927 ms | n/a |
1920x1080 | nv12 | nv12 | clamp | 2 | 128 | 1.70 ms | 0.1534 ms | n/a |
1920x1080 | nv12 | rgb8 | cast | 1 | 0 | 3.75 ms | 0.1918 ms | n/a |
1920x1080 | nv12 | rgba8 | cast | 1 | 0 | 3.21 ms | 0.1821 ms | n/a |
1920x1080 | rgb8 | u8 | cast | 1 | 0 | 3.56 ms | 0.1263 ms | n/a |
1920x1080 | rgb8 | u16 | cast | 1 | 0 | 3.65 ms | 0.1410 ms | n/a |
1920x1080 | rgb8 | f32 | cast | 1 | 0 | 3.85 ms | 0.1663 ms | n/a |
1920x1080 | rgb8 | nv12 | cast | 1 | 0 | 4.6 ms | 0.1480 ms | n/a |
1920x1080 | rgb8 | rgb8 | cast | 1 | 0 | 0.569 ms | 0.1875 ms | n/a |
1920x1080 | rgb8 | rgb8 | clamp | 2 | 128 | 1.463 ms | 0.1817 ms | n/a |
1920x1080 | rgb8 | bgr8 | cast | 1 | 0 | 1.0 ms | 0.2042 ms | n/a |
1920x1080 | rgb8 | rgba8 | cast | 1 | 0 | 0.941 ms | 0.1601 ms | n/a |
1920x1080 | rgba8 | u8 | cast | 1 | 0 | 4.34 ms | 0.1292 ms | n/a |
1920x1080 | rgba8 | u16 | cast | 1 | 0 | 4.44 ms | 0.1417 ms | n/a |
1920x1080 | rgba8 | f32 | cast | 1 | 0 | 4.85 ms | 0.1706 ms | n/a |
1920x1080 | rgba8 | nv12 | cast | 1 | 0 | 5.3 ms | 0.1473 ms | n/a |
1920x1080 | rgba8 | rgb8 | cast | 1 | 0 | 0.574 ms | 0.2038 ms | n/a |
1920x1080 | rgba8 | rgba8 | cast | 1 | 0 | 0.797 ms | 0.2509 ms | n/a |
1920x1080 | rgba8 | rgba8 | clamp | 2 | 128 | 4.29 ms | 0.1743 ms | n/a |
1920x1080 | rgba8 | bgra8 | cast | 1 | 0 | 0.41 ms | 0.1702 ms | n/a |
size | input type | output type | conv. | scale | offset | CPU | CUDA | PVA |
---|---|---|---|---|---|---|---|---|
1920x1080 | u8 | u8 | cast | 1 | 0 | 0.553 ms | 1.770 ms | n/a |
1920x1080 | u8 | u8 | clamp | 2 | 128 | 2.4 ms | 0.448 ms | n/a |
1920x1080 | u8 | u16 | cast | 1 | 0 | 0.45 ms | 0.430 ms | n/a |
1920x1080 | u8 | f32 | cast | 1 | 0 | 0.645 ms | 0.50 ms | n/a |
1920x1080 | u8 | nv12 | cast | 1 | 0 | 1.087 ms | 0.430 ms | n/a |
1920x1080 | u8 | rgb8 | cast | 1 | 0 | 0.57 ms | 0.56 ms | n/a |
1920x1080 | u8 | rgba8 | cast | 1 | 0 | 0.701 ms | 0.50 ms | n/a |
1920x1080 | u16 | u8 | cast | 1 | 0 | 1.25 ms | 0.431 ms | n/a |
1920x1080 | u16 | u16 | cast | 1 | 0 | 1.078 ms | 0.479 ms | n/a |
1920x1080 | u16 | u16 | clamp | 2 | 128 | 1.68 ms | 0.50 ms | n/a |
1920x1080 | u16 | f32 | cast | 1 | 0 | 1.042 ms | 0.53 ms | n/a |
1920x1080 | u16 | nv12 | cast | 1 | 0 | 1.29 ms | 0.457 ms | n/a |
1920x1080 | u16 | rgb8 | cast | 1 | 0 | 1.132 ms | 0.58 ms | n/a |
1920x1080 | u16 | rgba8 | cast | 1 | 0 | 1.16 ms | 0.53 ms | n/a |
1920x1080 | f32 | u8 | cast | 1 | 0 | 1.83 ms | 0.47 ms | n/a |
1920x1080 | f32 | u16 | cast | 1 | 0 | 1.053 ms | 0.50 ms | n/a |
1920x1080 | f32 | f32 | cast | 1 | 0 | 2.060 ms | 0.804 ms | n/a |
1920x1080 | f32 | f32 | clamp | 2 | 128 | 1.35 ms | 0.56 ms | n/a |
1920x1080 | f32 | nv12 | cast | 1 | 0 | 1.90 ms | 0.496 ms | n/a |
1920x1080 | f32 | rgb8 | cast | 1 | 0 | 1.20 ms | 0.61 ms | n/a |
1920x1080 | f32 | rgba8 | cast | 1 | 0 | 1.40 ms | 0.57 ms | n/a |
1920x1080 | nv12 | u8 | cast | 1 | 0 | 0.84 ms | 0.405 ms | n/a |
1920x1080 | nv12 | u16 | cast | 1 | 0 | 0.438 ms | 0.428 ms | n/a |
1920x1080 | nv12 | f32 | cast | 1 | 0 | 0.65 ms | 0.50 ms | n/a |
1920x1080 | nv12 | nv12 | cast | 1 | 0 | 0.895 ms | 2.572 ms | n/a |
1920x1080 | nv12 | nv12 | clamp | 2 | 128 | 3.04 ms | 0.579 ms | n/a |
1920x1080 | nv12 | rgb8 | cast | 1 | 0 | 8.84 ms | 0.67 ms | n/a |
1920x1080 | nv12 | rgba8 | cast | 1 | 0 | 8.6 ms | 0.65 ms | n/a |
1920x1080 | rgb8 | u8 | cast | 1 | 0 | 11.9 ms | 0.52 ms | n/a |
1920x1080 | rgb8 | u16 | cast | 1 | 0 | 13.39 ms | 0.54 ms | n/a |
1920x1080 | rgb8 | f32 | cast | 1 | 0 | 13.98 ms | 0.60 ms | n/a |
1920x1080 | rgb8 | nv12 | cast | 1 | 0 | 13.93 ms | 0.63 ms | n/a |
1920x1080 | rgb8 | rgb8 | cast | 1 | 0 | 1.595 ms | 2.088 ms | n/a |
1920x1080 | rgb8 | rgb8 | clamp | 2 | 128 | 3.34 ms | 0.69 ms | n/a |
1920x1080 | rgb8 | bgr8 | cast | 1 | 0 | 1.13 ms | 0.61 ms | n/a |
1920x1080 | rgb8 | rgba8 | cast | 1 | 0 | 1.30 ms | 0.57 ms | n/a |
1920x1080 | rgba8 | u8 | cast | 1 | 0 | 13.39 ms | 0.50 ms | n/a |
1920x1080 | rgba8 | u16 | cast | 1 | 0 | 13.59 ms | 0.53 ms | n/a |
1920x1080 | rgba8 | f32 | cast | 1 | 0 | 14.30 ms | 0.60 ms | n/a |
1920x1080 | rgba8 | nv12 | cast | 1 | 0 | 14.08 ms | 0.59 ms | n/a |
1920x1080 | rgba8 | rgb8 | cast | 1 | 0 | 1.15 ms | 0.60 ms | n/a |
1920x1080 | rgba8 | rgba8 | cast | 1 | 0 | 2.088 ms | 0.804 ms | n/a |
1920x1080 | rgba8 | rgba8 | clamp | 2 | 128 | 11.1 ms | 0.70 ms | n/a |
1920x1080 | rgba8 | bgra8 | cast | 1 | 0 | 1.310 ms | 0.57 ms | n/a |
size | input type | output type | conv. | scale | offset | CPU | CUDA | PVA |
---|---|---|---|---|---|---|---|---|
1920x1080 | u8 | u8 | cast | 1 | 0 | 0.679 ms | 2.25 ms | n/a |
1920x1080 | u8 | u8 | clamp | 2 | 128 | 5.63 ms | 1.120 ms | n/a |
1920x1080 | u8 | u16 | cast | 1 | 0 | 0.855 ms | 1.023 ms | n/a |
1920x1080 | u8 | f32 | cast | 1 | 0 | 1.156 ms | 1.117 ms | n/a |
1920x1080 | u8 | nv12 | cast | 1 | 0 | 1.731 ms | 1.044 ms | n/a |
1920x1080 | u8 | rgb8 | cast | 1 | 0 | 1.007 ms | 1.303 ms | n/a |
1920x1080 | u8 | rgba8 | cast | 1 | 0 | 1.112 ms | 1.137 ms | n/a |
1920x1080 | u16 | u8 | cast | 1 | 0 | 2.65 ms | 1.029 ms | n/a |
1920x1080 | u16 | u16 | cast | 1 | 0 | 1.2304 ms | 0.4015 ms | n/a |
1920x1080 | u16 | u16 | clamp | 2 | 128 | 4.26 ms | 1.194 ms | n/a |
1920x1080 | u16 | f32 | cast | 1 | 0 | 2.45 ms | 1.174 ms | n/a |
1920x1080 | u16 | nv12 | cast | 1 | 0 | 2.83 ms | 1.086 ms | n/a |
1920x1080 | u16 | rgb8 | cast | 1 | 0 | 2.43 ms | 1.331 ms | n/a |
1920x1080 | u16 | rgba8 | cast | 1 | 0 | 2.57 ms | 1.187 ms | n/a |
1920x1080 | f32 | u8 | cast | 1 | 0 | 3.199 ms | 1.095 ms | n/a |
1920x1080 | f32 | u16 | cast | 1 | 0 | 1.746 ms | 1.145 ms | n/a |
1920x1080 | f32 | f32 | cast | 1 | 0 | 2.290 ms | 0.8010 ms | n/a |
1920x1080 | f32 | f32 | clamp | 2 | 128 | 1.909 ms | 1.239 ms | n/a |
1920x1080 | f32 | nv12 | cast | 1 | 0 | 3.32 ms | 1.150 ms | n/a |
1920x1080 | f32 | rgb8 | cast | 1 | 0 | 2.072 ms | 1.410 ms | n/a |
1920x1080 | f32 | rgba8 | cast | 1 | 0 | 2.122 ms | 1.278 ms | n/a |
1920x1080 | nv12 | u8 | cast | 1 | 0 | 1.622 ms | 0.988 ms | n/a |
1920x1080 | nv12 | u16 | cast | 1 | 0 | 0.808 ms | 1.024 ms | n/a |
1920x1080 | nv12 | f32 | cast | 1 | 0 | 0.910 ms | 1.116 ms | n/a |
1920x1080 | nv12 | nv12 | cast | 1 | 0 | 0.963 ms | 3.285 ms | n/a |
1920x1080 | nv12 | nv12 | clamp | 2 | 128 | 6.95 ms | 1.458 ms | n/a |
1920x1080 | nv12 | rgb8 | cast | 1 | 0 | 15.14 ms | 1.826 ms | n/a |
1920x1080 | nv12 | rgba8 | cast | 1 | 0 | 14.929 ms | 1.766 ms | n/a |
1920x1080 | rgb8 | u8 | cast | 1 | 0 | 24.41 ms | 1.331 ms | n/a |
1920x1080 | rgb8 | u16 | cast | 1 | 0 | 24.667 ms | 1.378 ms | n/a |
1920x1080 | rgb8 | f32 | cast | 1 | 0 | 25.28 ms | 1.452 ms | n/a |
1920x1080 | rgb8 | nv12 | cast | 1 | 0 | 25.46 ms | 1.7450 ms | n/a |
1920x1080 | rgb8 | rgb8 | cast | 1 | 0 | 1.772 ms | 2.48 ms | n/a |
1920x1080 | rgb8 | rgb8 | clamp | 2 | 128 | 7.2 ms | 1.737 ms | n/a |
1920x1080 | rgb8 | bgr8 | cast | 1 | 0 | 1.88 ms | 1.472 ms | n/a |
1920x1080 | rgb8 | rgba8 | cast | 1 | 0 | 2.5 ms | 1.332 ms | n/a |
1920x1080 | rgba8 | u8 | cast | 1 | 0 | 24.67 ms | 1.214 ms | n/a |
1920x1080 | rgba8 | u16 | cast | 1 | 0 | 25.14 ms | 1.272 ms | n/a |
1920x1080 | rgba8 | f32 | cast | 1 | 0 | 25.68 ms | 1.375 ms | n/a |
1920x1080 | rgba8 | nv12 | cast | 1 | 0 | 25.81 ms | 1.577 ms | n/a |
1920x1080 | rgba8 | rgb8 | cast | 1 | 0 | 1.820 ms | 1.380 ms | n/a |
1920x1080 | rgba8 | rgba8 | cast | 1 | 0 | 2.323 ms | 0.8022 ms | n/a |
1920x1080 | rgba8 | rgba8 | clamp | 2 | 128 | 29.57 ms | 1.696 ms | n/a |
1920x1080 | rgba8 | bgra8 | cast | 1 | 0 | 1.892 ms | 1.292 ms | n/a |