The Convert Image Format is used to convert an image with a given format into another format. It handles both color spec, format and depth conversions. The algorithm also supports input range conversion, when one is required to map, for instance, an unsigned char
\([0,255]\) image into signed short
\([-32768,32767]\) range.
Color Input | Grayscale Output |
---|---|
For list of limitations, constraints and backends that implements the algorithm, consult reference documentation of the following functions:
Function | Description |
---|---|
vpiSubmitConvertImageFormat | Converts the image contents to the desired format, with optional scaling and offset. |
The algorithm is implemented as a pixel-wise conversion function that reads in the input pixels, applies a conversion-dependent series of transformations and writes the result to the output image in the same position. User inputs are:
Several types of conversion are available:
The color formats available are:
The grayscale formats available are:
Non-color formats are to be used when the information stored isn't represented as a color, such as temperature, speed, etc. For conversions, they are considered to be grayscale with extended range.
The following table shows which combinations of input and output image types are available for conversion.
in |
---|
out |
The following sections describe how input value is converted into output. In general, these conversions amount to color spec, depth, channel order (swizzle), adding or removing alpha channel, down- or up-sampling transformations. These are represented as conversion pipelines made out of basic processing blocks defined below.
Channel depth conversion is represented by the block aptly named "depth" and is defined by the following sub-pipeline:
depth | \(=\) | range | \(\rightarrow\) | round | \(\rightarrow\) | clamp/cast |
range: input is converted to floating point (fp32) and the following formula is applied:
\[ f(x) = \text{scale} \times x + \text{offset} \]
If scale==1
and offset==0
, a shortcut is taken and no operation (not even conversion to floating point) is performed.
round(0.5) == 1.0
and round(-0.5) == -1.0
.static_cast
would do. Underflows and overflows will behave as described by C specification (including undefined behavior). This is used when it's known that input range fits into output and maximum performance is needed.When applied to multiple channels such as RGB, the operation is performed on each channel independently.
This is represented by the following block:
swizzle |
It's used to permute (or swizzle) input type's channel order. Used in conversions from/to color specs that can be represented in multiple ways, like RGB and BGR. The color spec conversion functions assume a pre-determined channel order. In order to use them, channels must be reordered.
For RGB \(\leftrightarrow\) YUV conversions, VPI uses the ITU-R BT.601 625-line specification. It's the same standard used by JPEG File Interchange Format (JFIF).
To precisely establish the conversion, let's define the following constants:
\begin{align} K_r &= 0.299 \\ K_g &= 0.587 \\ K_b &= 0.114 \\ K_{c_b} &= 1.772 \\ K_{c_r} &= 1.402 \\ \end{align}
For notation convenience, we're assuming that \(U\) and \(V\) correspond to \(C_b\) and \(C_r\) respectively. This assumption doesn't hold in general.
The conversion blocks can be defined as:
rgb2yuv |
\begin{align} Y(r,g,b) &= \text{round}(r K_r + g K_g + b K_b)\big|^{255}_{0} \\ C_b(r,g,b) &= \text{round}((-r K_r - g K_g + b (1 - K_b )) / K_ {c_b} + 128)\big|^{255}_{0} \\ C_r(r,g,b) &= \text{round}((r (1-K_r) - g K_g - b K_b) / K_{c_r} + 128)\big|^{255}_{0} \end{align}
These functions expect \(r,g,b \in [0,255]\)
yuv2rgb |
\begin{align} R(y,c_b,c_r) &= \text{round}(y+K_{c_r}(c_r-128))\big|^{255}_{0} \\ G(y,c_b,c_r) &= \text{round}(y-[K_b K_{c_b} (c_b-128) + K_r K_{c_r} (c_r-128)] / K_g)\big|^{255}_{0} \\ B(y,c_b,c_r) &= \text{round}(y + K_{c_b} (c_b - 128))\big|^{255}_{0} \end{align}
These functions expect \(y,c_b,c_r \in [0,255]\)
The notation \(X\big|^{N} _ {M} \) represents clamping X's underflows and overflows to M and N respectively.
The round
function follows the definition here.
Conversion from RGB to grayscale follows the same specification used for conversion from RGB to YUV, but just returning the luma component. Hence, using the same constants defined here.
rgb2gray |
\[ Y(r,g,b) = \text{round}(K_r \times r + K_g \times g + K_b \times b)\big|^{255}_{0} \]
For grayscale to RGB the conversion is simply:
gray2rgb |
\[ f(x) = (x,x,x) \]
For image formats that includes subsampled planes like VPI_IMAGE_FORMAT_NV12, the following block definitions are needed:
2x downsample |
\[ D[x,y] = S[2x,2y] \]
2x upsample |
\[ D[x,y] = S[\lfloor x/2 \rfloor, \lfloor y/2 \rfloor] \]
Depending on input and output pixel type, i.e. whether it's required to remove or add an alpha channel, the following block might be used:
alpha |
This section defines how input pixel is converted to output. It uses the basic conversion blocks defined in previous section.
input | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | depth | \(\rightarrow\) | Y plane | \(\searrow\) |
(128,128) | \(\rightarrow\) | UV plane | \(\nearrow\) |
output |
input | \(\rightarrow\) | Y plane | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | depth | \(\rightarrow\) | gray2rgb | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | output |
input | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | rgb2gray | \(\rightarrow\) | depth | \(\rightarrow\) | output |
input | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | depth | \(\rightarrow\) | rgb2yuv |
\(\nearrow\) | Y plane | \(\searrow\) | ||
\(\searrow\) | UV plane | \(\rightarrow\) | 2x downsample | \(\nearrow\) |
output |
input | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | depth | \(\rightarrow\) | rgb2yuv |
\(\nearrow\) | Y plane | \(\searrow\) |
\(\searrow\) | UV plane | \(\nearrow\) |
output |
input |
\(\nearrow\) | Y plane | \(\searrow\) | ||
\(\searrow\) | UV plane | \(\rightarrow\) | 2x upsample | \(\nearrow\) |
yuv2rgb | \(\rightarrow\) | depth | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | output |
input |
\(\nearrow\) | Y plane | \(\searrow\) |
\(\searrow\) | UV plane | \(\nearrow\) |
yuv2rgb | \(\rightarrow\) | depth | \(\rightarrow\) | swizzle | \(\rightarrow\) | alpha | \(\rightarrow\) | output |
For more information, see Convert Image Format in the "C API Reference" section of VPI - Vision Programming Interface.
For information on how to use the performance table below, see Algorithm Performance Tables.
Before comparing measurements, consult Comparing Algorithm Elapsed Times.
For further information on how performance was benchmarked, see Performance Benchmark.
clamp
conversion policy only, the table below shows performance numbers even for cast
policy. Internally it's still using clamp. On VIC, it wouldn't make a difference even if it supported cast
.