Overview

The Image Format Converter is used to convert an image with a given format into another format. It handles both color space, format and depth conversions. The algorithm also supports input range conversion, when one is required to map, for instance, an unsigned char \([0,255]\) image into signed short \([-32768,32767]\) range.

Color Input	Grayscale Output

Implementation

The algorithm is implemented as a pixel-wise conversion function that reads in the input pixels, applies a conversion-dependent series of transformations and writes the result to the output image in the same position. User inputs are:

input image created with requested input type
output image created with requested output type
flags specify how type casts will be performed, see clamp/cast
scale and offset to be used in range conversions, see range.

Several types of conversion are available:

grayscale \(\leftrightarrow\) color
grayscale \(\leftrightarrow\) grayscale (useful in depth and range conversions)
color \(\leftrightarrow\) color (e.g. YUV to RGB and vice-versa)

The grayscale (or single channel) formats available are:

integral types:
floating point types:
- VPI_IMAGE_TYPE_F32

The color formats available are:

YUV color space:
- VPI_IMAGE_TYPE_NV12
RGB color space:
- without alpha channel:
  - VPI_IMAGE_TYPE_RGB8
  - VPI_IMAGE_TYPE_BGR8
- with alpha channel:
  - VPI_IMAGE_TYPE_RGBA8
  - VPI_IMAGE_TYPE_BGRA8

The following table shows which combinations of input and output image types are available for conversion.

in	U8	S8	U16	S16	F32	NV12	RGB8	RGBA8	BGR8	BGRA8	2F32
out	U8	S8	U16	S16	F32	NV12	RGB8	RGBA8	BGR8	BGRA8	2F32
U8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
S8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
U16	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
S16	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
F32	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
NV12	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
RGB8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
RGBA8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
BGR8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
BGRA8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
2F32	no	no	no	no	no	no	no	no	no	no	yes¹

1 - Only available when scale == 1 and offset == 0

Conversion Formulas

The following sections describe how input value is converted into output. In general, these conversions amount to color space, depth, channel order (swizzle), adding or removing alpha channel, down- or up-sampling transformations. These are represented as conversion pipelines made out of basic processing blocks defined below.

Channel depth conversion

Channel depth conversion is represented by the block aptly named "depth" and is defined by the following sub-pipeline:

depth

\(=\)

range

\(\rightarrow\)

round

\(\rightarrow\)

clamp/cast

range: input is converted to floating point (fp32) and the following formula is applied:
\[ f(x) = \text{scale} \times x + \text{offset} \]

If scale==1 and offset==0, a shortcut is taken and no operation (not even conversion to floating point) is performed.
round: round to the nearest integer, with halfway cases being rounded away from zero, e.g round(0.5) == 1.0 and round(-0.5) == -1.0.
clamp/cast: operation controlled by the passed flags:
- VPI_CONVERSION_CAST : cast input to output type like regular C cast or C++'s static_cast would do. Underflows and overflows will behave as described by C specification (including undefined behavior). This is used when it's known that input range fits into output and maximum performance is needed.
- VPI_CONVERSION_CLAMP : the value is clamped so that overflows and overflows will map to output type's maximum and minimum values, respectively. The result is then cast to the output type. When output type is floating point, clamp behaves like cast.

When applied to multiple channels such as RGB, the operation is performed on each channel independently.

Channel order conversion

This is represented by the following block:

swizzle

It's used to permute (or swizzle) input type's channel order. Used in conversions from/to color spaces that can be represented in multiple ways, like RGB and BGR. The color space conversion functions assume a pre-determined channel order. In order to use them, channels must be reordered.

Conversion between YUV and RGB

For RGB \(\leftrightarrow\) YUV conversions, VPI uses the ITU-R BT.601 625-line specification. It's the same standard used by JPEG File Interchange Format (JFIF).

To precisely establish the conversion, let's define the following constants:

\begin{align} K_r &= 0.299 \\ K_g &= 0.587 \\ K_b &= 0.114 \\ K_{c_b} &= 1.772 \\ K_{c_r} &= 1.402 \\ \end{align}

For notation convenience, we're assuming that \(U\) and \(V\) correspond to \(C_b\) and \(C_r\) respectively. This assumption doesn't hold in general.

The conversion blocks can be defined as:

rgb2yuv

\begin{align} Y(r,g,b) &= \text{round}(r K_r + g K_g + b K_b)\big|^{255}_{0} \\ C_b(r,g,b) &= \text{round}((-r K_r - g K_g + b (1 - K_b )) / K_ {c_b} + 128)\big|^{255}_{0} \\ C_r(r,g,b) &= \text{round}((r (1-K_r) - g K_g - b K_b) / K_{c_r} + 128)\big|^{255}_{0} \end{align}

These functions expect \(r,g,b \in [0,255]\)

yuv2rgb

\begin{align} R(y,c_b,c_r) &= \text{round}(y+K_{c_r}(c_r-128))\big|^{255}_{0} \\ G(y,c_b,c_r) &= \text{round}(y-[K_b K_{c_b} (c_b-128) + K_r K_{c_r} (c_r-128)] / K_g)\big|^{255}_{0} \\ B(y,c_b,c_r) &= \text{round}(y + K_{c_b} (c_b - 128))\big|^{255}_{0} \end{align}

These functions expect \(y,c_b,c_r \in [0,255]\)

The notation \(X\big|^{N} _ {M} \) represents clamping X's underflows and overflows to M and N respectively.

The round function follows the definition here.

Conversion between RGB and Grayscale

Conversion from RGB to grayscale follows the same specification used for conversion from RGB to YUV, but just returning the luma component. Hence, using the same constants defined here.

rgb2gray

\[ Y(r,g,b) = \text{round}(K_r \times r + K_g \times g + K_b \times b)\big|^{255}_{0} \]

For grayscale to RGB the conversion is simply:

gray2rgb

\[ f(x) = (x,x,x) \]

Up-/Down-sampling

For image formats that includes subsampled planes like VPI_IMAGE_TYPE_NV12, the following block definitions are needed:

2x downsample

\[ D[x,y] = S[2x,2y] \]

2x upsample

\[ D[x,y] = S[\lfloor x/2 \rfloor, \lfloor y/2 \rfloor] \]

Note: VPI is effectively upsamping using nearest-neighbor sampling. In a future version it'll use bilinear upsampling.

Alpha Channel Handling

Depending on input and output pixel type, i.e. whether it's required to remove or add an alpha channel, the following block might be used:

alpha

add alpha: append an opaque alpha channel to input pixel, e.g. RGB becomes RGBA. For integral channel types, the new alpha channel's value is the maximum representable of its type, e.g. 255 for 8-bit unsigned integer. Currently VPI doesn't support alpha channel on other channel types.
remove alpha: simply discards the alpha channel, e.g. RGBA becomes RGB.
do nothing: when input and output don't have alpha channel.

Conversion Pipelines

This section defines how input pixel is converted to output. It uses the basic conversion blocks defined in previous section.

Grayscale from/to Grayscale

input

\(\rightarrow\)

depth

\(\rightarrow\)

output

Grayscale to NV12

input

\(\rightarrow\)

depth

\(\rightarrow\)

Y plane

\(\searrow\)

(128,128)

\(\rightarrow\)

UV plane

\(\nearrow\)

output

Note: Since NV12's pixel depth is 8-bit unsigned, \((u,v) = (128,128)\) corresponds to zero saturation.

NV12 to Grayscale

input

\(\rightarrow\)

Y plane

\(\rightarrow\)

depth

\(\rightarrow\)

output

Grayscale to RGB space

input

\(\rightarrow\)

depth

\(\rightarrow\)

gray2rgb

\(\rightarrow\)

swizzle

\(\rightarrow\)

alpha

\(\rightarrow\)

output

RGB space to Grayscale

input

\(\rightarrow\)

swizzle

\(\rightarrow\)

alpha

\(\rightarrow\)

rgb2gray

\(\rightarrow\)

depth

\(\rightarrow\)

output

RGB space to NV12

input

\(\rightarrow\)

swizzle

\(\rightarrow\)

alpha

\(\rightarrow\)

depth

\(\rightarrow\)

rgb2yuv

\(\nearrow\)

Y plane

\(\searrow\)

UV plane

\(\rightarrow\)

2x downsample

\(\nearrow\)

output

NV12 to RGB space

input

\(\nearrow\)

Y plane

\(\searrow\)

UV plane

\(\rightarrow\)

2x upsample

\(\nearrow\)

yuv2rgb

\(\rightarrow\)

depth

\(\rightarrow\)

swizzle

\(\rightarrow\)

alpha

\(\rightarrow\)

output

Usage

Initialization phase
1. Include the header that defines the image format converter function.
  #include <vpi/algo/ImageFormatConverter.h>
2. Define the stream on which the algorithm will be executed.
  VPIStream stream = /*...*/;
3. Define the input image. Here as an example we're creating a color image with dimensions \(w \times h\) and NV12 image type.
  VPIImage input;
  
  vpiImageCreate(w, h, VPI_IMAGE_TYPE_NV12, 0, &input);
4. Create the output image with the destination image type. In this case, we want to convert the input to 16-bit signed integer grayscale.
  VPIImage output;
  
  vpiImageCreate(w, h, VPI_IMAGE_TYPE_S16, 0, &output);
Processing phase
1. Submit the algorithm to the stream, input, output images, specify we want clamping and also map the range from \([0,255]\) to \([-32768,32767]\).
  vpiSubmitImageFormatConverter(stream, input, output, VPI_CONVERSION_CLAMP, 257, -32768);
2. Optionally, wait until the processing is done.
  vpiStreamSync(stream);

For more details, consult the API reference.

Limitations and Constraints

PVA

Not implemented.

Performance

For further information on how performance was benchmarked, see Performance Measurement.

Jetson AGX Xavier
size	input type	output type	conv.	scale	offset	CPU	CUDA	PVA
1920x1080	u8	u8	cast	1	0	0.188 ms	0.135 ms	n/a
1920x1080	u8	u8	clamp	2	128	1.38 ms	0.1132 ms	n/a
1920x1080	u8	u16	cast	1	0	0.8 ms	0.1225 ms	n/a
1920x1080	u8	f32	cast	1	0	0.24 ms	0.1602 ms	n/a
1920x1080	u8	nv12	cast	1	0	0.749 ms	0.1168 ms	n/a
1920x1080	u8	rgb8	cast	1	0	0.3 ms	0.1898 ms	n/a
1920x1080	u8	rgba8	cast	1	0	0.378 ms	0.1612 ms	n/a
1920x1080	u16	u8	cast	1	0	0.677 ms	0.1104 ms	n/a
1920x1080	u16	u16	cast	1	0	0.3518 ms	0.193 ms	n/a
1920x1080	u16	u16	clamp	2	128	0.923 ms	0.1402 ms	n/a
1920x1080	u16	f32	cast	1	0	0.886 ms	0.1669 ms	n/a
1920x1080	u16	nv12	cast	1	0	0.758 ms	0.1237 ms	n/a
1920x1080	u16	rgb8	cast	1	0	0.3 ms	0.1931 ms	n/a
1920x1080	u16	rgba8	cast	1	0	0.29 ms	0.1666 ms	n/a
1920x1080	f32	u8	cast	1	0	0.883 ms	0.1203 ms	n/a
1920x1080	f32	u16	cast	1	0	0.684 ms	0.1345 ms	n/a
1920x1080	f32	f32	cast	1	0	0.73 ms	0.3035 ms	n/a
1920x1080	f32	f32	clamp	2	128	1.029 ms	0.1633 ms	n/a
1920x1080	f32	nv12	cast	1	0	0.980 ms	0.1330 ms	n/a
1920x1080	f32	rgb8	cast	1	0	0.345 ms	0.2038 ms	n/a
1920x1080	f32	rgba8	cast	1	0	0.42 ms	0.1642 ms	n/a
1920x1080	nv12	u8	cast	1	0	0.611 ms	0.1041 ms	n/a
1920x1080	nv12	u16	cast	1	0	0.8 ms	0.1225 ms	n/a
1920x1080	nv12	f32	cast	1	0	0.221 ms	0.1587 ms	n/a
1920x1080	nv12	nv12	cast	1	0	0.25 ms	0.163 ms	n/a
1920x1080	nv12	nv12	clamp	2	128	1.69 ms	0.1506 ms	n/a
1920x1080	nv12	rgb8	cast	1	0	3.6 ms	0.1890 ms	n/a
1920x1080	nv12	rgba8	cast	1	0	3.1 ms	0.1792 ms	n/a
1920x1080	rgb8	u8	cast	1	0	3.549 ms	0.1240 ms	n/a
1920x1080	rgb8	u16	cast	1	0	3.64 ms	0.1407 ms	n/a
1920x1080	rgb8	f32	cast	1	0	3.93 ms	0.1645 ms	n/a
1920x1080	rgb8	nv12	cast	1	0	6.6 ms	0.1453 ms	n/a
1920x1080	rgb8	rgb8	cast	1	0	0.586 ms	0.2488 ms	n/a
1920x1080	rgb8	rgb8	clamp	2	128	1.456 ms	0.18074 ms	n/a
1920x1080	rgb8	bgr8	cast	1	0	0.324 ms	0.2022 ms	n/a
1920x1080	rgb8	rgba8	cast	1	0	0.354 ms	0.1582 ms	n/a
1920x1080	rgba8	u8	cast	1	0	4.34 ms	0.1267 ms	n/a
1920x1080	rgba8	u16	cast	1	0	4.45 ms	0.1409 ms	n/a
1920x1080	rgba8	f32	cast	1	0	4.84 ms	0.1683 ms	n/a
1920x1080	rgba8	nv12	cast	1	0	7.4 ms	0.1447 ms	n/a
1920x1080	rgba8	rgb8	cast	1	0	0.346 ms	0.2019 ms	n/a
1920x1080	rgba8	rgba8	cast	1	0	0.773 ms	0.303 ms	n/a
1920x1080	rgba8	rgba8	clamp	2	128	4.19 ms	0.1714 ms	n/a
1920x1080	rgba8	bgra8	cast	1	0	0.4 ms	0.1683 ms	n/a

Jetson TX2
size	input type	output type	conv.	scale	offset	CPU	CUDA	PVA
1920x1080	u8	u8	cast	1	0	0.585 ms	1.769 ms	n/a
1920x1080	u8	u8	clamp	2	128	2.5 ms	0.445 ms	n/a
1920x1080	u8	u16	cast	1	0	0.47 ms	0.424 ms	n/a
1920x1080	u8	f32	cast	1	0	0.64 ms	0.498 ms	n/a
1920x1080	u8	nv12	cast	1	0	0.869 ms	0.430 ms	n/a
1920x1080	u8	rgb8	cast	1	0	0.50 ms	0.555 ms	n/a
1920x1080	u8	rgba8	cast	1	0	0.708 ms	0.496 ms	n/a
1920x1080	u16	u8	cast	1	0	1.17 ms	0.428 ms	n/a
1920x1080	u16	u16	cast	1	0	1.089 ms	0.623 ms	n/a
1920x1080	u16	u16	clamp	2	128	1.67 ms	0.489 ms	n/a
1920x1080	u16	f32	cast	1	0	1.032 ms	0.521 ms	n/a
1920x1080	u16	nv12	cast	1	0	1.243 ms	0.450 ms	n/a
1920x1080	u16	rgb8	cast	1	0	1.14 ms	0.572 ms	n/a
1920x1080	u16	rgba8	cast	1	0	1.15 ms	0.521 ms	n/a
1920x1080	f32	u8	cast	1	0	1.83 ms	0.461 ms	n/a
1920x1080	f32	u16	cast	1	0	1.049 ms	0.491 ms	n/a
1920x1080	f32	f32	cast	1	0	2.083 ms	1.116 ms	n/a
1920x1080	f32	f32	clamp	2	128	1.35 ms	0.549 ms	n/a
1920x1080	f32	nv12	cast	1	0	1.90 ms	0.491 ms	n/a
1920x1080	f32	rgb8	cast	1	0	1.31 ms	0.61 ms	n/a
1920x1080	f32	rgba8	cast	1	0	1.41 ms	0.56 ms	n/a
1920x1080	nv12	u8	cast	1	0	0.80 ms	0.400 ms	n/a
1920x1080	nv12	u16	cast	1	0	0.448 ms	0.428 ms	n/a
1920x1080	nv12	f32	cast	1	0	0.637 ms	0.499 ms	n/a
1920x1080	nv12	nv12	cast	1	0	0.981 ms	2.575 ms	n/a
1920x1080	nv12	nv12	clamp	2	128	3.01 ms	0.577 ms	n/a
1920x1080	nv12	rgb8	cast	1	0	8.70 ms	0.661 ms	n/a
1920x1080	nv12	rgba8	cast	1	0	8.7 ms	0.64 ms	n/a
1920x1080	rgb8	u8	cast	1	0	11.7 ms	0.510 ms	n/a
1920x1080	rgb8	u16	cast	1	0	11.9 ms	0.534 ms	n/a
1920x1080	rgb8	f32	cast	1	0	12.7 ms	0.60 ms	n/a
1920x1080	rgb8	nv12	cast	1	0	13.3 ms	0.65 ms	n/a
1920x1080	rgb8	rgb8	cast	1	0	1.577 ms	2.319 ms	n/a
1920x1080	rgb8	rgb8	clamp	2	128	2.91 ms	0.67 ms	n/a
1920x1080	rgb8	bgr8	cast	1	0	1.22 ms	0.599 ms	n/a
1920x1080	rgb8	rgba8	cast	1	0	1.33 ms	0.57 ms	n/a
1920x1080	rgba8	u8	cast	1	0	11.8 ms	0.497 ms	n/a
1920x1080	rgba8	u16	cast	1	0	11.9 ms	0.523 ms	n/a
1920x1080	rgba8	f32	cast	1	0	13.62 ms	0.593 ms	n/a
1920x1080	rgba8	nv12	cast	1	0	13.6 ms	0.588 ms	n/a
1920x1080	rgba8	rgb8	cast	1	0	1.19 ms	0.60 ms	n/a
1920x1080	rgba8	rgba8	cast	1	0	2.091 ms	1.126 ms	n/a
1920x1080	rgba8	rgba8	clamp	2	128	15.56 ms	0.69 ms	n/a
1920x1080	rgba8	bgra8	cast	1	0	1.32 ms	0.561 ms	n/a

Jetson Nano
size	input type	output type	conv.	scale	offset	CPU	CUDA	PVA
1920x1080	u8	u8	cast	1	0	0.58 ms	2.429 ms	n/a
1920x1080	u8	u8	clamp	2	128	5.511 ms	1.122 ms	n/a
1920x1080	u8	u16	cast	1	0	0.815 ms	1.023 ms	n/a
1920x1080	u8	f32	cast	1	0	1.170 ms	1.115 ms	n/a
1920x1080	u8	nv12	cast	1	0	1.744 ms	1.046 ms	n/a
1920x1080	u8	rgb8	cast	1	0	0.949 ms	1.307 ms	n/a
1920x1080	u8	rgba8	cast	1	0	1.116 ms	1.134 ms	n/a
1920x1080	u16	u8	cast	1	0	2.69 ms	1.030 ms	n/a
1920x1080	u16	u16	cast	1	0	1.1713 ms	0.7099 ms	n/a
1920x1080	u16	u16	clamp	2	128	4.26 ms	1.195 ms	n/a
1920x1080	u16	f32	cast	1	0	2.02 ms	1.175 ms	n/a
1920x1080	u16	nv12	cast	1	0	2.78 ms	1.089 ms	n/a
1920x1080	u16	rgb8	cast	1	0	2.503 ms	1.335 ms	n/a
1920x1080	u16	rgba8	cast	1	0	2.59 ms	1.184 ms	n/a
1920x1080	f32	u8	cast	1	0	3.16 ms	1.096 ms	n/a
1920x1080	f32	u16	cast	1	0	1.755 ms	1.146 ms	n/a
1920x1080	f32	f32	cast	1	0	2.341 ms	1.2967 ms	n/a
1920x1080	f32	f32	clamp	2	128	1.901 ms	1.250 ms	n/a
1920x1080	f32	nv12	cast	1	0	3.27 ms	1.151 ms	n/a
1920x1080	f32	rgb8	cast	1	0	2.079 ms	1.407 ms	n/a
1920x1080	f32	rgba8	cast	1	0	2.134 ms	1.279 ms	n/a
1920x1080	nv12	u8	cast	1	0	1.628 ms	0.989 ms	n/a
1920x1080	nv12	u16	cast	1	0	0.825 ms	1.022 ms	n/a
1920x1080	nv12	f32	cast	1	0	1.167 ms	1.116 ms	n/a
1920x1080	nv12	nv12	cast	1	0	1.028 ms	3.439 ms	n/a
1920x1080	nv12	nv12	clamp	2	128	6.85 ms	1.457 ms	n/a
1920x1080	nv12	rgb8	cast	1	0	15.17 ms	1.826 ms	n/a
1920x1080	nv12	rgba8	cast	1	0	14.96 ms	1.765 ms	n/a
1920x1080	rgb8	u8	cast	1	0	24.44 ms	1.334 ms	n/a
1920x1080	rgb8	u16	cast	1	0	24.72 ms	1.379 ms	n/a
1920x1080	rgb8	f32	cast	1	0	25.34 ms	1.448 ms	n/a
1920x1080	rgb8	nv12	cast	1	0	25.54 ms	1.7445 ms	n/a
1920x1080	rgb8	rgb8	cast	1	0	1.7994 ms	3.057 ms	n/a
1920x1080	rgb8	rgb8	clamp	2	128	6.0 ms	1.738 ms	n/a
1920x1080	rgb8	bgr8	cast	1	0	2.67 ms	1.474 ms	n/a
1920x1080	rgb8	rgba8	cast	1	0	2.70 ms	1.337 ms	n/a
1920x1080	rgba8	u8	cast	1	0	24.66 ms	1.213 ms	n/a
1920x1080	rgba8	u16	cast	1	0	25.04 ms	1.275 ms	n/a
1920x1080	rgba8	f32	cast	1	0	25.65 ms	1.378 ms	n/a
1920x1080	rgba8	nv12	cast	1	0	25.69 ms	1.577 ms	n/a
1920x1080	rgba8	rgb8	cast	1	0	1.826 ms	1.382 ms	n/a
1920x1080	rgba8	rgba8	cast	1	0	2.321 ms	1.3002 ms	n/a
1920x1080	rgba8	rgba8	clamp	2	128	29.55 ms	1.701 ms	n/a
1920x1080	rgba8	bgra8	cast	1	0	1.920 ms	1.294 ms	n/a

VPI - Vision Programming Interface

0.3.7 Release

Overview

Implementation

Conversion Formulas

Channel depth conversion

Channel order conversion

Conversion between YUV and RGB

Conversion between RGB and Grayscale

Up-/Down-sampling

Alpha Channel Handling

Conversion Pipelines

Grayscale from/to Grayscale

Grayscale to NV12

NV12 to Grayscale

Grayscale to RGB space

RGB space to Grayscale

RGB space to NV12

NV12 to RGB space

Usage

Limitations and Constraints

PVA

Performance