Overview

The Image Format Converter is used to convert an image with a given format into another format. It handles both color space, format and depth conversions. The algorithm also supports input range conversion, when one is required to map, for instance, an unsigned char \([0,255]\) image into signed short \([-32768,32767]\) range.

Color Input	Grayscale Output

Implementation

The algorithm is implemented as a pixel-wise conversion function that reads in the input pixels, applies a conversion-dependent series of transformations and writes the result to the output image in the same position. User inputs are:

input image created with requested input type
output image created with requested output type
flags specify how type casts will be performed, see clamp/cast
scale and offset to be used in range conversions, see range.

Several types of conversion are available:

grayscale \(\leftrightarrow\) color
grayscale \(\leftrightarrow\) grayscale (useful in depth and range conversions)
color \(\leftrightarrow\) color (e.g. YUV to RGB and vice-versa)

The grayscale (or single channel) formats available are:

integral types:
floating point types:
- VPI_IMAGE_TYPE_F32

The color formats available are:

YUV color space:
- VPI_IMAGE_TYPE_NV12
RGB color space:
- without alpha channel:
  - VPI_IMAGE_TYPE_RGB8
  - VPI_IMAGE_TYPE_BGR8
- with alpha channel:
  - VPI_IMAGE_TYPE_RGBA8
  - VPI_IMAGE_TYPE_BGRA8

The following table shows which combinations of input and output image types are available for conversion.

in	U8	S8	U16	S16	F32	NV12	RGB8	RGBA8	BGR8	BGRA8	2F32
out	U8	S8	U16	S16	F32	NV12	RGB8	RGBA8	BGR8	BGRA8	2F32
U8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
S8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
U16	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
S16	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
F32	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
NV12	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
RGB8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
RGBA8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
BGR8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
BGRA8	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no
2F32	no	no	no	no	no	no	no	no	no	no	yes¹

1 - Only available when scale == 1 and offset == 0

Conversion Formulas

The following sections describe how input value is converted into output. In general, these conversions amount to color space, depth, channel order (swizzle), adding or removing alpha channel, down- or up-sampling transformations. These are represented as conversion pipelines made out of basic processing blocks defined below.

Channel depth conversion

Channel depth conversion is represented by the block aptly named "depth" and is defined by the following sub-pipeline:

depth

\(=\)

range

\(\rightarrow\)

round

\(\rightarrow\)

clamp/cast

range: input is converted to floating point (fp32) and the following formula is applied:
\[ f(x) = \text{scale} \times x + \text{offset} \]

If scale==1 and offset==0, a shortcut is taken and no operation (not even conversion to floating point) is performed.
round: round to the nearest integer, with halfway cases being rounded away from zero, e.g round(0.5) == 1.0 and round(-0.5) == -1.0.
clamp/cast: operation controlled by the passed flags:
- VPI_CONVERSION_CAST : cast input to output type like regular C cast or C++'s static_cast would do. Underflows and overflows will behave as described by C specification (including undefined behavior). This is used when it's known that input range fits into output and maximum performance is needed.
- VPI_CONVERSION_CLAMP : the value is clamped so that overflows and overflows will map to output type's maximum and minimum values, respectively. The result is then cast to the output type. When output type is floating point, clamp behaves like cast.

When applied to multiple channels such as RGB, the operation is performed on each channel independently.

Channel order conversion

This is represented by the following block:

swizzle

It's used to permute (or swizzle) input type's channel order. Used in conversions from/to color spaces that can be represented in multiple ways, like RGB and BGR. The color space conversion functions assume a pre-determined channel order. In order to use them, channels must be reordered.

Conversion between YUV and RGB

For RGB \(\leftrightarrow\) YUV conversions, VPI uses the ITU-R BT.601 625-line specification. It's the same standard used by JPEG File Interchange Format (JFIF).

To precisely establish the conversion, let's define the following constants:

\begin{align} K_r &= 0.299 \\ K_g &= 0.587 \\ K_b &= 0.114 \\ K_{c_b} &= 1.772 \\ K_{c_r} &= 1.402 \\ \end{align}

For notation convenience, we're assuming that \(U\) and \(V\) correspond to \(C_b\) and \(C_r\) respectively. This assumption doesn't hold in general.

The conversion blocks can be defined as:

rgb2yuv

\begin{align} Y(r,g,b) &= \text{round}(r K_r + g K_g + b K_b)\big|^{255}_{0} \\ C_b(r,g,b) &= \text{round}((-r K_r - g K_g + b (1 - K_b )) / K_ {c_b} + 128)\big|^{255}_{0} \\ C_r(r,g,b) &= \text{round}((r (1-K_r) - g K_g - b K_b) / K_{c_r} + 128)\big|^{255}_{0} \end{align}

These functions expect \(r,g,b \in [0,255]\)

yuv2rgb

\begin{align} R(y,c_b,c_r) &= \text{round}(y+K_{c_r}(c_r-128))\big|^{255}_{0} \\ G(y,c_b,c_r) &= \text{round}(y-[K_b K_{c_b} (c_b-128) + K_r K_{c_r} (c_r-128)] / K_g)\big|^{255}_{0} \\ B(y,c_b,c_r) &= \text{round}(y + K_{c_b} (c_b - 128))\big|^{255}_{0} \end{align}

These functions expect \(y,c_b,c_r \in [0,255]\)

The notation \(X\big|^{N} _ {M} \) represents clamping X's underflows and overflows to M and N respectively.

The round function follows the definition here.

Conversion between RGB and Grayscale

Conversion from RGB to grayscale follows the same specification used for conversion from RGB to YUV, but just returning the luma component. Hence, using the same constants defined here.

rgb2gray

\[ Y(r,g,b) = \text{round}(K_r \times r + K_g \times g + K_b \times b)\big|^{255}_{0} \]

For grayscale to RGB the conversion is simply:

gray2rgb

\[ f(x) = (x,x,x) \]

Up-/Down-sampling

For image formats that includes subsampled planes like VPI_IMAGE_TYPE_NV12, the following block definitions are needed:

2x downsample

\[ D[x,y] = S[2x,2y] \]

2x upsample

\[ D[x,y] = S[\lfloor x/2 \rfloor, \lfloor y/2 \rfloor] \]

Note: VPI is effectively upsamping using nearest-neighbor sampling. In a future version it'll use bilinear upsampling.

Alpha Channel Handling

Depending on input and output pixel type, i.e. whether it's required to remove or add an alpha channel, the following block might be used:

alpha

add alpha: append an opaque alpha channel to input pixel, e.g. RGB becomes RGBA. For integral channel types, the new alpha channel's value is the maximum representable of its type, e.g. 255 for 8-bit unsigned integer. Currently VPI doesn't support alpha channel on other channel types.
remove alpha: simply discards the alpha channel, e.g. RGBA becomes RGB.
do nothing: when input and output don't have alpha channel.

Conversion Pipelines

This section defines how input pixel is converted to output. It uses the basic conversion blocks defined in previous section.

Grayscale from/to Grayscale

input

\(\rightarrow\)

depth

\(\rightarrow\)

output

Grayscale to NV12

input

\(\rightarrow\)

depth

\(\rightarrow\)

Y plane

\(\searrow\)

(128,128)

\(\rightarrow\)

UV plane

\(\nearrow\)

output

Note: Since NV12's pixel depth is 8-bit unsigned, \((u,v) = (128,128)\) corresponds to zero saturation.

NV12 to Grayscale

input

\(\rightarrow\)

Y plane

\(\rightarrow\)

depth

\(\rightarrow\)

output

Grayscale to RGB space

input

\(\rightarrow\)

depth

\(\rightarrow\)

gray2rgb

\(\rightarrow\)

swizzle

\(\rightarrow\)

alpha

\(\rightarrow\)

output

RGB space to Grayscale

input

\(\rightarrow\)

swizzle

\(\rightarrow\)

alpha

\(\rightarrow\)

rgb2gray

\(\rightarrow\)

depth

\(\rightarrow\)

output

RGB space to NV12

input

\(\rightarrow\)

swizzle

\(\rightarrow\)

alpha

\(\rightarrow\)

depth

\(\rightarrow\)

rgb2yuv

\(\nearrow\)

Y plane

\(\searrow\)

UV plane

\(\rightarrow\)

2x downsample

\(\nearrow\)

output

NV12 to RGB space

input

\(\nearrow\)

Y plane

\(\searrow\)

UV plane

\(\rightarrow\)

2x upsample

\(\nearrow\)

yuv2rgb

\(\rightarrow\)

depth

\(\rightarrow\)

swizzle

\(\rightarrow\)

alpha

\(\rightarrow\)

output

Usage

Initialization phase
1. Include the header that defines the image format converter function.
  #include <vpi/algo/ImageFormatConverter.h>
2. Define the stream on which the algorithm will be executed.
  VPIStream stream = /*...*/;
3. Define the input image. Here as an example we're creating a color image with dimensions \(w \times h\) and NV12 image type.
  VPIImage input;
  
  vpiImageCreate(w, h, VPI_IMAGE_TYPE_NV12, 0, &input);
4. Create the output image with the destination image type. In this case, we want to convert the input to 16-bit signed integer grayscale.
  VPIImage output;
  
  vpiImageCreate(w, h, VPI_IMAGE_TYPE_S16, 0, &output);
Processing phase
1. Submit the algorithm to the stream, input, output images, specify we want clamping and also map the range from \([0,255]\) to \([-32768,32767]\).
  vpiSubmitImageFormatConverter(stream, input, output, VPI_CONVERSION_CLAMP, 257, -32768);
2. Optionally, wait until the processing is done.
  vpiStreamSync(stream);

Limitations and Constraints

PVA

Not implemented.

Performance

For further information on how performance was benchmarked, see Performance Measurement.

Jetson AGX Xavier
size	input type	output type	conv.	scale	offset	CPU	CUDA	PVA
1920x1080	u8	u8	cast	1	0	0.1452 ms	0.0603 ms	n/a
1920x1080	u8	u8	clamp	2	128	1.37 ms	0.1157 ms	n/a
1920x1080	u8	u16	cast	1	0	0.79 ms	0.1234 ms	n/a
1920x1080	u8	f32	cast	1	0	0.211 ms	0.1604 ms	n/a
1920x1080	u8	nv12	cast	1	0	0.749 ms	0.1186 ms	n/a
1920x1080	u8	rgb8	cast	1	0	0.4 ms	0.1929 ms	n/a
1920x1080	u8	rgba8	cast	1	0	0.2 ms	0.1620 ms	n/a
1920x1080	u16	u8	cast	1	0	0.669 ms	0.1120 ms	n/a
1920x1080	u16	u16	cast	1	0	0.322 ms	0.1230 ms	n/a
1920x1080	u16	u16	clamp	2	128	0.919 ms	0.1402 ms	n/a
1920x1080	u16	f32	cast	1	0	0.882 ms	0.1688 ms	n/a
1920x1080	u16	nv12	cast	1	0	0.737 ms	0.1253 ms	n/a
1920x1080	u16	rgb8	cast	1	0	0.40 ms	0.1961 ms	n/a
1920x1080	u16	rgba8	cast	1	0	0.3 ms	0.1675 ms	n/a
1920x1080	f32	u8	cast	1	0	0.867 ms	0.1219 ms	n/a
1920x1080	f32	u16	cast	1	0	0.694 ms	0.1368 ms	n/a
1920x1080	f32	f32	cast	1	0	0.786 ms	0.2505 ms	n/a
1920x1080	f32	f32	clamp	2	128	1.036 ms	0.1651 ms	n/a
1920x1080	f32	nv12	cast	1	0	0.93 ms	0.1351 ms	n/a
1920x1080	f32	rgb8	cast	1	0	0.534 ms	0.2057 ms	n/a
1920x1080	f32	rgba8	cast	1	0	0.414 ms	0.1658 ms	n/a
1920x1080	nv12	u8	cast	1	0	0.449 ms	0.1057 ms	n/a
1920x1080	nv12	u16	cast	1	0	0.84 ms	0.1237 ms	n/a
1920x1080	nv12	f32	cast	1	0	0.233 ms	0.1608 ms	n/a
1920x1080	nv12	nv12	cast	1	0	0.222 ms	0.0927 ms	n/a
1920x1080	nv12	nv12	clamp	2	128	1.70 ms	0.1534 ms	n/a
1920x1080	nv12	rgb8	cast	1	0	3.75 ms	0.1918 ms	n/a
1920x1080	nv12	rgba8	cast	1	0	3.21 ms	0.1821 ms	n/a
1920x1080	rgb8	u8	cast	1	0	3.56 ms	0.1263 ms	n/a
1920x1080	rgb8	u16	cast	1	0	3.65 ms	0.1410 ms	n/a
1920x1080	rgb8	f32	cast	1	0	3.85 ms	0.1663 ms	n/a
1920x1080	rgb8	nv12	cast	1	0	4.6 ms	0.1480 ms	n/a
1920x1080	rgb8	rgb8	cast	1	0	0.569 ms	0.1875 ms	n/a
1920x1080	rgb8	rgb8	clamp	2	128	1.463 ms	0.1817 ms	n/a
1920x1080	rgb8	bgr8	cast	1	0	1.0 ms	0.2042 ms	n/a
1920x1080	rgb8	rgba8	cast	1	0	0.941 ms	0.1601 ms	n/a
1920x1080	rgba8	u8	cast	1	0	4.34 ms	0.1292 ms	n/a
1920x1080	rgba8	u16	cast	1	0	4.44 ms	0.1417 ms	n/a
1920x1080	rgba8	f32	cast	1	0	4.85 ms	0.1706 ms	n/a
1920x1080	rgba8	nv12	cast	1	0	5.3 ms	0.1473 ms	n/a
1920x1080	rgba8	rgb8	cast	1	0	0.574 ms	0.2038 ms	n/a
1920x1080	rgba8	rgba8	cast	1	0	0.797 ms	0.2509 ms	n/a
1920x1080	rgba8	rgba8	clamp	2	128	4.29 ms	0.1743 ms	n/a
1920x1080	rgba8	bgra8	cast	1	0	0.41 ms	0.1702 ms	n/a

Jetson TX2
size	input type	output type	conv.	scale	offset	CPU	CUDA	PVA
1920x1080	u8	u8	cast	1	0	0.553 ms	1.770 ms	n/a
1920x1080	u8	u8	clamp	2	128	2.4 ms	0.448 ms	n/a
1920x1080	u8	u16	cast	1	0	0.45 ms	0.430 ms	n/a
1920x1080	u8	f32	cast	1	0	0.645 ms	0.50 ms	n/a
1920x1080	u8	nv12	cast	1	0	1.087 ms	0.430 ms	n/a
1920x1080	u8	rgb8	cast	1	0	0.57 ms	0.56 ms	n/a
1920x1080	u8	rgba8	cast	1	0	0.701 ms	0.50 ms	n/a
1920x1080	u16	u8	cast	1	0	1.25 ms	0.431 ms	n/a
1920x1080	u16	u16	cast	1	0	1.078 ms	0.479 ms	n/a
1920x1080	u16	u16	clamp	2	128	1.68 ms	0.50 ms	n/a
1920x1080	u16	f32	cast	1	0	1.042 ms	0.53 ms	n/a
1920x1080	u16	nv12	cast	1	0	1.29 ms	0.457 ms	n/a
1920x1080	u16	rgb8	cast	1	0	1.132 ms	0.58 ms	n/a
1920x1080	u16	rgba8	cast	1	0	1.16 ms	0.53 ms	n/a
1920x1080	f32	u8	cast	1	0	1.83 ms	0.47 ms	n/a
1920x1080	f32	u16	cast	1	0	1.053 ms	0.50 ms	n/a
1920x1080	f32	f32	cast	1	0	2.060 ms	0.804 ms	n/a
1920x1080	f32	f32	clamp	2	128	1.35 ms	0.56 ms	n/a
1920x1080	f32	nv12	cast	1	0	1.90 ms	0.496 ms	n/a
1920x1080	f32	rgb8	cast	1	0	1.20 ms	0.61 ms	n/a
1920x1080	f32	rgba8	cast	1	0	1.40 ms	0.57 ms	n/a
1920x1080	nv12	u8	cast	1	0	0.84 ms	0.405 ms	n/a
1920x1080	nv12	u16	cast	1	0	0.438 ms	0.428 ms	n/a
1920x1080	nv12	f32	cast	1	0	0.65 ms	0.50 ms	n/a
1920x1080	nv12	nv12	cast	1	0	0.895 ms	2.572 ms	n/a
1920x1080	nv12	nv12	clamp	2	128	3.04 ms	0.579 ms	n/a
1920x1080	nv12	rgb8	cast	1	0	8.84 ms	0.67 ms	n/a
1920x1080	nv12	rgba8	cast	1	0	8.6 ms	0.65 ms	n/a
1920x1080	rgb8	u8	cast	1	0	11.9 ms	0.52 ms	n/a
1920x1080	rgb8	u16	cast	1	0	13.39 ms	0.54 ms	n/a
1920x1080	rgb8	f32	cast	1	0	13.98 ms	0.60 ms	n/a
1920x1080	rgb8	nv12	cast	1	0	13.93 ms	0.63 ms	n/a
1920x1080	rgb8	rgb8	cast	1	0	1.595 ms	2.088 ms	n/a
1920x1080	rgb8	rgb8	clamp	2	128	3.34 ms	0.69 ms	n/a
1920x1080	rgb8	bgr8	cast	1	0	1.13 ms	0.61 ms	n/a
1920x1080	rgb8	rgba8	cast	1	0	1.30 ms	0.57 ms	n/a
1920x1080	rgba8	u8	cast	1	0	13.39 ms	0.50 ms	n/a
1920x1080	rgba8	u16	cast	1	0	13.59 ms	0.53 ms	n/a
1920x1080	rgba8	f32	cast	1	0	14.30 ms	0.60 ms	n/a
1920x1080	rgba8	nv12	cast	1	0	14.08 ms	0.59 ms	n/a
1920x1080	rgba8	rgb8	cast	1	0	1.15 ms	0.60 ms	n/a
1920x1080	rgba8	rgba8	cast	1	0	2.088 ms	0.804 ms	n/a
1920x1080	rgba8	rgba8	clamp	2	128	11.1 ms	0.70 ms	n/a
1920x1080	rgba8	bgra8	cast	1	0	1.310 ms	0.57 ms	n/a

Jetson Nano
size	input type	output type	conv.	scale	offset	CPU	CUDA	PVA
1920x1080	u8	u8	cast	1	0	0.679 ms	2.25 ms	n/a
1920x1080	u8	u8	clamp	2	128	5.63 ms	1.120 ms	n/a
1920x1080	u8	u16	cast	1	0	0.855 ms	1.023 ms	n/a
1920x1080	u8	f32	cast	1	0	1.156 ms	1.117 ms	n/a
1920x1080	u8	nv12	cast	1	0	1.731 ms	1.044 ms	n/a
1920x1080	u8	rgb8	cast	1	0	1.007 ms	1.303 ms	n/a
1920x1080	u8	rgba8	cast	1	0	1.112 ms	1.137 ms	n/a
1920x1080	u16	u8	cast	1	0	2.65 ms	1.029 ms	n/a
1920x1080	u16	u16	cast	1	0	1.2304 ms	0.4015 ms	n/a
1920x1080	u16	u16	clamp	2	128	4.26 ms	1.194 ms	n/a
1920x1080	u16	f32	cast	1	0	2.45 ms	1.174 ms	n/a
1920x1080	u16	nv12	cast	1	0	2.83 ms	1.086 ms	n/a
1920x1080	u16	rgb8	cast	1	0	2.43 ms	1.331 ms	n/a
1920x1080	u16	rgba8	cast	1	0	2.57 ms	1.187 ms	n/a
1920x1080	f32	u8	cast	1	0	3.199 ms	1.095 ms	n/a
1920x1080	f32	u16	cast	1	0	1.746 ms	1.145 ms	n/a
1920x1080	f32	f32	cast	1	0	2.290 ms	0.8010 ms	n/a
1920x1080	f32	f32	clamp	2	128	1.909 ms	1.239 ms	n/a
1920x1080	f32	nv12	cast	1	0	3.32 ms	1.150 ms	n/a
1920x1080	f32	rgb8	cast	1	0	2.072 ms	1.410 ms	n/a
1920x1080	f32	rgba8	cast	1	0	2.122 ms	1.278 ms	n/a
1920x1080	nv12	u8	cast	1	0	1.622 ms	0.988 ms	n/a
1920x1080	nv12	u16	cast	1	0	0.808 ms	1.024 ms	n/a
1920x1080	nv12	f32	cast	1	0	0.910 ms	1.116 ms	n/a
1920x1080	nv12	nv12	cast	1	0	0.963 ms	3.285 ms	n/a
1920x1080	nv12	nv12	clamp	2	128	6.95 ms	1.458 ms	n/a
1920x1080	nv12	rgb8	cast	1	0	15.14 ms	1.826 ms	n/a
1920x1080	nv12	rgba8	cast	1	0	14.929 ms	1.766 ms	n/a
1920x1080	rgb8	u8	cast	1	0	24.41 ms	1.331 ms	n/a
1920x1080	rgb8	u16	cast	1	0	24.667 ms	1.378 ms	n/a
1920x1080	rgb8	f32	cast	1	0	25.28 ms	1.452 ms	n/a
1920x1080	rgb8	nv12	cast	1	0	25.46 ms	1.7450 ms	n/a
1920x1080	rgb8	rgb8	cast	1	0	1.772 ms	2.48 ms	n/a
1920x1080	rgb8	rgb8	clamp	2	128	7.2 ms	1.737 ms	n/a
1920x1080	rgb8	bgr8	cast	1	0	1.88 ms	1.472 ms	n/a
1920x1080	rgb8	rgba8	cast	1	0	2.5 ms	1.332 ms	n/a
1920x1080	rgba8	u8	cast	1	0	24.67 ms	1.214 ms	n/a
1920x1080	rgba8	u16	cast	1	0	25.14 ms	1.272 ms	n/a
1920x1080	rgba8	f32	cast	1	0	25.68 ms	1.375 ms	n/a
1920x1080	rgba8	nv12	cast	1	0	25.81 ms	1.577 ms	n/a
1920x1080	rgba8	rgb8	cast	1	0	1.820 ms	1.380 ms	n/a
1920x1080	rgba8	rgba8	cast	1	0	2.323 ms	0.8022 ms	n/a
1920x1080	rgba8	rgba8	clamp	2	128	29.57 ms	1.696 ms	n/a
1920x1080	rgba8	bgra8	cast	1	0	1.892 ms	1.292 ms	n/a

VPI - Vision Programming Interface

0.2.0 Release

Overview

Implementation

Conversion Formulas

Channel depth conversion

Channel order conversion

Conversion between YUV and RGB

Conversion between RGB and Grayscale

Up-/Down-sampling

Alpha Channel Handling

Conversion Pipelines

Grayscale from/to Grayscale

Grayscale to NV12

NV12 to Grayscale

Grayscale to RGB space

RGB space to Grayscale

RGB space to NV12

NV12 to RGB space

Usage

Limitations and Constraints

PVA

Performance