VPI - Vision Programming Interface

0.3.7 Release

Image Format Converter

Overview

The Image Format Converter is used to convert an image with a given format into another format. It handles both color space, format and depth conversions. The algorithm also supports input range conversion, when one is required to map, for instance, an unsigned char \([0,255]\) image into signed short \([-32768,32767]\) range.

Color Input Grayscale Output

Implementation

The algorithm is implemented as a pixel-wise conversion function that reads in the input pixels, applies a conversion-dependent series of transformations and writes the result to the output image in the same position. User inputs are:

  • input image created with requested input type
  • output image created with requested output type
  • flags specify how type casts will be performed, see clamp/cast
  • scale and offset to be used in range conversions, see range.

Several types of conversion are available:

  • grayscale \(\leftrightarrow\) color
  • grayscale \(\leftrightarrow\) grayscale (useful in depth and range conversions)
  • color \(\leftrightarrow\) color (e.g. YUV to RGB and vice-versa)

The grayscale (or single channel) formats available are:

The color formats available are:

The following table shows which combinations of input and output image types are available for conversion.

in U8 S8 U16 S16 F32 NV12 RGB8 RGBA8 BGR8 BGRA8 2F32
out
U8 yes yes yes yes yes yes yes yes yes yes no
S8 yes yes yes yes yes yes yes yes yes yes no
U16 yes yes yes yes yes yes yes yes yes yes no
S16 yes yes yes yes yes yes yes yes yes yes no
F32 yes yes yes yes yes yes yes yes yes yes no
NV12 yes yes yes yes yes yes yes yes yes yes no
RGB8 yes yes yes yes yes yes yes yes yes yes no
RGBA8 yes yes yes yes yes yes yes yes yes yes no
BGR8 yes yes yes yes yes yes yes yes yes yes no
BGRA8 yes yes yes yes yes yes yes yes yes yes no
2F32 no no no no no no no no no no yes1

1 - Only available when scale == 1 and offset == 0

Conversion Formulas

The following sections describe how input value is converted into output. In general, these conversions amount to color space, depth, channel order (swizzle), adding or removing alpha channel, down- or up-sampling transformations. These are represented as conversion pipelines made out of basic processing blocks defined below.

Channel depth conversion

Channel depth conversion is represented by the block aptly named "depth" and is defined by the following sub-pipeline:

depth \(=\) range \(\rightarrow\) round \(\rightarrow\)

clamp/cast

  • range: input is converted to floating point (fp32) and the following formula is applied:

    \[ f(x) = \text{scale} \times x + \text{offset} \]

    If scale==1 and offset==0, a shortcut is taken and no operation (not even conversion to floating point) is performed.

  • round: round to the nearest integer, with halfway cases being rounded away from zero, e.g round(0.5) == 1.0 and round(-0.5) == -1.0.
  • clamp/cast: operation controlled by the passed flags:
    • VPI_CONVERSION_CAST : cast input to output type like regular C cast or C++'s static_cast would do. Underflows and overflows will behave as described by C specification (including undefined behavior). This is used when it's known that input range fits into output and maximum performance is needed.
    • VPI_CONVERSION_CLAMP : the value is clamped so that overflows and overflows will map to output type's maximum and minimum values, respectively. The result is then cast to the output type. When output type is floating point, clamp behaves like cast.

When applied to multiple channels such as RGB, the operation is performed on each channel independently.

Channel order conversion

This is represented by the following block:

swizzle

It's used to permute (or swizzle) input type's channel order. Used in conversions from/to color spaces that can be represented in multiple ways, like RGB and BGR. The color space conversion functions assume a pre-determined channel order. In order to use them, channels must be reordered.

Conversion between YUV and RGB

For RGB \(\leftrightarrow\) YUV conversions, VPI uses the ITU-R BT.601 625-line specification. It's the same standard used by JPEG File Interchange Format (JFIF).

To precisely establish the conversion, let's define the following constants:

\begin{align} K_r &= 0.299 \\ K_g &= 0.587 \\ K_b &= 0.114 \\ K_{c_b} &= 1.772 \\ K_{c_r} &= 1.402 \\ \end{align}

For notation convenience, we're assuming that \(U\) and \(V\) correspond to \(C_b\) and \(C_r\) respectively. This assumption doesn't hold in general.

The conversion blocks can be defined as:

rgb2yuv

\begin{align} Y(r,g,b) &= \text{round}(r K_r + g K_g + b K_b)\big|^{255}_{0} \\ C_b(r,g,b) &= \text{round}((-r K_r - g K_g + b (1 - K_b )) / K_ {c_b} + 128)\big|^{255}_{0} \\ C_r(r,g,b) &= \text{round}((r (1-K_r) - g K_g - b K_b) / K_{c_r} + 128)\big|^{255}_{0} \end{align}

These functions expect \(r,g,b \in [0,255]\)

yuv2rgb

\begin{align} R(y,c_b,c_r) &= \text{round}(y+K_{c_r}(c_r-128))\big|^{255}_{0} \\ G(y,c_b,c_r) &= \text{round}(y-[K_b K_{c_b} (c_b-128) + K_r K_{c_r} (c_r-128)] / K_g)\big|^{255}_{0} \\ B(y,c_b,c_r) &= \text{round}(y + K_{c_b} (c_b - 128))\big|^{255}_{0} \end{align}

These functions expect \(y,c_b,c_r \in [0,255]\)

The notation \(X\big|^{N} _ {M} \) represents clamping X's underflows and overflows to M and N respectively.

The round function follows the definition here.

Conversion between RGB and Grayscale

Conversion from RGB to grayscale follows the same specification used for conversion from RGB to YUV, but just returning the luma component. Hence, using the same constants defined here.

rgb2gray

\[ Y(r,g,b) = \text{round}(K_r \times r + K_g \times g + K_b \times b)\big|^{255}_{0} \]

For grayscale to RGB the conversion is simply:

gray2rgb

\[ f(x) = (x,x,x) \]

Up-/Down-sampling

For image formats that includes subsampled planes like VPI_IMAGE_TYPE_NV12, the following block definitions are needed:

2x downsample

\[ D[x,y] = S[2x,2y] \]

2x upsample

\[ D[x,y] = S[\lfloor x/2 \rfloor, \lfloor y/2 \rfloor] \]

Note
VPI is effectively upsamping using nearest-neighbor sampling. In a future version it'll use bilinear upsampling.

Alpha Channel Handling

Depending on input and output pixel type, i.e. whether it's required to remove or add an alpha channel, the following block might be used:

alpha
  • add alpha: append an opaque alpha channel to input pixel, e.g. RGB becomes RGBA. For integral channel types, the new alpha channel's value is the maximum representable of its type, e.g. 255 for 8-bit unsigned integer. Currently VPI doesn't support alpha channel on other channel types.
  • remove alpha: simply discards the alpha channel, e.g. RGBA becomes RGB.
  • do nothing: when input and output don't have alpha channel.

Conversion Pipelines

This section defines how input pixel is converted to output. It uses the basic conversion blocks defined in previous section.

Grayscale from/to Grayscale

input \(\rightarrow\) depth \(\rightarrow\)

output

Grayscale to NV12

input \(\rightarrow\) depth \(\rightarrow\) Y plane

\(\searrow\)

(128,128) \(\rightarrow\) UV plane

\(\nearrow\)

output
Note
Since NV12's pixel depth is 8-bit unsigned, \((u,v) = (128,128)\) corresponds to zero saturation.

NV12 to Grayscale

input

\(\rightarrow\)

Y plane \(\rightarrow\) depth \(\rightarrow\)

output

Grayscale to RGB space

input \(\rightarrow\) depth \(\rightarrow\) gray2rgb \(\rightarrow\) swizzle \(\rightarrow\) alpha \(\rightarrow\)

output

RGB space to Grayscale

input \(\rightarrow\) swizzle \(\rightarrow\) alpha \(\rightarrow\) rgb2gray \(\rightarrow\) depth \(\rightarrow\)

output

RGB space to NV12

input \(\rightarrow\) swizzle \(\rightarrow\) alpha \(\rightarrow\) depth \(\rightarrow\)

rgb2yuv

\(\nearrow\) Y plane

\(\searrow\)

\(\searrow\) UV plane \(\rightarrow\) 2x downsample

\(\nearrow\)

output

NV12 to RGB space

input
\(\nearrow\) Y plane \(\searrow\)
\(\searrow\) UV plane \(\rightarrow\) 2x upsample

\(\nearrow\)

yuv2rgb \(\rightarrow\) depth \(\rightarrow\) swizzle \(\rightarrow\) alpha \(\rightarrow\)

output

Usage

  1. Initialization phase
    1. Include the header that defines the image format converter function.
    2. Define the stream on which the algorithm will be executed.
      VPIStream stream = /*...*/;
    3. Define the input image. Here as an example we're creating a color image with dimensions \(w \times h\) and NV12 image type.
      VPIImage input;
    4. Create the output image with the destination image type. In this case, we want to convert the input to 16-bit signed integer grayscale.
      VPIImage output;
      vpiImageCreate(w, h, VPI_IMAGE_TYPE_S16, 0, &output);
  2. Processing phase
    1. Submit the algorithm to the stream, input, output images, specify we want clamping and also map the range from \([0,255]\) to \([-32768,32767]\).
      vpiSubmitImageFormatConverter(stream, input, output, VPI_CONVERSION_CLAMP, 257, -32768);
    2. Optionally, wait until the processing is done.
      vpiStreamSync(stream);

For more details, consult the API reference.

Limitations and Constraints

PVA

  • Not implemented.

Performance

For further information on how performance was benchmarked, see Performance Measurement.

Jetson AGX Xavier
sizeinput
type
output
type
conv.scaleoffsetCPUCUDAPVA
1920x1080u8u8cast10 0.188 ms0.135 msn/a
1920x1080u8u8clamp2128 1.38 ms0.1132 msn/a
1920x1080u8u16cast10 0.8 ms0.1225 msn/a
1920x1080u8f32cast10 0.24 ms0.1602 msn/a
1920x1080u8nv12cast10 0.749 ms0.1168 msn/a
1920x1080u8rgb8cast10 0.3 ms0.1898 msn/a
1920x1080u8rgba8cast10 0.378 ms0.1612 msn/a
1920x1080u16u8cast10 0.677 ms0.1104 msn/a
1920x1080u16u16cast10 0.3518 ms0.193 msn/a
1920x1080u16u16clamp2128 0.923 ms0.1402 msn/a
1920x1080u16f32cast10 0.886 ms0.1669 msn/a
1920x1080u16nv12cast10 0.758 ms0.1237 msn/a
1920x1080u16rgb8cast10 0.3 ms0.1931 msn/a
1920x1080u16rgba8cast10 0.29 ms0.1666 msn/a
1920x1080f32u8cast10 0.883 ms0.1203 msn/a
1920x1080f32u16cast10 0.684 ms0.1345 msn/a
1920x1080f32f32cast10 0.73 ms0.3035 msn/a
1920x1080f32f32clamp2128 1.029 ms0.1633 msn/a
1920x1080f32nv12cast10 0.980 ms0.1330 msn/a
1920x1080f32rgb8cast10 0.345 ms0.2038 msn/a
1920x1080f32rgba8cast10 0.42 ms0.1642 msn/a
1920x1080nv12u8cast10 0.611 ms0.1041 msn/a
1920x1080nv12u16cast10 0.8 ms0.1225 msn/a
1920x1080nv12f32cast10 0.221 ms0.1587 msn/a
1920x1080nv12nv12cast10 0.25 ms0.163 msn/a
1920x1080nv12nv12clamp2128 1.69 ms0.1506 msn/a
1920x1080nv12rgb8cast10 3.6 ms0.1890 msn/a
1920x1080nv12rgba8cast10 3.1 ms0.1792 msn/a
1920x1080rgb8u8cast10 3.549 ms0.1240 msn/a
1920x1080rgb8u16cast10 3.64 ms0.1407 msn/a
1920x1080rgb8f32cast10 3.93 ms0.1645 msn/a
1920x1080rgb8nv12cast10 6.6 ms0.1453 msn/a
1920x1080rgb8rgb8cast10 0.586 ms0.2488 msn/a
1920x1080rgb8rgb8clamp2128 1.456 ms0.18074 msn/a
1920x1080rgb8bgr8cast10 0.324 ms0.2022 msn/a
1920x1080rgb8rgba8cast10 0.354 ms0.1582 msn/a
1920x1080rgba8u8cast10 4.34 ms0.1267 msn/a
1920x1080rgba8u16cast10 4.45 ms0.1409 msn/a
1920x1080rgba8f32cast10 4.84 ms0.1683 msn/a
1920x1080rgba8nv12cast10 7.4 ms0.1447 msn/a
1920x1080rgba8rgb8cast10 0.346 ms0.2019 msn/a
1920x1080rgba8rgba8cast10 0.773 ms0.303 msn/a
1920x1080rgba8rgba8clamp2128 4.19 ms0.1714 msn/a
1920x1080rgba8bgra8cast10 0.4 ms0.1683 msn/a
Jetson TX2
sizeinput
type
output
type
conv.scaleoffsetCPUCUDAPVA
1920x1080u8u8cast10 0.585 ms1.769 msn/a
1920x1080u8u8clamp2128 2.5 ms0.445 msn/a
1920x1080u8u16cast10 0.47 ms0.424 msn/a
1920x1080u8f32cast10 0.64 ms0.498 msn/a
1920x1080u8nv12cast10 0.869 ms0.430 msn/a
1920x1080u8rgb8cast10 0.50 ms0.555 msn/a
1920x1080u8rgba8cast10 0.708 ms0.496 msn/a
1920x1080u16u8cast10 1.17 ms0.428 msn/a
1920x1080u16u16cast10 1.089 ms0.623 msn/a
1920x1080u16u16clamp2128 1.67 ms0.489 msn/a
1920x1080u16f32cast10 1.032 ms0.521 msn/a
1920x1080u16nv12cast10 1.243 ms0.450 msn/a
1920x1080u16rgb8cast10 1.14 ms0.572 msn/a
1920x1080u16rgba8cast10 1.15 ms0.521 msn/a
1920x1080f32u8cast10 1.83 ms0.461 msn/a
1920x1080f32u16cast10 1.049 ms0.491 msn/a
1920x1080f32f32cast10 2.083 ms1.116 msn/a
1920x1080f32f32clamp2128 1.35 ms0.549 msn/a
1920x1080f32nv12cast10 1.90 ms0.491 msn/a
1920x1080f32rgb8cast10 1.31 ms0.61 msn/a
1920x1080f32rgba8cast10 1.41 ms0.56 msn/a
1920x1080nv12u8cast10 0.80 ms0.400 msn/a
1920x1080nv12u16cast10 0.448 ms0.428 msn/a
1920x1080nv12f32cast10 0.637 ms0.499 msn/a
1920x1080nv12nv12cast10 0.981 ms2.575 msn/a
1920x1080nv12nv12clamp2128 3.01 ms0.577 msn/a
1920x1080nv12rgb8cast10 8.70 ms0.661 msn/a
1920x1080nv12rgba8cast10 8.7 ms0.64 msn/a
1920x1080rgb8u8cast10 11.7 ms0.510 msn/a
1920x1080rgb8u16cast10 11.9 ms0.534 msn/a
1920x1080rgb8f32cast10 12.7 ms0.60 msn/a
1920x1080rgb8nv12cast10 13.3 ms0.65 msn/a
1920x1080rgb8rgb8cast10 1.577 ms2.319 msn/a
1920x1080rgb8rgb8clamp2128 2.91 ms0.67 msn/a
1920x1080rgb8bgr8cast10 1.22 ms0.599 msn/a
1920x1080rgb8rgba8cast10 1.33 ms0.57 msn/a
1920x1080rgba8u8cast10 11.8 ms0.497 msn/a
1920x1080rgba8u16cast10 11.9 ms0.523 msn/a
1920x1080rgba8f32cast10 13.62 ms0.593 msn/a
1920x1080rgba8nv12cast10 13.6 ms0.588 msn/a
1920x1080rgba8rgb8cast10 1.19 ms0.60 msn/a
1920x1080rgba8rgba8cast10 2.091 ms1.126 msn/a
1920x1080rgba8rgba8clamp2128 15.56 ms0.69 msn/a
1920x1080rgba8bgra8cast10 1.32 ms0.561 msn/a
Jetson Nano
sizeinput
type
output
type
conv.scaleoffsetCPUCUDAPVA
1920x1080u8u8cast10 0.58 ms2.429 msn/a
1920x1080u8u8clamp2128 5.511 ms1.122 msn/a
1920x1080u8u16cast10 0.815 ms1.023 msn/a
1920x1080u8f32cast10 1.170 ms1.115 msn/a
1920x1080u8nv12cast10 1.744 ms1.046 msn/a
1920x1080u8rgb8cast10 0.949 ms1.307 msn/a
1920x1080u8rgba8cast10 1.116 ms1.134 msn/a
1920x1080u16u8cast10 2.69 ms1.030 msn/a
1920x1080u16u16cast10 1.1713 ms0.7099 msn/a
1920x1080u16u16clamp2128 4.26 ms1.195 msn/a
1920x1080u16f32cast10 2.02 ms1.175 msn/a
1920x1080u16nv12cast10 2.78 ms1.089 msn/a
1920x1080u16rgb8cast10 2.503 ms1.335 msn/a
1920x1080u16rgba8cast10 2.59 ms1.184 msn/a
1920x1080f32u8cast10 3.16 ms1.096 msn/a
1920x1080f32u16cast10 1.755 ms1.146 msn/a
1920x1080f32f32cast10 2.341 ms1.2967 msn/a
1920x1080f32f32clamp2128 1.901 ms1.250 msn/a
1920x1080f32nv12cast10 3.27 ms1.151 msn/a
1920x1080f32rgb8cast10 2.079 ms1.407 msn/a
1920x1080f32rgba8cast10 2.134 ms1.279 msn/a
1920x1080nv12u8cast10 1.628 ms0.989 msn/a
1920x1080nv12u16cast10 0.825 ms1.022 msn/a
1920x1080nv12f32cast10 1.167 ms1.116 msn/a
1920x1080nv12nv12cast10 1.028 ms3.439 msn/a
1920x1080nv12nv12clamp2128 6.85 ms1.457 msn/a
1920x1080nv12rgb8cast10 15.17 ms1.826 msn/a
1920x1080nv12rgba8cast10 14.96 ms1.765 msn/a
1920x1080rgb8u8cast10 24.44 ms1.334 msn/a
1920x1080rgb8u16cast10 24.72 ms1.379 msn/a
1920x1080rgb8f32cast10 25.34 ms1.448 msn/a
1920x1080rgb8nv12cast10 25.54 ms1.7445 msn/a
1920x1080rgb8rgb8cast10 1.7994 ms3.057 msn/a
1920x1080rgb8rgb8clamp2128 6.0 ms1.738 msn/a
1920x1080rgb8bgr8cast10 2.67 ms1.474 msn/a
1920x1080rgb8rgba8cast10 2.70 ms1.337 msn/a
1920x1080rgba8u8cast10 24.66 ms1.213 msn/a
1920x1080rgba8u16cast10 25.04 ms1.275 msn/a
1920x1080rgba8f32cast10 25.65 ms1.378 msn/a
1920x1080rgba8nv12cast10 25.69 ms1.577 msn/a
1920x1080rgba8rgb8cast10 1.826 ms1.382 msn/a
1920x1080rgba8rgba8cast10 2.321 ms1.3002 msn/a
1920x1080rgba8rgba8clamp2128 29.55 ms1.701 msn/a
1920x1080rgba8bgra8cast10 1.920 ms1.294 msn/a
VPI_IMAGE_TYPE_NV12
@ VPI_IMAGE_TYPE_NV12
8-bit NV12.
Definition: Types.h:212
vpiStreamSync
VPIStatus vpiStreamSync(VPIStream stream)
Blocks the calling thread until all submitted commands in this stream queue are done (queue is empty)...
VPIStream
struct VPIStreamImpl * VPIStream
A handle to a stream.
Definition: Types.h:177
ImageFormatConverter.h
VPIImage
struct VPIImageImpl * VPIImage
A handle to an image.
Definition: Types.h:183
VPI_CONVERSION_CLAMP
@ VPI_CONVERSION_CLAMP
Clamps input to output's type range.
Definition: Types.h:395
vpiSubmitImageFormatConverter
VPIStatus vpiSubmitImageFormatConverter(VPIStream stream, VPIImage input, VPIImage output, VPIConversionPolicy convPolicy, float scale, float offset)
Converts the image contents to the desired format, with optional scaling and offset.
vpiImageCreate
VPIStatus vpiImageCreate(uint32_t width, uint32_t height, VPIImageType type, uint32_t flags, VPIImage *img)
Create an empty image instance with the specified flags.
VPI_IMAGE_TYPE_S16
@ VPI_IMAGE_TYPE_S16
signed 16-bit grayscale.
Definition: Types.h:211