Data Format Descriptions#

TensorRT supports different data formats. There are two aspects to consider: data type and layout.

Data Type Format

The data type is the representation of each value. Its size determines the range of values and the precision of the representation, which are:

  • FP32 (32-bit floating point or single precision)

  • FP16 (16-bit floating point or half precision)

  • BF16 (1-bit sign, 8-bit exponent, 7-bit mantissa)

  • FP8 (1-bit sign, 4-bit exponent, 3-bit mantissa)

  • INT64 (64-bit integer)

  • INT32 (32-bit integer)

  • INT8 (8-bit integer)

  • UINT8 (unsigned 8-bit integer)

  • INT4 (4-bit integer)

Layout Format

The layout format determines the ordering in which values are stored. Typically, batch dimensions are the leftmost dimensions, and the other dimensions refer to aspects of each data item, such as C is channel, H is height, and W is width in images. Ignoring batch sizes, which always precede these, C, H, and W are typically sorted as CHW or HWC.

Table 20 TensorFormat Enum Quick Reference#

Format Name

TensorFormat Enum

Description

Linear (row-major)

kLINEAR

Default CHW ordering with no vectorization.

NC/2HW2

kCHW2

Channel pairs packed per HxW element (FP16, BF16).

NC/4HW4

kCHW4

4-channel vectors per HxW element (INT8).

NHWC8

kHWC8

8-channel vectors in HWC order (FP16, BF16).

NC/16HW16

kCHW16

16-channel vectors (DLA FP16).

NC/32HW32

kCHW32

32-channel vectors (FP32, FP16, INT8).

NDHWC8

kDHWC8

3D variant of kHWC8 (FP16, BF16).

NC/32DHW32

kCDHW32

3D variant of kCHW32 (FP16, INT8).

NHWC

kHWC

Channel-last without vectorization (FP32, UINT8).

DLA Linear

kDLA_LINEAR

DLA-specific row-major (FP16, INT8).

DLA HWC4

kDLA_HWC4

DLA-specific 4-channel vectors (FP16, INT8).

NHWC16

kHWC16

16-channel vectors in HWC order (FP16, INT8, FP8).

NDHWC

kDHWC

3D channel-last without vectorization (FP32).

For supported data type combinations per format, refer to the I/O Formats table in the Advanced Topics section.

The following image is divided into HxW matrices, one per channel, and the matrices are stored in sequence; all channel values are stored contiguously.

Layout Format for CHW

The image is stored as a single HxW matrix, whose value is C-tuple, with a value per channel; all the values of a point (pixel) are stored contiguously.

Layout format for HWC

More formats are defined to pack together channel values and use reduced precision to enable faster computations. For this reason, TensorRT also supports formats like NC2HW2, and NHWC8.

In NC2HW2 (TensorFormat::kCHW2), pairs of channel values are packed together in each HxW matrix (with an empty value in the case of an odd number of channels). The result is a format in which the values of ⌈C/2⌉ HxW matrices are pairs of values of two consecutive channels; notice that this ordering interleaves dimension as values of channels that have stride 1 if they are in the same pair and stride 2xHxW otherwise.

A pair of channel values is packed together in each HxW matrix. The result is a format in which the values of ⌈C/2⌉ HxW matrices are pairs of values of two consecutive channels.

Values of ⌈C/2⌉ HxW Matrices are Pairs of Values of Two Consecutive Channels

In NHWC8 (TensorFormat::kHWC8), the entries of an HxW matrix include the values of all the channels. In addition, these values are packed together in ⌈C/8⌉ 8-tuples, and C is rounded up to the nearest multiple of 8.

In this NHWC8 format, the entries of an HxW matrix include the values of all the channels.

In NHWC8 Format, the Entries of an HxW Matrix Include the Values of all the Channels

Other TensorFormat follows similar rules to TensorFormat::kCHW2 and TensorFormat::kHWC8 mentioned previously.

See also

Understanding Formats Printed in Logs

How to interpret tensor format and stride information in TensorRT log output.

Glossary

Definitions of TensorRT terms including tensor formats and precision types.