Data Format Descriptions#

TensorRT supports different data formats. There are two aspects to consider: data type and layout.

Data Type Format

The data type is the representation of each value. Its size determines the range of values and the precision of the representation, which are:

FP32 (32-bit floating point or single precision)
FP16 (16-bit floating point or half precision)
BF16 (1-bit sign, 8-bit exponent, 7-bit mantissa)
FP8 (1-bit sign, 4-bit exponent, 3-bit mantissa)
INT64 (64-bit integer)
INT32 (32-bit integer)
INT8 (8-bit integer)
UINT8 (unsigned 8-bit integer)
INT4 (4-bit integer)

Layout Format

The layout format determines the ordering in which values are stored. Typically, batch dimensions are the leftmost dimensions, and the other dimensions refer to aspects of each data item, such as C is channel, H is height, and W is width in images. Ignoring batch sizes, which always precede these, C, H, and W are typically sorted as CHW or HWC.

Table 20 TensorFormat Enum Quick Reference#
Format Name	`TensorFormat` Enum	Description
Linear (row-major)	`kLINEAR`	Default CHW ordering with no vectorization.
NC/2HW2	`kCHW2`	Channel pairs packed per HxW element (FP16, BF16).
NC/4HW4	`kCHW4`	4-channel vectors per HxW element (INT8).
NHWC8	`kHWC8`	8-channel vectors in HWC order (FP16, BF16).
NC/16HW16	`kCHW16`	16-channel vectors (DLA FP16).
NC/32HW32	`kCHW32`	32-channel vectors (FP32, FP16, INT8).
NDHWC8	`kDHWC8`	3D variant of kHWC8 (FP16, BF16).
NC/32DHW32	`kCDHW32`	3D variant of kCHW32 (FP16, INT8).
NHWC	`kHWC`	Channel-last without vectorization (FP32, UINT8).
DLA Linear	`kDLA_LINEAR`	DLA-specific row-major (FP16, INT8).
DLA HWC4	`kDLA_HWC4`	DLA-specific 4-channel vectors (FP16, INT8).
NHWC16	`kHWC16`	16-channel vectors in HWC order (FP16, INT8, FP8).
NDHWC	`kDHWC`	3D channel-last without vectorization (FP32).

For supported data type combinations per format, refer to the I/O Formats table in the Advanced Topics section.

The following image is divided into HxW matrices, one per channel, and the matrices are stored in sequence; all channel values are stored contiguously.

The image is stored as a single HxW matrix, whose value is C-tuple, with a value per channel; all the values of a point (pixel) are stored contiguously.

More formats are defined to pack together channel values and use reduced precision to enable faster computations. For this reason, TensorRT also supports formats like NC2HW2, and NHWC8.

In NC2HW2 (TensorFormat::kCHW2), pairs of channel values are packed together in each HxW matrix (with an empty value in the case of an odd number of channels). The result is a format in which the values of ⌈C/2⌉ HxW matrices are pairs of values of two consecutive channels; notice that this ordering interleaves dimension as values of channels that have stride 1 if they are in the same pair and stride 2xHxW otherwise.

A pair of channel values is packed together in each HxW matrix. The result is a format in which the values of ⌈C/2⌉ HxW matrices are pairs of values of two consecutive channels.

In NHWC8 (TensorFormat::kHWC8), the entries of an HxW matrix include the values of all the channels. In addition, these values are packed together in ⌈C/8⌉ 8-tuples, and C is rounded up to the nearest multiple of 8.

In this NHWC8 format, the entries of an HxW matrix include the values of all the channels.

Other TensorFormat follows similar rules to TensorFormat::kCHW2 and TensorFormat::kHWC8 mentioned previously.