Data Format Descriptions#
TensorRT supports different data formats. There are two aspects to consider: data type and layout.
Data Type Format
The data type is the representation of each value. Its size determines the range of values and the precision of the representation, which are:
FP32 (32-bit floating point or single precision)
FP16 (16-bit floating point or half precision)
BF16 (1-bit sign, 8-bit exponent, 7-bit mantissa)
FP8 (1-bit sign, 4-bit exponent, 3-bit mantissa)
INT64 (64-bit integer)
INT32 (32-bit integer)
INT8 (8-bit integer)
UINT8 (unsigned 8-bit integer)
INT4 (4-bit integer)
Layout Format
The layout format determines the ordering in which values are stored. Typically, batch dimensions are the leftmost dimensions, and the other dimensions refer to aspects of each data item, such as C
is channel, H
is height, and W
is width in images. Ignoring batch sizes, which always precede these, C
, H
, and W
are typically sorted as CHW
or HWC
.
The following image is divided into HxW
matrices, one per channel, and the matrices are stored in sequence; all channel values are stored contiguously.
The image is stored as a single HxW matrix, whose value is C-tuple, with a value per channel; all the values of a point (pixel) are stored contiguously.
More formats are defined to pack together channel values and use reduced precision to enable faster computations. For this reason, TensorRT also supports formats like NC
, 2HW2
, and NHWC8
.
In NC
, 2HW2 (TensorFormat::kCHW2)
, pairs of channel values are packed together in each HxW
matrix (with an empty value in the case of an odd number of channels). The result is a format in which the values of ⌈C/2⌉ HxW
matrices are pairs of values of two consecutive channels; notice that this ordering interleaves dimension as values of channels that have stride 1
if they are in the same pair and stride 2xHxW
otherwise.
A pair of channel values is packed together in each HxW
matrix. The result is a format in which the values of ⌈C/2⌉ HxW
matrices are pairs of values of two consecutive channels.
In NHWC8 (TensorFormat::kHWC8)
, the entries of an HxW
matrix include the values of all the channels. In addition, these values are packed together in ⌈C/8⌉
8-tuples, and C
is rounded up to the nearest multiple of 8
.
In this NHWC8 format, the entries of an HxW
matrix include the values of all the channels.
Other TensorFormat
follows similar rules to TensorFormat::kCHW2
and TensorFormat::kHWC8
mentioned previously.