Data Format Descriptions#
TensorRT supports different data formats. There are two aspects to consider: data type and layout.
Data Type Format
The data type is the representation of each value. Its size determines the range of values and the precision of the representation, which are:
FP32 (32-bit floating point or single precision)
FP16 (16-bit floating point or half precision)
BF16 (1-bit sign, 8-bit exponent, 7-bit mantissa)
FP8 (1-bit sign, 4-bit exponent, 3-bit mantissa)
FP4 (1-bit sign, 2-bit exponent, 1-bit mantissa)
E8M0 (0-bit sign, 8-bit exponent, 0-bit mantissa)
INT64 (64-bit integer)
INT32 (32-bit integer)
INT8 (8-bit integer)
UINT8 (unsigned 8-bit integer)
INT4 (4-bit integer)
Layout Format
The layout format determines the ordering in which values are stored. Typically, batch dimensions are the leftmost dimensions, and the other dimensions refer to aspects of each data item, such as C is channel, H is height, and W is width in images. Ignoring batch sizes, which always precede these, C, H, and W are typically sorted as CHW or HWC.
Format Name |
|
Description |
|---|---|---|
Linear (row-major) |
|
Default CHW ordering with no vectorization. |
NC/2HW2 |
|
Channel pairs packed per HxW element (FP16, BF16). |
NC/4HW4 |
|
4-channel vectors per HxW element (INT8). |
NHWC8 |
|
8-channel vectors in HWC order (FP16, BF16). |
NC/16HW16 |
|
16-channel vectors (DLA FP16). |
NC/32HW32 |
|
32-channel vectors (FP32, FP16, INT8). |
NDHWC8 |
|
3D variant of kHWC8 (FP16, BF16). |
NC/32DHW32 |
|
3D variant of kCHW32 (FP16, INT8). |
NHWC |
|
Channel-last without vectorization (FP32, UINT8). |
DLA Linear |
|
DLA-specific row-major (FP16, INT8). |
DLA HWC4 |
|
DLA-specific 4-channel vectors (FP16, INT8). |
NHWC16 |
|
16-channel vectors in HWC order (FP16, INT8, FP8). |
NDHWC |
|
3D channel-last without vectorization (FP32). |
For supported data type combinations per format, refer to the I/O Formats table.
The default CHW (kLINEAR) layout stores one HxW matrix per channel: all values of channel 0 are stored first, then all of channel 1, and so on.
In HWC (kHWC), a single HxW matrix holds the data, and each entry is a C-tuple containing all channel values for that pixel. All channel values for one pixel are stored consecutively before moving to the next pixel.
TensorRT also defines formats that pack channel values into fixed-width groups per pixel, which lets kernels load multiple channels with one vector instruction. NC/2HW2 (kCHW2) and NHWC8 (kHWC8) below are two examples; the table above lists the full set.
In NC/2HW2 (kCHW2), channel values are packed into pairs per pixel; the tensor is stored as ceil(C/2) HxW matrices of two-channel pairs. If C is odd, the last pair contains one padded slot. Within a pair, consecutive channels have stride 1; between pairs, the stride is 2*HxW. For example, a 3-channel input is stored as two pair-planes: [C=0, C=1] and [C=2, pad].
In NHWC8 (kHWC8), each pixel of the HxW matrix stores its channel values packed into 8-tuples, and C is padded up to a multiple of 8. For example, a 3-channel input has one 8-tuple per pixel containing 3 real values plus 5 padded slots.
Other TensorFormat values follow similar packing rules to kCHW2 and kHWC8.
See also
- Understanding Formats Printed in Logs
How to interpret tensor format and stride information in TensorRT log output.
- Glossary
Definitions of TensorRT terms including tensor formats and precision types.