Is this page helpful?

Layer Fusion Catalog#

This page catalogs the fusion patterns TensorRT applies at build time. For an overview of why fusion matters and how to inspect fusion decisions in logs, refer to Enabling Layer Fusion.

Layer Fusion#

TensorRT attempts to perform many different types of optimizations in a network during the build phase. In the first phase, layers are fused whenever possible. Fusions transform the network into a simpler form but preserve the same overall behavior. Internally, many layer implementations have extra parameters and options that are not directly accessible when creating the network. Instead, the fusion optimization step detects supported patterns of operations and fuses multiple layers into one layer with an internal options set.

Consider the common case of a convolution followed by ReLU activation. Creating a network with these operations involves adding a Convolution layer with addConvolutionNd and following it with an Activation layer using addActivation with an ActivationType of kRELU. The unoptimized graph will contain separate layers for convolution and activation. The internal implementation of convolution supports computing the ReLU function on the output in one step directly from the convolution kernel without requiring a second kernel call. The fusion optimization step will detect the convolution followed by ReLU. Verify that the implementation supports the operations, then fuse them into one layer.

To investigate which fusions have occurred, the builder logs its operations to the logger object provided during construction. Optimization steps are at the kINFO log level. To view these messages, ensure you log them in the ILogger callback.

Fusions are normally handled by creating a new layer with a name containing the names of both of the layers that were fused. For example, a MatrixMultiply layer (InnerProduct) named ip1 is fused with a ReLU Activation layer named relu1 to create a new layer named ip1 + relu1.

Types of Fusions#

The following table summarizes the main fusion patterns. Refer to the dropdown sections below for full descriptions and constraints.

Fusion	Pattern	Key constraints
ReLU Activation	ReLU → ReLU	Both activations must be ReLU
Convolution + ReLU	Conv → ReLU	Any conv type; activation must be ReLU
Convolution + GELU	Conv → GELU	FP16 or INT8 I/O; Turing+ GPU
Convolution + Clip	Conv → Clip	Any conv type; activation must be Clip
Scale + Activation	Scale → Activation	Fused into single activation
Convolution + ElementWise	Conv → sum/min/max ElementWise	No batch broadcast unless across batch size
Padding + Convolution	Padding → Conv/Deconv	Non-negative padding only
Shuffle + Reduce	Shuffle (permute only) → Reduce	`keepDimensions` required on Reduce
Shuffle + Shuffle	Shuffle → Shuffle	Reshape fusion only when transpose inverses match
Convolution + Scale	Conv → `kUNIFORM`/`kCHANNEL` Scale	Disabled if scale has non-constant power
Convolution + Generic Activation	Conv → pointwise activation	After pointwise fusion pass
Reduce → Pooling	Reduce (avg, CHW, `keepDimensions`) → Pooling	Replaces avg pooling layer
Convolution + Pooling	Conv (+ optional fused act) → Pooling	Same precision on both layers
Depthwise Separable Convolution	Depthwise conv+act → conv+act	INT8 only; compute capability ≥ 7.2
Softmax + Log / TopK	Softmax → Log or TopK	Optional Log fused into Softmax+TopK
GELU / L1Norm / L2Norm / LogSum / LogSumExp	Unary + ElementWise + Reduce chains	See reduction-operation dropdown below

The following list describes the types of supported fusions in detail.

Supported Layer Fusions

Pointwise Fusion#

Multiple adjacent Pointwise layers can be fused into a single Pointwise layer to improve performance.

The following types of Pointwise layers are supported, with some limitations:

Activation: Every ActivationType is supported.
Constant: Only constant with a single value (size == 1).
ElementWise: Every ElementWiseOperation is supported.
Pointwise: Pointwise itself is also a Pointwise layer.
Scale: Only support ScaleMode::kUNIFORM.
Unary: Every UnaryOperation is supported.

The size of the fused Pointwise layer is not unlimited, so some layers cannot be fused.

Fusion creates a new layer with a name consisting of both fused layers. For example, an ElementWise layer named add1 is fused with a ReLU Activation layer named relu1, creating a new layer named fusedPointwiseNode(add1, relu1).

Q/DQ Fusion#

Refer to the Explicit Quantization section for suggestions on optimizing INT8 and FP8 networks containing QuantizeLinear and DequantizeLinear layers.