Dynamic Shapes: Advanced Topics#

Layer Extensions For Dynamic Shapes#

Some layers have optional inputs that allow specifying dynamic shape information; IShapeLayer can access a tensor’s shape at runtime. Some layers also allow for calculating new shapes. The next section goes into semantic details and restrictions. Here is a summary of what you might find useful in conjunction with dynamic shapes.

IShapeLayer outputs a 1D tensor containing the dimensions of the input tensor. For example, if the input tensor has dimensions [2,3,5,7], the output tensor is a four-element 1D tensor containing {2,3,5,7}. If the input tensor is a scalar, it has dimensions [], and the output tensor is a zero-element 1D tensor containing {}.

IResizeLayer accepts an optional second input containing the desired dimensions of the output.

IShuffleLayer accepts an optional second input containing the reshaped dimensions before the second transpose is applied. For example, the following network reshapes a tensor Y to have the same dimensions as X:

C++

auto* reshape = networkDefinition.addShuffle(Y);
reshape.setInput(1, networkDefintion.addShape(X)->getOutput(0));

Python

reshape = network_definition.add_shuffle(y)
reshape.set_input(1, network_definition.add_shape(X).get_output(0))

ISliceLayer accepts an optional second, third, and fourth input containing the start, size, and stride.

IConcatenationLayer, IElementWiseLayer, IGatherLayer, IIdentityLayer, and IReduceLayer can calculate shapes and create new shape tensors.

Restrictions for Dynamic Shapes#

The following layer restrictions arise because the layer’s weights have a fixed size:

IConvolutionLayer and IDeconvolutionLayer require that the channel dimension be a build time constant.
Int8 requires that the channel dimension be a build time constant.
Layers accepting additional shape inputs (IResizeLayer, IShuffleLayer, ISliceLayer) require that the additional shape inputs be compatible with the dimensions of the minimum and maximum optimization profiles as well as with the dimensions of the runtime data input; otherwise, it can lead to either a build time or runtime error.

Not all required build-time constants need to be set manually. TensorRT will infer shapes through the network layers, and only those that cannot be inferred to be build-time constants must be set manually.

For more information regarding layers, refer to the TensorRT Operator documentation.

Execution Tensors vs Shape Tensors#

TensorRT 8.5 largely erased the distinctions between execution tensors and shape tensors. However, when designing a network or analyzing performance, it can help to understand the internals and where internal synchronization is incurred.

Engines using dynamic shapes employ a ping-pong execution strategy.

Compute the shapes of tensors on the CPU until a shape requiring GPU results is reached.
Stream work to the GPU until you run out of work or reach an unknown shape. If the latter, synchronize and go back to step 1.

An execution tensor is a traditional TensorRT tensor. A shape tensor is a tensor that is related to shape calculations. It must have type Int32, Int64, Float, or Bool, its shape must be determinable at build time, and it must have no more than 64 elements. Refer to Shape Tensor I/O (Advanced) for additional restrictions for shape tensors at network I/O boundaries. For example, there is an IShapeLayer whose output is a 1D tensor containing the dimensions of the input tensor. The output is a shape tensor. IShuffleLayer accepts an optional second input that can specify reshaping dimensions. The second input must be a shape tensor.

When TensorRT needs a shape tensor, but the tensor has been classified as an execution tensor, the runtime copies the tensor from the GPU to the CPU, which incurs synchronization overhead.

Some layers are polymorphic in terms of the kinds of tensors they handle. For example, IElementWiseLayer can sum two INT32 execution tensors or two INT32 shape tensors. The type of the output tensor depends on its ultimate use. If the sum is used to reshape another tensor, it is a shape tensor.

Formal Inference Rules#

The formal inference rules used by TensorRT for classifying tensors are based on a type-inference algebra. Let E denote an execution tensor, and S denote a shape tensor.

IActivationLayer has the signature:

IActivationLayer: E → E

since it takes an execution tensor as an input and an execution tensor as an output. IElementWiseLayer is polymorphic in this respect, with two signatures:

IElementWiseLayer: S × S → S, E × E → E

For brevity, let us adopt the convention that t is a variable denoting either class of tensor, and all t in a signature refers to the same class of tensor. Then, the two previous signatures can be written as a single polymorphic signature:

IElementWiseLayer: t × t → t

The two-input IShuffleLayer has a shape tensor as the second input and is polymorphic concerning the first input:

IShuffleLayer (two inputs): t × S → t

IConstantLayer has no inputs but can produce a tensor of either kind, so its signature is:

IConstantLayer: → t

The signature for IShapeLayer allows all four possible combinations E→E, E→S, S→E, and S→S, so it can be written with two independent variables:

IShapeLayer: t1 → t2

Here is the complete set of rules, which also serves as a reference for which layers can be used to manipulate shape tensors:

The inferred types are not exclusive because the output can be the input of more than one subsequent layer. For example, an IConstantLayer might feed into one use that requires an execution tensor and another use that requires a shape tensor. The output of IConstantLayer is classified as both and can be used in both phase 1 and phase 2 of the two-phase execution.

The requirement that the size of a shape tensor be known at build time limits how ISliceLayer can be used to manipulate a shape tensor. Specifically, suppose the third parameter specifies the result’s size and is not a build-time constant. In that case, the length of the resulting tensor is unknown at build time, breaking the restriction that shape tensors have constant shapes. The slice will still work but will incur synchronization overhead at runtime because the tensor is considered an execution tensor that has to be copied back to the CPU to do further shape calculations.

The rank of any tensor has to be known at build time. For example, if the output of ISliceLayer is a 1D tensor of unknown length that is used as the reshape dimensions for IShuffleLayer, the output of the shuffle would have an unknown rank at build time, and hence such a composition is prohibited.

TensorRT’s inferences can be inspected using methods ITensor::isShapeTensor(), which returns true for a shape tensor, and ITensor::isExecutionTensor(), which returns true for an execution tensor. Build the entire network first before calling these methods because their answer can change depending on what uses of the tensor have been added.

For example, if a partially built network sums two tensors, T1 and T2, to create tensor T3, and none are yet needed as shape tensors, isShapeTensor() returns false for all three tensors. Setting the second input of IShuffleLayer to T3 would cause all three tensors to become shape tensors because IShuffleLayer requires its second optional input to be a shape tensor. If the output of IElementWiseLayer is a shape tensor, its inputs are, too.

Shape Tensor I/O (Advanced)#

Sometimes, the need arises to use a shape tensor as a network I/O tensor. For example, consider a network consisting solely of an IShuffleLayer. TensorRT infers that the second input is a shape tensor. ITensor::isShapeTensor returns true for it. Because it is an input shape tensor, TensorRT requires two things for it:

At build time: the optimization profile values of the shape tensor.
At run time: the values of the shape tensor.

The shape of an input shape tensor is always known at build time. The values must be described since they can be used to specify the dimensions of execution tensors.

The optimization profile values can be set using IOptimizationProfile::setShapeValues. Analogous to how min, max, and optimization dimensions must be supplied for execution tensors with runtime dimensions, min, max, and optimization values must be provided for shape tensors at build time.

The corresponding runtime method is IExecutionContext::setTensorAddress, which tells TensorRT where to look for the shape tensor values.

Because the inference of execution tensor or shape tensor is based on ultimate use, TensorRT cannot infer whether a network output is a shape tensor. You must tell it using the method INetworkDefinition::markOutputForShapes.

Besides letting you output shape information for debugging, this feature is useful for composing engines. For example, consider building three engines, one each for sub-networks A, B, and C, where a connection from A to B or B to C might involve a shape tensor. Build the networks in reverse order: C, B, and A. After constructing network C, you can use ITensor::isShapeTensor to determine if an input is a shape tensor and use INetworkDefinition::markOutputForShapes to mark the corresponding output tensor in network B. Then check which inputs of B are shape tensors and mark the corresponding output tensor in network A.

Shape tensors at network boundaries must have the type Int32 or Int64. They cannot have type Float or Bool. A workaround for Bool is to use Int32 for the I/O tensor, with zeros and ones, and convert to/from Bool using IIdentityLayer.

At runtime, whether a tensor is an I/O shape tensor can be determined using ICudaEngine::isShapeInferenceIO().