Working with Loops#
NVIDIA TensorRT supports loop-like constructs, which can be useful for recurrent networks. TensorRT loops support scanning over input tensors, recurrent definitions of tensors, and scan outputs and last value outputs.
Defining a Loop#
Loop boundary layers define a loop.
ITripLimitLayer
specifies how many times the loop iterates.IIteratorLayer
enables a loop to iterate over a tensor.IRecurrenceLayer
specifies a recurrent definition.ILoopOutputLayer
specifies an output from the loop.
Each boundary layer inherits from the class ILoopBoundaryLayer
, which has a method getLoop()
for getting its associated ILoop
. The ILoop
object identifies the loop, and all loop boundary layers with the same ILoop
belong to that loop.
The following figure depicts a loop structure and data flow at the boundary. Loop-invariant tensors can be used directly inside the loop, as shown for FooLayer
.

As explained later, a loop can have multiple IIteratorLayer
, IRecurrenceLayer
, and ILoopOutputLayer
and, at most, two ITripLimitLayers
. A loop with no ILoopOutputLayer
has no output and is optimized by TensorRT.
Interior layers are free to use tensors defined inside or outside the loop. The interior can contain other loops (refer to Nested Loops) and other conditional constructs (refer to Nesting and Loops).
To define a loop, first create an ILoop
object using the INetworkDefinition method::addLoop
. Then, add the boundary and interior layers. The rest of this section describes the features of the boundary layers, using a loop to denote the ILoop*
returned by INetworkDefinition::addLoop
.
ITripLimitLayer
supports both counted loops and while loops.
loop->addTripLimit(t,TripLimit::kCOUNT)
creates anITripLimitLayer
whose inputt
is a 0D INT32 tensor that specifies the number of loop iterations.loop->addTripLimit(t,TripLimit::kWHILE)
creates anITripLimitLayer
whose inputt
is a 0D Bool tensor that specifies whether an iteration should occur. Typically,t
is either the output of anIRecurrenceLayer
or a calculation based on said output.
A loop can have at most one of each kind of limit.
IIteratorLayer
supports iterating forwards or backward over any axis.
loop->addIterator(t)
adds anIIteratorLayer
that iterates over axis 0 of tensort
. For example, if the input is the matrix:[[2,3,5], [4,6,8]]
The output is the 1D tensor {2, 3, 5}
on the first iteration and {4, 6, 8}
on the second iteration. It is invalid to iterate beyond the tensor’s bounds.
loop->addIterator(t,axis)
is similar, but the layer iterates over the given axis. For example, ifaxis=1
and the input is a matrix, each iteration delivers a matrix column.loop->addIterator(t,axis,reverse)
is similar, but the layer produces its output in reverse order ifreverse=true
.
ILoopOutputLayer
supports three forms of loop output:
loop->addLoopOutput(t,LoopOutput::kLAST_VALUE)
outputs the last value oft
, wheret
must be the output of anIRecurrenceLayer
.loop->addLoopOutput(t,LoopOutput::kCONCATENATE,axis)
outputs the concatenation of each iteration’s input tot
. For example, if the input is a 1D tensor, with value{a,b,c}
on the first iteration and{d,e,f}
on the second iteration, andaxis=0
, the output is the matrix:[[a, b, c], [d, e, f]]
If
axis=1
, the output is:[[a, d], [b, e], [c, f]]
loop->addLoopOutput(t,LoopOutput::kREVERSE,axis)
is similar, but reverses the order.
Both the kCONCATENATE
and kREVERSE
forms of ILoopOutputLayer
require a second input, a 0D INT32 shape tensor specifying the length of the new output dimension. When the length exceeds the number of iterations, the extra elements contain arbitrary values. The second input, for example, u
, should be set using ILoopOutputLayer::setInput(1,u)
.
Finally, there is IRecurrenceLayer
. Its first input specifies the initial output value, and its second input specifies the next output value. The first input must come from outside the loop; the second usually comes from inside. For example, the TensorRT analog of this C++ fragment:
for (int32_t i = j; ...; i += k) ...
These calls could create this, where j
and k
are ITensor*
.
ILoop* loop = n.addLoop();
IRecurrenceLayer* iRec = loop->addRecurrence(j);
ITensor* i = iRec->getOutput(0);
ITensor* iNext = addElementWise(*i, *k,
ElementWiseOperation::kADD)->getOutput(0);
iRec->setInput(1, *iNext);
The second input to IRecurrenceLayer
is the only case where TensorRT allows a back edge. If such inputs are removed, the remaining network must be acyclic.
Formal Semantics#
TensorRT has applicative semantics, meaning there are no visible side effects besides engine inputs and outputs. Because there are no side effects, intuitions about loops from imperative languages do not always work. This section defines formal semantics for TensorRT’s loop constructs.
The formal semantics is based on lazy sequences of tensors. Each iteration of a loop corresponds to an element in the sequence. The sequence for a tensor X
inside the loop is denoted \(\left\langle X_{0},X_{1},X_{2}, ... \right\rangle\). Elements of the sequence are evaluated lazily, meaning as needed.
The output from IIteratorLayer(X)
is \(\left\langle X\left[ 0 \right],X\left[ 1 \right],X\left[ 2 \right], ... \right\rangle\) where X[i]
denotes subscripting on the axis specified for the IIteratorLayer
.
The output from IRecurrenceLayer(X,Y)
is \(\left\langle \text{X},Y_{0},Y_{1},Y_{2} ... \right\rangle\).
The input and output from an ILoopOutputLayer
depend on the kind of LoopOutput
.
kLAST_VALUE
: Input is a single tensorX
, and output is \(X_{n}\) for an n-trip loop.kCONCATENATE
: The first input is a tensorX
, and the second is a scalar shape tensorY
. The result is the concatenation of \(X_{0},X_{1},X_{2}, ... X_{n-1}\) with post padding, if necessary, to the length specified byY
. It is a runtime error ifY < n
.Y
is a build time constant. Note the inverse relationship withIIteratorLayer
.IIteratorLayer
maps a tensor to a sequence of subtensors;ILoopOutputLayer
withkCONCATENATE
maps a sequence of subtensors to a tensor.kREVERSE
: Similar tokCONCATENATE
, but the output is in the reverse direction.
The value of n in the definitions for the output of ILoopOutputLayer
is determined by the ITripLimitLayer
for the loop:
For counted loops, it is the iteration count, meaning the input to the
ITripLimitLayer
.For while loops, it is the least
n
such that \(X_{n}\) is false, whereX
is the sequence for theITripLimitLayer
input tensor.
The output from a non-loop layer is a sequence-wise application of the layer’s function. For example, for a two-input non-loop layer \(F\left( X,Y \right)=\left\langle f\left( X_{0},Y_{0} \right),f\left( X_{1},Y_{1} \right),f\left( X_{2},Y_{2} \right) \right\rangle\). If a tensor comes from outside the loop, that is, a loop invariant, then the sequence for it is created by replicating the tensor.
Nested Loops#
TensorRT infers the nesting of the loops from the data flow. For instance, if loop B uses values defined inside loop A, then B is considered to be nested inside of A.
TensorRT rejects networks where the loops are not cleanly nested, such as if loop A uses values defined in the interior of loop B and vice versa.
Limitations#
A loop that refers to more than one dynamic dimension can take an unexpected amount of memory. In a loop, memory is allocated as if all dynamic dimensions take on the maximum value of any of those dimensions. For example, if a loop refers to two tensors with dimensions [4,x,y]
and [6,y]
, memory allocation for those tensors is as if their dimensions were [4,max(x,y),max(x,y)]
and [6,max(x,y)]
.
The input to a LoopOutputLayer
with kLAST_VALUE
must be the output from an IRecurrenceLayer
.
The loop API supports only FP32 and FP16 precision.
Replacing IRNNv2Layer
with Loops#
IRNNv2Layer
was deprecated in TensorRT 7.2.1 and was removed in TensorRT 10.0. Use the loop API to synthesize a recurrent sub-network. For an example, refer to sampleCharRNN, method SampleCharRNNLoop::addLSTMCell
. Using the loop API, you can express general recurrent networks instead of being limited to the prefabricated cells in IRNNLayer
and IRNNv2Layer
.