Working with Loops#

NVIDIA TensorRT supports loop-like constructs, which can be useful for recurrent networks. TensorRT loops support scanning over input tensors, recurrent definitions of tensors, and scan outputs and last value outputs.

Defining a Loop#

Loop boundary layers define a loop.

ITripLimitLayer specifies how many times the loop iterates.
IIteratorLayer enables a loop to iterate over a tensor.
IRecurrenceLayer specifies a recurrent definition.
ILoopOutputLayer specifies an output from the loop.

Each boundary layer inherits from the class ILoopBoundaryLayer, which has a method getLoop() for getting its associated ILoop. The ILoop object identifies the loop, and all loop boundary layers with the same ILoop belong to that loop.

The following figure depicts a loop structure and data flow at the boundary. Loop-invariant tensors can be used directly inside the loop, as shown for FooLayer.

As explained later, a loop can have multiple IIteratorLayer, IRecurrenceLayer, and ILoopOutputLayer and, at most, two ITripLimitLayers. A loop with no ILoopOutputLayer has no output and is optimized by TensorRT.

Interior layers are free to use tensors defined inside or outside the loop. The interior can contain other loops (refer to Nested Loops) and other conditional constructs (refer to Nesting and Loops).

To define a loop, first create an ILoop object using the INetworkDefinition method::addLoop. Then, add the boundary and interior layers. The rest of this section describes the features of the boundary layers, using a loop to denote the ILoop* returned by INetworkDefinition::addLoop.

ITripLimitLayer supports both counted loops and while loops.

loop->addTripLimit(t,TripLimit::kCOUNT) creates an ITripLimitLayer whose input t is a 0D INT32 tensor that specifies the number of loop iterations.
loop->addTripLimit(t,TripLimit::kWHILE) creates an ITripLimitLayer whose input t is a 0D Bool tensor that specifies whether an iteration should occur. Typically, t is either the output of an IRecurrenceLayer or a calculation based on said output.

A loop can have at most one of each kind of limit.

IIteratorLayer supports iterating forwards or backward over any axis.

loop->addIterator(t) adds an IIteratorLayer that iterates over axis 0 of tensor t. For example, if the input is the matrix:
```
[[2,3,5],
 [4,6,8]]
```

The output is the 1D tensor {2, 3, 5} on the first iteration and {4, 6, 8} on the second iteration. It is invalid to iterate beyond the tensor’s bounds.

loop->addIterator(t,axis) is similar, but the layer iterates over the given axis. For example, if axis=1 and the input is a matrix, each iteration delivers a matrix column.
loop->addIterator(t,axis,reverse) is similar, but the layer produces its output in reverse order if reverse=true.

ILoopOutputLayer supports three forms of loop output:

loop->addLoopOutput(t,LoopOutput::kLAST_VALUE) outputs the last value of t, where t must be the output of an IRecurrenceLayer.
loop->addLoopOutput(t,LoopOutput::kCONCATENATE,axis) outputs the concatenation of each iteration’s input to t. For example, if the input is a 1D tensor, with value {a,b,c} on the first iteration and {d,e,f} on the second iteration, and axis=0, the output is the matrix:
```
[[a, b, c],
 [d, e, f]]
```
If axis=1, the output is:
```
[[a, d],
 [b, e],
 [c, f]]
```
loop->addLoopOutput(t,LoopOutput::kREVERSE,axis) is similar, but reverses the order.

Both the kCONCATENATE and kREVERSE forms of ILoopOutputLayer require a second input, a 0D INT32 shape tensor specifying the length of the new output dimension. When the length exceeds the number of iterations, the extra elements contain arbitrary values. The second input, for example, u, should be set using ILoopOutputLayer::setInput(1,u).

Finally, there is IRecurrenceLayer. Its first input specifies the initial output value, and its second input specifies the next output value. The first input must come from outside the loop; the second usually comes from inside. For example, the TensorRT analog of this C++ fragment:

for (int32_t i = j; ...; i += k) ...

These calls could create this, where j and k are ITensor*.

ILoop* loop = n.addLoop();
IRecurrenceLayer* iRec = loop->addRecurrence(j);
ITensor* i = iRec->getOutput(0);
ITensor* iNext = addElementWise(*i, *k,
    ElementWiseOperation::kADD)->getOutput(0);
iRec->setInput(1, *iNext);

The second input to IRecurrenceLayer is the only case where TensorRT allows a back edge. If such inputs are removed, the remaining network must be acyclic.

Formal Semantics#

TensorRT has applicative semantics, meaning there are no visible side effects besides engine inputs and outputs. Because there are no side effects, intuitions about loops from imperative languages do not always work. This section defines formal semantics for TensorRT’s loop constructs.

The formal semantics is based on lazy sequences of tensors. Each iteration of a loop corresponds to an element in the sequence. The sequence for a tensor X inside the loop is denoted \(\left\langle X_{0},X_{1},X_{2}, ... \right\rangle\). Elements of the sequence are evaluated lazily, meaning as needed.

The output from IIteratorLayer(X) is \(\left\langle X\left[ 0 \right],X\left[ 1 \right],X\left[ 2 \right], ... \right\rangle\) where X[i] denotes subscripting on the axis specified for the IIteratorLayer.

The output from IRecurrenceLayer(X,Y) is \(\left\langle \text{X},Y_{0},Y_{1},Y_{2} ... \right\rangle\).

The input and output from an ILoopOutputLayer depend on the kind of LoopOutput.

kLAST_VALUE: Input is a single tensor X, and output is \(X_{n}\) for an n-trip loop.
kCONCATENATE: The first input is a tensor X, and the second is a scalar shape tensor Y. The result is the concatenation of \(X_{0},X_{1},X_{2}, ... X_{n-1}\) with post padding, if necessary, to the length specified by Y. It is a runtime error if Y < n. Y is a build time constant. Note the inverse relationship with IIteratorLayer. IIteratorLayer maps a tensor to a sequence of subtensors; ILoopOutputLayer with kCONCATENATE maps a sequence of subtensors to a tensor.
kREVERSE: Similar to kCONCATENATE, but the output is in the reverse direction.

The value of n in the definitions for the output of ILoopOutputLayer is determined by the ITripLimitLayer for the loop:

For counted loops, it is the iteration count, meaning the input to the ITripLimitLayer.
For while loops, it is the least n such that \(X_{n}\) is false, where X is the sequence for the ITripLimitLayer input tensor.

The output from a non-loop layer is a sequence-wise application of the layer’s function. For example, for a two-input non-loop layer \(F\left( X,Y \right)=\left\langle f\left( X_{0},Y_{0} \right),f\left( X_{1},Y_{1} \right),f\left( X_{2},Y_{2} \right) \right\rangle\). If a tensor comes from outside the loop, that is, a loop invariant, then the sequence for it is created by replicating the tensor.

Nested Loops#

TensorRT infers the nesting of the loops from the data flow. For instance, if loop B uses values defined inside loop A, then B is considered to be nested inside of A.

TensorRT rejects networks where the loops are not cleanly nested, such as if loop A uses values defined in the interior of loop B and vice versa.

Limitations#

A loop that refers to more than one dynamic dimension can take an unexpected amount of memory. In a loop, memory is allocated as if all dynamic dimensions take on the maximum value of any of those dimensions. For example, if a loop refers to two tensors with dimensions [4,x,y] and [6,y], memory allocation for those tensors is as if their dimensions were [4,max(x,y),max(x,y)] and [6,max(x,y)].

The input to a LoopOutputLayer with kLAST_VALUE must be the output from an IRecurrenceLayer.

The loop API supports only FP32 and FP16 precision.

Replacing `IRNNv2Layer` with Loops#

IRNNv2Layer was deprecated in TensorRT 7.2.1 and was removed in TensorRT 10.0. Use the loop API to synthesize a recurrent sub-network. For an example, refer to sampleCharRNN, method SampleCharRNNLoop::addLSTMCell. Using the loop API, you can express general recurrent networks instead of being limited to the prefabricated cells in IRNNLayer and IRNNv2Layer.