Layers

PaddingMode

tensorrt.PaddingMode
Enumerates types of padding available in convolution, deconvolution and pooling layers.

Padding mode takes precedence if both padding_mode and pre_padding are set.

EXPLICIT* corresponds to explicit padding.
SAME* implicitly calculates padding such that the output dimensions are the same as the input dimensions. For convolution and pooling, output dimensions are determined by ceil(input dimensions, stride).
CAFFE* corresponds to symmetric padding.

Members:

EXPLICIT_ROUND_DOWN : Use explicit padding, rounding the output size down

EXPLICIT_ROUND_UP : Use explicit padding, rounding the output size up

SAME_UPPER : Use SAME padding, with pre_padding <= post_padding

SAME_LOWER : Use SAME padding, with pre_padding >= post_padding

CAFFE_ROUND_DOWN : Use CAFFE padding, rounding the output size down

CAFFE_ROUND_UP : Use CAFFE padding, rounding the output size up

IConvolutionLayer

class tensorrt.IConvolutionLayer

A convolution layer in an INetworkDefinition .

This layer performs a correlation operation between 3-dimensional filter with a 4-dimensional tensor to produce another 4-dimensional tensor.

An optional bias argument is supported, which adds a per-channel constant to each value in the output.

Variables
  • kernel_sizeDimsHW The HW kernel size of the convolution.

  • num_output_mapsint The number of output maps for the convolution.

  • strideDimsHW The stride of the convolution. Default: (1, 1)

  • paddingDimsHW The padding of the convolution. The input will be zero-padded by this number of elements in the height and width directions. If the padding is asymmetric, this value corresponds to the pre-padding. Default: (0, 0)

  • pre_paddingDimsHW The pre-padding. The start of input will be zero-padded by this number of elements in the height and width directions. Default: (0, 0)

  • post_paddingDimsHW The post-padding. The end of input will be zero-padded by this number of elements in the height and width directions. Default: (0, 0)

  • padding_modePaddingMode The padding mode. Padding mode takes precedence if both IConvolutionLayer.padding_mode and either IConvolutionLayer.pre_padding or IConvolutionLayer.post_padding are set.

  • num_groupsint The number of groups for a convolution. The input tensor channels are divided into this many groups, and a convolution is executed for each group, using a filter per group. The results of the group convolutions are concatenated to form the output. Note When using groups in int8 mode, the size of the groups (i.e. the channel count divided by the group count) must be a multiple of 4 for both input and output. Default: 1.

  • kernelWeights The kernel weights for the convolution. The weights are specified as a contiguous array in GKCRS order, where G is the number of groups, K the number of output feature maps, C the number of input channels, and R and S are the height and width of the filter.

  • biasWeights The bias weights for the convolution. Bias is optional. To omit bias, set this to an empty Weights object. The bias is applied per-channel, so the number of weights (if non-zero) must be equal to the number of output feature maps.

  • dilationDimsHW The dilation for a convolution. Default: (1, 1)

  • kernel_size_ndDims The multi-dimension kernel size of the convolution.

  • stride_ndDims The multi-dimension stride of the convolution. Default: (1, …, 1)

  • padding_ndDims The multi-dimension padding of the convolution. The input will be zero-padded by this number of elements in each dimension. If the padding is asymmetric, this value corresponds to the pre-padding. Default: (0, …, 0)

  • dilation_ndDims The multi-dimension dilation for the convolution. Default: (1, …, 1)

IFullyConnectedLayer

class tensorrt.IFullyConnectedLayer

A fully connected layer in an INetworkDefinition .

This layer expects an input tensor of three or more non-batch dimensions. The input is automatically reshaped into an MxV tensor X, where V is a product of the last three dimensions and M is a product of the remaining dimensions (where the product over 0 dimensions is defined as 1). For example:

  • If the input tensor has shape {C, H, W}, then the tensor is reshaped into {1, C*H*W} .

  • If the input tensor has shape {P, C, H, W}, then the tensor is reshaped into {P, C*H*W} .

The layer then performs:

\(Y := matmul(X, W^T) + bias\)

Where X is the MxV tensor defined above, W is the KxV weight tensor of the layer, and bias is a row vector size K that is broadcasted to MxK . K is the number of output channels, and configurable via IFullyConnectedLayer.num_output_channels . If bias is not specified, it is implicitly 0 .

The MxK result Y is then reshaped such that the last three dimensions are {K, 1, 1} and the remaining dimensions match the dimensions of the input tensor. For example:

  • If the input tensor has shape {C, H, W}, then the output tensor will have shape {K, 1, 1} .

  • If the input tensor has shape {P, C, H, W}, then the output tensor will have shape {P, K, 1, 1} .

Variables
  • num_output_channelsint The number of output channels K from the fully connected layer.

  • kernelWeights The kernel weights, given as a KxC matrix in row-major order.

  • biasWeights The bias weights. Bias is optional. To omit bias, set this to an empty Weights object.

IActivationLayer

tensorrt.ActivationType

The type of activation to perform.

Members:

RELU : Rectified Linear activation

SIGMOID : Sigmoid activation

TANH : Hyperbolic Tangent activation

LEAKY_RELU : Leaky Relu activation: f(x) = x if x >= 0, f(x) = alpha * x if x < 0

ELU : Elu activation: f(x) = x if x >= 0, f(x) = alpha * (exp(x) - 1) if x < 0

SELU : Selu activation: f(x) = beta * x if x > 0, f(x) = beta * (alpha * exp(x) - alpha) if x <= 0

SOFTSIGN : Softsign activation: f(x) = x / (1 + abs(x))

SOFTPLUS : Softplus activation: f(x) = alpha * log(exp(beta * x) + 1)

CLIP : Clip activation: f(x) = max(alpha, min(beta, x))

HARD_SIGMOID : Hard sigmoid activation: f(x) = max(0, min(1, alpha * x + beta))

SCALED_TANH : Scaled Tanh activation: f(x) = alpha * tanh(beta * x)

THRESHOLDED_RELU : Thresholded Relu activation: f(x) = x if x > alpha, f(x) = 0 if x <= alpha

class tensorrt.IActivationLayer

An Activation layer in an INetworkDefinition . This layer applies a per-element activation function to its input. The output has the same shape as the input.

Variables
  • typeActivationType The type of activation to be performed.

  • alphafloat The alpha parameter that is used by some parametric activations (LEAKY_RELU, ELU, SELU, SOFTPLUS, CLIP, HARD_SIGMOID, SCALED_TANH). Other activations ignore this parameter.

  • betafloat The beta parameter that is used by some parametric activations (SELU, SOFTPLUS, CLIP, HARD_SIGMOID, SCALED_TANH). Other activations ignore this parameter.

IPoolingLayer

tensorrt.PoolingType

The type of pooling to perform in a pooling layer.

Members:

MAX : Maximum over elements

AVERAGE : Average over elements. If the tensor is padded, the count includes the padding

MAX_AVERAGE_BLEND : Blending between the max pooling and average pooling: (1-blendFactor)*maxPool + blendFactor*avgPool

class tensorrt.IPoolingLayer

A Pooling layer in an INetworkDefinition . The layer applies a reduction operation within a window over the input.

Variables
  • typePoolingType The type of pooling to be performed.

  • window_sizeDimsHW The window size for pooling.

  • strideDimsHW The stride for pooling. Default: (1, 1)

  • paddingDimsHW The padding for pooling. Default: (0, 0)

  • pre_paddingDimsHW The pre-padding. The start of input will be zero-padded by this number of elements in the height and width directions. Default: (0, 0)

  • post_paddingDimsHW The post-padding. The end of input will be zero-padded by this number of elements in the height and width directions. Default: (0, 0)

  • padding_modePaddingMode The padding mode. Padding mode takes precedence if both IPoolingLayer.padding_mode and either IPoolingLayer.pre_padding or IPoolingLayer.post_padding are set.

  • blend_factorfloat The blending factor for the max_average_blend mode: \(max_average_blendPool = (1-blendFactor)*maxPool + blendFactor*avgPool\) . blend_factor is a user value in [0,1] with the default value of 0.0. This value only applies for the PoolingType.MAX_AVERAGE_BLEND mode.

  • average_count_excludes_paddingbool Whether average pooling uses as a denominator the overlap area between the window and the unpadded input. If this is not set, the denominator is the overlap between the pooling window and the padded input. Default: True

  • window_size_ndDims The multi-dimension window size for pooling.

  • stride_ndDims The multi-dimension stride for pooling. Default: (1, …, 1)

  • padding_ndDims The multi-dimension padding for pooling. Default: (0, …, 0)

ILRNLayer

class tensorrt.ILRNLayer

A LRN layer in an INetworkDefinition . The output size is the same as the input size.

Variables
  • window_sizeint The LRN window size. The window size must be odd and in the range of [1, 15].

  • alphafloat The LRN alpha value. The valid range is [-1e20, 1e20].

  • betafloat The LRN beta value. The valid range is [0.01, 1e5f].

  • kfloat The LRN K value. The valid range is [1e-5, 1e10].

IScaleLayer

tensorrt.ScaleMode

Controls how scale is applied in a Scale layer.

Members:

UNIFORM : Identical coefficients across all elements of the tensor.

CHANNEL : Per-channel coefficients. The channel dimension is assumed to be the third to last dimension.

ELEMENTWISE : Elementwise coefficients.

class tensorrt.IScaleLayer

A Scale layer in an INetworkDefinition .

This layer applies a per-element computation to its input:

\(output = (input * scale + shift) ^ power\)

The coefficients can be applied on a per-tensor, per-channel, or per-element basis.

Note If the number of weights is 0, then a default value is used for shift, power, and scale. The default shift is 0, the default power is 1, and the default scale is 1.

The output size is the same as the input size.

Note The input tensor for this layer is required to have a minimum of 3 dimensions.

Variables
  • modeScaleMode The scale mode.

  • shiftWeights The shift value.

  • scaleWeights The scale value.

  • powerWeights The power value.

  • channel_axisint The channel axis.

ISoftMaxLayer

class tensorrt.ISoftMaxLayer

A Softmax layer in an INetworkDefinition .

This layer applies a per-channel softmax to its input.

The output size is the same as the input size.

Variables

axesint The axis along which softmax is computed. Currently, only one axis can be set.

The axis is specified by setting the bit corresponding to the axis to 1, as a bit mask.
For example, consider an NCHW tensor as input (three non-batch dimensions).

In implicit mode :
Bit 0 corresponds to the C dimension boolean.
Bit 1 corresponds to the H dimension boolean.
Bit 2 corresponds to the W dimension boolean.

By default, softmax is performed on the axis which is the number of axes minus three. It is 0 if there are fewer than 3 non-batch axes. For example, if the input is NCHW, the default axis is C. If the input is NHW, then the default axis is H.

In explicit mode :
Bit 0 corresponds to the N dimension boolean.
Bit 1 corresponds to the C dimension boolean.
Bit 2 corresponds to the H dimension boolean.
Bit 3 corresponds to the W dimension boolean.
By default, softmax is performed on the axis which is the number of axes minus three. It is 0 if
there are fewer than 3 axes. For example, if the input is NCHW, the default axis is C. If the input
is NHW, then the default axis is N.

For example, to perform softmax on axis R of a NPQRCHW input, set bit 2 with implicit batch mode,
set bit 3 with explicit batch mode.

IConcatenationLayer

class tensorrt.IConcatenationLayer

A concatenation layer in an INetworkDefinition .

The output channel size is the sum of the channel sizes of the inputs. The other output sizes are the same as the other input sizes, which must all match.

Variables

axisint The axis along which concatenation occurs. 0 is the major axis (excluding the batch dimension). The default is the number of non-batch axes in the tensor minus three (e.g. for an NCHW input it would be 0), or 0 if there are fewer than 3 non-batch axes.

IDeconvolutionLayer

class tensorrt.IDeconvolutionLayer

A deconvolution layer in an INetworkDefinition .

Variables
  • kernel_sizeDimsHW The HW kernel size of the convolution.

  • num_output_mapsint The number of output feature maps for the deconvolution.

  • strideDimsHW The stride of the deconvolution. Default: (1, 1)

  • paddingDimsHW The padding of the deconvolution. The input will be zero-padded by this number of elements in the height and width directions. Padding is symmetric. Default: (0, 0)

  • pre_paddingDimsHW The pre-padding. The start of input will be zero-padded by this number of elements in the height and width directions. Default: (0, 0)

  • post_paddingDimsHW The post-padding. The end of input will be zero-padded by this number of elements in the height and width directions. Default: (0, 0)

  • padding_modePaddingMode The padding mode. Padding mode takes precedence if both IDeconvolutionLayer.padding_mode and either IDeconvolutionLayer.pre_padding or IDeconvolutionLayer.post_padding are set.

  • num_groupsint The number of groups for a deconvolution. The input tensor channels are divided into this many groups, and a deconvolution is executed for each group, using a filter per group. The results of the group convolutions are concatenated to form the output. Note When using groups in int8 mode, the size of the groups (i.e. the channel count divided by the group count) must be a multiple of 4 for both input and output. Default: 1

  • kernelWeights The kernel weights for the deconvolution. The weights are specified as a contiguous array in CKRS order, where C the number of input channels, K the number of output feature maps, and R and S are the height and width of the filter.

  • biasWeights The bias weights for the deconvolution. Bias is optional. To omit bias, set this to an empty Weights object. The bias is applied per-feature-map, so the number of weights (if non-zero) must be equal to the number of output feature maps.

  • kernel_size_ndDims The multi-dimension kernel size of the convolution.

  • stride_ndDims The multi-dimension stride of the deconvolution. Default: (1, …, 1)

  • padding_ndDims The multi-dimension padding of the deconvolution. The input will be zero-padded by this number of elements in each dimension. Padding is symmetric. Default: (0, …, 0)

IElementWiseLayer

tensorrt.ElementWiseOperation

The binary operations that may be performed by an ElementWise layer.

Members:

SUM : Sum of the two elements

PROD : Product of the two elements

MAX : Max of the two elements

MIN : Min of the two elements

SUB : Subtract the second element from the first

DIV : Divide the first element by the second

POW : The first element to the power of the second element

FLOOR_DIV : Floor division of the first element by the second

AND : Logical AND of two elements

OR : Logical OR of two elements

XOR : Logical XOR of two elements

EQUAL : Check if two elements are equal

GREATER : Check if element in first tensor is greater than corresponding element in second tensor

LESS : Check if element in first tensor is less than corresponding element in second tensor

class tensorrt.IElementWiseLayer

A elementwise layer in an INetworkDefinition .

This layer applies a per-element binary operation between corresponding elements of two tensors.

The input dimensions of the two input tensors must be equal, and the output tensor is the same size as each input.

Variables

opElementWiseOperation The binary operation for the layer.

IGatherLayer

class tensorrt.IGatherLayer

A gather layer in an INetworkDefinition .

Variables
  • axisint The non-batch dimension axis to gather on. The axis must be less than the number of non-batch dimensions in the data input.

  • num_elementwise_dimsint The number of leading dimensions of indices tensor to be handled elementwise. For GatherMode::kDEFAULT, it must be 0 if there is an implicit batch dimension. It can be 0 or 1 if there is not an implicit batch dimension. For GatherMode::kND, it can be between 0 and one less than rank(data). For GatherMode::kELEMENT, it must be 0.

  • modeGatherMode The gather mode.

RNN Layers

tensorrt.RNNOperation

The RNN operations that may be performed by an RNN layer.

Equation definitions

In the equations below, we use the following naming convention:

t := current time step
i := input gate
o := output gate
f := forget gate
z := update gate
r := reset gate
c := cell gate
h := hidden gate
g[t] denotes the output of gate g at timestep t, e.g.`f[t]` is the output of the forget gate f .
X[t] := input tensor for timestep t
C[t] := cell state for timestep t
H[t] := hidden state for timestep t
W[g] := W (input) parameter weight matrix for gate g
R[g] := U (recurrent) parameter weight matrix for gate g
Wb[g] := W (input) parameter bias vector for gate g
Rb[g] := U (recurrent) parameter bias vector for gate g

Unless otherwise specified, all operations apply pointwise to elements of each operand tensor.

ReLU(X) := max(X, 0)
tanh(X) := hyperbolic tangent of X
sigmoid(X) := 1 / (1 + exp(-X))
exp(X) := e^X
A.B denotes matrix multiplication of A and B .
A*B denotes pointwise multiplication of A and B .

Equations

Depending on the value of RNNOperation chosen, each sub-layer of the RNN layer will perform one of the following operations:

RELU

\(H[t] := ReLU(W[i].X[t] + R[i].H[t-1] + Wb[i] + Rb[i])\)

TANH

\(H[t] := tanh(W[i].X[t] + R[i].H[t-1] + Wb[i] + Rb[i])\)

LSTM

\(i[t] := sigmoid(W[i].X[t] + R[i].H[t-1] + Wb[i] + Rb[i])\)
\(f[t] := sigmoid(W[f].X[t] + R[f].H[t-1] + Wb[f] + Rb[f])\)
\(o[t] := sigmoid(W[o].X[t] + R[o].H[t-1] + Wb[o] + Rb[o])\)
\(c[t] := tanh(W[c].X[t] + R[c].H[t-1] + Wb[c] + Rb[c])\)
\(C[t] := f[t]*C[t-1] + i[t]*c[t]\)
\(H[t] := o[t]*tanh(C[t])\)

GRU

\(z[t] := sigmoid(W[z].X[t] + R[z].H[t-1] + Wb[z] + Rb[z])\)
\(r[t] := sigmoid(W[r].X[t] + R[r].H[t-1] + Wb[r] + Rb[r])\)
\(h[t] := tanh(W[h].X[t] + r[t]*(R[h].H[t-1] + Rb[h]) + Wb[h])\)
\(H[t] := (1 - z[t])*h[t] + z[t]*H[t-1]\)

Members:

RELU : Single gate RNN w/ ReLU activation

TANH : Single gate RNN w/ TANH activation

LSTM : Four-gate LSTM network w/o peephole connections

GRU : Three-gate network consisting of Gated Recurrent Units

tensorrt.RNNDirection

The RNN direction that may be performed by an RNN layer.

Members:

UNIDIRECTION : Network iterates from first input to last input

BIDIRECTION : Network iterates from first to last (and vice versa) and outputs concatenated

tensorrt.RNNInputMode

The RNN input modes that may occur with an RNN layer.

If the RNN is configured with RNNInputMode.LINEAR , then for each gate g in the first layer of the RNN, the input vector X[t] (length E) is left-multiplied by the gate’s corresponding weight matrix W[g] (dimensions HxE) as usual, before being used to compute the gate output as described by RNNOperation .

If the RNN is configured with RNNInputMode.SKIP , then this initial matrix multiplication is “skipped” and W[g] is conceptually an identity matrix. In this case, the input vector X[t] must have length H (the size of the hidden state).

Members:

LINEAR : Perform the normal matrix multiplication in the first recurrent layer

SKIP : No operation is performed on the first recurrent layer

IRNNv2Layer

tensorrt.RNNGateType

The RNN input modes that may occur with an RNN layer.

If the RNN is configured with RNNInputMode.LINEAR , then for each gate g in the first layer of the RNN, the input vector X[t] (length E) is left-multiplied by the gate’s corresponding weight matrix W[g] (dimensions HxE) as usual, before being used to compute the gate output as described by RNNOperation .

If the RNN is configured with RNNInputMode.SKIP , then this initial matrix multiplication is “skipped” and W[g] is conceptually an identity matrix. In this case, the input vector X[t] must have length H (the size of the hidden state).

Members:

INPUT : Input Gate

OUTPUT : Output Gate

FORGET : Forget Gate

UPDATE : Update Gate

RESET : Reset Gate

CELL : Cell Gate

HIDDEN : Hidden Gate

class tensorrt.IRNNv2Layer

An RNN layer in an INetworkDefinition , version 2

Variables
  • num_layersint The layer count of the RNN.

  • hidden_sizeint The hidden size of the RNN.

  • max_seq_lengthint The maximum sequence length of the RNN

  • data_lengthint The length of the data being processed by the RNN for use in computing other values.

  • seq_lengthsITensor Individual sequence lengths in the batch with the ITensor provided. The seq_lengths ITensor should be a {N1, …, Np} tensor, where N1..Np are the index dimensions of the input tensor to the RNN. If seq_lengths is not specified, then the RNN layer assumes all sequences are size max_seq_length . All sequence lengths in seq_lengths should be in the range [1, max_seq_length ]. Zero-length sequences are not supported. This tensor must be of type int32.

  • opRNNOperation The operation of the RNN layer.

  • input_modeint The input mode of the RNN layer.

  • directionint The direction of the RNN layer.

  • hidden_stateITensor the initial hidden state of the RNN with the provided hidden_state ITensor . The hidden_state ITensor should have the dimensions {N1, …, Np, L, H}, where: N1..Np are the index dimensions specified by the input tensor L is the number of layers in the RNN, equal to num_layers H is the hidden state for each layer, equal to hidden_size if direction is RNNDirection.UNIDIRECTION , and 2x hidden_size otherwise.

  • cell_stateITensor The initial cell state of the LSTM with the provided cell_state ITensor . The cell_state ITensor should have the dimensions {N1, …, Np, L, H}, where: N1..Np are the index dimensions specified by the input tensor L is the number of layers in the RNN, equal to num_layers H is the hidden state for each layer, equal to hidden_size if direction is RNNDirection.UNIDIRECTION, and 2x hidden_size otherwise. It is an error to set this on an RNN layer that is not configured with RNNOperation.LSTM .

get_bias_for_gate(self: tensorrt.tensorrt.IRNNv2Layer, layer_index: int, gate: tensorrt.tensorrt.RNNGateType, is_w: bool) → numpy.ndarray

Get the bias parameters for an individual gate in the RNN.

Parameters
  • layer_index – The index of the layer that contains this gate.

  • gate – The name of the gate within the RNN layer.

  • is_w – True if the bias parameters are for the input bias Wb[g] and false if they are for the recurrent input bias Rb[g].

Returns

The bias parameters.

get_weights_for_gate(self: tensorrt.tensorrt.IRNNv2Layer, layer_index: int, gate: tensorrt.tensorrt.RNNGateType, is_w: bool) → numpy.ndarray

Get the weight parameters for an individual gate in the RNN.

Parameters
  • layer_index – The index of the layer that contains this gate.

  • gate – The name of the gate within the RNN layer.

  • is_w – True if the weight parameters are for the input matrix W[g] and false if they are for the recurrent input matrix R[g].

Returns

The weight parameters.

set_bias_for_gate(self: tensorrt.tensorrt.IRNNv2Layer, layer_index: int, gate: tensorrt.tensorrt.RNNGateType, is_w: bool, bias: tensorrt.tensorrt.Weights) → None

Set the bias parameters for an individual gate in the RNN.

Parameters
  • layer_index – The index of the layer that contains this gate.

  • gate – The name of the gate within the RNN layer. The gate name must correspond to one of the gates used by this layer’s RNNOperation .

  • is_w – True if the bias parameters are for the input bias Wb[g] and false if they are for the recurrent input bias Rb[g]. See RNNOperation for equations showing how these bias vectors are used in the RNN gate.

  • bias – The weight structure holding the bias parameters, which should be an array of size hidden_size .

set_weights_for_gate(self: tensorrt.tensorrt.IRNNv2Layer, layer_index: int, gate: tensorrt.tensorrt.RNNGateType, is_w: bool, weights: tensorrt.tensorrt.Weights) → None

Set the weight parameters for an individual gate in the RNN.

Parameters
  • layer_index – The index of the layer that contains this gate.

  • gate – The name of the gate within the RNN layer. The gate name must correspond to one of the gates used by this layer’s RNNOperation .

  • is_w – True if the weight parameters are for the input matrix W[g] and false if they are for the recurrent input matrix R[g]. See RNNOperation for equations showing how these matrices are used in the RNN gate.

  • weights – The weight structure holding the weight parameters, which are stored as a row-major 2D matrix. For more information, see IRNNv2Layer::setWeights().

IPluginV2Layer

class tensorrt.IPluginV2Layer

A plugin layer in an INetworkDefinition .

Variables

pluginIPluginV2 The plugin for the layer.

IUnaryLayer

tensorrt.UnaryOperation

The unary operations that may be performed by a Unary layer.

Members:

EXP : Exponentiation

LOG : Log (base e)

SQRT : Square root

RECIP : Reciprocal

ABS : Absolute value

NEG : Negation

SIN : Sine

COS : Cosine

TAN : Tangent

SINH : Hyperbolic sine

COSH : Hyperbolic cosine

ASIN : Inverse sine

ACOS : Inverse cosine

ATAN : Inverse tangent

ASINH : Inverse hyperbolic sine

ACOSH : Inverse hyperbolic cosine

ATANH : Inverse hyperbolic tangent

CEIL : Ceiling

FLOOR : Floor

ERF : Gauss error function

NOT : Not

SIGN : Sign. If input > 0, output 1; if input < 0, output -1; if input == 0, output 0.

ROUND : Round to nearest even for float datatype.

class tensorrt.IUnaryLayer

A unary layer in an INetworkDefinition .

Variables

opUnaryOperation The unary operation for the layer.

IReduceLayer

tensorrt.ReduceOperation

The reduce operations that may be performed by a Reduce layer

Members:

SUM :

PROD :

MAX :

MIN :

AVG :

class tensorrt.IReduceLayer

A reduce layer in an INetworkDefinition .

Variables
  • opReduceOperation The reduce operation for the layer.

  • axesint The axes over which to reduce.

  • keep_dimsbool Specifies whether or not to keep the reduced dimensions for the layer.

IPaddingLayer

class tensorrt.IPaddingLayer

A padding layer in an INetworkDefinition .

Variables
  • pre_paddingDimsHW The padding that is applied at the start of the tensor. Negative padding results in trimming the edge by the specified amount.

  • post_paddingDimsHW The padding that is applied at the end of the tensor. Negative padding results in trimming the edge by the specified amount

  • pre_padding_ndDims The padding that is applied at the start of the tensor. Negative padding results in trimming the edge by the specified amount. Only 2 dimensions currently supported.

  • post_padding_ndDims The padding that is applied at the end of the tensor. Negative padding results in trimming the edge by the specified amount. Only 2 dimensions currently supported.

IParametricReLULayer

class tensorrt.IParametricReLULayer

A parametric ReLU layer in an INetworkDefinition .

This layer applies a parametric ReLU activation to an input tensor (first input), with slopes taken from a slopes tensor (second input). This can be viewed as a leaky ReLU operation where the negative slope differs from element to element (and can in fact be learned).

The slopes tensor must be unidirectional broadcastable to the input tensor: the rank of the two tensors must be the same, and all dimensions of the slopes tensor must either equal the input tensor or be 1. The output tensor has the same shape as the input tensor.

ISelectLayer

class tensorrt.ISelectLayer

A select layer in an INetworkDefinition .

This layer implements an element-wise ternary conditional operation. Wherever condition is True, elements are taken from the first input, and wherever condition is False, elements are taken from the second input.

IShuffleLayer

class tensorrt.Permutation(*args, **kwargs)

The elements of the permutation. The permutation is applied as outputDimensionIndex = permutation[inputDimensionIndex], so to permute from CHW order to HWC order, the required permutation is [1, 2, 0], and to permute from HWC to CHW, the required permutation is [2, 0, 1].

It supports iteration and indexing and is implicitly convertible to/from Python iterables (like tuple or list ). Therefore, you can use those classes in place of Permutation .

Overloaded function.

  1. __init__(self: tensorrt.tensorrt.Permutation) -> None

  2. __init__(self: tensorrt.tensorrt.Permutation, arg0: List[int]) -> None

class tensorrt.IShuffleLayer

A shuffle layer in an INetworkDefinition .

This class shuffles data by applying in sequence: a transpose operation, a reshape operation and a second transpose operation. The dimension types of the output are those of the reshape dimension.

Variables
  • first_transposePermutation The permutation applied by the first transpose operation. Default: Identity Permutation

  • reshape_dimsDims The reshaped dimensions. Two special values can be used as dimensions. Value 0 copies the corresponding dimension from input. This special value can be used more than once in the dimensions. If number of reshape dimensions is less than input, 0s are resolved by aligning the most significant dimensions of input. Value -1 infers that particular dimension by looking at input and rest of the reshape dimensions. Note that only a maximum of one dimension is permitted to be specified as -1. The product of the new dimensions must be equal to the product of the old.

  • second_transposePermutation The permutation applied by the second transpose operation. Default: Identity Permutation

  • zero_is_placeholderbool The meaning of 0 in reshape dimensions. If true, then a 0 in the reshape dimensions denotes copying the corresponding dimension from the first input tensor. If false, then a 0 in the reshape dimensions denotes a zero-length dimension.

set_input(self: tensorrt.tensorrt.IShuffleLayer, index: int, tensor: tensorrt.tensorrt.ITensor) → None

Sets the input tensor for the given index. The index must be 0 for a static shuffle layer. A static shuffle layer is converted to a dynamic shuffle layer by calling set_input() with an index 1. A dynamic shuffle layer cannot be converted back to a static shuffle layer.

For a dynamic shuffle layer, the values 0 and 1 are valid. The indices in the dynamic case are as follows:

Index

Description

0

Data or Shape tensor to be shuffled.

1

The dimensions for the reshape operation, as a 1D Int32 shape tensor.

If this function is called with a value 1, then num_inputs changes from 1 to 2.

Parameters
  • index – The index of the input tensor.

  • tensor – The input tensor.

ISliceLayer

class tensorrt.ISliceLayer

A slice layer in an INetworkDefinition .

Variables
  • startDims The start offset.

  • shapeDims The output dimensions.

  • strideDims The slicing stride.

  • modeSliceMode Controls how ISliceLayer handles out of bounds coordinates.

set_input(self: tensorrt.tensorrt.ISliceLayer, index: int, tensor: tensorrt.tensorrt.ITensor) → None

Sets the input tensor for the given index. The index must be 0 or 4 for a static slice layer. A static slice layer is converted to a dynamic slice layer by calling set_input() with an index between 1 and 3. A dynamic slice layer cannot be converted back to a static slice layer.

The indices are as follows:

Index

Description

0

Data or Shape tensor to be sliced.

1

The start tensor to begin slicing, N-dimensional for Data, and 1-D for Shape.

2

The size tensor of the resulting slice, N-dimensional for Data, and 1-D for Shape.

3

The stride of the slicing operation, N-dimensional for Data, and 1-D for Shape.

4

Value for the kFILL slice mode. Disallowed for other modes.

If this function is called with a value greater than 0, then num_inputs changes from 1 to index + 1.

Parameters
  • index – The index of the input tensor.

  • tensor – The input tensor.

IShapeLayer

class tensorrt.IShapeLayer

A shape layer in an INetworkDefinition . Used for getting the shape of a tensor. This class sets the output to a one-dimensional tensor with the dimensions of the input tensor.

For example, if the input is a four-dimensional tensor (of any type) with dimensions [2,3,5,7], the output tensor is a one-dimensional Int32 tensor of length 4 containing the sequence 2, 3, 5, 7.

ITopKLayer

tensorrt.TopKOperation

The operations that may be performed by a TopK layer

Members:

MAX : Maximum of the elements

MIN : Minimum of the elements

class tensorrt.ITopKLayer

A TopK layer in an INetworkDefinition .

Variables
  • opTopKOperation The operation for the layer.

  • kTopKOperation the k value for the layer. Currently only values up to 25 are supported.

  • axesTopKOperation The axes along which to reduce.

IMatrixMultiplyLayer

tensorrt.MatrixOperation

The matrix operations that may be performed by a Matrix layer

Members:

NONE :

TRANSPOSE : Transpose each matrix

VECTOR : Treat operand as collection of vectors

class tensorrt.IMatrixMultiplyLayer

A matrix multiply layer in an INetworkDefinition .

Let A be op(getInput(0)) and B be op(getInput(1)) where op(x) denotes the corresponding MatrixOperation.

When A and B are matrices or vectors, computes the inner product A * B:

matrix * matrix -> matrix
matrix * vector -> vector
vector * matrix -> vector
vector * vector -> scalar

Inputs of higher rank are treated as collections of matrices or vectors. The output will be a corresponding collection of matrices, vectors, or scalars.

Variables

IRaggedSoftMaxLayer

class tensorrt.IRaggedSoftMaxLayer

A ragged softmax layer in an INetworkDefinition .

This layer takes a ZxS input tensor and an additional Zx1 bounds tensor holding the lengths of the Z sequences.

This layer computes a softmax across each of the Z sequences.

The output tensor is of the same size as the input tensor.

IIdentityLayer

class tensorrt.IIdentityLayer

A layer that represents the identity function.

If tensor precision is explicitly specified, it can be used to transform from one precision to another.

IConstantLayer

class tensorrt.IConstantLayer

A constant layer in an INetworkDefinition .

Note: This layer does not support boolean types.

Variables
  • weightsWeights The weights for the layer.

  • shapeDims The shape of the layer.

IResizeLayer

tensorrt.ResizeMode

Various modes of resize in the resize layer.

Members:

NEAREST : 1D, 2D, and 3D nearest neighbor resizing.

LINEAR : Can handle linear, bilinear, trilinear resizing.

class tensorrt.IResizeLayer

A resize layer in an INetworkDefinition .

Resize layer can be used for resizing a N-D tensor.

Resize layer currently supports the following configurations:

  • ResizeMode.NEAREST - resizes innermost m dimensions of N-D, where 0 < m <= min(3, N) and N > 0.

  • ResizeMode.LINEAR - resizes innermost m dimensions of N-D, where 0 < m <= min(3, N) and N > 0.

Default resize mode is ResizeMode.NEAREST.

Resize layer provides two ways to resize tensor dimensions:

  • Set output dimensions directly. It can be done for static as well as dynamic resize layer.

    Static resize layer requires output dimensions to be known at build-time. Dynamic resize layer requires output dimensions to be set as one of the input tensors.

  • Set scales for resize. Each output dimension is calculated as floor(input dimension * scale).

    Only static resize layer allows setting scales where the scales are known at build-time.

Variables
  • shapeDims The output dimensions. Must to equal to input dimensions size.

  • scalesList[float] List of resize scales.

  • resize_modeResizeMode Resize mode can be Linear or Nearest.

  • coordinate_transformationResizeCoordinateTransformationDoc Supported resize coordinate transformation modes are ALIGN_CORNERS, ASYMMETRIC and HALF_PIXEL.

  • selector_for_single_pixelResizeSelector Supported resize selector modes are FORMULA and UPPER.

  • nearest_roundingResizeRoundMode Supported resize Round modes are HALF_UP, HALF_DOWN, FLOOR and CEIL.

set_input(self: tensorrt.tensorrt.IResizeLayer, index: int, tensor: tensorrt.tensorrt.ITensor) → None

Sets the input tensor for the given index.

If index == 1 and num_inputs == 1, and there is no implicit batch dimension, in which case num_inputs changes to 2. Once such additional input is set, resize layer works in dynamic mode. When index == 1 and num_inputs == 1, the output dimensions are used from the input tensor, overriding the dimensions supplied by shape.

Parameters
  • index – The index of the input tensor.

  • tensor – The input tensor.

ILoop

class tensorrt.ILoop

Helper for creating a recurrent subgraph.

Variables

name – The name of the loop. The name is used in error diagnostics.

add_iterator(self: tensorrt.tensorrt.ILoop, tensor: tensorrt.tensorrt.ITensor, axis: int = 0, reverse: bool = False) → tensorrt.tensorrt.IIteratorLayer

Return layer that subscripts tensor by loop iteration.

For reverse=false, this is equivalent to add_gather(tensor, I, 0) where I is a scalar tensor containing the loop iteration number. For reverse=true, this is equivalent to add_gather(tensor, M-1-I, 0) where M is the trip count computed from TripLimits of kind COUNT.

Parameters
  • tensor – The tensor to iterate over.

  • axis – The axis along which to iterate.

  • reverse – Whether to iterate in the reverse direction.

Returns

The IIteratorLayer , or None if it could not be created.

add_loop_output(self: tensorrt.tensorrt.ILoop, tensor: tensorrt.tensorrt.ITensor, kind: tensorrt.tensorrt.LoopOutput, axis: int = 0) → tensorrt.tensorrt.ILoopOutputLayer

Make an output for this loop, based on the given tensor.

If kind is CONCATENATE or REVERSE, a second input specifying the concatenation dimension must be added via method ILoopOutputLayer.set_input() .

Parameters
  • kind – The kind of loop output. See LoopOutput

  • axis – The axis for concatenation (if using kind of CONCATENATE or REVERSE).

Returns

The added ILoopOutputLayer , or None if it could not be created.

add_recurrence(self: tensorrt.tensorrt.ILoop, initial_value: tensorrt.tensorrt.ITensor) → tensorrt.tensorrt.IRecurrenceLayer

Create a recurrence layer for this loop with initial_value as its first input.

Parameters

initial_value – The initial value of the recurrence layer.

Returns

The added IRecurrenceLayer , or None if it could not be created.

add_trip_limit(self: tensorrt.tensorrt.ILoop, tensor: tensorrt.tensorrt.ITensor, kind: tensorrt.tensorrt.TripLimit) → tensorrt.tensorrt.ITripLimitLayer

Add a trip-count limiter, based on the given tensor.

There may be at most one COUNT and one WHILE limiter for a loop. When both trip limits exist, the loop exits when the count is reached or condition is falsified. It is an error to not add at least one trip limiter.

For WHILE, the input tensor must be the output of a subgraph that contains only layers that are not ITripLimitLayer , IIteratorLayer or ILoopOutputLayer . Any IRecurrenceLayer s in the subgraph must belong to the same loop as the ITripLimitLayer . A trivial example of this rule is that the input to the WHILE is the output of an IRecurrenceLayer for the same loop.

Parameters
  • tensor – The input tensor. Must be available before the loop starts.

  • kind – The kind of trip limit. See TripLimit

Returns

The added ITripLimitLayer , or None if it could not be created.

ILoopBoundaryLayer

class tensorrt.ILoopBoundaryLayer
Variables

loopILoop associated with this boundary layer.

ITripLimitLayer

tensorrt.TripLimit

Describes kinds of trip limits.

Members:

COUNT : Tensor is scalar of type kINT32 that contains the trip count.

WHILE : Tensor is a scalar of type BOOL. Loop terminates when value is false.

class tensorrt.ITripLimitLayer
Variables

kind – The kind of trip limit. See TripLimit

IRecurrenceLayer

class tensorrt.IRecurrenceLayer
set_input(self: tensorrt.tensorrt.IRecurrenceLayer, index: int, tensor: tensorrt.tensorrt.ITensor) → None

Set the first or second input. If index==1 and the number of inputs is one, the input is appended. The first input specifies the initial output value, and must come from outside the loop. The second input specifies the next output value, and must come from inside the loop. The two inputs must have the same dimensions.

Parameters
  • index – The index of the input to set.

  • tensor – The input tensor.

IIteratorLayer

class tensorrt.IIteratorLayer
Variables
  • axis – The axis to iterate over

  • reverse – For reverse=false, the layer is equivalent to add_gather(tensor, I, 0) where I is a scalar tensor containing the loop iteration number. For reverse=true, the layer is equivalent to add_gather(tensor, M-1-I, 0) where M is the trip count computed from TripLimits of kind COUNT. The default is reverse=false.

ILoopOutputLayer

tensorrt.LoopOutput

Describes kinds of loop outputs.

Members:

LAST_VALUE : Output value is value of tensor for last iteration.

CONCATENATE : Output value is concatenation of values of tensor for each iteration, in forward order.

REVERSE : Output value is concatenation of values of tensor for each iteration, in reverse order.

class tensorrt.ILoopOutputLayer

An ILoopOutputLayer is the sole way to get output from a loop.

The first input tensor must be defined inside the loop; the output tensor is outside the loop. The second input tensor, if present, must be defined outside the loop.

If kind is LAST_VALUE, a single input must be provided.

If kind is CONCATENATE or REVERSE, a second input must be provided. The second input must be a scalar “shape tensor”, defined before the loop commences, that specifies the concatenation length of the output.

The output tensor has j more dimensions than the input tensor, where j == 0 if kind is LAST_VALUE j == 1 if kind is CONCATENATE or REVERSE.

Variables
  • axis – The contenation axis. Ignored if kind is LAST_VALUE. For example, if the input tensor has dimensions [b,c,d], and kind is CONCATENATE, the output has four dimensions. Let a be the value of the second input. axis=0 causes the output to have dimensions [a,b,c,d]. axis=1 causes the output to have dimensions [b,a,c,d]. axis=2 causes the output to have dimensions [b,c,a,d]. axis=3 causes the output to have dimensions [b,c,d,a]. Default is axis is 0.

  • kind – The kind of loop output. See LoopOutput

set_input(self: tensorrt.tensorrt.ILoopOutputLayer, index: int, tensor: tensorrt.tensorrt.ITensor) → None

Like ILayer.set_input(), but additionally works if index==1, num_inputs`==1, in which case :attr:`num_inputs changes to 2.

IFillLayer

tensorrt.FillOperation

The tensor fill operations that may performed by an Fill layer.

Members:

LINSPACE : Generate evenly spaced numbers over a specified interval

RANDOM_UNIFORM : Generate a tensor with random values drawn from a uniform distribution

class tensorrt.IFillLayer

A fill layer in an INetworkDefinition .

set_input(self: tensorrt.tensorrt.IFillLayer, index: int, tensor: tensorrt.tensorrt.ITensor) → None

replace an input of this layer with a specific tensor.

Index

Description for kLINSPACE

0

Shape tensor, represents the output tensor’s dimensions.

1

Start, a scalar, represents the start value.

2

Delta, a 1D tensor, length equals to shape tensor’s nbDims, represents the delta value for each dimension.

Index

Description for kRANDOM_UNIFORM

0

Shape tensor, represents the output tensor’s dimensions.

1

Minimum, a scalar, represents the minimum random value.

2

Maximum, a scalar, represents the maximal random value.

Parameters
  • index – the index of the input to modify.

  • tensor – the input tensor.

IQuantizeLayer

class tensorrt.IQuantizeLayer

A Quantize layer in an INetworkDefinition .

This layer accepts a floating-point data input tensor, and uses the scale and zeroPt inputs to

quantize the data to an 8-bit signed integer according to:

\(output = clamp(round(input / scale) + zeroPt)\)

Rounding type is rounding-to-nearest ties-to-even (https://en.wikipedia.org/wiki/Rounding#Round_half_to_even).

Clamping is in the range [-128, 127].

The first input (index 0) is the tensor to be quantized. The second (index 1) and third (index 2) are the scale and zero point respectively. Each of scale and zeroPt must be either a scalar, or a 1D tensor.

The zeroPt tensor is optional, and if not set, will be assumed to be zero. Its data type must be tensorrt.int8. zeroPt must only contain zero-valued coefficients, because only symmetric quantization is supported. The scale value must be either a scalar for per-tensor quantization, or a 1D tensor for per-axis quantization. The size of the 1-D scale tensor must match the size of the quantization axis. The size of the scale must match the size of the zeroPt.

The subgraph which terminates with the scale tensor must be a build-time constant. The same restrictions apply to the zeroPt. The output type, if constrained, must be constrained to tensorrt.int8. The input type, if constrained, must be constrained to tensorrt.float32 (FP16 input is not supported). The output size is the same as the input size.

IQuantizeLayer only supports tensorrt.float32 precision and will default to this precision during instantiation. IQuantizeLayer only supports tensorrt.int8 output.

Variables

axisint The axis along which quantization occurs. The quantization axis is in reference to the input tensor’s dimensions.

IDequantizeLayer

class tensorrt.IDequantizeLayer

A Dequantize layer in an INetworkDefinition .

This layer accepts a signed 8-bit integer input tensor, and uses the configured scale and zeroPt inputs to dequantize the input according to: \(output = (input - zeroPt) * scale\)

The first input (index 0) is the tensor to be quantized. The second (index 1) and third (index 2) are the scale and zero point respectively. Each of scale and zeroPt must be either a scalar, or a 1D tensor.

The zeroPt tensor is optional, and if not set, will be assumed to be zero. Its data type must be tensorrt.int8. zeroPt must only contain zero-valued coefficients, because only symmetric quantization is supported. The scale value must be either a scalar for per-tensor quantization, or a 1D tensor for per-axis quantization. The size of the 1-D scale tensor must match the size of the quantization axis. The size of the scale must match the size of the zeroPt.

The subgraph which terminates with the scale tensor must be a build-time constant. The same restrictions apply to the zeroPt. The output type, if constrained, must be constrained to tensorrt.int8. The input type, if constrained, must be constrained to tensorrt.float32 (FP16 input is not supported). The output size is the same as the input size.

IDequantizeLayer only supports tensorrt.int8 precision and will default to this precision during instantiation. IDequantizeLayer only supports tensorrt.float32 output.

Variables

axisint The axis along which dequantization occurs. The dequantization axis is in reference to the input tensor’s dimensions.

IScatterLayer

class tensorrt.IScatterLayer

A Scatter layer as in INetworkDefinition. :ivar axis: axis to scatter on when using Scatter Element mode (ignored in ND mode) :ivar mode: ScatterMode The operation mode of the scatter.

IIfConditional

class tensorrt.IIfConditional

Helper for constructing conditionally-executed subgraphs.

An If-conditional conditionally executes (lazy evaluation) part of the network according to the following pseudo-code:

If condition is true Then:
    output = trueSubgraph(trueInputs);
Else:
    output = falseSubgraph(falseInputs);
Emit output

Condition is a 0D boolean tensor (representing a scalar). trueSubgraph represents a network subgraph that is executed when condition is evaluated to True. falseSubgraph represents a network subgraph that is executed when condition is evaluated to False.

The following constraints apply to If-conditionals: - Both the trueSubgraph and falseSubgraph must be defined. - The number of output tensors in both subgraphs is the same. - The type and shape of each output tensor from true/false subgraphs are the same.

add_input(self: tensorrt.tensorrt.IIfConditional, input: tensorrt.tensorrt.ITensor) → tensorrt.tensorrt.IIfConditionalInputLayer

Make an input for this if-conditional, based on the given tensor.

Parameters

input – An input to the conditional that can be used by either or both of the conditional’s subgraphs.

add_output(self: tensorrt.tensorrt.IIfConditional, true_subgraph_output: tensorrt.tensorrt.ITensor, false_subgraph_output: tensorrt.tensorrt.ITensor) → tensorrt.tensorrt.IIfConditionalOutputLayer

Make an output for this if-conditional, based on the given tensors.

Each output layer of the if-conditional represents a single output of either the true-subgraph or the false-subgraph of the if-conditional, depending on which subgraph was executed.

Parameters
  • true_subgraph_output – The output of the subgraph executed when this conditional’s condition input evaluates to true.

  • false_subgraph_output – The output of the subgraph executed when this conditional’s condition input evaluates to false.

Returns

The IIfConditionalOutputLayer , or None if it could not be created.

set_condition(self: tensorrt.tensorrt.IIfConditional, condition: tensorrt.tensorrt.ITensor) → tensorrt.tensorrt.IConditionLayer

Set the condition tensor for this If-Conditional construct.

The condition tensor must be a 0D data tensor (scalar) with type DataType::kBOOL.

Parameters

condition – The condition tensor that will determine which subgraph to execute.

Returns

The IConditionLayer , or None if it could not be created.

IConditionLayer

class tensorrt.IConditionLayer

Describes the boolean condition of an if-conditional.

IIfConditionalOutputLayer

class tensorrt.IIfConditionalOutputLayer

Describes kinds of if-conditional outputs.

IIfConditionalInputLayer

class tensorrt.IIfConditionalInputLayer

Describes kinds of if-conditional inputs.

IEinsumLayer

class tensorrt.IEinsumLayer

An Einsum layer in an INetworkDefinition .

This layer implements a summation over the elements of the inputs along dimensions specified by the equation parameter, based on the Einstein summation convention. The layer can have one or more inputs of rank >= 0. All the inputs must be of same data type. This layer supports all TensorRT data types except trt.bool. There is one output tensor of the same type as the input tensors. The shape of output tensor is determined by the equation.

The equation specifies ASCII lower-case letters for each dimension in the inputs in the same order as the dimensions, separated by comma for each input. The dimensions labeled with the same subscript must match or be broadcastable. Repeated subscript labels in one input take the diagonal. Repeating a label across multiple inputs means that those axes will be multiplied. Omitting a label from the output means values along those axes will be summed. In implicit mode, the indices which appear once in the expression will be part of the output in increasing alphabetical order. In explicit mode, the output can be controlled by specifying output subscript labels by adding an arrow (‘->’) followed by subscripts for the output. For example, “ij,jk->ik” is equivalent to “ij,jk”. Ellipsis (‘…’) can be used in place of subscripts to broadcast the dimensions. See the TensorRT Developer Guide for more details on equation syntax.

Many common operations can be expressed using the Einsum equation. For example: Matrix Transpose: ij->ji Sum: ij-> Matrix-Matrix Multiplication: ik,kj->ij Dot Product: i,i-> Matrix-Vector Multiplication: ik,k->i Batch Matrix Multiplication: ijk,ikl->ijl Batch Diagonal: …ii->…i

Note that TensorRT does not support ellipsis or diagonal operations.

Variables

equationstr The Einsum equation of the layer. The equation is a comma-separated list of subscript labels, where each label refers to a dimension of the corresponding tensor.

IAssertionLayer

class tensorrt.IAssertionLayer

An assertion layer in an INetworkDefinition .

This layer implements assertions. The input must be a boolean shape tensor. If any element of it is False, a build-time or run-time error occurs. Asserting equality of input dimensions may help the optimizer.

Variables

messagestring Message to print if the assertion fails.