tensorrt.infer¶

The infer package contains an interface for libnvinfer. This module is used for graph definition, engine building and inference execution.

Foundational Types¶

DataType¶

class tensorrt.infer.DataType¶

Derived From C++ Class nvinfer1::DataType

Available data types

Base Class:: IntEnum

Weights¶

class tensorrt.infer.Weights¶

Derived From C++ Class nvinfer1::Weights

An array of weights used as a layer parameter.

The weights are held by reference until the engine has been built. Therefore the data referenced by values field should be preserved until the build is complete.

type: DataType – The type of the weights.

values: const void * – The weight values, in a contiguous array.

count: int64_t – The number of weights in the array.

C++ includes: NvInfer.h

Dims¶

class tensorrt.infer.Dims¶

Derived From C++ Class nvinfer1::Dims

Structure to define the dimensions of a tensor.

note: : Currently the following formats are supported for layer inputs and outputs:

zero or more index dimensions followed by one channel and two spatial dimensions (e.g. CHW)

one time series dimension followed by one index dimension followed by one channel dimension (i.e. TNC)

MAX_DIMS: const int – The maximum number of dimensions supported for a tensor.

nbDims: int – The number of dimensions.

d: int – The extent of each dimension.

type: DimensionType – The type of each dimension.

C++ includes: NvInfer.h

DimsHW¶

class tensorrt.infer.DimsHW¶

Derived From C++ Class nvinfer1::DimsHW

Descriptor for two-dimensional spatial data.

C++ includes: NvInfer.h

h()¶

h() const -> int

Get the height.

Returns:	The height.

w()¶

w() const -> int

Get the width.

Returns:	The width.

DimsCHW¶

class tensorrt.infer.DimsCHW¶

Derived From C++ Class nvinfer1::DimsCHW

Descriptor for data with one channel dimension and two spatial dimensions.

C++ includes: NvInfer.h

c()¶

c() const -> int

Get the channel count.

Returns:	The channel count.

h()¶

h() const -> int

Get the height.

Returns:	The height.

w()¶

w() const -> int

Get the width.

Returns:	The width.

DimsNCHW¶

class tensorrt.infer.DimsNCHW¶

Derived From C++ Class nvinfer1::DimsNCHW

Descriptor for data with one index dimension, one channel dimension and two spatial dimensions.

C++ includes: NvInfer.h

c()¶

c() const -> int

Get the channel count.

Returns:	The channel count.

h()¶

h() const -> int

Get the height.

Returns:	The height.

n()¶

n() const -> int

Get the index count.

Returns:	The index count.

w()¶

w() const -> int

Get the width.

Returns:	The width.

DimensionType¶

class tensorrt.infer.DimensionType¶

Derived From C++ Class nvinfer1::DimensionType

Available dimension types

Base Class:: IntEnum

Engine and Inference¶

Builder¶

class tensorrt.infer.Builder¶

Derived From C++ Class nvinfer1::Builder

Builds an engine from a network definition.

C++ includes: NvInfer.h

build_cuda_engine()¶

buildCudaEngine(nvinfer1::INetworkDefinition &network)=0 -> nvinfer1::ICudaEngine *

Build a CUDA engine from a network definition.

create_network()¶

createNetwork()=0 -> nvinfer1::INetworkDefinition *

Create a network definition object.

destroy()¶

destroy()=0

Destroy this object.

get_average_find_iterations()¶

getAverageFindIterations() const =0 -> int

Query the number of averaging iterations.

get_debug_sync()¶

getDebugSync() const =0 -> bool

Query whether the builder will use debug synchronization.

get_fp16_mode()¶

getFp16Mode() const =0 -> bool

Query whether 16-bit kernels are permitted.

get_half2_mode()¶

getHalf2Mode() const =0 -> bool

Query whether half2 mode is used.

Deprecated: This function is superseded by getFp16Mode.

get_int8_mode()¶

getInt8Mode() const =0 -> bool

Query whether Int8 mode is used.

get_max_batch_size()¶

getMaxBatchSize() const =0 -> int

Get the maximum batch size.

Returns:	The maximum batch size.

get_max_workspace_size()¶

getMaxWorkspaceSize() const =0 -> std::size_t

Get the maximum workspace size.

Returns:	The maximum workspace size.

get_min_find_iterations()¶

getMinFindIterations() const =0 -> int

Query the number of minimization iterations.

platform_has_fast_fp16()¶

platformHasFastFp16() const =0 -> bool

Determine whether the platform has fast native fp16.

platform_has_fast_int8()¶

platformHasFastInt8() const =0 -> bool

Determine whether the platform has fast native int8.

set_average_find_iterations()¶

setAverageFindIterations(int avgFind)=0

Set the number of averaging iterations used when timing layers.

When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in averaging.

set_debug_sync()¶

setDebugSync(bool sync)=0

Set whether the builder should use debug synchronization.

If this flag is true, the builder will synchronize after timing each layer, and report the layer name. It can be useful when diagnosing issues at build time.

set_fp16_mode()¶

setFp16Mode(bool mode)=0

Set whether or not 16-bit kernels are permitted.

During engine build fp16 kernels will also be tried when this mode is enabled.

Parameters:	mode (*) – Whether 16-bit kernels are permitted.

set_gpu_allocator()¶

setGpuAllocator(IGpuAllocator *allocator)=0

Set the GPU allocator.

Parameters:	allocator (*) – Set the GPU allocator to be used by the builder. All GPU memory acquired will use this allocator. If None is passed, the default allocator will be used.

set_half2_mode()¶

setHalf2Mode(bool mode)=0

Set whether half2 mode is used.

half2 mode is a paired-image mode that is significantly faster for batch sizes greater than one on platforms with fp16 support.

Deprecated: function is superseded by setFp16Mode.

Parameters:	mode (*) – Whether half2 mode is used.

set_int8_calibrator()¶

setInt8Calibrator(IInt8Calibrator *calibrator)=0

Set Int8 Calibration interface.

set_int8_mode()¶

setInt8Mode(bool mode)=0

Set the maximum value for a region.

Used for INT8 mode compression.

set_max_batch_size()¶

setMaxBatchSize(int batchSize)=0

Set the maximum batch size.

Parameters:	batchSize (*) – The maximum batch size which can be used at execution time, and also the batch size for which the engine will be optimized.

set_max_workspace_size()¶

setMaxWorkspaceSize(std::size_t workspaceSize)=0

Set the maximum workspace size.

Parameters:	workspaceSize (*) – The maximum GPU temporary memory which the engine can use at execution time.

set_min_find_iterations()¶

setMinFindIterations(int minFind)=0

Set the number of minimization iterations used when timing layers.

When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in minimization.

create_infer_builder¶

tensorrt.infer.create_infer_builder()¶

CudaEngine¶

class tensorrt.infer.CudaEngine¶

Derived From C++ Class nvinfer1::CudaEngine

An engine for executing inference on a built network.

C++ includes: NvInfer.h

binding_is_input()¶

bindingIsInput(int bindingIndex) const =0 -> bool

Determine whether a binding is an input binding.

Parameters:	bindingIndex (*) – The binding index.
Returns:	True if the index corresponds to an input binding and the index is in range.

create_execution_context()¶

createExecutionContext()=0 -> IExecutionContext *

Create an execution context.

create_execution_context_without_device_memory()¶

createExecutionContextWithoutDeviceMemory()=0 -> IExecutionContext *

create an execution context without any device memory allocated

The memory for execution of this device context must be supplied by the application.

destroy()¶

destroy()=0

Destroy this object;.

get_binding_data_type()¶

getBindingDataType(int bindingIndex) const =0 -> DataType

Determine the required data type for a buffer from its binding index.

Parameters:	bindingIndex (*) – The binding index.
Returns:	The type of the data in the buffer.

get_binding_dimensions()¶

getBindingDimensions(int bindingIndex) const =0 -> Dims

Get the dimensions of a binding.

Parameters:	bindingIndex (*) – The binding index.
Returns:	The dimensions of the binding if the index is in range, otherwise (0,0,0).

get_binding_index()¶

getBindingIndex(const char *name) const =0 -> int

Retrieve the binding index for a named tensor.

IExecutionContext::enqueue() and IExecutionContext::execute() require an array of buffers.

Engine bindings map from tensor names to indices in this array. Binding indices are assigned at engine build time, and take values in the range [0 … n-1] where n is the total number of inputs and outputs.

Parameters:	name (*) – The tensor name.
Returns:	The binding index for the named tensor, or -1 if the name is not found. see getNbBindings() getBindingIndex()

get_binding_name()¶

getBindingName(int bindingIndex) const =0 -> const char *

Retrieve the name corresponding to a binding index.

This is the reverse mapping to that provided by getBindingIndex().

Parameters:	bindingIndex (*) – The binding index.
Returns:	The name corresponding to the index, or None if the index is out of range.

get_device_memory_size()¶

getDeviceMemorySize() const =0 -> size_t

Return the amount of device memory required by an execution context.

get_location()¶

getLocation(int bindingIndex) const =0 -> TensorLocation

Get location of binding.

This lets you know whether the binding should be a pointer to device or host memory.

Parameters:	bindingIndex (*) – The binding index.
Returns:	The location of the bound tensor with given index.

get_max_batch_size()¶

getMaxBatchSize() const =0 -> int

Get the maximum batch size which can be used for inference.

Returns:	The maximum batch size for this engine.

get_nb_bindings()¶

getNbBindings() const =0 -> int

Get the number of binding indices.

get_nb_layers()¶

getNbLayers() const =0 -> int

Get the number of layers in the network.

The number of layers in the network is not necessarily the number in the original network definition, as layers may be combined or eliminated as the engine is optimized. This value can be useful when building per-layer tables, such as when aggregating profiling data over a number of executions.

Returns:	The number of layers in the network.

get_workspace_size()¶

getWorkspaceSize() const =0 -> std::size_t

Get the amount of workspace the engine uses.

The workspace size will be no greater than the value provided to the builder when the engine was built, and will typically be smaller. Workspace will be allocated for each execution context.

serialize()¶

serialize() const =0 -> IHostMemory *

Serialize the network to a stream.

Returns:	A IHostMemory object that contains the serialized engine. The network may be deserialized with IRuntime::deserializeCudaEngine()

ExecutionContext¶

class tensorrt.infer.ExecutionContext¶

Derived From C++ Class nvinfer1::ExecutionContext

Context for executing inference using an engine.

Multiple execution contexts may exist for one ICudaEngine instance, allowing the same engine to be used for the execution of multiple batches simultaneously.

C++ includes: NvInfer.h

destroy()¶

destroy()=0

Destroy this object.

enqueue()¶

enqueue(int batchSize, void **bindings, cudaStream_t stream, cudaEvent_t *inputConsumed)=0 -> bool

Asynchronously execute inference on a batch.

This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()

Parameters:

batchSize (*) – The batch size. This is at most the value supplied when the engine was built.
bindings (*) – An array of pointers to input and output buffers for the network.
stream (*) – A cuda stream on which the inference kernels will be enqueued
inputConsumed (*) – An optional event which will be signaled when the input buffers can be refilled with new data

Returns:

True if the kernels were enqueued successfully.

execute()¶

execute(int batchSize, void **bindings)=0 -> bool

Synchronously execute inference on a batch.

This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()

Parameters:

batchSize (*) – The batch size. This is at most the value supplied when the engine was built.
bindings (*) – An array of pointers to input and output buffers for the network.

Returns:

True if execution succeeded.

get_debug_sync()¶

getDebugSync() const =0 -> bool

Get the debug sync flag.

get_engine()¶

getEngine() const =0 -> const ICudaEngine &

Get the associated engine.

get_name()¶

getName() const =0 -> const char *

Return the name of the execution context.

get_profiler()¶

getProfiler() const =0 -> IProfiler *

Get the profiler.

set_debug_sync()¶

setDebugSync(bool sync)=0

Set the debug sync flag.

If this flag is set to true, the engine will log the successful execution for each kernel during execute(). It has no effect when using enqueue().

set_device_memory()¶

setDeviceMemory(void *memory)=0

set the device memory for use by this execution context.

The memory must be aligned on a 256-byte boundary, and its size must be at least that returned by getDeviceMemorySize(). If using enqueue() to run the network, The memory is in use from the invocation of enqueue() until network execution is complete. If using execute(), it is in use until execute() returns. Releasing or otherwise using the memory for other purposes during this time will result in undefined behavior.

set_name()¶

setName(const char *name)=0

Set the name of the execution context.

This method copies the name string.

set_profiler()¶

setProfiler(IProfiler *)=0

Set the profiler.

Runtime¶

class tensorrt.infer.Runtime¶

Derived From C++ Class nvinfer1::Runtime

Allows a serialized engine to be deserialized.

C++ includes: NvInfer.h

deserialize_cuda_engine()¶

deserializeCudaEngine(const void *blob, std::size_t size, IPluginFactory *pluginFactory)=0 -> nvinfer1::ICudaEngine *

Deserialize an engine from a stream.

Parameters:	blob () – The memory that holds the serialized engine. size* () – The size of the memory. pluginFactory* (*) – The plugin factory, if any plugins are used by the network, otherwise None.
Returns:	The engine, or None if it could not be deserialized.

destroy()¶

destroy()=0

Destroy this object.

set_gpu_allocator()¶

setGpuAllocator(IGpuAllocator *allocator)=0

Set the GPU allocator.

Parameters:	allocator (*) – Set the GPU allocator to be used by the runtime. All GPU memory acquired will use this allocator. If None is passed, the default allocator will be used.

create_infer_runtime¶

tensorrt.infer.create_infer_runtime()¶

HostMemory¶

class tensorrt.infer.HostMemory¶

Derived From C++ Class nvinfer1::HostMemory

Class to handle library allocated memory that is accessible to the user.

The memory allocated via the host memory object is owned by the library and will be de-allocated when the destroy method is called.

C++ includes: NvInfer.h

data()¶

data() const =0 -> void *

A pointer to the raw data that is owned by the library.

destroy()¶

destroy()=0

Destroy the allocated memory.

size()¶

size() const =0 -> std::size_t

The size in bytes of the data that was allocated.

type()¶

type() const =0 -> DataType

The type of the memory that was allocated.

Graph Definition¶

NetworkDefinition¶

class tensorrt.infer.NetworkDefinition¶

Derived From C++ Class nvinfer1::NetworkDefinition

A network definition for input to the builder.

C++ includes: NvInfer.h

add_activation()¶

addActivation(ITensor &input, ActivationType type)=0 -> IActivationLayer *

Add an activation layer to the network.

Parameters:	input () – The input tensor to the layer. type* (*) – The type of activation function to apply.
Returns:	The new activation layer, or None if it could not be created.

add_concatenation()¶

addConcatenation(ITensor *const *inputs, int nbInputs)=0 -> IConcatenationLayer *

Add a concatenation layer to the network.

Parameters:

inputs (*) – The input tensors to the layer.
nbInputs (*) – The number of input tensors.

Returns:

The new concatenation layer, or None if it could not be created.
**Warning** (All tensors must have the same dimensions for all dimensions except for channel.)

add_constant()¶

addConstant(Dims dimensions, Weights weights)=0 -> IConstantLayer *

Add a constant layer to the network.

Parameters:	dimensions () – The dimensions of the constant. weights* (*) – The constant value, represented as weights.
Returns:	The new constant layer, or None if it could not be created.

add_convolution()¶

addConvolution(ITensor &input, int nbOutputMaps, DimsHW kernelSize, Weights kernelWeights, Weights biasWeights)=0 -> IConvolutionLayer *

Add a convolution layer to the network.

Parameters:	input () – The input tensor to the convolution. nbOutputMaps* () – The number of output feature maps for the convolution. kernelSize* () – The HW-dimensions of the convolution kernel. kernelWeights* () – The kernel weights for the convolution. biasWeights* (*) – The optional bias weights for the convolution.
Returns:	The new convolution layer, or None if it could not be created.

add_deconvolution()¶

addDeconvolution(ITensor &input, int nbOutputMaps, DimsHW kernelSize, Weights kernelWeights, Weights biasWeights)=0 -> IDeconvolutionLayer *

Add a deconvolution layer to the network.

Parameters:	input () – The input tensor to the layer. nbOutputMaps* () – The number of output feature maps. kernelSize* () – The HW-dimensions of the convolution kernel. kernelWeights* () – The kernel weights for the convolution. biasWeights* (*) – The optional bias weights for the convolution.
Returns:	The new deconvolution layer, or None if it could not be created.

add_element_wise()¶

addElementWise(ITensor &input1, ITensor &input2, ElementWiseOperation op)=0 -> IElementWiseLayer *

Add an elementwise layer to the network.

Parameters:	input1 () – The first input tensor to the layer. input2* () – The second input tensor to the layer. op () – The binary operation that the layer applies.

The input tensors must have the same number of dimensions. For each dimension, their lengths must match, or one of them must be one. In the latter case, the tensor is broadcast along that axis. The output tensor has the same number of dimensions as the inputs. For each dimension, its length is the maximum of the lengths of the corresponding input dimension.

Returns:	The new elementwise layer, or None if it could not be created.

add_fully_connected()¶

addFullyConnected(ITensor &input, int nbOutputs, Weights kernelWeights, Weights biasWeights)=0 -> IFullyConnectedLayer *

Add a fully connected layer to the network.

Parameters:	input () – The input tensor to the layer. nbOutputs* () – The number of outputs of the layer. kernelWeights* () – The kernel weights for the convolution. biasWeights* (*) – The optional bias weights for the convolution.
Returns:	The new fully connected layer, or None if it could not be created.

add_gather()¶

addGather(ITensor &data, ITensor &indices, int axis)=0 -> IGatherLayer *

Add a gather layer to the network.

Parameters:	data () – The tensor to gather values from. indices* () – The tensor to get indices from to populate the output tensor. axis* (*) – The non-batch dimension axis in the data tensor to gather on.
Returns:	The new gather layer, or None if it could not be created.

add_input()¶

addInput(const char *name, DataType type, Dims dimensions)=0 -> ITensor *

Add an input tensor to the network.

The name of the input tensor is used to find the index into the buffer array for an engine built from the network.

Parameters:	name () – The name of the tensor. type* () – The type of the data held in the tensor. dimensions* (*) – The dimensions of the tensor.

Only DataType::kFLOAT, DataType::kHALF and DataType::kINT32 are valid input tensor types. The volume of the dimensions, including the maximum batch size, must be less than 2^30 elements.

Returns:	The new tensor or None if there is an error.

add_lrn()¶

addLRN(ITensor &input, int window, float alpha, float beta, float k)=0 -> ILRNLayer *

Add a LRN layer to the network.

Parameters:	input () – The input tensor to the layer. window* () – The size of the window. alpha* () – The alpha value for the LRN computation. beta* () – The beta value for the LRN computation. k () – The k value for the LRN computation.
Returns:	The new LRN layer, or None if it could not be created.

add_matrix_multiply()¶

addMatrixMultiply(ITensor &input0, bool transpose0, ITensor &input1, bool transpose1)=0 -> IMatrixMultiplyLayer *

Add a MatrixMultiply layer to the network.

Parameters:	input0 () – The first input tensor (commonly A). transpose0* () – If true, op(input0)=transpose(input0), else op(input0)=input0. input1* () – The second input tensor (commonly B). transpose1* (*) – If true, op(input1)=transpose(input1), else op(input1)=input1.
Returns:	The new matrix multiply layer, or None if it could not be created.

add_padding()¶

addPadding(ITensor &input, DimsHW prePadding, DimsHW postPadding)=0 -> IPaddingLayer *

Add a padding layer to the network.

Parameters:	input () – The input tensor to the layer. prePadding* () – The padding to apply to the start of the tensor. postPadding* (*) – The padding to apply to the end of the tensor.
Returns:	the new padding layer, or None if it could not be created.

add_plugin()¶

addPlugin(ITensor *const *inputs, int nbInputs, IPlugin &plugin)=0 -> IPluginLayer *

Add a plugin layer to the network.

Parameters:	inputs () – The input tensors to the layer. nbInputs* () – The number of input tensors. plugin* (*) – The layer plugin.
Returns:	the new plugin layer, or None if it could not be created.

add_plugin_ext()¶

addPluginExt(ITensor *const *inputs, int nbInputs, IPluginExt &plugin)=0 -> IPluginLayer *

Add a plugin layer to the network using an IPluginExt interface.

Parameters:	inputs () – The input tensors to the layer. nbInputs* () – The number of input tensors. plugin* (*) – The layer plugin.
Returns:	The new plugin layer, or None if it could not be created.

add_pooling()¶

addPooling(ITensor &input, PoolingType type, DimsHW windowSize)=0 -> IPoolingLayer *

Add a pooling layer to the network.

Parameters:	input () – The input tensor to the layer. type* () – The type of pooling to apply. windowSize* (*) – The size of the pooling window.
Returns:	The new pooling layer, or None if it could not be created.

add_ragged_soft_max()¶

addRaggedSoftMax(ITensor &input, ITensor &bounds)=0 -> IRaggedSoftMaxLayer *

Add a RaggedSoftMax layer to the network.

Parameters:	input () – The ZxS input tensor. bounds* (*) – The Zx1 bounds tensor.
Returns:	The new RaggedSoftMax layer, or None if it could not be created.

add_reduce()¶

addReduce(ITensor &input, ReduceOperation operation, uint32_t reduceAxes, bool keepDimensions)=0 -> IReduceLayer *

Add a reduce layer to the network.

Parameters:

input (*) – The input tensor to the layer.
operation (*) – The reduction operation to perform.
reduceAxes (*) – The reduction dimensions. Bit 0 of the uint32_t type corresponds to the non-batch dimension 0 boolean and so on. If a bit is set, then the corresponding dimension will be reduced. Let’s say we have an NCHW tensor as input (three non-batch dimensions). Bit 0 corresponds to the C dimension boolean. Bit 1 corresponds to the H dimension boolean. Bit 2 corresponds to the W dimension boolean. Note that reduction is not permitted over the batch size dimension.
keepDimensions (*) – The boolean that specifies whether or not to keep the reduced dimensions in the output of the layer.

Returns:

The new reduce layer, or None if it could not be created.

add_rnn()¶

addRNN(ITensor &inputs, int layerCount, std::size_t hiddenSize, int maxSeqLen, RNNOperation op, RNNInputMode mode, RNNDirection dir, Weights weights, Weights bias)=0 -> IRNNLayer *

Add an layerCount deep RNN layer to the network with a sequence length of maxSeqLen and hiddenSize internal state per layer.

Parameters:

inputs (*) – The input tensor to the layer.
layerCount (*) – The number of layers in the RNN.
hiddenSize (*) – The size of the internal hidden state for each layer.
maxSeqLen (*) – The maximum length of the time sequence.
op (*) – The type of RNN to execute.
mode (*) – The input mode for the RNN.
dir (*) – The direction to run the RNN.
weights (*) – The weights for the weight matrix parameters of the RNN.
bias (*) – The weights for the bias vectors parameters of the RNN.

The input tensors must be of the type DataType::kFLOAT or DataType::kHALF.

The layout for the input tensor should be {1, S_max, N, E}, where:

S_max is the maximum allowed sequence length (number of RNN iterations)
N is the batch size
E specifies the embedding length (unless kSKIP is set, in which case it should match getHiddenSize())

The first output tensor is the output of the final RNN layer across all timesteps, with dimensions {S_max, N, H}, where:

S_max is the maximum allowed sequence length (number of RNN iterations)
N is the batch size
H is an output hidden state (equal to getHiddenSize() or 2x getHiddenSize())

The second tensor is the final hidden state of the RNN across all layers, and if the RNN is an LSTM (i.e. getOperation() is kLSTM), then the third tensor is the final cell state of the RNN across all layers. Both the second and third output tensors have dimensions {L, N, H}:

L is equal to getLayerCount() if getDirection is kUNIDIRECTION, and 2*getLayerCount() if getDirection is kBIDIRECTION. In the bi-directional case, layer l’s final forward hidden state is: stored in L = 2*l, and final backward hidden state is stored in L = 2*l + 1
N is the batch size
H is getHiddenSize()

Note that in bidirectional RNNs, the full “hidden state” for a layer l is the concatenation of its forward hidden state and its backward hidden state, and its size is 2*H.

Returns:	The new RNN layer, or None if it could not be created.

add_rnnv2()¶

addRNNv2(ITensor &input, int32_t layerCount, int32_t hiddenSize, int32_t maxSeqLen, RNNOperation op)=0 -> IRNNv2Layer *

Add an layerCount deep RNN layer to the network with hiddenSize internal states that can take a batch with fixed or variable sequence lengths.

Parameters:	input () – The input tensor to the layer (see below). layerCount* () – The number of layers in the RNN. hiddenSize* () – Size of the internal hidden state for each layer. maxSeqLen* () – Maximum sequence length for the input. op () – The type of RNN to execute.

Returns:	The new RNN layer, or None if it could not be created.

add_scale()¶

addScale(ITensor &input, ScaleMode mode, Weights shift, Weights scale, Weights power)=0 -> IScaleLayer *

Add a Scale layer to the network.

Parameters:	input () – The input tensor to The layer. This tensor is required to have a minimum of 3 dimensions. mode* () – The scaling mode. shift* () – The shift value. scale* () – The scale value. power* (*) – The power value.

If the weights are available, then the size of weights are dependent on the on the ScaleMode. For kUNIFORM, the number of weights is equal to 1. For kCHANNEL, the number of weights is equal to the dimension. For kELEMENTWISE, the number of weights is equal to the volume of the input.

Returns:	The new Scale layer, or None if it could not be created.

add_shuffle()¶

addShuffle(ITensor &input)=0 -> IShuffleLayer *

Add a shuffle layer to the network.

Parameters:	input (*) – The input tensor to the layer.
Returns:	The new shuffle layer, or None if it could not be created.

add_softmax()¶

addSoftMax(ITensor &input)=0 -> ISoftMaxLayer *

Add a SoftMax layer to the network.

Returns:	The new SoftMax layer, or None if it could not be created.

add_top_k()¶

addTopK(ITensor &input, TopKOperation op, int k, uint32_t reduceAxes)=0 -> ITopKLayer *

Add a TopK layer to the network.

The TopK layer has two outputs of the same dimensions. The first contains data values, the second contains index positions for the values. Output values are sorted, largest first for operation kMAX and smallest first for operation kMIN.

Currently only values of K up to 1024 are supported.

Parameters:

input (*) – The input tensor to the layer.
op (*) – Operation to perform.
k (*) – Number of elements to keep.
reduceAxes (*) – The reduction dimensions. Bit 0 of the uint32_t type corresponds to the non-batch dimension 0 boolean and so on. If a bit is set, then the corresponding dimension will be reduced. Let’s say we have an NCHW tensor as input (three non-batch dimensions). Bit 0 corresponds to the C dimension boolean. Bit 1 corresponds to the H dimension boolean. Bit 2 corresponds to the W dimension boolean. Note that TopK reduction is currently only permitted over one dimension.

add_unary()¶

addUnary(ITensor &input, UnaryOperation operation)=0 -> IUnaryLayer *

Add a unary layer to the network.

Parameters:	input () – The input tensor to the layer. operation* (*) – The operation to apply.
Returns:	The new unary layer, or None if it could not be created

destroy()¶

destroy()=0

Destroy this INetworkDefinition object.

get_convolution_output_dimensions_formula()¶

getConvolutionOutputDimensionsFormula() const =0 -> IOutputDimensionsFormula &

Get the convolution output dimensions formula.

Deprecated: This method does not currently work reliably and will be removed in a future release.

Returns:	The formula from computing the convolution output dimensions.

get_deconvolution_output_dimensions_formula()¶

getDeconvolutionOutputDimensionsFormula() const =0 -> IOutputDimensionsFormula &

Get the deconvolution output dimensions formula.

Deprecated: This method does not currently work reliably and will be removed in a future release.

Returns:	The formula from computing the deconvolution output dimensions.

get_input()¶

getInput(int index) const =0 -> ITensor *

Get the input tensor specified by the given index.

Parameters:	index (*) – The index of the input tensor.
Returns:	The input tensor, or None if the index is out of range.

get_layer()¶

getLayer(int index) const =0 -> ILayer *

Get the layer specified by the given index.

Parameters:	index (*) – The index of the layer.
Returns:	The layer, or None if the index is out of range.

get_nb_inputs()¶

getNbInputs() const =0 -> int

Get the number of inputs in the network.

Returns:	The number of inputs in the network.

get_nb_layers()¶

getNbLayers() const =0 -> int

Get the number of layers in the network.

Returns:	The number of layers in the network.

get_nb_outputs()¶

getNbOutputs() const =0 -> int

Get the number of outputs in the network.

Returns:	The number of outputs in the network.

get_output()¶

getOutput(int index) const =0 -> ITensor *

Get the output tensor specified by the given index.

Parameters:	index (*) – The index of the output tensor.
Returns:	The output tensor, or None if the index is out of range.

get_pooling_output_dimensions_formula()¶

getPoolingOutputDimensionsFormula() const =0 -> IOutputDimensionsFormula &

Get the pooling output dimensions formula.

Returns:	The formula from computing the pooling output dimensions.

mark_output()¶

markOutput(ITensor &tensor)=0

Mark a tensor as a network output.

Parameters:	tensor (*) – The tensor to mark as an output tensor.

set_convolution_output_dimensions_formula()¶

setConvolutionOutputDimensionsFormula(IOutputDimensionsFormula *formula)=0

Set the convolution output dimensions formula.

Deprecated: This method does not currently work reliably and will be removed in a future release.

Parameters:	formula (*) – The formula from computing the convolution output dimensions. If None is passed, the default formula is used.

The default formula in each dimension is (inputDim + padding * 2 - kernelSize) / stride + 1.

set_deconvolution_output_dimensions_formula()¶

setDeconvolutionOutputDimensionsFormula(IOutputDimensionsFormula *formula)=0

Set the deconvolution output dimensions formula.

Deprecated: This method does not currently work reliably and will be removed in a future release.

Parameters:	formula (*) – The formula from computing the deconvolution output dimensions. If None is passed, the default formula is used.

The default formula in each dimension is (inputDim - 1) * stride + kernelSize - 2 * padding.

set_pooling_output_dimensions_formula()¶

setPoolingOutputDimensionsFormula(IOutputDimensionsFormula *formula)=0

Set the pooling output dimensions formula.

Parameters:	formula (*) – The formula from computing the pooling output dimensions. If None is passed, the default formula is used.

The default formula in each dimension is (inputDim + padding * 2 - kernelSize) / stride + 1.

LayerType¶

class tensorrt.infer.LayerType¶

Derived From C++ Class nvinfer1::LayerType

Available layer types

Base Class:: IntEnum

Tensor¶

class tensorrt.infer.Tensor¶

Derived From C++ Class nvinfer1::Tensor

A tensor in a network definition.

C++ includes: NvInfer.h

get_broadcast_across_batch()¶

getBroadcastAcrossBatch() const =0 -> bool

Check if tensor is broadcast across the batch.

When a tensor is broadcast across a batch, it has the same value for every member in the batch. Memory is only allocated once for the single member.

Returns:	True if tensor is broadcast across the batch, false otherwise.

get_dimensions()¶

getDimensions() const =0 -> Dims

Get the dimensions of a tensor.

Returns:	The dimensions of the layer.

get_location()¶

getLocation() const =0 -> TensorLocation

Get the storage location of a tensor.

Returns:	The location of tensor data.

get_name()¶

getName() const =0 -> const char *

Get the tensor name.

Returns:	The name, as a pointer to a None-terminated character sequence.

get_type()¶

getType() const =0 -> DataType

Get the data type of a tensor.

Returns:	The data type of the tensor.

is_network_input()¶

isNetworkInput() const =0 -> bool

Whether the tensor is a network input.

is_network_output()¶

isNetworkOutput() const =0 -> bool

Whether the tensor is a network output.

set_broadcast_across_batch()¶

setBroadcastAcrossBatch(bool broadcastAcrossBatch)=0

Set whether to enable broadcast of tensor across the batch.

When a tensor is broadcast across a batch, it has the same value for every member in the batch. Memory is only allocated once for the single member.

This method is only valid for network input tensors, since the flags of layer output tensors are inferred based on layer inputs and parameters. If this state is modified for a tensor in the network, the states of all dependent tensors will be recomputed.

Parameters:	broadcastAcrossBatch (*) – Whether to enable broadcast of tensor across the batch.

set_dimensions()¶

setDimensions(Dims dimensions)=0

Set the dimensions of a tensor.

For a network input the name is assigned by the application. For a network output it is computed based on the layer parameters and the inputs to the layer. If a tensor size or a parameter is modified in the network, the dimensions of all dependent tensors will be recomputed.

This call is only legal for network input tensors, since the dimensions of layer output tensors are inferred based on layer inputs and parameters.

Parameters:	dimensions (*) – The dimensions of the tensor.

set_location()¶

setLocation(TensorLocation location)=0

Set the storage location of a tensor.

Parameters:	location (*) – the location of tensor data

Only input tensors for storing sequence lengths for RNNv2 are supported. Using host storage for layers that do not support it will generate errors at build time.

set_name()¶

setName(const char *name)=0

Set the tensor name.

For a network input, the name is assigned by the application. For tensors which are layer outputs, a default name is assigned consisting of the layer name followed by the index of the output in brackets.

This method copies the name string.

Parameters:	name (*) – The name.

set_type()¶

setType(DataType type)=0

Set the data type of a tensor.

Parameters:	type (*) – The data type of the tensor.

The type is unchanged if the type is invalid for the given tensor. If the tensor is a network input or output, then the tensor type cannot be DataType::kINT8.

Layer¶

class tensorrt.infer.Layer¶

Derived From C++ Class nvinfer1::Layer

Base class for all layer classes in a network definition.

C++ includes: NvInfer.h

get_core()¶

getCore() const =0 -> COREID

get the DLA core that this layer executes on

get_input()¶

getInput(int index) const =0 -> ITensor *

Get the layer input corresponding to the given index.

Parameters:	index (*) – The index of the in
Returns:	The input tensor, or None if the index is out of range.

get_name()¶

getName() const =0 -> const char *

Return the name of a layer.

get_nb_inputs()¶

getNbInputs() const =0 -> int

Get the number of inputs of a layer.

get_nb_outputs()¶

getNbOutputs() const =0 -> int

Get the number of outputs of a layer.

get_output()¶

getOutput(int index) const =0 -> ITensor *

Get the layer output corresponding to the given index.

Returns:	The indexed output tensor, or None if the index is out of range.

get_type()¶

getType() const =0 -> LayerType

Return the type of a layer.

set_core()¶

setCore(COREID core)=0 -> bool

set the DLA that this layer must execute on.

Returns:	returns true if the core is valid for the layer, false otherwise.

set_name()¶

setName(const char *name)=0

Set the name of a layer.

This method copies the name string.

ConvolutionLayer¶

class tensorrt.infer.ConvolutionLayer¶

Derived From C++ Class nvinfer1::ConvolutionLayer

A convolution layer in a network definition.

This layer performs a correlation operation between 3-dimensional filter with a 4-dimensional tensor to produce another 4-dimensional tensor.

The HW output size of the convolution is set according to the INetworkCustomDimensions set in INetworkDefinition::setCustomConvolutionDimensions().

An optional bias argument is supported, which adds a per-channel constant to each value in the output.

C++ includes: NvInfer.h

get_bias_weights()¶

getBiasWeights() const =0 -> Weights

Get the bias weights for the convolution.

get_dilation()¶

getDilation() const =0 -> DimsHW

Get the dilation for a convolution.

get_kernel_size()¶

getKernelSize() const =0 -> DimsHW

Get the HW kernel size of the convolution.

get_kernel_weights()¶

getKernelWeights() const =0 -> Weights

Get the kernel weights for the convolution.

get_nb_groups()¶

getNbGroups() const =0 -> int

Set the number of groups for a convolution.

get_nb_output_maps()¶

getNbOutputMaps() const =0 -> int

Get the number of output maps for the convolution.

get_padding()¶

getPadding() const =0 -> DimsHW

Get the padding of the convolution.

get_stride()¶

getStride() const =0 -> DimsHW

Get the stride of the convolution.

set_bias_weights()¶

setBiasWeights(Weights weights)=0

Set the bias weights for the convolution.

Bias is optional. To omit bias, set the count value of the weights structure to zero.

The bias is applied per-channel, so the number of weights (if non-zero) must be equal to the number of output feature maps.

set_dilation()¶

setDilation(DimsHW dims)=0

Set the dilation for a convolution.

Default: (1,1)

set_kernel_size()¶

setKernelSize(DimsHW kernelSize)=0

Set the HW kernel size of the convolution.

set_kernel_weights()¶

setKernelWeights(Weights weights)=0

Set the kernel weights for the convolution.

The weights are specified as a contiguous array in GKCRS order, where G is the number of groups, K the number of output feature maps, C the number of input channels, and R and S are the height and width of the filter.

set_nb_groups()¶

setNbGroups(int nbGroups)=0

Set the number of groups for a convolution.

The input tensor channels are divided into nbGroups groups, and a convolution is executed for each group, using a filter per group. The results of the group convolutions are concatenated to form the output.

note: When using groups in int8 mode, the size of the groups (i.e. the channel count divided by the group count) must be a multiple of 4 for both input and output.

Default: 1

set_nb_output_maps()¶

setNbOutputMaps(int nbOutputMaps)=0

Set the number of output maps for the convolution.

set_padding()¶

setPadding(DimsHW padding)=0

Set the padding of the convolution.

The input will be zero-padded by this number of elements in the height and width directions. Padding is symmetric.

Default: (0,0)

set_stride()¶

setStride(DimsHW stride)=0

Get the stride of the convolution.

Default: (1,1)

FullyConnectedLayer¶

class tensorrt.infer.FullyConnectedLayer¶

Derived From C++ Class nvinfer1::FullyConnectedLayer

A fully connected layer in a network definition. This layer expects an input tensor of three or more non-batch dimensions. The input is automatically reshaped into an MxV tensor X, where V is a product of the last three dimensions and M is a product of the remaining dimensions (where the product over 0 dimensions is defined as 1). For example:

If the input tensor has shape {C, H, W}, then the tensor is reshaped into {1, C*H*W}.
If the input tensor has shape {P, C, H, W}, then the tensor is reshaped into {P, C*H*W}.

The layer then performs the following operation:

Where X is the MxV tensor defined above, W is the KxV weight tensor of the layer, and bias is a row vector size K that is broadcasted to MxK. K is the number of output channels, and configurable via setNbOutputChannels(). If bias is not specified, it is implicitly 0.

The MxK result Y is then reshaped such that the last three dimensions are {K, 1, 1} and the remaining dimensions match the dimensions of the input tensor. For example:

If the input tensor has shape {C, H, W}, then the output tensor will have shape {K, 1, 1}.
If the input tensor has shape {P, C, H, W}, then the output tensor will have shape {P, K, 1, 1}.

C++ includes: NvInfer.h

get_bias_weights()¶

getBiasWeights() const =0 -> Weights

Get the bias weights.

get_kernel_weights()¶

getKernelWeights() const =0 -> Weights

Get the kernel weights.

get_nb_output_channels()¶

getNbOutputChannels() const =0 -> int

Get the number of output channels K from the fully connected layer.

set_bias_weights()¶

setBiasWeights(Weights weights)=0

Set the bias weights.

Bias is optional. To omit bias, set the count value in the weights structure to zero.

set_kernel_weights()¶

setKernelWeights(Weights weights)=0

Set the kernel weights, given as a KxC matrix in row-major order.

set_nb_output_channels()¶

setNbOutputChannels(int nbOutputs)=0

Set the number of output channels K from the fully connected layer.

ActivationLayer¶

class tensorrt.infer.ActivationLayer¶

Derived From C++ Class nvinfer1::ActivationLayer

An Activation layer in a network definition.

This layer applies a per-element activation function to its input.

The output has the same shape as the input.

C++ includes: NvInfer.h

get_activation_type()¶

getActivationType() const =0 -> ActivationType

Get the type of activation to be performed.

set_activation_type()¶

setActivationType(ActivationType type)=0

Set the type of activation to be performed.

ActivationType¶

class tensorrt.infer.ActivationType¶

Derived From C++ Class nvinfer1::ActivationType

Type of activation function

Base Class:: IntEnum

PoolingLayer¶

class tensorrt.infer.PoolingLayer¶

Derived From C++ Class nvinfer1::PoolingLayer

A Pooling layer in a network definition.

The layer applies a reduction operation within a window over the input.

The output size is determined from the input size using the formula set by INetworkDefinition::setCustomPoolingDimensions().

C++ includes: NvInfer.h

get_average_count_excludes_padding()¶

getAverageCountExcludesPadding() const =0 -> bool

Get whether exclusive pooling uses as a denominator the overlap area betwen the window and the unpadded input.

get_blend_factor()¶

getBlendFactor() const =0 -> float

Get the blending factor for the max_average_blend mode: max_average_blendPool = (1-blendFactor)*maxPool + blendFactor*avgPool blendFactor is a user value in [0,1] with the default value of 0.0 In modes other than kMAX_AVERAGE_BLEND, blendFactor is ignored.

get_padding()¶

getPadding() const =0 -> DimsHW

Get the padding for pooling.

Default: 0

get_pooling_type()¶

getPoolingType() const =0 -> PoolingType

Get the type of activation to be performed.

get_stride()¶

getStride() const =0 -> DimsHW

Get the stride for pooling.

get_window_size()¶

getWindowSize() const =0 -> DimsHW

Get the window size for pooling.

set_average_count_excludes_padding()¶

setAverageCountExcludesPadding(bool exclusive)=0

Set whether average pooling uses as a denominator the overlap area between the window and the unpadded input. If this is not set, the denominator is the overlap between the pooling window and the padded input.

Default: true

set_blend_factor()¶

setBlendFactor(float blendFactor)=0

Set the blending factor for the max_average_blend mode: max_average_blendPool = (1-blendFactor)*maxPool + blendFactor*avgPool blendFactor is a user value in [0,1] with the default value of 0.0 This value only applies for the kMAX_AVERAGE_BLEND mode.

set_padding()¶

setPadding(DimsHW padding)=0

Set the padding for pooling.

Default: 0

set_pooling_type()¶

setPoolingType(PoolingType type)=0

Set the type of activation to be performed.

set_stride()¶

setStride(DimsHW stride)=0

Set the stride for pooling.

Default: 1

set_window_size()¶

setWindowSize(DimsHW windowSize)=0

Set the window size for pooling.

PoolingType¶

class tensorrt.infer.PoolingType¶

Derived From C++ Class nvinfer1::PoolingType

Type of pooling layer

Base Class:: IntEnum

LRNLayer¶

class tensorrt.infer.LRNLayer¶

Derived From C++ Class nvinfer1::LRNLayer

A LRN layer in a network definition.

The output size is the same as the input size.

C++ includes: NvInfer.h

get_alpha()¶

getAlpha() const =0 -> float

Get the LRN alpha value.

get_beta()¶

getBeta() const =0 -> float

Get the LRN beta value.

get_k()¶

getK() const =0 -> float

Get the LRN K value.

get_window_size()¶

getWindowSize() const =0 -> int

Get the LRN window size.

set_alpha()¶

setAlpha(float alpha)=0

Set the LRN alpha value.

The valid range is [-1e20, 1e20].

set_beta()¶

setBeta(float beta)=0

Set the LRN beta value.

The valid range is [0.01, 1e5f].

set_k()¶

setK(float k)=0

Set the LRN K value.

The valid range is [1e-5, 1e10].

set_window_size()¶

setWindowSize(int windowSize)=0

Set the LRN window size.

The window size must be odd and in the range of [1, 15].

ScaleLayer¶

class tensorrt.infer.ScaleLayer¶

Derived From C++ Class nvinfer1::ScaleLayer

A Scale layer in a network definition.

This layer applies a per-element computation to its input:

output = (input* scale + shift)^ power

The coefficients can be applied on a per-tensor, per-channel, or per-element basis.

Note: If the number of weights is 0, then a default value is used for shift, power, and scale. The default shift is 0, the default power is 1, and the default scale is 1.

The output size is the same as the input size.

note: The input tensor for this layer is required to have a minimum of 3 dimensions.

C++ includes: NvInfer.h

get_mode()¶

getMode() const =0 -> ScaleMode

Set the scale mode.

get_power()¶

getPower() const =0 -> Weights

Get the power value.

get_scale()¶

getScale() const =0 -> Weights

Get the scale value.

get_shift()¶

getShift() const =0 -> Weights

Get the shift value.

set_mode()¶

setMode(ScaleMode mode)=0

Set the scale mode.

set_power()¶

setPower(Weights power)=0

Set the power value.

set_scale()¶

setScale(Weights scale)=0

Set the scale value.

set_shift()¶

setShift(Weights shift)=0

Set the shift value.

ScaleMode¶

class tensorrt.infer.ScaleMode¶

Derived From C++ Class nvinfer1::ScaleMode

Scale mode

Base Class:: IntEnum

SoftmaxLayer¶

class tensorrt.infer.SoftmaxLayer¶

Derived From C++ Class nvinfer1::SoftmaxLayer

A Softmax layer in a network definition.

This layer applies a per-channel softmax to its input.

The output size is the same as the input size.

C++ includes: NvInfer.h

ConcatenationLayer¶

class tensorrt.infer.ConcatenationLayer¶

Derived From C++ Class nvinfer1::ConcatenationLayer

A concatenation layer in a network definition.

The output channel size is the sum of the channel sizes of the inputs. The other output sizes are the same as the other input sizes, which must all match.

C++ includes: NvInfer.h

get_axis()¶

getAxis() const =0 -> int

Get the axis along which concatenation occurs.

set_axis()¶

setAxis(int axis)=0

Set the axis along which concatenation occurs.

0 is the major axis (excluding the batch dimension). The default is the number of non-batch axes in the tensor minus three (e.g. for an NCHW input it would be 0), or 0 if there are fewer than 3 non- batch axes.

Parameters:	axis (*) – The axis along which concatenation occurs.

DeconvolutionLayer¶

class tensorrt.infer.DeconvolutionLayer¶

Derived From C++ Class nvinfer1::DeconvolutionLayer

A deconvolution layer in a network definition.

The output size is defined using the formula set by INetworkDefinition::setDeconvolutionOutputDimensionsFormula().

C++ includes: NvInfer.h

get_bias_weights()¶

getBiasWeights() const =0 -> Weights

Get the bias weights for the deconvolution.

get_kernel_size()¶

getKernelSize() const =0 -> DimsHW

Get the HW kernel size of the deconvolution.

get_kernel_weights()¶

getKernelWeights() const =0 -> Weights

Get the kernel weights for the deconvolution.

get_nb_groups()¶

getNbGroups() const =0 -> int

Set the number of groups for a deconvolution.

get_nb_output_maps()¶

getNbOutputMaps() const =0 -> int

Get the number of output feature maps for the deconvolution.

get_padding()¶

getPadding() const =0 -> DimsHW

Get the padding of the deconvolution.

get_stride()¶

getStride() const =0 -> DimsHW

Get the stride of the deconvolution.

Default: (1,1)

set_bias_weights()¶

setBiasWeights(Weights weights)=0

Set the bias weights for the deconvolution.

Bias is optional. To omit bias, set the count value of the weights structure to zero.

The bias is applied per-feature-map, so the number of weights (if non-zero) must be equal to the number of output feature maps.

set_kernel_size()¶

setKernelSize(DimsHW kernelSize)=0

Set the HW kernel size of the convolution.

set_kernel_weights()¶

setKernelWeights(Weights weights)=0

Set the kernel weights for the deconvolution.

The weights are specified as a contiguous array in CKRS order, where C the number of input channels, K the number of output feature maps, and R and S are the height and width of the filter.

set_nb_groups()¶

setNbGroups(int nbGroups)=0

Set the number of groups for a deconvolution.

The input tensor channels are divided into nbGroups groups, and a deconvolution is executed for each group, using a filter per group. The results of the group convolutions are concatenated to form the output.

note: When using groups in int8 mode, the size of the groups (i.e. the channel count divided by the group count) must be a multiple of 4 for both input and output.

Default: 1

set_nb_output_maps()¶

setNbOutputMaps(int nbOutputMaps)=0

Set the number of output feature maps for the deconvolution.

set_padding()¶

setPadding(DimsHW padding)=0

Set the padding of the deconvolution.

The input will be zero-padded by this number of elements in the height and width directions. Padding is symmetric.

Default: (0,0)

set_stride()¶

setStride(DimsHW stride)=0

Get the stride of the deconvolution.

GatherLayer¶

class tensorrt.infer.GatherLayer¶

ReduceLayer¶

class tensorrt.infer.ReduceLayer¶

ConstantLayer¶

class tensorrt.infer.ConstantLayer¶

MatrixMultiply¶

class tensorrt.infer.MatrixMultiply¶

RaggedSoftMax¶

class tensorrt.infer.RaggedSoftMax¶

RNNv2Layer¶

class tensorrt.infer.RNNv2Layer¶

TopKLayer¶

class tensorrt.infer.TopKLayer¶

ElementWiseLayer¶

class tensorrt.infer.ElementWiseLayer¶

Derived From C++ Class nvinfer1::ElementWiseLayer

A elementwise layer in a network definition.

This layer applies a per-element binary operation between corresponding elements of two tensors.

The input dimensions of the two input tensors must be equal, and the output tensor is the same size as each input.

C++ includes: NvInfer.h

get_operation()¶

getOperation() const =0 -> ElementWiseOperation

Get the binary operation for the layer.

setBiasWeights()

set_operation()¶

setOperation(ElementWiseOperation type)=0

Set the binary operation for the layer.

getBiasWeights()

ElementWiseOperation¶

class tensorrt.infer.ElementWiseOperation¶

Derived From C++ Class nvinfer1::ElementWiseOperation

Type of operation for the layer

Base Class:: IntEnum

ShuffleLayer¶

class tensorrt.infer.ShuffleLayer¶

Derived From C++ Class nvinfer1::ShuffleLayer

Layer type for shuffling data.

This class shuffles data by applying in sequence: a transpose operation, a reshape operation and a second transpose operation. The dimension types of the output are those of the reshape dimension.

C++ includes: NvInfer.h

get_first_transpose()¶

getFirstTranspose() const =0 -> Permutation

Get the permutation applied by the first transpose operation.

Returns:	The dimension permutation applied before the reshape.

get_reshape_dimensions()¶

getReshapeDimensions() const =0 -> Dims

Get the reshaped dimensions.

Returns:	The reshaped dimensions.

get_second_transpose()¶

getSecondTranspose() const =0 -> Permutation

Get the permutation applied by the second transpose operation.

Returns:	The dimension permutation applied after the reshape.

set_first_transpose()¶

setFirstTranspose(Permutation permutation)=0

Set the permutation applied by the first transpose operation.

Parameters:	permutation (*) – The dimension permutation applied before the reshape.

The default is the identity permutation.

set_reshape_dimensions()¶

setReshapeDimensions(Dims dimensions)=0

Set the reshaped dimensions.

Parameters:	dimensions (*) – The reshaped dimensions.

Two special values can be used as dimensions.

Value 0 copies the corresponding dimension from input. This special value can be used more than once in the dimensions. If number of reshape dimensions is less than input, 0s are resolved by aligning the most significant dimensions of input.

Value-1 infers that particular dimension by looking at input and rest of the reshape dimensions. Note that only a maximum of one dimension is permitted to be specified as -1.

The product of the new dimensions must be equal to the product of the old.

set_second_transpose()¶

setSecondTranspose(Permutation permutation)=0

Set the permutation applied by the second transpose operation.

Parameters:	permutation (*) – The dimension permutation applied after the reshape.

The default is the identity permutation.

The permutation is applied as outputDimensionIndex = permutation.order[inputDimensionIndex], so to permute from CHW order to HWC order, the required permutation is [1, 2, 0]

Permutation¶

class tensorrt.infer.Permutation¶

Derived From C++ Class nvinfer1::Permutation

order: int – The elements of the permutation. The permutation is applied as outputDimensionIndex = permutation.order[inputDimensionIndex], so to permute from CHW order to HWC order, the required permutation is [1, 2, 0], and to permute from HWC to CHW, the required permutation is [2, 0, 1].

UnaryLayer¶

class tensorrt.infer.UnaryLayer¶

Derived From C++ Class nvinfer1::UnaryLayer

Layer that represents an unary operation.

C++ includes: NvInfer.h

get_operation()¶

getOperation() const =0 -> UnaryOperation

Get the unary operation for the layer.

set_operation()¶

setOperation(UnaryOperation op)=0

Set the unary operation for the layer.

UnaryOperation¶

class tensorrt.infer.UnaryOperation¶

Derived From C++ Class nvinfer1::UnaryOperation

Type of operation for the layer

Base Class:: IntEnum

PluginLayer¶

class tensorrt.infer.PluginLayer¶

Derived From C++ Class nvinfer1::PluginLayer

Layer type for plugins.

C++ includes: NvInfer.h

get_plugin()¶

getPlugin()=0 -> IPlugin &

Get the plugin for the layer.

PaddingLayer¶

class tensorrt.infer.PaddingLayer¶

Derived From C++ Class nvinfer1::PaddingLayer

Layer that represents a padding operation.

The padding layer adds zero-padding at the start and end of the input tensor. It only supports padding along the two innermost dimensions. Applying negative padding results in cropping of the input.

C++ includes: NvInfer.h

get_post_padding()¶

getPostPadding() const =0 -> DimsHW

Set the padding that is applied at the end of the tensor.

get_pre_padding()¶

getPrePadding() const =0 -> DimsHW

Set the padding that is applied at the start of the tensor.

set_post_padding()¶

setPostPadding(DimsHW padding)=0

Set the padding that is applied at the end of the tensor.

Negative padding results in trimming the edge by the specified amount

set_pre_padding()¶

setPrePadding(DimsHW padding)=0

Set the padding that is applied at the start of the tensor.

Negative padding results in trimming the edge by the specified amount

RNNLayer¶

class tensorrt.infer.RNNLayer¶

Derived From C++ Class nvinfer1::RNNLayer

A RNN layer in a network definition.

This layer applies an RNN operation on the inputs.

Deprecated: This interface is superseded by IRNNv2Layer.

C++ includes: NvInfer.h

get_bias()¶

getBias() const =0 -> Weights

Get the bias parameter vector for the RNN.

get_cell_state()¶

getCellState() const =0 -> ITensor *

Get the initial cell state of the RNN.

Returns:	None if no initial cell tensor was specified, the initial cell data otherwise.

get_data_length()¶

getDataLength() const =0 -> int

Get the length of the data being processed by the RNN for use in computing other values.

get_direction()¶

getDirection() const =0 -> RNNDirection

Get the direction of the RNN layer.

get_hidden_size()¶

getHiddenSize() const =0 -> std::size_t

Get the size of the hidden layers.

The hidden size is the value of hiddenSize parameter passed into addRNN().

Returns:	The internal hidden layer size for the RNN.

get_hidden_state()¶

getHiddenState() const =0 -> ITensor *

Get the initial hidden state of the RNN.

Returns:	None if no initial hidden tensor was specified, the initial hidden data otherwise.

get_input_mode()¶

getInputMode() const =0 -> RNNInputMode

Get the operation of the RNN layer.

get_layer_count()¶

getLayerCount() const =0 -> unsigned

Get the number of layers in the RNN.

Returns:	The number of layers in the RNN.

get_operation()¶

getOperation() const =0 -> RNNOperation

Get the operation of the RNN layer.

get_seq_length()¶

getSeqLength() const =0 -> int

Get the sequence length.

The sequence length is the maximum number of time steps passed into the addRNN() function. This is also the maximum number of input tensors that the RNN can process at once.

Returns:	the maximum number of time steps that can be executed by a single call RNN layer.

get_weights()¶

getWeights() const =0 -> Weights

Get the W weights for the RNN.

set_bias()¶

setBias(Weights bias)=0

Set the bias parameters for the RNN.

Parameters:	bias (*) – The weight structure holding the bias parameters.

The trained weights for the bias parameter vectors of the RNN. The DataType for this structure must be kFLOAT or kHALF, and must be the same datatype as the input tensor.

The layout of the weight structure depends on the RNNOperation, RNNInputMode, and RNNDirection of the layer. The array specified by weights.values contains a sequence of bias vectors, where each bias vector is linearly appended after the previous without padding; e.g. if bias vector 0 and 1 have M and N elements respectively, then the layout of weights.values in memory looks like:

index | 0 1 2 3 4 ...  M-2 M-1 | M M+1  ... M+N-2 M+N-1 | M+N M+N+1 M+N+2 ...   | ...
data  |--   bias vector 0    --|--   bias vector 1    --|--   bias vector 2   --| ...

The ordering of bias vectors is similar to the ordering of weight matrices as described in setWeights(). To determine the order of bias vectors for a given RNN configuration, determine the ordered list of weight matrices [ W0, W1, …, Wn ]. Then replace each weight matrix with its corresponding bias vector, i.e. apply the following transform (for layer l, gate g):

Wl[g] becomes Wbl[g]
Rl[g] becomes Rbl[g]

For example:

an RNN with getLayerCount() == 3, getDirection() == kUNIDIRECTION, and getOperation() == kRELU has the following order:

[ Wb0[i], Rb0[i], Wb1[i], Rb1[i], Wb2[i], Rb2[i] ]
an RNN with getLayerCount() == 2, getDirection() == kUNIDIRECTION, and getOperation() == kGRU has the following order:

[ Wb0[z], Wb0[r], Wb0[h], Rb0[z], Rb0[r], Rb0[h], Wb1[z], Wb1[r], Wb1[h], Rb1[z], Rb1[r], Rb1[h] ]
an RNN with getLayerCount() == 2, getDirection() == kBIDIRECTION, and getOperation() == kRELU has the following order:

[ Wb0_fw[i], Rb0_fw[i], Wb0_bw[i], Rb0_bw[i], Wb1_fw[i], Rb1_fw[i], Wb1_bw[i], Rb1_bw[i] ]

(fw = “forward”, bw = “backward”)

Each bias vector has a fixed size, getHiddenSize().

set_cell_state()¶

setCellState(ITensor &cell)=0

Set the initial cell state of the RNN with the provided cell ITensor.

Parameters:	cell (*) – The initial cell state of the RNN.

The layout for cell is a linear layout of a 3D matrix:

C - The number of layers in the RNN, it must match getLayerCount().
H - The number of mini-batches for each time sequence.
W - The size of the per layer hidden states, it must match getHiddenSize().

If cell is not specified, then the initial cell state is set to zero.

The amount of space required is doubled if getDirection() is kBIDIRECTION with the bidirectional states coming after the unidirectional sptes.

The cell state only affects LSTM RNN's.

set_direction()¶

setDirection(RNNDirection op)=0

Set the direction of the RNN layer.

The direction determines if the RNN is run as a unidirectional(left to right) or bidirectional(left to right and right to left). In the kBIDIRECTION case the output is concatenated together, resulting in output size of 2x getHiddenSize().

set_hidden_state()¶

setHiddenState(ITensor &hidden)=0

Set the initial hidden state of the RNN with the provided hidden ITensor.

Parameters:	hidden (*) – The initial hidden state of the RNN.

The layout for hidden is a linear layout of a 3D matrix:

C - The number of layers in the RNN, it must match getLayerCount()
H - The number of mini-batches for each time sequence.
W - The size of the per layer hidden states, it must match getHiddenSize()

The amount of space required is doubled if getDirection() is kBIDIRECTION with the bidirectional states coming after the unidirectional states.

If hidden is not specified, then the initial hidden state is set to zero.

set_input_mode()¶

setInputMode(RNNInputMode op)=0

Set the operation of the RNN layer.

set_operation()¶

setOperation(RNNOperation op)=0

Set the operation of the RNN layer.

set_weights()¶

setWeights(Weights weights)=0

Set the weight parameters for the RNN.

Parameters:	weights (*) – The weight structure holding the weight parameters.

The trained weights for the weight parameter matrices of the RNN. The DataType for this structure must be kFLOAT or kHALF, and must be the same datatype as the input tensor.

The layout of the weight structure depends on the RNNOperation, RNNInputMode, and RNNDirection of the layer. The array specified by weights.values contains a sequence of parameter matrices, where each parameter matrix is linearly appended after the previous without padding; e.g., if parameter matrix 0 and 1 have M and N elements respectively, then the layout of weights.values in memory looks like:

index | 0 1 2 3 4 ...  M-2 M-1 | M M+1  ... M+N-2 M+N-1 | M+N M+N+1 M+N+2 ...    | ...
data  |-- parameter matrix 0 --|-- parameter matrix 1 --|-- parameter matrix 2 --| ...

The following sections describe the order of weight matrices and the layout of elements within a weight matrix. Order of weight matrices The parameter matrices are ordered as described below:

For example:

an RNN with getLayerCount() == 3, getDirection() == kUNIDIRECTION, and getOperation() == kRELU has the following order:

[ W0[i], R0[i], W1[i], R1[i], W2[i], R2[i] ]
an RNN with getLayerCount() == 2, getDirection() == kUNIDIRECTION, and getOperation() == kGRU has the following order:

[ W0[z], W0[r], W0[h], R0[z], R0[r], R0[h], W1[z], W1[r], W1[h], R1[z], R1[r], R1[h] ]
an RNN with getLayerCount() == 2, getDirection() == kBIDIRECTION, and getOperation() == kRELU has the following order:

[ W0_fw[i], R0_fw[i], W0_bw[i], R0_bw[i], W1_fw[i], R1_fw[i], W1_bw[i], R1_bw[i] ]

(fw = “forward”, bw = “backward”)

Layout of elements within a weight matrix Each parameter matrix is row-major in memory, and has the following dimensions:

In other words, the input weights of the first layer of the RNN (if not skipped) transform a getDataLength()-size column vector into a getHiddenSize()-size column vector. The input weights of subsequent layers transform a K*getHiddenSize()-size column vector into a getHiddenSize()-size column vector. K=2 in the bidirectional case to account for the full hidden state being the concatenation of the forward and backward RNN hidden states.

The recurrent weight matrices for all layers all have shape (H, H), both in the unidirectional and bidirectional cases. (In the bidirectional case, each recurrent weight matrix for the (forward or backward) RNN cell operates on the previous (forward or backward) RNN cell’s hidden state, which is size H).

RNNOperation¶

class tensorrt.infer.RNNOperation¶

Derived From C++ Class nvinfer1::RNNOperation

Type of operation for the layer

Base Class:: IntEnum

RNNDirection¶

class tensorrt.infer.RNNDirection¶

Derived From C++ Class nvinfer1::RNNDirection

Direction for the RNN Layer

Base Class:: IntEnum

RNNInputMode¶

class tensorrt.infer.RNNInputMode¶

Derived From C++ Class nvinfer1::RNNInputMode

Input mode for RNN Layer

Base Class:: IntEnum

Int8 Calibration¶

Int8Calibrator¶

class tensorrt.infer.Int8Calibrator¶

Derived From C++ Class nvinfer1::Int8Calibrator

Application-implemented interface for calibration.

Calibration is a step performed by the builder when deciding suitable scale factors for 8-bit inference.

It must also provide a method for retrieving representative images which the calibration process can use to examine the distribution of activations. It may optionally implement a method for caching the calibration result for reuse on subsequent runs.

C++ includes: NvInfer.h

get_algorithm()¶

getAlgorithm()=0 -> CalibrationAlgoType

Get the algorithm used by this calibrator.

Returns:	The algorithm used by the calibrator.

get_batch()¶

getBatch(void *bindings[], const char *names[], int nbBindings)=0 -> bool

Get a batch of input for calibration.

The batch size of the input must match the batch size returned by getBatchSize().

Parameters:

bindings (*) – An array of pointers to device memory that must be set to the memory containing each network input data.
names (*) – The names of the network input for each pointer in the binding array.
nbBindings (*) – The number of pointers in the bindings array.

Returns:

False if there are no more batches for calibration.

get_batch_size()¶

getBatchSize() const =0 -> int

Get the batch size used for calibration batches.

Returns:	The batch size.

read_calibration_cache()¶

readCalibrationCache(std::size_t &length)=0 -> const void *

Load a calibration cache.

Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not batch the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

Parameters:	length (*) – The length of the cached data, that should be set by the called function. If there is no data, this should be zero.
Returns:	A pointer to the cache, or None if there is no data.

write_calibration_cache()¶

writeCalibrationCache(const void *ptr, std::size_t length)=0

Save a calibration cache.

Parameters:	ptr () – A pointer to the data to cache. length* (*) – The length in bytes of the data to cache.

Int8EntropyCalibrator¶

class tensorrt.infer.Int8EntropyCalibrator¶

Derived From C++ Class nvinfer1::Int8EntropyCalibrator

Entropy calibrator. This is the preferred calibrator, as it is less complicated than the legacy calibrator and produces better results.

C++ includes: NvInfer.h

Int8LegacyCalibrator¶

class tensorrt.infer.Int8LegacyCalibrator¶

Derived From C++ Class nvinfer1::Int8LegacyCalibrator

Legacy calibrator for compatibility with 2.0 EA. Will be removed in 2.2. Deprecated

C++ includes: NvInfer.h

get_algorithm()¶

getAlgorithm() -> CalibrationAlgoType

Signal that this is the legacy calibrator.

get_quantile()¶

getQuantile() const =0 -> double

The quantile (between 0 and 1) that will be used to select the region maximum when the quantile method is in use.

See the user guide for more details on how the quantile is used.

get_regression_cutoff()¶

getRegressionCutoff() const =0 -> double

The fraction (between 0 and 1) of the maximum used to define the regression cutoff when using regression to determine the region maximum.

See the user guide for more details on how the regression cutoff is used

read_histogram_cache()¶

readHistogramCache(std::size_t &length)=0 -> const void *

Load a histogram.

Histogram generation is potentially expensive, so it can be useful to generate the histograms once, then use them when exploring the space of calibrations. The histograms should be regenerated if the network structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.

Parameters:	length (*) – The length of the cached data, that should be set by the called function. If there is no data, this shou
Returns:	A pointer to the cache, or None if there is no data.

write_histogram_cache()¶

writeHistogramCache(const void *ptr, std::size_t length)=0

Save a histogram cache.

Parameters:	ptr () – A pointer to the data to cache. length* (*) – The length in bytes of the data to cache.

CalibrationAlgoType¶

class tensorrt.infer.CalibrationAlgoType¶

Derived From C++ Class nvinfer1::CalibrationAlgoType

Type of int8 calibration algorithm

Base Class:: IntEnum

Logger¶

class tensorrt.infer.Logger¶

Derived From C++ Class nvinfer1::Logger

Application-implemented logging interface for the builder, engine and runtime.

Note that although a logger is passed on creation to each instance of a IBuilder or IRuntime interface, the logger is internally considered a singleton, and thus multiple instances of IRuntime and/or IBuilder must all use the same logger.

C++ includes: NvInfer.h

log()¶

log(Severity severity, const char *msg)=0

A callback implemented by the application to handle logging messages;

Parameters:	severity () – The severity of the message. msg* (*) – The log message, None terminated.

ConsoleLogger¶

class tensorrt.infer.ConsoleLogger¶

Derived From C++ Class nvinfer1::ConsoleLogger

log()¶

log(Severity severity, const char *msg)=0

A callback implemented by the application to handle logging messages;

Parameters:	severity () – The severity of the message. msg* (*) – The log message, None terminated.

LogSeverity¶

class tensorrt.infer.LogSeverity¶

Derived From C++ Class nvinfer1::LogSeverity

Log level specifier

Base Class:: IntEnum

Profiler¶

class tensorrt.infer.Profiler¶

Derived From C++ Class nvinfer1::Profiler

Application-implemented interface for profiling.

When this class is added to an execution context, the profiler will be called once per layer for each invocation of execute(). Note that enqueue() does not currently support profiling.

The profiler will only be called after execution is complete. It has a small impact on execution time.

C++ includes: NvInfer.h

report_layer_time()¶

reportLayerTime(const char *layerName, float ms)=0

Layer time reporting callback.

Parameters:	layerName () – The name of the layer, set when constructing the network definition. ms () – The time in milliseconds to execute the layer.

ConsoleProfiler¶

class tensorrt.infer.ConsoleProfiler¶

Derived From C++ Class nvinfer1::ConsoleProfiler

mProfile¶

report_layer_time()¶

reportLayerTime(const char *layerName, float ms)=0

Layer time reporting callback.

Parameters:	layerName () – The name of the layer, set when constructing the network definition. ms () – The time in milliseconds to execute the layer.

timing_iterations¶