tensorrt.infer¶
The infer
package contains an interface for libnvinfer. This module is used for graph definition,
engine building and inference execution.
Foundational Types¶
DataType¶

class
tensorrt.infer.
DataType
¶ Available data types
 Base Class:
 IntEnum
Derived From C++ Class nvinfer1::DataType
Weights¶

class
tensorrt.infer.
Weights
¶ An array of weights used as a layer parameter.
The weights are held by reference until the engine has been built. Therefore the data referenced by values field should be preserved until the build is complete.

type
DataType – The type of the weights.

values
const void * – The weight values, in a contiguous array.

count
int64_t – The number of weights in the array.

C++ includes
NvInfer.h

Derived From C++ Class nvinfer1::Weights
Dims¶

class
tensorrt.infer.
Dims
¶ Structure to define the dimensions of a tensor.
note: : Currently the following formats are supported for layer inputs and outputs:
 zero or more index dimensions followed by one channel and two spatial dimensions (e.g. CHW)
 one time series dimension followed by one index dimension followed by one channel dimension (i.e. TNC)

MAX_DIMS
const int – The maximum number of dimensions supported for a tensor.

nbDims
int – The number of dimensions.

d
int – The extent of each dimension.

type
DimensionType – The type of each dimension.

C++ includes
NvInfer.h
Derived From C++ Class nvinfer1::Dims
DimsHW¶
DimsCHW¶

class
tensorrt.infer.
DimsCHW
¶ Descriptor for data with one channel dimension and two spatial dimensions.
C++ includes: NvInfer.h

c
()¶ c() const > int
Get the channel count.
Returns: The channel count.

h
()¶ h() const > int
Get the height.
Returns: The height.

w
()¶ w() const > int
Get the width.
Returns: The width.

Derived From C++ Class nvinfer1::DimsCHW
DimsNCHW¶

class
tensorrt.infer.
DimsNCHW
¶ Descriptor for data with one index dimension, one channel dimension and two spatial dimensions.
C++ includes: NvInfer.h

c
()¶ c() const > int
Get the channel count.
Returns: The channel count.

h
()¶ h() const > int
Get the height.
Returns: The height.

n
()¶ n() const > int
Get the index count.
Returns: The index count.

w
()¶ w() const > int
Get the width.
Returns: The width.

Derived From C++ Class nvinfer1::DimsNCHW
Engine and Inference¶
Builder¶

class
tensorrt.infer.
Builder
¶ Builds an engine from a network definition.
C++ includes: NvInfer.h

build_cuda_engine
()¶ buildCudaEngine(nvinfer1::INetworkDefinition &network)=0 > nvinfer1::ICudaEngine *
Build a CUDA engine from a network definition.

create_network
()¶ createNetwork()=0 > nvinfer1::INetworkDefinition *
Create a network definition object.

destroy
()¶ destroy()=0
Destroy this object.

get_average_find_iterations
()¶ getAverageFindIterations() const =0 > int
Query the number of averaging iterations.

get_debug_sync
()¶ getDebugSync() const =0 > bool
Query whether the builder will use debug synchronization.

get_fp16_mode
()¶ getFp16Mode() const =0 > bool
Query whether 16bit kernels are permitted.

get_half2_mode
()¶ getHalf2Mode() const =0 > bool
Query whether half2 mode is used.
Deprecated: This function is superseded by getFp16Mode.

get_int8_mode
()¶ getInt8Mode() const =0 > bool
Query whether Int8 mode is used.

get_max_batch_size
()¶ getMaxBatchSize() const =0 > int
Get the maximum batch size.
Returns:  The maximum batch size.

get_max_workspace_size
()¶ getMaxWorkspaceSize() const =0 > std::size_t
Get the maximum workspace size.
Returns:  The maximum workspace size.

get_min_find_iterations
()¶ getMinFindIterations() const =0 > int
Query the number of minimization iterations.

platform_has_fast_fp16
()¶ platformHasFastFp16() const =0 > bool
Determine whether the platform has fast native fp16.

platform_has_fast_int8
()¶ platformHasFastInt8() const =0 > bool
Determine whether the platform has fast native int8.

set_average_find_iterations
()¶ setAverageFindIterations(int avgFind)=0
Set the number of averaging iterations used when timing layers.
When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in averaging.

set_debug_sync
()¶ setDebugSync(bool sync)=0
Set whether the builder should use debug synchronization.
If this flag is true, the builder will synchronize after timing each layer, and report the layer name. It can be useful when diagnosing issues at build time.

set_fp16_mode
()¶ setFp16Mode(bool mode)=0
Set whether or not 16bit kernels are permitted.
During engine build fp16 kernels will also be tried when this mode is enabled.
Parameters:  mode (*) – Whether 16bit kernels are permitted.

set_gpu_allocator
()¶ setGpuAllocator(IGpuAllocator *allocator)=0
Set the GPU allocator.
Parameters:  allocator (*) – Set the GPU allocator to be used by the builder. All GPU memory acquired will use this allocator. If None is passed, the default allocator will be used.

set_half2_mode
()¶ setHalf2Mode(bool mode)=0
Set whether half2 mode is used.
half2 mode is a pairedimage mode that is significantly faster for batch sizes greater than one on platforms with fp16 support.
Deprecated: function is superseded by setFp16Mode.
Parameters:  mode (*) – Whether half2 mode is used.

set_int8_calibrator
()¶ setInt8Calibrator(IInt8Calibrator *calibrator)=0
Set Int8 Calibration interface.

set_int8_mode
()¶ setInt8Mode(bool mode)=0
Set the maximum value for a region.
Used for INT8 mode compression.

set_max_batch_size
()¶ setMaxBatchSize(int batchSize)=0
Set the maximum batch size.
Parameters:  batchSize (*) – The maximum batch size which can be used at execution time, and also the batch size for which the engine will be optimized.

set_max_workspace_size
()¶ setMaxWorkspaceSize(std::size_t workspaceSize)=0
Set the maximum workspace size.
Parameters:  workspaceSize (*) – The maximum GPU temporary memory which the engine can use at execution time.

set_min_find_iterations
()¶ setMinFindIterations(int minFind)=0
Set the number of minimization iterations used when timing layers.
When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in minimization.

Derived From C++ Class nvinfer1::Builder
CudaEngine¶

class
tensorrt.infer.
CudaEngine
¶ An engine for executing inference on a built network.
C++ includes: NvInfer.h

binding_is_input
()¶ bindingIsInput(int bindingIndex) const =0 > bool
Determine whether a binding is an input binding.
Parameters: bindingIndex (*) – The binding index. Returns:  True if the index corresponds to an input binding and the index is in range.

create_execution_context
()¶ createExecutionContext()=0 > IExecutionContext *
Create an execution context.

create_execution_context_without_device_memory
()¶ createExecutionContextWithoutDeviceMemory()=0 > IExecutionContext *
create an execution context without any device memory allocated
The memory for execution of this device context must be supplied by the application.

destroy
()¶ destroy()=0
Destroy this object;.

get_binding_data_type
()¶ getBindingDataType(int bindingIndex) const =0 > DataType
Determine the required data type for a buffer from its binding index.
Parameters: bindingIndex (*) – The binding index. Returns:  The type of the data in the buffer.

get_binding_dimensions
()¶ getBindingDimensions(int bindingIndex) const =0 > Dims
Get the dimensions of a binding.
Parameters: bindingIndex (*) – The binding index. Returns:  The dimensions of the binding if the index is in range, otherwise (0,0,0).

get_binding_index
()¶ getBindingIndex(const char *name) const =0 > int
Retrieve the binding index for a named tensor.
IExecutionContext::enqueue() and IExecutionContext::execute() require an array of buffers.
Engine bindings map from tensor names to indices in this array. Binding indices are assigned at engine build time, and take values in the range [0 … n1] where n is the total number of inputs and outputs.
Parameters: name (*) – The tensor name. Returns:  The binding index for the named tensor, or 1 if the name is not found.
 see getNbBindings() getBindingIndex()

get_binding_name
()¶ getBindingName(int bindingIndex) const =0 > const char *
Retrieve the name corresponding to a binding index.
This is the reverse mapping to that provided by getBindingIndex().
Parameters: bindingIndex (*) – The binding index. Returns:  The name corresponding to the index, or None if the index is out of range.

get_device_memory_size
()¶ getDeviceMemorySize() const =0 > size_t
Return the amount of device memory required by an execution context.

get_location
()¶ getLocation(int bindingIndex) const =0 > TensorLocation
Get location of binding.
This lets you know whether the binding should be a pointer to device or host memory.
Parameters: bindingIndex (*) – The binding index. Returns: The location of the bound tensor with given index.

get_max_batch_size
()¶ getMaxBatchSize() const =0 > int
Get the maximum batch size which can be used for inference.
Returns: The maximum batch size for this engine.

get_nb_bindings
()¶ getNbBindings() const =0 > int
Get the number of binding indices.

get_nb_layers
()¶ getNbLayers() const =0 > int
Get the number of layers in the network.
The number of layers in the network is not necessarily the number in the original network definition, as layers may be combined or eliminated as the engine is optimized. This value can be useful when building perlayer tables, such as when aggregating profiling data over a number of executions.
Returns: The number of layers in the network.

get_workspace_size
()¶ getWorkspaceSize() const =0 > std::size_t
Get the amount of workspace the engine uses.
The workspace size will be no greater than the value provided to the builder when the engine was built, and will typically be smaller. Workspace will be allocated for each execution context.

serialize
()¶ serialize() const =0 > IHostMemory *
Serialize the network to a stream.
Returns:  A IHostMemory object that contains the serialized engine.
 The network may be deserialized with IRuntime::deserializeCudaEngine()

Derived From C++ Class nvinfer1::CudaEngine
ExecutionContext¶

class
tensorrt.infer.
ExecutionContext
¶ Context for executing inference using an engine.
Multiple execution contexts may exist for one ICudaEngine instance, allowing the same engine to be used for the execution of multiple batches simultaneously.
C++ includes: NvInfer.h

destroy
()¶ destroy()=0
Destroy this object.

enqueue
()¶ enqueue(int batchSize, void **bindings, cudaStream_t stream, cudaEvent_t *inputConsumed)=0 > bool
Asynchronously execute inference on a batch.
This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()
Parameters:  batchSize (*) – The batch size. This is at most the value supplied when the engine was built.
 bindings (*) – An array of pointers to input and output buffers for the network.
 stream (*) – A cuda stream on which the inference kernels will be enqueued
 inputConsumed (*) – An optional event which will be signaled when the input buffers can be refilled with new data
Returns:  True if the kernels were enqueued successfully.

execute
()¶ execute(int batchSize, void **bindings)=0 > bool
Synchronously execute inference on a batch.
This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()
Parameters:  batchSize (*) – The batch size. This is at most the value supplied when the engine was built.
 bindings (*) – An array of pointers to input and output buffers for the network.
Returns:  True if execution succeeded.

get_debug_sync
()¶ getDebugSync() const =0 > bool
Get the debug sync flag.

get_engine
()¶ getEngine() const =0 > const ICudaEngine &
Get the associated engine.

get_name
()¶ getName() const =0 > const char *
Return the name of the execution context.

get_profiler
()¶ getProfiler() const =0 > IProfiler *
Get the profiler.

set_debug_sync
()¶ setDebugSync(bool sync)=0
Set the debug sync flag.
If this flag is set to true, the engine will log the successful execution for each kernel during execute(). It has no effect when using enqueue().

set_device_memory
()¶ setDeviceMemory(void *memory)=0
set the device memory for use by this execution context.
The memory must be aligned on a 256byte boundary, and its size must be at least that returned by getDeviceMemorySize(). If using enqueue() to run the network, The memory is in use from the invocation of enqueue() until network execution is complete. If using execute(), it is in use until execute() returns. Releasing or otherwise using the memory for other purposes during this time will result in undefined behavior.

set_name
()¶ setName(const char *name)=0
Set the name of the execution context.
This method copies the name string.

set_profiler
()¶ setProfiler(IProfiler *)=0
Set the profiler.

Derived From C++ Class nvinfer1::ExecutionContext
Runtime¶

class
tensorrt.infer.
Runtime
¶ Allows a serialized engine to be deserialized.
C++ includes: NvInfer.h

deserialize_cuda_engine
()¶ deserializeCudaEngine(const void *blob, std::size_t size, IPluginFactory *pluginFactory)=0 > nvinfer1::ICudaEngine *
Deserialize an engine from a stream.
Parameters:  blob (*) – The memory that holds the serialized engine.
 size (*) – The size of the memory.
 pluginFactory (*) – The plugin factory, if any plugins are used by the network, otherwise None.
Returns: The engine, or None if it could not be deserialized.

destroy
()¶ destroy()=0
Destroy this object.

set_gpu_allocator
()¶ setGpuAllocator(IGpuAllocator *allocator)=0
Set the GPU allocator.
Parameters:  allocator (*) – Set the GPU allocator to be used by the runtime. All GPU memory acquired will use this allocator. If None is passed, the default allocator will be used.

Derived From C++ Class nvinfer1::Runtime
HostMemory¶

class
tensorrt.infer.
HostMemory
¶ Class to handle library allocated memory that is accessible to the user.
The memory allocated via the host memory object is owned by the library and will be deallocated when the destroy method is called.
C++ includes: NvInfer.h

data
()¶ data() const =0 > void *
A pointer to the raw data that is owned by the library.

destroy
()¶ destroy()=0
Destroy the allocated memory.

size
()¶ size() const =0 > std::size_t
The size in bytes of the data that was allocated.

type
()¶ type() const =0 > DataType
The type of the memory that was allocated.

Derived From C++ Class nvinfer1::HostMemory
Graph Definition¶
NetworkDefinition¶

class
tensorrt.infer.
NetworkDefinition
¶ A network definition for input to the builder.
C++ includes: NvInfer.h

add_activation
()¶ addActivation(ITensor &input, ActivationType type)=0 > IActivationLayer *
Add an activation layer to the network.
Parameters:  input (*) – The input tensor to the layer.
 type (*) – The type of activation function to apply.
Returns: The new activation layer, or None if it could not be created.

add_concatenation
()¶ addConcatenation(ITensor *const *inputs, int nbInputs)=0 > IConcatenationLayer *
Add a concatenation layer to the network.
Parameters:  inputs (*) – The input tensors to the layer.
 nbInputs (*) – The number of input tensors.
Returns:  The new concatenation layer, or None if it could not be created.
 **Warning** (All tensors must have the same dimensions for all dimensions except for channel.)

add_constant
()¶ addConstant(Dims dimensions, Weights weights)=0 > IConstantLayer *
Add a constant layer to the network.
Parameters:  dimensions (*) – The dimensions of the constant.
 weights (*) – The constant value, represented as weights.
Returns: The new constant layer, or None if it could not be created.

add_convolution
()¶ addConvolution(ITensor &input, int nbOutputMaps, DimsHW kernelSize, Weights kernelWeights, Weights biasWeights)=0 > IConvolutionLayer *
Add a convolution layer to the network.
Parameters:  input (*) – The input tensor to the convolution.
 nbOutputMaps (*) – The number of output feature maps for the convolution.
 kernelSize (*) – The HWdimensions of the convolution kernel.
 kernelWeights (*) – The kernel weights for the convolution.
 biasWeights (*) – The optional bias weights for the convolution.
Returns: The new convolution layer, or None if it could not be created.

add_deconvolution
()¶ addDeconvolution(ITensor &input, int nbOutputMaps, DimsHW kernelSize, Weights kernelWeights, Weights biasWeights)=0 > IDeconvolutionLayer *
Add a deconvolution layer to the network.
Parameters:  input (*) – The input tensor to the layer.
 nbOutputMaps (*) – The number of output feature maps.
 kernelSize (*) – The HWdimensions of the convolution kernel.
 kernelWeights (*) – The kernel weights for the convolution.
 biasWeights (*) – The optional bias weights for the convolution.
Returns: The new deconvolution layer, or None if it could not be created.

add_element_wise
()¶ addElementWise(ITensor &input1, ITensor &input2, ElementWiseOperation op)=0 > IElementWiseLayer *
Add an elementwise layer to the network.
Parameters:  input1 (*) – The first input tensor to the layer.
 input2 (*) – The second input tensor to the layer.
 op (*) – The binary operation that the layer applies.
The input tensors must have the same number of dimensions. For each dimension, their lengths must match, or one of them must be one. In the latter case, the tensor is broadcast along that axis. The output tensor has the same number of dimensions as the inputs. For each dimension, its length is the maximum of the lengths of the corresponding input dimension.
Returns: The new elementwise layer, or None if it could not be created.

add_fully_connected
()¶ addFullyConnected(ITensor &input, int nbOutputs, Weights kernelWeights, Weights biasWeights)=0 > IFullyConnectedLayer *
Add a fully connected layer to the network.
Parameters:  input (*) – The input tensor to the layer.
 nbOutputs (*) – The number of outputs of the layer.
 kernelWeights (*) – The kernel weights for the convolution.
 biasWeights (*) – The optional bias weights for the convolution.
Returns: The new fully connected layer, or None if it could not be created.

add_gather
()¶ addGather(ITensor &data, ITensor &indices, int axis)=0 > IGatherLayer *
Add a gather layer to the network.
Parameters:  data (*) – The tensor to gather values from.
 indices (*) – The tensor to get indices from to populate the output tensor.
 axis (*) – The nonbatch dimension axis in the data tensor to gather on.
Returns: The new gather layer, or None if it could not be created.

add_input
()¶ addInput(const char *name, DataType type, Dims dimensions)=0 > ITensor *
Add an input tensor to the network.
The name of the input tensor is used to find the index into the buffer array for an engine built from the network.
Parameters:  name (*) – The name of the tensor.
 type (*) – The type of the data held in the tensor.
 dimensions (*) – The dimensions of the tensor.
Only DataType::kFLOAT, DataType::kHALF and DataType::kINT32 are valid input tensor types. The volume of the dimensions, including the maximum batch size, must be less than 2^30 elements.
Returns: The new tensor or None if there is an error.

add_lrn
()¶ addLRN(ITensor &input, int window, float alpha, float beta, float k)=0 > ILRNLayer *
Add a LRN layer to the network.
Parameters:  input (*) – The input tensor to the layer.
 window (*) – The size of the window.
 alpha (*) – The alpha value for the LRN computation.
 beta (*) – The beta value for the LRN computation.
 k (*) – The k value for the LRN computation.
Returns: The new LRN layer, or None if it could not be created.

add_matrix_multiply
()¶ addMatrixMultiply(ITensor &input0, bool transpose0, ITensor &input1, bool transpose1)=0 > IMatrixMultiplyLayer *
Add a MatrixMultiply layer to the network.
Parameters:  input0 (*) – The first input tensor (commonly A).
 transpose0 (*) – If true, op(input0)=transpose(input0), else op(input0)=input0.
 input1 (*) – The second input tensor (commonly B).
 transpose1 (*) – If true, op(input1)=transpose(input1), else op(input1)=input1.
Returns: The new matrix multiply layer, or None if it could not be created.

add_padding
()¶ addPadding(ITensor &input, DimsHW prePadding, DimsHW postPadding)=0 > IPaddingLayer *
Add a padding layer to the network.
Parameters:  input (*) – The input tensor to the layer.
 prePadding (*) – The padding to apply to the start of the tensor.
 postPadding (*) – The padding to apply to the end of the tensor.
Returns: the new padding layer, or None if it could not be created.

add_plugin
()¶ addPlugin(ITensor *const *inputs, int nbInputs, IPlugin &plugin)=0 > IPluginLayer *
Add a plugin layer to the network.
Parameters:  inputs (*) – The input tensors to the layer.
 nbInputs (*) – The number of input tensors.
 plugin (*) – The layer plugin.
Returns: the new plugin layer, or None if it could not be created.

add_plugin_ext
()¶ addPluginExt(ITensor *const *inputs, int nbInputs, IPluginExt &plugin)=0 > IPluginLayer *
Add a plugin layer to the network using an IPluginExt interface.
Parameters:  inputs (*) – The input tensors to the layer.
 nbInputs (*) – The number of input tensors.
 plugin (*) – The layer plugin.
Returns: The new plugin layer, or None if it could not be created.

add_pooling
()¶ addPooling(ITensor &input, PoolingType type, DimsHW windowSize)=0 > IPoolingLayer *
Add a pooling layer to the network.
Parameters:  input (*) – The input tensor to the layer.
 type (*) – The type of pooling to apply.
 windowSize (*) – The size of the pooling window.
Returns: The new pooling layer, or None if it could not be created.

add_ragged_soft_max
()¶ addRaggedSoftMax(ITensor &input, ITensor &bounds)=0 > IRaggedSoftMaxLayer *
Add a RaggedSoftMax layer to the network.
Parameters:  input (*) – The ZxS input tensor.
 bounds (*) – The Zx1 bounds tensor.
Returns: The new RaggedSoftMax layer, or None if it could not be created.

add_reduce
()¶ addReduce(ITensor &input, ReduceOperation operation, uint32_t reduceAxes, bool keepDimensions)=0 > IReduceLayer *
Add a reduce layer to the network.
Parameters:  input (*) – The input tensor to the layer.
 operation (*) – The reduction operation to perform.
 reduceAxes (*) – The reduction dimensions. Bit 0 of the uint32_t type corresponds to the nonbatch dimension 0 boolean and so on. If a bit is set, then the corresponding dimension will be reduced. Let’s say we have an NCHW tensor as input (three nonbatch dimensions). Bit 0 corresponds to the C dimension boolean. Bit 1 corresponds to the H dimension boolean. Bit 2 corresponds to the W dimension boolean. Note that reduction is not permitted over the batch size dimension.
 keepDimensions (*) – The boolean that specifies whether or not to keep the reduced dimensions in the output of the layer.
Returns: The new reduce layer, or None if it could not be created.

add_rnn
()¶ addRNN(ITensor &inputs, int layerCount, std::size_t hiddenSize, int maxSeqLen, RNNOperation op, RNNInputMode mode, RNNDirection dir, Weights weights, Weights bias)=0 > IRNNLayer *
Add an layerCount deep RNN layer to the network with a sequence length of maxSeqLen and hiddenSize internal state per layer.
Parameters:  inputs (*) – The input tensor to the layer.
 layerCount (*) – The number of layers in the RNN.
 hiddenSize (*) – The size of the internal hidden state for each layer.
 maxSeqLen (*) – The maximum length of the time sequence.
 op (*) – The type of RNN to execute.
 mode (*) – The input mode for the RNN.
 dir (*) – The direction to run the RNN.
 weights (*) – The weights for the weight matrix parameters of the RNN.
 bias (*) – The weights for the bias vectors parameters of the RNN.
The input tensors must be of the type DataType::kFLOAT or DataType::kHALF.
The layout for the input tensor should be {1, S_max, N, E}, where:
 S_max is the maximum allowed sequence length (number of RNN iterations)
 N is the batch size
 E specifies the embedding length (unless kSKIP is set, in which case it should match getHiddenSize())
 S_max is the maximum allowed sequence length (number of RNN iterations)
 N is the batch size
 H is an output hidden state (equal to getHiddenSize() or 2x getHiddenSize())
 L is equal to getLayerCount() if getDirection is kUNIDIRECTION, and 2*getLayerCount() if getDirection is kBIDIRECTION. In the bidirectional case, layer l’s final forward hidden state is: stored in L = 2*l, and final backward hidden state is stored in L = 2*l + 1
 N is the batch size
 H is getHiddenSize()
Returns:  The new RNN layer, or None if it could not be created.

add_rnnv2
()¶ addRNNv2(ITensor &input, int32_t layerCount, int32_t hiddenSize, int32_t maxSeqLen, RNNOperation op)=0 > IRNNv2Layer *
Add an layerCount deep RNN layer to the network with hiddenSize internal states that can take a batch with fixed or variable sequence lengths.
Parameters:  input (*) – The input tensor to the layer (see below).
 layerCount (*) – The number of layers in the RNN.
 hiddenSize (*) – Size of the internal hidden state for each layer.
 maxSeqLen (*) – Maximum sequence length for the input.
 op (*) – The type of RNN to execute.
Returns:  The new RNN layer, or None if it could not be created.

add_scale
()¶ addScale(ITensor &input, ScaleMode mode, Weights shift, Weights scale, Weights power)=0 > IScaleLayer *
Add a Scale layer to the network.
Parameters:  input (*) – The input tensor to The layer. This tensor is required to have a minimum of 3 dimensions.
 mode (*) – The scaling mode.
 shift (*) – The shift value.
 scale (*) – The scale value.
 power (*) – The power value.
If the weights are available, then the size of weights are dependent on the on the ScaleMode. For kUNIFORM, the number of weights is equal to 1. For kCHANNEL, the number of weights is equal to the dimension. For kELEMENTWISE, the number of weights is equal to the volume of the input.
Returns: The new Scale layer, or None if it could not be created.

add_shuffle
()¶ addShuffle(ITensor &input)=0 > IShuffleLayer *
Add a shuffle layer to the network.
Parameters: input (*) – The input tensor to the layer. Returns: The new shuffle layer, or None if it could not be created.

add_softmax
()¶ addSoftMax(ITensor &input)=0 > ISoftMaxLayer *
Add a SoftMax layer to the network.
Returns: The new SoftMax layer, or None if it could not be created.

add_top_k
()¶ addTopK(ITensor &input, TopKOperation op, int k, uint32_t reduceAxes)=0 > ITopKLayer *
Add a TopK layer to the network.
The TopK layer has two outputs of the same dimensions. The first contains data values, the second contains index positions for the values. Output values are sorted, largest first for operation kMAX and smallest first for operation kMIN.
Currently only values of K up to 1024 are supported.
Parameters:  input (*) – The input tensor to the layer.
 op (*) – Operation to perform.
 k (*) – Number of elements to keep.
 reduceAxes (*) – The reduction dimensions. Bit 0 of the uint32_t type corresponds to the nonbatch dimension 0 boolean and so on. If a bit is set, then the corresponding dimension will be reduced. Let’s say we have an NCHW tensor as input (three nonbatch dimensions). Bit 0 corresponds to the C dimension boolean. Bit 1 corresponds to the H dimension boolean. Bit 2 corresponds to the W dimension boolean. Note that TopK reduction is currently only permitted over one dimension.

add_unary
()¶ addUnary(ITensor &input, UnaryOperation operation)=0 > IUnaryLayer *
Add a unary layer to the network.
Parameters:  input (*) – The input tensor to the layer.
 operation (*) – The operation to apply.
Returns: The new unary layer, or None if it could not be created

destroy
()¶ destroy()=0
Destroy this INetworkDefinition object.

get_convolution_output_dimensions_formula
()¶ getConvolutionOutputDimensionsFormula() const =0 > IOutputDimensionsFormula &
Get the convolution output dimensions formula.
Deprecated: This method does not currently work reliably and will be removed in a future release.
Returns:  The formula from computing the convolution output dimensions.

get_deconvolution_output_dimensions_formula
()¶ getDeconvolutionOutputDimensionsFormula() const =0 > IOutputDimensionsFormula &
Get the deconvolution output dimensions formula.
Deprecated: This method does not currently work reliably and will be removed in a future release.
Returns:  The formula from computing the deconvolution output dimensions.

get_input
()¶ getInput(int index) const =0 > ITensor *
Get the input tensor specified by the given index.
Parameters: index (*) – The index of the input tensor. Returns:  The input tensor, or None if the index is out of range.

get_layer
()¶ getLayer(int index) const =0 > ILayer *
Get the layer specified by the given index.
Parameters: index (*) – The index of the layer. Returns:  The layer, or None if the index is out of range.

get_nb_inputs
()¶ getNbInputs() const =0 > int
Get the number of inputs in the network.
Returns:  The number of inputs in the network.

get_nb_layers
()¶ getNbLayers() const =0 > int
Get the number of layers in the network.
Returns:  The number of layers in the network.

get_nb_outputs
()¶ getNbOutputs() const =0 > int
Get the number of outputs in the network.
Returns:  The number of outputs in the network.

get_output
()¶ getOutput(int index) const =0 > ITensor *
Get the output tensor specified by the given index.
Parameters: index (*) – The index of the output tensor. Returns:  The output tensor, or None if the index is out of range.

get_pooling_output_dimensions_formula
()¶ getPoolingOutputDimensionsFormula() const =0 > IOutputDimensionsFormula &
Get the pooling output dimensions formula.
Returns:  The formula from computing the pooling output dimensions.

mark_output
()¶ markOutput(ITensor &tensor)=0
Mark a tensor as a network output.
Parameters: tensor (*) – The tensor to mark as an output tensor.

set_convolution_output_dimensions_formula
()¶ setConvolutionOutputDimensionsFormula(IOutputDimensionsFormula *formula)=0
Set the convolution output dimensions formula.
Deprecated: This method does not currently work reliably and will be removed in a future release.
Parameters:  formula (*) – The formula from computing the convolution output dimensions. If None is passed, the default formula is used.
The default formula in each dimension is (inputDim + padding * 2  kernelSize) / stride + 1.

set_deconvolution_output_dimensions_formula
()¶ setDeconvolutionOutputDimensionsFormula(IOutputDimensionsFormula *formula)=0
Set the deconvolution output dimensions formula.
Deprecated: This method does not currently work reliably and will be removed in a future release.
Parameters:  formula (*) – The formula from computing the deconvolution output dimensions. If None is passed, the default formula is used.
The default formula in each dimension is (inputDim  1) * stride + kernelSize  2 * padding.

set_pooling_output_dimensions_formula
()¶ setPoolingOutputDimensionsFormula(IOutputDimensionsFormula *formula)=0
Set the pooling output dimensions formula.
Parameters:  formula (*) – The formula from computing the pooling output dimensions. If None is passed, the default formula is used.
The default formula in each dimension is (inputDim + padding * 2  kernelSize) / stride + 1.

Derived From C++ Class nvinfer1::NetworkDefinition
LayerType¶

class
tensorrt.infer.
LayerType
¶ Available layer types
 Base Class:
 IntEnum
Derived From C++ Class nvinfer1::LayerType
Tensor¶

class
tensorrt.infer.
Tensor
¶ A tensor in a network definition.
C++ includes: NvInfer.h

get_broadcast_across_batch
()¶ getBroadcastAcrossBatch() const =0 > bool
Check if tensor is broadcast across the batch.
When a tensor is broadcast across a batch, it has the same value for every member in the batch. Memory is only allocated once for the single member.
Returns:  True if tensor is broadcast across the batch, false otherwise.

get_dimensions
()¶ getDimensions() const =0 > Dims
Get the dimensions of a tensor.
Returns:  The dimensions of the layer.

get_location
()¶ getLocation() const =0 > TensorLocation
Get the storage location of a tensor.
Returns:  The location of tensor data.

get_name
()¶ getName() const =0 > const char *
Get the tensor name.
Returns:  The name, as a pointer to a Noneterminated character sequence.

get_type
()¶ getType() const =0 > DataType
Get the data type of a tensor.
Returns:  The data type of the tensor.

is_network_input
()¶ isNetworkInput() const =0 > bool
Whether the tensor is a network input.

is_network_output
()¶ isNetworkOutput() const =0 > bool
Whether the tensor is a network output.

set_broadcast_across_batch
()¶ setBroadcastAcrossBatch(bool broadcastAcrossBatch)=0
Set whether to enable broadcast of tensor across the batch.
When a tensor is broadcast across a batch, it has the same value for every member in the batch. Memory is only allocated once for the single member.
This method is only valid for network input tensors, since the flags of layer output tensors are inferred based on layer inputs and parameters. If this state is modified for a tensor in the network, the states of all dependent tensors will be recomputed.
Parameters:  broadcastAcrossBatch (*) – Whether to enable broadcast of tensor across the batch.

set_dimensions
()¶ setDimensions(Dims dimensions)=0
Set the dimensions of a tensor.
For a network input the name is assigned by the application. For a network output it is computed based on the layer parameters and the inputs to the layer. If a tensor size or a parameter is modified in the network, the dimensions of all dependent tensors will be recomputed.
This call is only legal for network input tensors, since the dimensions of layer output tensors are inferred based on layer inputs and parameters.
Parameters:  dimensions (*) – The dimensions of the tensor.

set_location
()¶ setLocation(TensorLocation location)=0
Set the storage location of a tensor.
Parameters:  location (*) – the location of tensor data
Only input tensors for storing sequence lengths for RNNv2 are supported. Using host storage for layers that do not support it will generate errors at build time.

set_name
()¶ setName(const char *name)=0
Set the tensor name.
For a network input, the name is assigned by the application. For tensors which are layer outputs, a default name is assigned consisting of the layer name followed by the index of the output in brackets.
This method copies the name string.
Parameters:  name (*) – The name.

set_type
()¶ setType(DataType type)=0
Set the data type of a tensor.
Parameters:  type (*) – The data type of the tensor.
The type is unchanged if the type is invalid for the given tensor. If the tensor is a network input or output, then the tensor type cannot be DataType::kINT8.

Derived From C++ Class nvinfer1::Tensor
Layer¶

class
tensorrt.infer.
Layer
¶ Base class for all layer classes in a network definition.
C++ includes: NvInfer.h

get_core
()¶ getCore() const =0 > COREID
get the DLA core that this layer executes on

get_input
()¶ getInput(int index) const =0 > ITensor *
Get the layer input corresponding to the given index.
Parameters: index (*) – The index of the in Returns: The input tensor, or None if the index is out of range.

get_name
()¶ getName() const =0 > const char *
Return the name of a layer.

get_nb_inputs
()¶ getNbInputs() const =0 > int
Get the number of inputs of a layer.

get_nb_outputs
()¶ getNbOutputs() const =0 > int
Get the number of outputs of a layer.

get_output
()¶ getOutput(int index) const =0 > ITensor *
Get the layer output corresponding to the given index.
Returns: The indexed output tensor, or None if the index is out of range.

get_type
()¶ getType() const =0 > LayerType
Return the type of a layer.

set_core
()¶ setCore(COREID core)=0 > bool
set the DLA that this layer must execute on.
Returns:  returns true if the core is valid for the layer, false otherwise.

set_name
()¶ setName(const char *name)=0
Set the name of a layer.
This method copies the name string.

Derived From C++ Class nvinfer1::Layer
ConvolutionLayer¶

class
tensorrt.infer.
ConvolutionLayer
¶ A convolution layer in a network definition.
This layer performs a correlation operation between 3dimensional filter with a 4dimensional tensor to produce another 4dimensional tensor.
The HW output size of the convolution is set according to the INetworkCustomDimensions set in INetworkDefinition::setCustomConvolutionDimensions().
An optional bias argument is supported, which adds a perchannel constant to each value in the output.
C++ includes: NvInfer.h

get_bias_weights
()¶ getBiasWeights() const =0 > Weights
Get the bias weights for the convolution.

get_dilation
()¶ getDilation() const =0 > DimsHW
Get the dilation for a convolution.

get_kernel_size
()¶ getKernelSize() const =0 > DimsHW
Get the HW kernel size of the convolution.

get_kernel_weights
()¶ getKernelWeights() const =0 > Weights
Get the kernel weights for the convolution.

get_nb_groups
()¶ getNbGroups() const =0 > int
Set the number of groups for a convolution.

get_nb_output_maps
()¶ getNbOutputMaps() const =0 > int
Get the number of output maps for the convolution.

get_padding
()¶ getPadding() const =0 > DimsHW
Get the padding of the convolution.

get_stride
()¶ getStride() const =0 > DimsHW
Get the stride of the convolution.

set_bias_weights
()¶ setBiasWeights(Weights weights)=0
Set the bias weights for the convolution.
Bias is optional. To omit bias, set the count value of the weights structure to zero.
The bias is applied perchannel, so the number of weights (if nonzero) must be equal to the number of output feature maps.

set_dilation
()¶ setDilation(DimsHW dims)=0
Set the dilation for a convolution.
Default: (1,1)

set_kernel_size
()¶ setKernelSize(DimsHW kernelSize)=0
Set the HW kernel size of the convolution.

set_kernel_weights
()¶ setKernelWeights(Weights weights)=0
Set the kernel weights for the convolution.
The weights are specified as a contiguous array in GKCRS order, where G is the number of groups, K the number of output feature maps, C the number of input channels, and R and S are the height and width of the filter.

set_nb_groups
()¶ setNbGroups(int nbGroups)=0
Set the number of groups for a convolution.
The input tensor channels are divided into nbGroups groups, and a convolution is executed for each group, using a filter per group. The results of the group convolutions are concatenated to form the output.
note: When using groups in int8 mode, the size of the groups (i.e. the channel count divided by the group count) must be a multiple of 4 for both input and output.
Default: 1

set_nb_output_maps
()¶ setNbOutputMaps(int nbOutputMaps)=0
Set the number of output maps for the convolution.

set_padding
()¶ setPadding(DimsHW padding)=0
Set the padding of the convolution.
The input will be zeropadded by this number of elements in the height and width directions. Padding is symmetric.
Default: (0,0)

set_stride
()¶ setStride(DimsHW stride)=0
Get the stride of the convolution.
Default: (1,1)

Derived From C++ Class nvinfer1::ConvolutionLayer
FullyConnectedLayer¶

class
tensorrt.infer.
FullyConnectedLayer
¶ A fully connected layer in a network definition. This layer expects an input tensor of three or more nonbatch dimensions. The input is automatically reshaped into an MxV tensor X, where V is a product of the last three dimensions and M is a product of the remaining dimensions (where the product over 0 dimensions is defined as 1). For example:
 If the input tensor has shape {C, H, W}, then the tensor is reshaped into {1, C*H*W}.
 If the input tensor has shape {P, C, H, W}, then the tensor is reshaped into {P, C*H*W}.
The layer then performs the following operation:
Where X is the MxV tensor defined above, W is the KxV weight tensor of the layer, and bias is a row vector size K that is broadcasted to MxK. K is the number of output channels, and configurable via setNbOutputChannels(). If bias is not specified, it is implicitly 0.
The MxK result Y is then reshaped such that the last three dimensions are {K, 1, 1} and the remaining dimensions match the dimensions of the input tensor. For example:
 If the input tensor has shape {C, H, W}, then the output tensor will have shape {K, 1, 1}.
 If the input tensor has shape {P, C, H, W}, then the output tensor will have shape {P, K, 1, 1}.
C++ includes: NvInfer.h

get_bias_weights
()¶ getBiasWeights() const =0 > Weights
Get the bias weights.

get_kernel_weights
()¶ getKernelWeights() const =0 > Weights
Get the kernel weights.

get_nb_output_channels
()¶ getNbOutputChannels() const =0 > int
Get the number of output channels K from the fully connected layer.

set_bias_weights
()¶ setBiasWeights(Weights weights)=0
Set the bias weights.
Bias is optional. To omit bias, set the count value in the weights structure to zero.

set_kernel_weights
()¶ setKernelWeights(Weights weights)=0
Set the kernel weights, given as a KxC matrix in rowmajor order.

set_nb_output_channels
()¶ setNbOutputChannels(int nbOutputs)=0
Set the number of output channels K from the fully connected layer.
Derived From C++ Class nvinfer1::FullyConnectedLayer
ActivationLayer¶

class
tensorrt.infer.
ActivationLayer
¶ An Activation layer in a network definition.
This layer applies a perelement activation function to its input.
The output has the same shape as the input.
C++ includes: NvInfer.h

get_activation_type
()¶ getActivationType() const =0 > ActivationType
Get the type of activation to be performed.

set_activation_type
()¶ setActivationType(ActivationType type)=0
Set the type of activation to be performed.

Derived From C++ Class nvinfer1::ActivationLayer
ActivationType¶

class
tensorrt.infer.
ActivationType
¶ Type of activation function
 Base Class:
 IntEnum
Derived From C++ Class nvinfer1::ActivationType
PoolingLayer¶

class
tensorrt.infer.
PoolingLayer
¶ A Pooling layer in a network definition.
The layer applies a reduction operation within a window over the input.
The output size is determined from the input size using the formula set by INetworkDefinition::setCustomPoolingDimensions().
C++ includes: NvInfer.h

get_average_count_excludes_padding
()¶ getAverageCountExcludesPadding() const =0 > bool
Get whether exclusive pooling uses as a denominator the overlap area betwen the window and the unpadded input.

get_blend_factor
()¶ getBlendFactor() const =0 > float
Get the blending factor for the max_average_blend mode: max_average_blendPool = (1blendFactor)*maxPool + blendFactor*avgPool blendFactor is a user value in [0,1] with the default value of 0.0 In modes other than kMAX_AVERAGE_BLEND, blendFactor is ignored.

get_padding
()¶ getPadding() const =0 > DimsHW
Get the padding for pooling.
Default: 0

get_pooling_type
()¶ getPoolingType() const =0 > PoolingType
Get the type of activation to be performed.

get_stride
()¶ getStride() const =0 > DimsHW
Get the stride for pooling.

get_window_size
()¶ getWindowSize() const =0 > DimsHW
Get the window size for pooling.

set_average_count_excludes_padding
()¶ setAverageCountExcludesPadding(bool exclusive)=0
Set whether average pooling uses as a denominator the overlap area between the window and the unpadded input. If this is not set, the denominator is the overlap between the pooling window and the padded input.
Default: true

set_blend_factor
()¶ setBlendFactor(float blendFactor)=0
Set the blending factor for the max_average_blend mode: max_average_blendPool = (1blendFactor)*maxPool + blendFactor*avgPool blendFactor is a user value in [0,1] with the default value of 0.0 This value only applies for the kMAX_AVERAGE_BLEND mode.

set_padding
()¶ setPadding(DimsHW padding)=0
Set the padding for pooling.
Default: 0

set_pooling_type
()¶ setPoolingType(PoolingType type)=0
Set the type of activation to be performed.

set_stride
()¶ setStride(DimsHW stride)=0
Set the stride for pooling.
Default: 1

set_window_size
()¶ setWindowSize(DimsHW windowSize)=0
Set the window size for pooling.

Derived From C++ Class nvinfer1::PoolingLayer
PoolingType¶

class
tensorrt.infer.
PoolingType
¶ Type of pooling layer
 Base Class:
 IntEnum
Derived From C++ Class nvinfer1::PoolingType
LRNLayer¶

class
tensorrt.infer.
LRNLayer
¶ A LRN layer in a network definition.
The output size is the same as the input size.
C++ includes: NvInfer.h

get_alpha
()¶ getAlpha() const =0 > float
Get the LRN alpha value.

get_beta
()¶ getBeta() const =0 > float
Get the LRN beta value.

get_k
()¶ getK() const =0 > float
Get the LRN K value.

get_window_size
()¶ getWindowSize() const =0 > int
Get the LRN window size.

set_alpha
()¶ setAlpha(float alpha)=0
Set the LRN alpha value.
The valid range is [1e20, 1e20].

set_beta
()¶ setBeta(float beta)=0
Set the LRN beta value.
The valid range is [0.01, 1e5f].

set_k
()¶ setK(float k)=0
Set the LRN K value.
The valid range is [1e5, 1e10].

set_window_size
()¶ setWindowSize(int windowSize)=0
Set the LRN window size.
The window size must be odd and in the range of [1, 15].

Derived From C++ Class nvinfer1::LRNLayer
ScaleLayer¶

class
tensorrt.infer.
ScaleLayer
¶ A Scale layer in a network definition.
This layer applies a perelement computation to its input:
output = (input* scale + shift)^ power
The coefficients can be applied on a pertensor, perchannel, or perelement basis.
Note: If the number of weights is 0, then a default value is used for shift, power, and scale. The default shift is 0, the default power is 1, and the default scale is 1.
The output size is the same as the input size.
note: The input tensor for this layer is required to have a minimum of 3 dimensions.
C++ includes: NvInfer.h

get_mode
()¶ getMode() const =0 > ScaleMode
Set the scale mode.

get_power
()¶ getPower() const =0 > Weights
Get the power value.

get_scale
()¶ getScale() const =0 > Weights
Get the scale value.

get_shift
()¶ getShift() const =0 > Weights
Get the shift value.

set_mode
()¶ setMode(ScaleMode mode)=0
Set the scale mode.

set_power
()¶ setPower(Weights power)=0
Set the power value.

set_scale
()¶ setScale(Weights scale)=0
Set the scale value.

set_shift
()¶ setShift(Weights shift)=0
Set the shift value.

Derived From C++ Class nvinfer1::ScaleLayer
ScaleMode¶

class
tensorrt.infer.
ScaleMode
¶ Scale mode
 Base Class:
 IntEnum
Derived From C++ Class nvinfer1::ScaleMode
SoftmaxLayer¶

class
tensorrt.infer.
SoftmaxLayer
¶ A Softmax layer in a network definition.
This layer applies a perchannel softmax to its input.
The output size is the same as the input size.
C++ includes: NvInfer.h
Derived From C++ Class nvinfer1::SoftmaxLayer
ConcatenationLayer¶

class
tensorrt.infer.
ConcatenationLayer
¶ A concatenation layer in a network definition.
The output channel size is the sum of the channel sizes of the inputs. The other output sizes are the same as the other input sizes, which must all match.
C++ includes: NvInfer.h

get_axis
()¶ getAxis() const =0 > int
Get the axis along which concatenation occurs.

set_axis
()¶ setAxis(int axis)=0
Set the axis along which concatenation occurs.
0 is the major axis (excluding the batch dimension). The default is the number of nonbatch axes in the tensor minus three (e.g. for an NCHW input it would be 0), or 0 if there are fewer than 3 non batch axes.
Parameters: axis (*) – The axis along which concatenation occurs.

Derived From C++ Class nvinfer1::ConcatenationLayer
DeconvolutionLayer¶

class
tensorrt.infer.
DeconvolutionLayer
¶ A deconvolution layer in a network definition.
The output size is defined using the formula set by INetworkDefinition::setDeconvolutionOutputDimensionsFormula().
C++ includes: NvInfer.h

get_bias_weights
()¶ getBiasWeights() const =0 > Weights
Get the bias weights for the deconvolution.

get_kernel_size
()¶ getKernelSize() const =0 > DimsHW
Get the HW kernel size of the deconvolution.

get_kernel_weights
()¶ getKernelWeights() const =0 > Weights
Get the kernel weights for the deconvolution.

get_nb_groups
()¶ getNbGroups() const =0 > int
Set the number of groups for a deconvolution.

get_nb_output_maps
()¶ getNbOutputMaps() const =0 > int
Get the number of output feature maps for the deconvolution.

get_padding
()¶ getPadding() const =0 > DimsHW
Get the padding of the deconvolution.

get_stride
()¶ getStride() const =0 > DimsHW
Get the stride of the deconvolution.
Default: (1,1)

set_bias_weights
()¶ setBiasWeights(Weights weights)=0
Set the bias weights for the deconvolution.
Bias is optional. To omit bias, set the count value of the weights structure to zero.
The bias is applied perfeaturemap, so the number of weights (if nonzero) must be equal to the number of output feature maps.

set_kernel_size
()¶ setKernelSize(DimsHW kernelSize)=0
Set the HW kernel size of the convolution.

set_kernel_weights
()¶ setKernelWeights(Weights weights)=0
Set the kernel weights for the deconvolution.
The weights are specified as a contiguous array in CKRS order, where C the number of input channels, K the number of output feature maps, and R and S are the height and width of the filter.

set_nb_groups
()¶ setNbGroups(int nbGroups)=0
Set the number of groups for a deconvolution.
The input tensor channels are divided into nbGroups groups, and a deconvolution is executed for each group, using a filter per group. The results of the group convolutions are concatenated to form the output.
note: When using groups in int8 mode, the size of the groups (i.e. the channel count divided by the group count) must be a multiple of 4 for both input and output.
Default: 1

set_nb_output_maps
()¶ setNbOutputMaps(int nbOutputMaps)=0
Set the number of output feature maps for the deconvolution.

set_padding
()¶ setPadding(DimsHW padding)=0
Set the padding of the deconvolution.
The input will be zeropadded by this number of elements in the height and width directions. Padding is symmetric.
Default: (0,0)

set_stride
()¶ setStride(DimsHW stride)=0
Get the stride of the deconvolution.

Derived From C++ Class nvinfer1::DeconvolutionLayer
GatherLayer¶

class
tensorrt.infer.
GatherLayer
¶ 
set_gather_axis
()¶ setGatherAxis(int axis) const =0
Set the nonbatch dimension axis to gather on. The axis must be less than the number of nonbatch dimensions in the data input.

get_gather_axis
()¶ getGatherAxis() type)=0
Get the nonbatch dimension axis to gather on.
Derived from C++ class nvinfer1::IGatherLayer
C++ includes: NvInfer.h
ReduceLayer¶

class
tensorrt.infer.
ReduceLayer
¶ 
get_keep_dimensions
()¶ getKeepDimensions() const =0
Get the boolean that specifies whether or not to keep the reduced dimensions for the layer.

get_operation
()¶ getOperation() =0
Get the reduce operation for the layer.

get_reduce_axes
()¶ getReduceAxes() =0
Get the axes over which to reduce for the layer.

set_keep_dimensions
()¶ setKeepDimensions(bool keepDimensions) =0
Set the boolean that specifies whether or not to keep the reduced dimensions for the layer.

set_operation
()¶ setOperation(ReduceOperation op) =0
Set the reduce operation for the layer.

set_reduce_axes
()¶ setReduceAxes(int reduceAxes) =0
Set the axes over which to reduce.
Derived from C++ class nvinfer1::IReduceLayer
C++ includes: NvInfer.h
Layer that represents a reduction operator
ConstantLayer¶

class
tensorrt.infer.
ConstantLayer
¶ 
set_weights
()¶ setWeights(Weights weights) =0
Set the weights for the layer.

get_weights
()¶ getWeights()const=0
Get the weights for the layer.

set_dimensions
()¶ setDimensions(Dims dimensions) =0
Set the dimensions for the layer.

get_dimensions
()¶ getDimensions() const=0
Get the dimensions for the layer.
Derived from C++ class nvinfer1::IConstantLayer
C++ includes: NvInfer.h
Layer that represents a constant value
MatrixMultiply¶

class
tensorrt.infer.
MatrixMultiply
¶ 
set_transpose
()¶ setTranspose(int index, bool val) =0
Set the transpose flag for an input tensor.

get_transpose
()¶ getTranspose(int index)const
Get the transpose flag for an input tensor.
Derived from C++ class nvinfer1::IMatrixMultiplyLayer
C++ includes: NvInfer.h
Layer that represents a matrix multiplication
Let A be getInput(0) and B be getInput(1).
Tensors A and B must have equal rank, which must be at least 2.
When A and B are matrices, computes op(A) * op(B), where: op(x)=x if transpose == false op(x)=transpose(x) if transpose == true Transposition is of the last two dimensions. Inputs of higher rank are treated as collections of matrices.
For a dimension that is not one of the last two dimensions: If the dimension is 1 for one of the tensors but not the other tensor, the former tensor is broadcast along that dimension to match the dimension of the latter tensor.
RaggedSoftMax¶

class
tensorrt.infer.
RaggedSoftMax
¶
Derived From C++ Class nvinfer1::IRaggedSoftMaxLayer
C++ includes: NvInfer.h
A RaggedSoftmax layer in a network definition.
This layer takes a ZxS input tensor and an additional Zx1 bounds tensor holding the lengths of the Z sequences.
This layer computes a softmax across each of the Z sequences.
The output tensor is of the same size as the input tensor.
RNNv2Layer¶

class
tensorrt.infer.
RNNv2Layer
¶ 
get_layer_count
()¶ getLayerCount() const =0
Get the layer count for the RNN.
getHiddenSize() const =0
Get the hidden size for the RNN.

get_max_seq_length
()¶ getMaxSeqLength() const =0
Get the maximum sequence length for the RNN.

get_data_length
()¶ getDataLength() const =0
Get the data length for the RNN.

set_sequence_lengths
()¶ setSequenceLengths(ITensor &seqLengths) =0
Specify individual sequence lengths in the batch with the ITensor pointed to by seqLengths.
The seqLengths ITensor should be a {N1, ..., Np} tensor, where N1..Np are the index dimensions of the input tensor to the RNN.
If this is not specified, then the RNN layer assumes all sequences are size getMaxSeqLength().
All sequence lengths in seqLengths should be in the range [1, getMaxSeqLength()]. Zerolength sequences are not supported.
This tensor must be of type DataType::kINT32.

get_sequence_lengths
()¶ getSequenceLengths() const=0
Get the sequence lengths specified for the RNN.

set_operation
()¶ setOperation(RNNOperation op) =0
Set the operation of the RNN layer.

get_operation
()¶ getOperation() const=0
Get the operation of the RNN layer.

set_input_mode
()¶ setInputMode(RNNInptMode op) =0
Set the input mode of the RNN layer.

get_input_mode
()¶ getInputMode() const=0
Get the input mode of the RNN layer.

set_direction
()¶ setDirection(RNNDirection op) =0
Set the direction of the RNN layer.

get_direction
()¶ getDirection() const=0
Get the direction of the RNN layer.

set_weights_for_gate
()¶ setWeightsForGate(int layerindex, RNNGateType gate, bool isW, Weights weights) const=0
Set the weight parameters for an individual gate in the RNN.

get_weights_for_gate
()¶ getWeightsForGate(int layerindex, RNNGateType gate, bool isW) const=0
Get the weight parameters for an individual gate in the RNN.

set_bias_for_gate
()¶ setBiasForGate(int layerindex, RNNGateType gate, bool isW, Weights bias) const=0
Set the bias parameters for an individual gate in the RNN.

get_bias_for_gate
()¶ getBiasForGate(int layerindex, RNNGateType gate, bool isW) const=0
Get the bias parameters for an individual gate in the RNN.
setHiddenState(Itensor &hidden) =0
Set the initial hidden state of the RNN with the provided hidden ITensor.
The hidden ITensor should have the dimensions {N1, ..., Np, L, H}, where:
N1..Np are the index dimensions specified by the input tensor L is the number of layers in the RNN, equal to getLayerCount() H is the hidden state for each layer, equal to getHiddenSize() if getDirection is kUNIDIRECTION, and 2x getHiddenSize() otherwise.
getHiddenState() const=0
Get the initial hidden state of the RNN

set_cell_state
()¶ setCellState(ITensor &cell) =0
Set the initial cell state of the LSTM with the provided cell

get_cell_state
()¶ getCellState() const=0
Set the initial cell state of the LSTM
Derived From C++ Class nvinfer1::RNNv2Layer
C++ includes: NvInfer.h
An RNN layer in a network definition, version 2.
This layer supersedes IRNNLayer.
TopKLayer¶

class
tensorrt.infer.
TopKLayer
¶ 
set_operation
()¶ setOperation(TopKOperation op) =0
Set the operation for the layer.

get_operation
()¶ getOperation() =0
Get the operation for the layer.

set_k
()¶ setK(int k) =0
Set the k value for the layer.

get_k
()¶ getK() =0
Get the k value for the layer.

set_reduce_axes
()¶ setReduceAxes(int reduceAxes) type)=0
Set which axes to reduce for the layer.

get_reduce_axes
()¶ getReduceAxes() type)=0
Get the axes to reduce for the layer.
Derived From C++ Class nvinfer1::TopKLayer
Layer that represents a TopK reduction
C++ includes: NvInfer.h
ElementWiseLayer¶

class
tensorrt.infer.
ElementWiseLayer
¶ A elementwise layer in a network definition.
This layer applies a perelement binary operation between corresponding elements of two tensors.
The input dimensions of the two input tensors must be equal, and the output tensor is the same size as each input.
C++ includes: NvInfer.h

get_operation
()¶ getOperation() const =0 > ElementWiseOperation
Get the binary operation for the layer.
setBiasWeights()

set_operation
()¶ setOperation(ElementWiseOperation type)=0
Set the binary operation for the layer.
getBiasWeights()

Derived From C++ Class nvinfer1::ElementWiseLayer
ElementWiseOperation¶

class
tensorrt.infer.
ElementWiseOperation
¶ Type of operation for the layer
 Base Class:
 IntEnum
Derived From C++ Class nvinfer1::ElementWiseOperation
ShuffleLayer¶

class
tensorrt.infer.
ShuffleLayer
¶ Layer type for shuffling data.
This class shuffles data by applying in sequence: a transpose operation, a reshape operation and a second transpose operation. The dimension types of the output are those of the reshape dimension.
C++ includes: NvInfer.h

get_first_transpose
()¶ getFirstTranspose() const =0 > Permutation
Get the permutation applied by the first transpose operation.
Returns:  The dimension permutation applied before the reshape.

get_reshape_dimensions
()¶ getReshapeDimensions() const =0 > Dims
Get the reshaped dimensions.
Returns: The reshaped dimensions.

get_second_transpose
()¶ getSecondTranspose() const =0 > Permutation
Get the permutation applied by the second transpose operation.
Returns:  The dimension permutation applied after the reshape.

set_first_transpose
()¶ setFirstTranspose(Permutation permutation)=0
Set the permutation applied by the first transpose operation.
Parameters:  permutation (*) – The dimension permutation applied before the reshape.
The default is the identity permutation.

set_reshape_dimensions
()¶ setReshapeDimensions(Dims dimensions)=0
Set the reshaped dimensions.
Parameters:  dimensions (*) – The reshaped dimensions.
Two special values can be used as dimensions.
Value 0 copies the corresponding dimension from input. This special value can be used more than once in the dimensions. If number of reshape dimensions is less than input, 0s are resolved by aligning the most significant dimensions of input.
Value1 infers that particular dimension by looking at input and rest of the reshape dimensions. Note that only a maximum of one dimension is permitted to be specified as 1.
The product of the new dimensions must be equal to the product of the old.

set_second_transpose
()¶ setSecondTranspose(Permutation permutation)=0
Set the permutation applied by the second transpose operation.
Parameters:  permutation (*) – The dimension permutation applied after the reshape.
The default is the identity permutation.
The permutation is applied as outputDimensionIndex = permutation.order[inputDimensionIndex], so to permute from CHW order to HWC order, the required permutation is [1, 2, 0]

Derived From C++ Class nvinfer1::ShuffleLayer
Permutation¶

class
tensorrt.infer.
Permutation
¶ 
order
int – The elements of the permutation. The permutation is applied as outputDimensionIndex = permutation.order[inputDimensionIndex], so to permute from CHW order to HWC order, the required permutation is [1, 2, 0], and to permute from HWC to CHW, the required permutation is [2, 0, 1].

Derived From C++ Class nvinfer1::Permutation
UnaryLayer¶

class
tensorrt.infer.
UnaryLayer
¶ Layer that represents an unary operation.
C++ includes: NvInfer.h

get_operation
()¶ getOperation() const =0 > UnaryOperation
Get the unary operation for the layer.

set_operation
()¶ setOperation(UnaryOperation op)=0
Set the unary operation for the layer.

Derived From C++ Class nvinfer1::UnaryLayer
UnaryOperation¶

class
tensorrt.infer.
UnaryOperation
¶ Type of operation for the layer
 Base Class:
 IntEnum
Derived From C++ Class nvinfer1::UnaryOperation
PluginLayer¶
PaddingLayer¶

class
tensorrt.infer.
PaddingLayer
¶ Layer that represents a padding operation.
The padding layer adds zeropadding at the start and end of the input tensor. It only supports padding along the two innermost dimensions. Applying negative padding results in cropping of the input.
C++ includes: NvInfer.h

get_post_padding
()¶ getPostPadding() const =0 > DimsHW
Set the padding that is applied at the end of the tensor.

get_pre_padding
()¶ getPrePadding() const =0 > DimsHW
Set the padding that is applied at the start of the tensor.

set_post_padding
()¶ setPostPadding(DimsHW padding)=0
Set the padding that is applied at the end of the tensor.
Negative padding results in trimming the edge by the specified amount

set_pre_padding
()¶ setPrePadding(DimsHW padding)=0
Set the padding that is applied at the start of the tensor.
Negative padding results in trimming the edge by the specified amount

Derived From C++ Class nvinfer1::PaddingLayer
RNNLayer¶

class
tensorrt.infer.
RNNLayer
¶ A RNN layer in a network definition.
This layer applies an RNN operation on the inputs.
Deprecated: This interface is superseded by IRNNv2Layer.
C++ includes: NvInfer.h

get_bias
()¶ getBias() const =0 > Weights
Get the bias parameter vector for the RNN.

get_cell_state
()¶ getCellState() const =0 > ITensor *
Get the initial cell state of the RNN.
Returns: None if no initial cell tensor was specified, the initial cell data otherwise.

get_data_length
()¶ getDataLength() const =0 > int
Get the length of the data being processed by the RNN for use in computing other values.

get_direction
()¶ getDirection() const =0 > RNNDirection
Get the direction of the RNN layer.
getHiddenSize() const =0 > std::size_t
Get the size of the hidden layers.
The hidden size is the value of hiddenSize parameter passed into addRNN().
Returns:  The internal hidden layer size for the RNN.
getHiddenState() const =0 > ITensor *
Get the initial hidden state of the RNN.
Returns: None if no initial hidden tensor was specified, the initial hidden data otherwise.

get_input_mode
()¶ getInputMode() const =0 > RNNInputMode
Get the operation of the RNN layer.

get_layer_count
()¶ getLayerCount() const =0 > unsigned
Get the number of layers in the RNN.
Returns: The number of layers in the RNN.

get_operation
()¶ getOperation() const =0 > RNNOperation
Get the operation of the RNN layer.

get_seq_length
()¶ getSeqLength() const =0 > int
Get the sequence length.
The sequence length is the maximum number of time steps passed into the addRNN() function. This is also the maximum number of input tensors that the RNN can process at once.
Returns: the maximum number of time steps that can be executed by a single call RNN layer.

get_weights
()¶ getWeights() const =0 > Weights
Get the W weights for the RNN.

set_bias
()¶ setBias(Weights bias)=0
Set the bias parameters for the RNN.
Parameters:  bias (*) – The weight structure holding the bias parameters.
The trained weights for the bias parameter vectors of the RNN. The DataType for this structure must be kFLOAT or kHALF, and must be the same datatype as the input tensor.
The layout of the weight structure depends on the RNNOperation, RNNInputMode, and RNNDirection of the layer. The array specified by weights.values contains a sequence of bias vectors, where each bias vector is linearly appended after the previous without padding; e.g. if bias vector 0 and 1 have M and N elements respectively, then the layout of weights.values in memory looks like:
index  0 1 2 3 4 ... M2 M1  M M+1 ... M+N2 M+N1  M+N M+N+1 M+N+2 ...  ... data  bias vector 0  bias vector 1  bias vector 2  ...
The ordering of bias vectors is similar to the ordering of weight matrices as described in setWeights(). To determine the order of bias vectors for a given RNN configuration, determine the ordered list of weight matrices [ W0, W1, …, Wn ]. Then replace each weight matrix with its corresponding bias vector, i.e. apply the following transform (for layer l, gate g):
 Wl[g] becomes Wbl[g]
 Rl[g] becomes Rbl[g]
For example:
an RNN with getLayerCount() == 3, getDirection() == kUNIDIRECTION, and getOperation() == kRELU has the following order:
[ Wb0[i], Rb0[i], Wb1[i], Rb1[i], Wb2[i], Rb2[i] ]
an RNN with getLayerCount() == 2, getDirection() == kUNIDIRECTION, and getOperation() == kGRU has the following order:
[ Wb0[z], Wb0[r], Wb0[h], Rb0[z], Rb0[r], Rb0[h], Wb1[z], Wb1[r], Wb1[h], Rb1[z], Rb1[r], Rb1[h] ]
an RNN with getLayerCount() == 2, getDirection() == kBIDIRECTION, and getOperation() == kRELU has the following order:
[ Wb0_fw[i], Rb0_fw[i], Wb0_bw[i], Rb0_bw[i], Wb1_fw[i], Rb1_fw[i], Wb1_bw[i], Rb1_bw[i] ]
(fw = “forward”, bw = “backward”)
Each bias vector has a fixed size, getHiddenSize().

set_cell_state
()¶ setCellState(ITensor &cell)=0
Set the initial cell state of the RNN with the provided cell ITensor.
Parameters:  cell (*) – The initial cell state of the RNN.
The layout for cell is a linear layout of a 3D matrix:
 C  The number of layers in the RNN, it must match getLayerCount().
 H  The number of minibatches for each time sequence.
 W  The size of the per layer hidden states, it must match getHiddenSize().
If cell is not specified, then the initial cell state is set to zero.
The amount of space required is doubled if getDirection() is kBIDIRECTION with the bidirectional states coming after the unidirectional sptes.
The cell state only affects LSTM RNN's.

set_direction
()¶ setDirection(RNNDirection op)=0
Set the direction of the RNN layer.
The direction determines if the RNN is run as a unidirectional(left to right) or bidirectional(left to right and right to left). In the kBIDIRECTION case the output is concatenated together, resulting in output size of 2x getHiddenSize().
setHiddenState(ITensor &hidden)=0
Set the initial hidden state of the RNN with the provided hidden ITensor.
Parameters:  hidden (*) – The initial hidden state of the RNN.
The layout for hidden is a linear layout of a 3D matrix:
 C  The number of layers in the RNN, it must match getLayerCount()
 H  The number of minibatches for each time sequence.
 W  The size of the per layer hidden states, it must match getHiddenSize()
The amount of space required is doubled if getDirection() is kBIDIRECTION with the bidirectional states coming after the unidirectional states.
If hidden is not specified, then the initial hidden state is set to zero.

set_input_mode
()¶ setInputMode(RNNInputMode op)=0
Set the operation of the RNN layer.

set_operation
()¶ setOperation(RNNOperation op)=0
Set the operation of the RNN layer.

set_weights
()¶ setWeights(Weights weights)=0
Set the weight parameters for the RNN.
Parameters:  weights (*) – The weight structure holding the weight parameters.
The trained weights for the weight parameter matrices of the RNN. The DataType for this structure must be kFLOAT or kHALF, and must be the same datatype as the input tensor.
The layout of the weight structure depends on the RNNOperation, RNNInputMode, and RNNDirection of the layer. The array specified by weights.values contains a sequence of parameter matrices, where each parameter matrix is linearly appended after the previous without padding; e.g., if parameter matrix 0 and 1 have M and N elements respectively, then the layout of weights.values in memory looks like:
index  0 1 2 3 4 ... M2 M1  M M+1 ... M+N2 M+N1  M+N M+N+1 M+N+2 ...  ... data  parameter matrix 0  parameter matrix 1  parameter matrix 2  ...
The following sections describe the order of weight matrices and the layout of elements within a weight matrix. Order of weight matrices The parameter matrices are ordered as described below:
For example:
an RNN with getLayerCount() == 3, getDirection() == kUNIDIRECTION, and getOperation() == kRELU has the following order:
[ W0[i], R0[i], W1[i], R1[i], W2[i], R2[i] ]
an RNN with getLayerCount() == 2, getDirection() == kUNIDIRECTION, and getOperation() == kGRU has the following order:
[ W0[z], W0[r], W0[h], R0[z], R0[r], R0[h], W1[z], W1[r], W1[h], R1[z], R1[r], R1[h] ]
an RNN with getLayerCount() == 2, getDirection() == kBIDIRECTION, and getOperation() == kRELU has the following order:
[ W0_fw[i], R0_fw[i], W0_bw[i], R0_bw[i], W1_fw[i], R1_fw[i], W1_bw[i], R1_bw[i] ]
(fw = “forward”, bw = “backward”)
Layout of elements within a weight matrix Each parameter matrix is rowmajor in memory, and has the following dimensions:
In other words, the input weights of the first layer of the RNN (if not skipped) transform a getDataLength()size column vector into a getHiddenSize()size column vector. The input weights of subsequent layers transform a K*getHiddenSize()size column vector into a getHiddenSize()size column vector. K=2 in the bidirectional case to account for the full hidden state being the concatenation of the forward and backward RNN hidden states.
The recurrent weight matrices for all layers all have shape (H, H), both in the unidirectional and bidirectional cases. (In the bidirectional case, each recurrent weight matrix for the (forward or backward) RNN cell operates on the previous (forward or backward) RNN cell’s hidden state, which is size H).

Derived From C++ Class nvinfer1::RNNLayer
RNNOperation¶

class
tensorrt.infer.
RNNOperation
¶ Type of operation for the layer
 Base Class:
 IntEnum
Derived From C++ Class nvinfer1::RNNOperation
Int8 Calibration¶
Int8Calibrator¶

class
tensorrt.infer.
Int8Calibrator
¶ Applicationimplemented interface for calibration.
Calibration is a step performed by the builder when deciding suitable scale factors for 8bit inference.
It must also provide a method for retrieving representative images which the calibration process can use to examine the distribution of activations. It may optionally implement a method for caching the calibration result for reuse on subsequent runs.
C++ includes: NvInfer.h

get_algorithm
()¶ getAlgorithm()=0 > CalibrationAlgoType
Get the algorithm used by this calibrator.
Returns: The algorithm used by the calibrator.

get_batch
()¶ getBatch(void *bindings[], const char *names[], int nbBindings)=0 > bool
Get a batch of input for calibration.
The batch size of the input must match the batch size returned by getBatchSize().
Parameters:  bindings (*) – An array of pointers to device memory that must be set to the memory containing each network input data.
 names (*) – The names of the network input for each pointer in the binding array.
 nbBindings (*) – The number of pointers in the bindings array.
Returns:  False if there are no more batches for calibration.

get_batch_size
()¶ getBatchSize() const =0 > int
Get the batch size used for calibration batches.
Returns: The batch size.

read_calibration_cache
()¶ readCalibrationCache(std::size_t &length)=0 > const void *
Load a calibration cache.
Calibration is potentially expensive, so it can be useful to generate the calibration data once, then use it on subsequent builds of the network. The cache includes the regression cutoff and quantile values used to generate it, and will not be used if these do not batch the settings of the current calibrator. However, the network should also be recalibrated if its structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.
Parameters: length (*) – The length of the cached data, that should be set by the called function. If there is no data, this should be zero. Returns: A pointer to the cache, or None if there is no data.

write_calibration_cache
()¶ writeCalibrationCache(const void *ptr, std::size_t length)=0
Save a calibration cache.
Parameters:  ptr (*) – A pointer to the data to cache.
 length (*) – The length in bytes of the data to cache.

Derived From C++ Class nvinfer1::Int8Calibrator
Int8EntropyCalibrator¶

class
tensorrt.infer.
Int8EntropyCalibrator
¶ Entropy calibrator. This is the preferred calibrator, as it is less complicated than the legacy calibrator and produces better results.
C++ includes: NvInfer.h
Derived From C++ Class nvinfer1::Int8EntropyCalibrator
Int8LegacyCalibrator¶

class
tensorrt.infer.
Int8LegacyCalibrator
¶ Legacy calibrator for compatibility with 2.0 EA. Will be removed in 2.2. Deprecated
C++ includes: NvInfer.h

get_algorithm
()¶ getAlgorithm() > CalibrationAlgoType
Signal that this is the legacy calibrator.

get_quantile
()¶ getQuantile() const =0 > double
The quantile (between 0 and 1) that will be used to select the region maximum when the quantile method is in use.
See the user guide for more details on how the quantile is used.

get_regression_cutoff
()¶ getRegressionCutoff() const =0 > double
The fraction (between 0 and 1) of the maximum used to define the regression cutoff when using regression to determine the region maximum.
See the user guide for more details on how the regression cutoff is used

read_histogram_cache
()¶ readHistogramCache(std::size_t &length)=0 > const void *
Load a histogram.
Histogram generation is potentially expensive, so it can be useful to generate the histograms once, then use them when exploring the space of calibrations. The histograms should be regenerated if the network structure changes, or the input data set changes, and it is the responsibility of the application to ensure this.
Parameters: length (*) – The length of the cached data, that should be set by the called function. If there is no data, this shou Returns: A pointer to the cache, or None if there is no data.

write_histogram_cache
()¶ writeHistogramCache(const void *ptr, std::size_t length)=0
Save a histogram cache.
Parameters:  ptr (*) – A pointer to the data to cache.
 length (*) – The length in bytes of the data to cache.

Derived From C++ Class nvinfer1::Int8LegacyCalibrator
Logger¶
Logger¶

class
tensorrt.infer.
Logger
¶ Applicationimplemented logging interface for the builder, engine and runtime.
Note that although a logger is passed on creation to each instance of a IBuilder or IRuntime interface, the logger is internally considered a singleton, and thus multiple instances of IRuntime and/or IBuilder must all use the same logger.
C++ includes: NvInfer.h

log
()¶ log(Severity severity, const char *msg)=0
A callback implemented by the application to handle logging messages;
Parameters:  severity (*) – The severity of the message.
 msg (*) – The log message, None terminated.

Derived From C++ Class nvinfer1::Logger
ConsoleLogger¶

class
tensorrt.infer.
ConsoleLogger
¶ 
log
()¶ log(Severity severity, const char *msg)=0
A callback implemented by the application to handle logging messages;
Parameters:  severity (*) – The severity of the message.
 msg (*) – The log message, None terminated.

Derived From C++ Class nvinfer1::ConsoleLogger
Profiler¶
Profiler¶

class
tensorrt.infer.
Profiler
¶ Applicationimplemented interface for profiling.
When this class is added to an execution context, the profiler will be called once per layer for each invocation of execute(). Note that enqueue() does not currently support profiling.
The profiler will only be called after execution is complete. It has a small impact on execution time.
C++ includes: NvInfer.h

report_layer_time
()¶ reportLayerTime(const char *layerName, float ms)=0
Layer time reporting callback.
Parameters:  layerName (*) – The name of the layer, set when constructing the network definition.
 ms (*) – The time in milliseconds to execute the layer.

Derived From C++ Class nvinfer1::Profiler
ConsoleProfiler¶

class
tensorrt.infer.
ConsoleProfiler
¶ 
mProfile
¶

report_layer_time
()¶ reportLayerTime(const char *layerName, float ms)=0
Layer time reporting callback.
Parameters:  layerName (*) – The name of the layer, set when constructing the network definition.
 ms (*) – The time in milliseconds to execute the layer.

timing_iterations
¶

Derived From C++ Class nvinfer1::ConsoleProfiler