ICudaEngine

tensorrt.TensorIOMode

IO tensor modes for TensorRT.

Members:

NONE : Tensor is not an input or output.

INPUT : Tensor is input to the engine.

OUTPUT : Tensor is output to the engine.

class tensorrt.ICudaEngine

An ICudaEngine for executing inference on a built network.

The engine can be indexed with [] . When indexed in this way with an integer, it will return the corresponding binding name. When indexed with a string, it will return the corresponding binding index.

Variables
  • num_bindingsint The number of binding indices.

  • num_io_tensorsint The number of IO tensors.

  • max_batch_sizeint [DEPRECATED] The maximum batch size which can be used for inference for an engine built from an INetworkDefinition with implicit batch dimension. For an engine built from an INetworkDefinition with explicit batch dimension, this will always be 1 .

  • has_implicit_batch_dimensionbool Whether the engine was built with an implicit batch dimension. This is an engine-wide property. Either all tensors in the engine have an implicit batch dimension or none of them do. This is True if and only if the INetworkDefinition from which this engine was built was created without the NetworkDefinitionCreationFlag.EXPLICIT_BATCH flag.

  • num_layersint The number of layers in the network. The number of layers in the network is not necessarily the number in the original INetworkDefinition, as layers may be combined or eliminated as the ICudaEngine is optimized. This value can be useful when building per-layer tables, such as when aggregating profiling data over a number of executions.

  • max_workspace_sizeint The amount of workspace the ICudaEngine uses. The workspace size will be no greater than the value provided to the Builder when the ICudaEngine was built, and will typically be smaller. Workspace will be allocated for each IExecutionContext .

  • device_memory_sizeint The amount of device memory required by an IExecutionContext .

  • refittablebool Whether the engine can be refit.

  • namestr The name of the network associated with the engine. The name is set during network creation and is retrieved after building or deserialization.

  • num_optimization_profilesint The number of optimization profiles defined for this engine. This is always at least 1.

  • error_recorderIErrorRecorder Application-implemented error reporting interface for TensorRT objects.

  • engine_capabilityEngineCapability The engine capability. See EngineCapability for details.

  • tactic_sourcesint The tactic sources required by this engine.

  • profiling_verbosity – The profiling verbosity the builder config was set to when the engine was built.

  • hardware_compatibility_level – The hardware compatibility level of the engine.

  • num_aux_streams – Read-only. The number of auxiliary streams used by this engine, which will be less than or equal to the maximum allowed number of auxiliary streams by setting builder_config.max_aux_streams when the engine is built.

__del__(self: tensorrt.tensorrt.ICudaEngine) None
__exit__(exc_type, exc_value, traceback)

Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.

__getitem__(*args, **kwargs)

Overloaded function.

  1. __getitem__(self: tensorrt.tensorrt.ICudaEngine, arg0: str) -> int

  2. __getitem__(self: tensorrt.tensorrt.ICudaEngine, arg0: int) -> str

__init__(*args, **kwargs)
__len__(self: tensorrt.tensorrt.ICudaEngine) int
binding_is_input(*args, **kwargs)

Overloaded function.

  1. binding_is_input(self: tensorrt.tensorrt.ICudaEngine, index: int) -> bool

    Determine whether a binding is an input binding.

    index

    The binding index.

    returns

    True if the index corresponds to an input binding and the index is in range.

  2. binding_is_input(self: tensorrt.tensorrt.ICudaEngine, name: str) -> bool

    Determine whether a binding is an input binding.

    name

    The name of the tensor corresponding to an engine binding.

    returns

    True if the index corresponds to an input binding and the index is in range.

create_engine_inspector(self: tensorrt.tensorrt.ICudaEngine) nvinfer1::IEngineInspector

Create an IEngineInspector which prints out the layer information of an engine or an execution context.

Returns

The IEngineInspector.

create_execution_context(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.IExecutionContext

Create an IExecutionContext .

Returns

The newly created IExecutionContext .

create_execution_context_without_device_memory(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.IExecutionContext

Create an IExecutionContext without any device memory allocated The memory for execution of this device context must be supplied by the application.

Returns

An IExecutionContext without device memory allocated.

get_binding_bytes_per_component(self: tensorrt.tensorrt.ICudaEngine, index: int) int

Return the number of bytes per component of an element. The vector component size is returned if get_binding_vectorized_dim() != -1.

Parameters

index – The binding index.

get_binding_components_per_element(self: tensorrt.tensorrt.ICudaEngine, index: int) int

Return the number of components included in one element.

The number of elements in the vectors is returned if get_binding_vectorized_dim() != -1.

Parameters

index – The binding index.

get_binding_dtype(*args, **kwargs)

Overloaded function.

  1. get_binding_dtype(self: tensorrt.tensorrt.ICudaEngine, index: int) -> tensorrt.tensorrt.DataType

    Determine the required data type for a buffer from its binding index.

    index

    The binding index.

    Returns

    The type of data in the buffer.

  2. get_binding_dtype(self: tensorrt.tensorrt.ICudaEngine, name: str) -> tensorrt.tensorrt.DataType

    Determine the required data type for a buffer from its binding index.

    name

    The name of the tensor corresponding to an engine binding.

    Returns

    The type of data in the buffer.

get_binding_format(self: tensorrt.tensorrt.ICudaEngine, index: int) tensorrt.tensorrt.TensorFormat

Return the binding format.

Parameters

index – The binding index.

get_binding_format_desc(self: tensorrt.tensorrt.ICudaEngine, index: int) str

Return the human readable description of the tensor format.

The description includes the order, vectorization, data type, strides, etc. For example:

Example 1: kCHW + FP32
“Row major linear FP32 format”
Example 2: kCHW2 + FP16
“Two wide channel vectorized row major FP16 format”
Example 3: kHWC8 + FP16 + Line Stride = 32
“Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”
Parameters

index – The binding index.

get_binding_index(self: tensorrt.tensorrt.ICudaEngine, name: str) int

Retrieve the binding index for a named tensor.

You can also use engine’s __getitem__() with engine[name]. When invoked with a str , this will return the corresponding binding index.

IExecutionContext.execute_async_v2() and IExecutionContext.execute_v2() require an array of buffers. Engine bindings map from tensor names to indices in this array. Binding indices are assigned at ICudaEngine build time, and take values in the range [0 … n-1] where n is the total number of inputs and outputs.

Parameters

name – The tensor name.

Returns

The binding index for the named tensor, or -1 if the name is not found.

get_binding_name(self: tensorrt.tensorrt.ICudaEngine, index: int) str

Retrieve the name corresponding to a binding index.

You can also use engine’s __getitem__() with engine[index]. When invoked with an int , this will return the corresponding binding name.

This is the reverse mapping to that provided by get_binding_index() .

Parameters

index – The binding index.

Returns

The name corresponding to the binding index.

get_binding_shape(*args, **kwargs)

Overloaded function.

  1. get_binding_shape(self: tensorrt.tensorrt.ICudaEngine, index: int) -> tensorrt.tensorrt.Dims

    Get the shape of a binding.

    index

    The binding index.

    Returns

    The shape of the binding if the index is in range, otherwise Dims()

  2. get_binding_shape(self: tensorrt.tensorrt.ICudaEngine, name: str) -> tensorrt.tensorrt.Dims

    Get the shape of a binding.

    name

    The name of the tensor corresponding to an engine binding.

    Returns

    The shape of the binding if the tensor is present, otherwise Dims()

get_binding_vectorized_dim(self: tensorrt.tensorrt.ICudaEngine, index: int) int

Return the dimension index that the buffer is vectorized.

Specifically -1 is returned if scalars per vector is 1.

Parameters

index – The binding index.

get_location(*args, **kwargs)

Overloaded function.

  1. get_location(self: tensorrt.tensorrt.ICudaEngine, index: int) -> tensorrt.tensorrt.TensorLocation

    Get location of binding. This lets you know whether the binding should be a pointer to device or host memory.

    index

    The binding index.

    returns

    The location of the bound tensor with given index.

  2. get_location(self: tensorrt.tensorrt.ICudaEngine, name: str) -> tensorrt.tensorrt.TensorLocation

    Get location of binding. This lets you know whether the binding should be a pointer to device or host memory.

    name

    The name of the tensor corresponding to an engine binding.

    returns

    The location of the bound tensor with given index.

get_profile_shape(*args, **kwargs)

Overloaded function.

  1. get_profile_shape(self: tensorrt.tensorrt.ICudaEngine, profile_index: int, binding: int) -> List[tensorrt.tensorrt.Dims]

    Get the minimum/optimum/maximum dimensions for a particular binding under an optimization profile.

    arg profile_index

    The index of the profile.

    arg binding

    The binding index or name.

    returns

    A List[Dims] of length 3, containing the minimum, optimum, and maximum shapes, in that order.

  2. get_profile_shape(self: tensorrt.tensorrt.ICudaEngine, profile_index: int, binding: str) -> List[tensorrt.tensorrt.Dims]

    Get the minimum/optimum/maximum dimensions for a particular binding under an optimization profile.

    arg profile_index

    The index of the profile.

    arg binding

    The binding index or name.

    returns

    A List[Dims] of length 3, containing the minimum, optimum, and maximum shapes, in that order.

get_profile_shape_input(*args, **kwargs)

Overloaded function.

  1. get_profile_shape_input(self: tensorrt.tensorrt.ICudaEngine, profile_index: int, binding: int) -> List[List[int]]

    Get minimum/optimum/maximum values for an input shape binding under an optimization profile. If the specified binding is not an input shape binding, an exception is raised.

    arg profile_index

    The index of the profile.

    arg binding

    The binding index or name.

    returns

    A List[List[int]] of length 3, containing the minimum, optimum, and maximum values, in that order. If the values have not been set yet, an empty list is returned.

  2. get_profile_shape_input(self: tensorrt.tensorrt.ICudaEngine, profile_index: int, binding: str) -> List[List[int]]

    Get minimum/optimum/maximum values for an input shape binding under an optimization profile. If the specified binding is not an input shape binding, an exception is raised.

    arg profile_index

    The index of the profile.

    arg binding

    The binding index or name.

    returns

    A List[List[int]] of length 3, containing the minimum, optimum, and maximum values, in that order. If the values have not been set yet, an empty list is returned.

get_tensor_bytes_per_component(self: tensorrt.tensorrt.ICudaEngine, name: str) int

Return the number of bytes per component of an element.

The vector component size is returned if get_tensor_vectorized_dim() != -1.

Parameters

name – The tensor name.

get_tensor_components_per_element(self: tensorrt.tensorrt.ICudaEngine, name: str) int

Return the number of components included in one element.

The number of elements in the vectors is returned if get_tensor_vectorized_dim() != -1.

Parameters

name – The tensor name.

get_tensor_dtype(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.DataType

Return the required data type for a buffer from its tensor name.

Parameters

name – The tensor name.

get_tensor_format(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.TensorFormat

Return the tensor format.

Parameters

name – The tensor name.

get_tensor_format_desc(self: tensorrt.tensorrt.ICudaEngine, name: str) str

Return the human readable description of the tensor format.

The description includes the order, vectorization, data type, strides, etc. For example:

Example 1: kCHW + FP32
“Row major linear FP32 format”
Example 2: kCHW2 + FP16
“Two wide channel vectorized row major FP16 format”
Example 3: kHWC8 + FP16 + Line Stride = 32
“Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”
Parameters

name – The tensor name.

get_tensor_location(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.TensorLocation

Determine whether an input or output tensor must be on GPU or CPU.

Parameters

name – The tensor name.

get_tensor_mode(self: tensorrt.tensorrt.ICudaEngine, name: str) nvinfer1::TensorIOMode

Determine whether a tensor is an input or output tensor.

Parameters

name – The tensor name.

get_tensor_name(self: tensorrt.tensorrt.ICudaEngine, index: int) str

Return the name of an input or output tensor.

Parameters

index – The tensor index.

get_tensor_profile_shape(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) List[tensorrt.tensorrt.Dims]

Get the minimum/optimum/maximum dimensions for a particular tensor under an optimization profile.

Parameters
  • name – The tensor name.

  • profile_index – The index of the profile.

get_tensor_shape(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.Dims

Return the shape of an input or output tensor.

Parameters

name – The tensor name.

get_tensor_vectorized_dim(self: tensorrt.tensorrt.ICudaEngine, name: str) int

Return the dimension index that the buffer is vectorized.

Specifically -1 is returned if scalars per vector is 1.

Parameters

name – The tensor name.

is_execution_binding(self: tensorrt.tensorrt.ICudaEngine, binding: int) bool

Returns True if tensor is required for execution phase, false otherwise.

For example, if a network uses an input tensor with binding i ONLY as the reshape dimensions for an IShuffleLayer , then is_execution_binding(i) == False, and a binding of 0 can be supplied for it when calling IExecutionContext.execute_v2() or IExecutionContext.execute_async_v2() .

Parameters

binding – The binding index.

is_shape_binding(self: tensorrt.tensorrt.ICudaEngine, binding: int) bool

Returns True if tensor is required as input for shape calculations or output from them.

TensorRT evaluates a network in two phases:

  1. Compute shape information required to determine memory allocation requirements and validate that runtime sizes make sense.

  2. Process tensors on the device.

Some tensors are required in phase 1. These tensors are called “shape tensors”, and always have type tensorrt.int32 and no more than one dimension. These tensors are not always shapes themselves, but might be used to calculate tensor shapes for phase 2.

is_shape_binding() returns true if the tensor is a required input or an output computed in phase 1. is_execution_binding() returns true if the tensor is a required input or an output computed in phase 2.

For example, if a network uses an input tensor with binding i as an input to an IElementWiseLayer that computes the reshape dimensions for an IShuffleLayer , is_shape_binding(i) == True

It’s possible to have a tensor be required by both phases. For instance, a tensor can be used as a shape in an IShuffleLayer and as the indices for an IGatherLayer collecting floating-point data.

It’s also possible to have a tensor required by neither phase that shows up in the engine’s inputs. For example, if an input tensor is used only as an input to an IShapeLayer , only its shape matters and its values are irrelevant.

Parameters

binding – The binding index.

is_shape_inference_io(self: tensorrt.tensorrt.ICudaEngine, name: str) bool

Determine whether a tensor is read or written by infer_shapes.

Parameters

name – The tensor name.

serialize(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.IHostMemory

Serialize the engine to a stream.

Returns

An IHostMemory object containing the serialized ICudaEngine .