ICudaEngine

tensorrt.TensorIOMode

IO tensor modes for TensorRT.

Members:

NONE : Tensor is not an input or output.

INPUT : Tensor is input to the engine.

OUTPUT : Tensor is output to the engine.

class tensorrt.ICudaEngine

An ICudaEngine for executing inference on a built network.

The engine can be indexed with [] . When indexed in this way with an integer, it will return the corresponding binding name. When indexed with a string, it will return the corresponding binding index.

Variables

num_io_tensors – int The number of IO tensors.
has_implicit_batch_dimension – bool [DEPRECATED] Deprecated in TensorRT 10.0. Always flase since the implicit batch dimensions support has been removed.
num_layers – int The number of layers in the network. The number of layers in the network is not necessarily the number in the original INetworkDefinition, as layers may be combined or eliminated as the ICudaEngine is optimized. This value can be useful when building per-layer tables, such as when aggregating profiling data over a number of executions.
max_workspace_size – int The amount of workspace the ICudaEngine uses. The workspace size will be no greater than the value provided to the Builder when the ICudaEngine was built, and will typically be smaller. Workspace will be allocated for each IExecutionContext .
device_memory_size – int The amount of device memory required by an IExecutionContext .
device_memory_size_v2 – int The amount of device memory required by an IExecutionContext. The return value depends on the weight streaming budget if enabled.
refittable – bool Whether the engine can be refit.
name – str The name of the network associated with the engine. The name is set during network creation and is retrieved after building or deserialization.
num_optimization_profiles – int The number of optimization profiles defined for this engine. This is always at least 1.
error_recorder – IErrorRecorder Application-implemented error reporting interface for TensorRT objects.
engine_capability – EngineCapability The engine capability. See EngineCapability for details.
tactic_sources – int The tactic sources required by this engine.
profiling_verbosity – The profiling verbosity the builder config was set to when the engine was built.
hardware_compatibility_level – The hardware compatibility level of the engine.
num_aux_streams – Read-only. The number of auxiliary streams used by this engine, which will be less than or equal to the maximum allowed number of auxiliary streams by setting builder_config.max_aux_streams when the engine is built.
weight_streaming_budget – [DEPRECATED] Deprecated in TensorRT 10.1, superceded by weight_streaming_budget_v2. Set and get the current weight streaming budget for inference. The budget may be set to -1 disabling weight streaming at runtime, 0 (default) enabling TRT to choose to weight stream or not, or a positive value in the inclusive range [minimum_weight_streaming_budget, streamable_weights_size - 1].
minimum_weight_streaming_budget – [DEPRECATED] Deprecated in TensorRT 10.1, superceded by weight_streaming_budget_v2. Returns the minimum weight streaming budget in bytes required to run the network successfully. The engine must have been built with kWEIGHT_STREAMING.
streamable_weights_size – Returns the size of the streamable weights in the engine. This may not include all the weights.
weight_streaming_budget_v2 – Set and get the current weight streaming budget for inference. The budget may be set any non-negative value. A value of 0 streams the most weights. Values equal to streamable_weights_size (default) or larger will disable weight streaming.
weight_streaming_scratch_memory_size – The amount of scratch memory required by a TensorRT ExecutionContext to perform inference. This value may change based on the current weight streaming budget. Please use the V2 memory APIs, engine.device_memory_size_v2 and ExecutionContext.set_device_memory() to provide memory which includes the current weight streaming scratch memory. Not specifying these APIs or using the V1 APIs will not include this memory, so TensorRT will resort to allocating itself.

__del__(self: tensorrt.tensorrt.ICudaEngine) → None

__exit__(exc_type, exc_value, traceback): Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.

__getitem__(self: tensorrt.tensorrt.ICudaEngine, arg0: int) → str

__init__(*args, **kwargs)

create_engine_inspector(self: tensorrt.tensorrt.ICudaEngine) → tensorrt.tensorrt.EngineInspector

Create an IEngineInspector which prints out the layer information of an engine or an execution context.

Returns: The IEngineInspector.

create_execution_context(self: tensorrt.tensorrt.ICudaEngine, strategy: tensorrt.tensorrt.ExecutionContextAllocationStrategy = <ExecutionContextAllocationStrategy.STATIC: 0>) → tensorrt.tensorrt.IExecutionContext

Create an IExecutionContext and specify the device memory allocation strategy.

Returns: The newly created IExecutionContext .

create_execution_context_without_device_memory(self: tensorrt.tensorrt.ICudaEngine) → tensorrt.tensorrt.IExecutionContext

Create an IExecutionContext without any device memory allocated The memory for execution of this device context must be supplied by the application.

Returns: An IExecutionContext without device memory allocated.

create_serialization_config(self: tensorrt.tensorrt.ICudaEngine) → tensorrt.tensorrt.ISerializationConfig: Create a serialization configuration object.

get_device_memory_size_for_profile(self: tensorrt.tensorrt.ICudaEngine, profile_index: int) → int

Return the device memory size required for a certain profile.

Parameters: profile_index – The index of the profile.

get_device_memory_size_for_profile_v2(self: tensorrt.tensorrt.ICudaEngine, profile_index: int) → int

Return the device memory size required for a certain profile.

The return value will change depending on the following API calls 1. setWeightStreamingBudgetV2

Parameters: profile_index – The index of the profile.

get_tensor_bytes_per_component(*args, **kwargs)

Overloaded function.

get_tensor_bytes_per_component(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int

Return the number of bytes per component of an element.

The vector component size is returned if get_tensor_vectorized_dim() != -1.

arg name

The tensor name.
get_tensor_bytes_per_component(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int

Return the number of bytes per component of an element.

The vector component size is returned if get_tensor_vectorized_dim() != -1.

arg name

The tensor name.

get_tensor_components_per_element(*args, **kwargs)

Overloaded function.

get_tensor_components_per_element(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int

Return the number of components included in one element.

The number of elements in the vectors is returned if get_tensor_vectorized_dim() != -1.

arg name

The tensor name.
get_tensor_components_per_element(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int

Return the number of components included in one element.

The number of elements in the vectors is returned if get_tensor_vectorized_dim() != -1.

arg name

The tensor name.

get_tensor_dtype(self: tensorrt.tensorrt.ICudaEngine, name: str) → tensorrt.tensorrt.DataType

Return the required data type for a buffer from its tensor name.

Parameters: name – The tensor name.

get_tensor_format(*args, **kwargs)

Overloaded function.

get_tensor_format(self: tensorrt.tensorrt.ICudaEngine, name: str) -> tensorrt.tensorrt.TensorFormat

Return the tensor format.

arg name

The tensor name.
get_tensor_format(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> tensorrt.tensorrt.TensorFormat

Return the tensor format.

arg name

The tensor name.

get_tensor_format_desc(*args, **kwargs)

Overloaded function.

get_tensor_format_desc(self: tensorrt.tensorrt.ICudaEngine, name: str) -> str

Return the human readable description of the tensor format.

The description includes the order, vectorization, data type, strides, etc. For example:

Example 1: CHW + FP32

“Row major linear FP32 format”

Example 2: CHW2 + FP16

“Two wide channel vectorized row major FP16 format”

Example 3: HWC8 + FP16 + Line Stride = 32

“Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”

arg name

The tensor name.
get_tensor_format_desc(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> str

Return the human readable description of the tensor format.

The description includes the order, vectorization, data type, strides, etc. For example:

Example 1: CHW + FP32

“Row major linear FP32 format”

Example 2: CHW2 + FP16

“Two wide channel vectorized row major FP16 format”

Example 3: HWC8 + FP16 + Line Stride = 32

“Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”

arg name

The tensor name.

get_tensor_location(self: tensorrt.tensorrt.ICudaEngine, name: str) → tensorrt.tensorrt.TensorLocation

Determine whether an input or output tensor must be on GPU or CPU.

Parameters: name – The tensor name.

get_tensor_mode(self: tensorrt.tensorrt.ICudaEngine, name: str) → tensorrt.tensorrt.TensorIOMode

Determine whether a tensor is an input or output tensor.

Parameters: name – The tensor name.

get_tensor_name(self: tensorrt.tensorrt.ICudaEngine, index: int) → str

Return the name of an input or output tensor.

Parameters: index – The tensor index.

get_tensor_profile_shape(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) → List[tensorrt.tensorrt.Dims]

Get the minimum/optimum/maximum dimensions for a particular tensor under an optimization profile.

Parameters

name – The tensor name.
profile_index – The index of the profile.

get_tensor_profile_values(self: tensorrt.tensorrt.ICudaEngine, name: int, profile_index: str) → List[List[int]]

Get minimum/optimum/maximum values for an input shape binding under an optimization profile. If the specified binding is not an input shape binding, an exception is raised.

Parameters

name – The tensor name.
profile_index – The index of the profile.

Returns

A List[List[int]] of length 3, containing the minimum, optimum, and maximum values, in that order. If the values have not been set yet, an empty list is returned.

get_tensor_shape(self: tensorrt.tensorrt.ICudaEngine, name: str) → tensorrt.tensorrt.Dims

Return the shape of an input or output tensor.

Parameters: name – The tensor name.

get_tensor_vectorized_dim(*args, **kwargs)

Overloaded function.

get_tensor_vectorized_dim(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int

Return the dimension index that the buffer is vectorized.

Specifically -1 is returned if scalars per vector is 1.

arg name

The tensor name.
get_tensor_vectorized_dim(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int

Return the dimension index that the buffer is vectorized.

Specifically -1 is returned if scalars per vector is 1.

arg name

The tensor name.

get_weight_streaming_automatic_budget(self: tensorrt.tensorrt.ICudaEngine) → int: Get an automatic weight streaming budget based on available device memory. This value may change between TensorRT major and minor versions. Please use CudaEngine.weight_streaming_budget_v2 to set the returned budget.

is_debug_tensor(self: tensorrt.tensorrt.ICudaEngine, name: str) → bool

Determine whether the given name corresponds to a debug tensor.

Parameters: name – The tensor name.

is_shape_inference_io(self: tensorrt.tensorrt.ICudaEngine, name: str) → bool

Determine whether a tensor is read or written by infer_shapes.

Parameters: name – The tensor name.

serialize(self: tensorrt.tensorrt.ICudaEngine) → tensorrt.tensorrt.IHostMemory

Serialize the engine to a stream.

Returns: An IHostMemory object containing the serialized ICudaEngine .

serialize_with_config(self: tensorrt.tensorrt.ICudaEngine, arg0: tensorrt.tensorrt.ISerializationConfig) → tensorrt.tensorrt.IHostMemory: Serialize the network to a stream.