

IO tensor modes for TensorRT.


NONE : Tensor is not an input or output.

INPUT : Tensor is input to the engine.

OUTPUT : Tensor is output to the engine.

class tensorrt.ICudaEngine

An ICudaEngine for executing inference on a built network.

The engine can be indexed with [] . When indexed in this way with an integer, it will return the corresponding binding name. When indexed with a string, it will return the corresponding binding index.

  • num_io_tensorsint The number of IO tensors.

  • has_implicit_batch_dimensionbool [DEPRECATED] Deprecated in TensorRT 10.0. Always flase since the implicit batch dimensions support has been removed.

  • num_layersint The number of layers in the network. The number of layers in the network is not necessarily the number in the original INetworkDefinition, as layers may be combined or eliminated as the ICudaEngine is optimized. This value can be useful when building per-layer tables, such as when aggregating profiling data over a number of executions.

  • max_workspace_sizeint The amount of workspace the ICudaEngine uses. The workspace size will be no greater than the value provided to the Builder when the ICudaEngine was built, and will typically be smaller. Workspace will be allocated for each IExecutionContext .

  • device_memory_sizeint The amount of device memory required by an IExecutionContext .

  • device_memory_size_v2int The amount of device memory required by an IExecutionContext. The return value depends on the weight streaming budget if enabled.

  • refittablebool Whether the engine can be refit.

  • namestr The name of the network associated with the engine. The name is set during network creation and is retrieved after building or deserialization.

  • num_optimization_profilesint The number of optimization profiles defined for this engine. This is always at least 1.

  • error_recorderIErrorRecorder Application-implemented error reporting interface for TensorRT objects.

  • engine_capabilityEngineCapability The engine capability. See EngineCapability for details.

  • tactic_sourcesint The tactic sources required by this engine.

  • profiling_verbosity – The profiling verbosity the builder config was set to when the engine was built.

  • hardware_compatibility_level – The hardware compatibility level of the engine.

  • num_aux_streams – Read-only. The number of auxiliary streams used by this engine, which will be less than or equal to the maximum allowed number of auxiliary streams by setting builder_config.max_aux_streams when the engine is built.

  • weight_streaming_budget – [DEPRECATED] Deprecated in TensorRT 10.1, superceded by weight_streaming_budget_v2. Set and get the current weight streaming budget for inference. The budget may be set to -1 disabling weight streaming at runtime, 0 (default) enabling TRT to choose to weight stream or not, or a positive value in the inclusive range [minimum_weight_streaming_budget, streamable_weights_size - 1].

  • minimum_weight_streaming_budget – [DEPRECATED] Deprecated in TensorRT 10.1, superceded by weight_streaming_budget_v2. Returns the minimum weight streaming budget in bytes required to run the network successfully. The engine must have been built with kWEIGHT_STREAMING.

  • streamable_weights_size – Returns the size of the streamable weights in the engine. This may not include all the weights.

  • weight_streaming_budget_v2 – Set and get the current weight streaming budget for inference. The budget may be set any non-negative value. A value of 0 streams the most weights. Values equal to streamable_weights_size (default) or larger will disable weight streaming.

  • weight_streaming_scratch_memory_size – The amount of scratch memory required by a TensorRT ExecutionContext to perform inference. This value may change based on the current weight streaming budget. Please use the V2 memory APIs, engine.device_memory_size_v2 and ExecutionContext.set_device_memory() to provide memory which includes the current weight streaming scratch memory. Not specifying these APIs or using the V1 APIs will not include this memory, so TensorRT will resort to allocating itself.

__del__(self: tensorrt.tensorrt.ICudaEngine) None
__exit__(exc_type, exc_value, traceback)

Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.

__getitem__(self: tensorrt.tensorrt.ICudaEngine, arg0: int) str
__init__(*args, **kwargs)
create_engine_inspector(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.EngineInspector

Create an IEngineInspector which prints out the layer information of an engine or an execution context.


The IEngineInspector.

create_execution_context(self: tensorrt.tensorrt.ICudaEngine, strategy: tensorrt.tensorrt.ExecutionContextAllocationStrategy = <ExecutionContextAllocationStrategy.STATIC: 0>) tensorrt.tensorrt.IExecutionContext

Create an IExecutionContext and specify the device memory allocation strategy.


The newly created IExecutionContext .

create_execution_context_without_device_memory(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.IExecutionContext

Create an IExecutionContext without any device memory allocated The memory for execution of this device context must be supplied by the application.


An IExecutionContext without device memory allocated.

create_serialization_config(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.ISerializationConfig

Create a serialization configuration object.

get_device_memory_size_for_profile(self: tensorrt.tensorrt.ICudaEngine, profile_index: int) int

Return the device memory size required for a certain profile.


profile_index – The index of the profile.

get_device_memory_size_for_profile_v2(self: tensorrt.tensorrt.ICudaEngine, profile_index: int) int

Return the device memory size required for a certain profile.

The return value will change depending on the following API calls 1. setWeightStreamingBudgetV2


profile_index – The index of the profile.

get_tensor_bytes_per_component(*args, **kwargs)

Overloaded function.

  1. get_tensor_bytes_per_component(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int

    Return the number of bytes per component of an element.

    The vector component size is returned if get_tensor_vectorized_dim() != -1.

    arg name

    The tensor name.

  2. get_tensor_bytes_per_component(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int

    Return the number of bytes per component of an element.

    The vector component size is returned if get_tensor_vectorized_dim() != -1.

    arg name

    The tensor name.

get_tensor_components_per_element(*args, **kwargs)

Overloaded function.

  1. get_tensor_components_per_element(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int

    Return the number of components included in one element.

    The number of elements in the vectors is returned if get_tensor_vectorized_dim() != -1.

    arg name

    The tensor name.

  2. get_tensor_components_per_element(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int

    Return the number of components included in one element.

    The number of elements in the vectors is returned if get_tensor_vectorized_dim() != -1.

    arg name

    The tensor name.

get_tensor_dtype(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.DataType

Return the required data type for a buffer from its tensor name.


name – The tensor name.

get_tensor_format(*args, **kwargs)

Overloaded function.

  1. get_tensor_format(self: tensorrt.tensorrt.ICudaEngine, name: str) -> tensorrt.tensorrt.TensorFormat

    Return the tensor format.

    arg name

    The tensor name.

  2. get_tensor_format(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> tensorrt.tensorrt.TensorFormat

    Return the tensor format.

    arg name

    The tensor name.

get_tensor_format_desc(*args, **kwargs)

Overloaded function.

  1. get_tensor_format_desc(self: tensorrt.tensorrt.ICudaEngine, name: str) -> str

    Return the human readable description of the tensor format.

    The description includes the order, vectorization, data type, strides, etc. For example:

    Example 1: CHW + FP32
    “Row major linear FP32 format”
    Example 2: CHW2 + FP16
    “Two wide channel vectorized row major FP16 format”
    Example 3: HWC8 + FP16 + Line Stride = 32
    “Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”
    arg name

    The tensor name.

  2. get_tensor_format_desc(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> str

    Return the human readable description of the tensor format.

    The description includes the order, vectorization, data type, strides, etc. For example:

    Example 1: CHW + FP32
    “Row major linear FP32 format”
    Example 2: CHW2 + FP16
    “Two wide channel vectorized row major FP16 format”
    Example 3: HWC8 + FP16 + Line Stride = 32
    “Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”
    arg name

    The tensor name.

get_tensor_location(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.TensorLocation

Determine whether an input or output tensor must be on GPU or CPU.


name – The tensor name.

get_tensor_mode(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.TensorIOMode

Determine whether a tensor is an input or output tensor.


name – The tensor name.

get_tensor_name(self: tensorrt.tensorrt.ICudaEngine, index: int) str

Return the name of an input or output tensor.


index – The tensor index.

get_tensor_profile_shape(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) List[tensorrt.tensorrt.Dims]

Get the minimum/optimum/maximum dimensions for a particular tensor under an optimization profile.

  • name – The tensor name.

  • profile_index – The index of the profile.

get_tensor_profile_values(self: tensorrt.tensorrt.ICudaEngine, name: int, profile_index: str) List[List[int]]

Get minimum/optimum/maximum values for an input shape binding under an optimization profile. If the specified binding is not an input shape binding, an exception is raised.

  • name – The tensor name.

  • profile_index – The index of the profile.


A List[List[int]] of length 3, containing the minimum, optimum, and maximum values, in that order. If the values have not been set yet, an empty list is returned.

get_tensor_shape(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.Dims

Return the shape of an input or output tensor.


name – The tensor name.

get_tensor_vectorized_dim(*args, **kwargs)

Overloaded function.

  1. get_tensor_vectorized_dim(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int

    Return the dimension index that the buffer is vectorized.

    Specifically -1 is returned if scalars per vector is 1.

    arg name

    The tensor name.

  2. get_tensor_vectorized_dim(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int

    Return the dimension index that the buffer is vectorized.

    Specifically -1 is returned if scalars per vector is 1.

    arg name

    The tensor name.

get_weight_streaming_automatic_budget(self: tensorrt.tensorrt.ICudaEngine) int

Get an automatic weight streaming budget based on available device memory. This value may change between TensorRT major and minor versions. Please use CudaEngine.weight_streaming_budget_v2 to set the returned budget.

is_debug_tensor(self: tensorrt.tensorrt.ICudaEngine, name: str) bool

Determine whether the given name corresponds to a debug tensor.


name – The tensor name.

is_shape_inference_io(self: tensorrt.tensorrt.ICudaEngine, name: str) bool

Determine whether a tensor is read or written by infer_shapes.


name – The tensor name.

serialize(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.IHostMemory

Serialize the engine to a stream.


An IHostMemory object containing the serialized ICudaEngine .

serialize_with_config(self: tensorrt.tensorrt.ICudaEngine, arg0: tensorrt.tensorrt.ISerializationConfig) tensorrt.tensorrt.IHostMemory

Serialize the network to a stream.