ICudaEngine
- tensorrt.TensorIOMode
IO tensor modes for TensorRT.
Members:
NONE : Tensor is not an input or output.
INPUT : Tensor is input to the engine.
OUTPUT : Tensor is output to the engine.
- class tensorrt.ICudaEngine
An
ICudaEngine
for executing inference on a built network.The engine can be indexed with
[]
. When indexed in this way with an integer, it will return the corresponding binding name. When indexed with a string, it will return the corresponding binding index.- Variables
num_io_tensors –
int
The number of IO tensors.has_implicit_batch_dimension –
bool
[DEPRECATED] Deprecated in TensorRT 10.0. Always flase since the implicit batch dimensions support has been removed.num_layers –
int
The number of layers in the network. The number of layers in the network is not necessarily the number in the originalINetworkDefinition
, as layers may be combined or eliminated as theICudaEngine
is optimized. This value can be useful when building per-layer tables, such as when aggregating profiling data over a number of executions.max_workspace_size –
int
The amount of workspace theICudaEngine
uses. The workspace size will be no greater than the value provided to theBuilder
when theICudaEngine
was built, and will typically be smaller. Workspace will be allocated for eachIExecutionContext
.device_memory_size –
int
The amount of device memory required by anIExecutionContext
.device_memory_size_v2 –
int
The amount of device memory required by anIExecutionContext
. The return value depends on the weight streaming budget if enabled.refittable –
bool
Whether the engine can be refit.name –
str
The name of the network associated with the engine. The name is set during network creation and is retrieved after building or deserialization.num_optimization_profiles –
int
The number of optimization profiles defined for this engine. This is always at least 1.error_recorder –
IErrorRecorder
Application-implemented error reporting interface for TensorRT objects.engine_capability –
EngineCapability
The engine capability. SeeEngineCapability
for details.tactic_sources –
int
The tactic sources required by this engine.profiling_verbosity – The profiling verbosity the builder config was set to when the engine was built.
hardware_compatibility_level – The hardware compatibility level of the engine.
num_aux_streams – Read-only. The number of auxiliary streams used by this engine, which will be less than or equal to the maximum allowed number of auxiliary streams by setting builder_config.max_aux_streams when the engine is built.
weight_streaming_budget – [DEPRECATED] Deprecated in TensorRT 10.1, superceded by weight_streaming_budget_v2. Set and get the current weight streaming budget for inference. The budget may be set to -1 disabling weight streaming at runtime, 0 (default) enabling TRT to choose to weight stream or not, or a positive value in the inclusive range [minimum_weight_streaming_budget, streamable_weights_size - 1].
minimum_weight_streaming_budget – [DEPRECATED] Deprecated in TensorRT 10.1, superceded by weight_streaming_budget_v2. Returns the minimum weight streaming budget in bytes required to run the network successfully. The engine must have been built with kWEIGHT_STREAMING.
streamable_weights_size – Returns the size of the streamable weights in the engine. This may not include all the weights.
weight_streaming_budget_v2 – Set and get the current weight streaming budget for inference. The budget may be set any non-negative value. A value of 0 streams the most weights. Values equal to streamable_weights_size (default) or larger will disable weight streaming.
weight_streaming_scratch_memory_size – The amount of scratch memory required by a TensorRT ExecutionContext to perform inference. This value may change based on the current weight streaming budget. Please use the V2 memory APIs, engine.device_memory_size_v2 and ExecutionContext.set_device_memory() to provide memory which includes the current weight streaming scratch memory. Not specifying these APIs or using the V1 APIs will not include this memory, so TensorRT will resort to allocating itself.
- __del__(self: tensorrt.tensorrt.ICudaEngine) None
- __exit__(exc_type, exc_value, traceback)
Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.
- __getitem__(self: tensorrt.tensorrt.ICudaEngine, arg0: int) str
- __init__(*args, **kwargs)
- create_engine_inspector(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.EngineInspector
Create an
IEngineInspector
which prints out the layer information of an engine or an execution context.- Returns
The
IEngineInspector
.
- create_execution_context(self: tensorrt.tensorrt.ICudaEngine, strategy: tensorrt.tensorrt.ExecutionContextAllocationStrategy = <ExecutionContextAllocationStrategy.STATIC: 0>) tensorrt.tensorrt.IExecutionContext
Create an
IExecutionContext
and specify the device memory allocation strategy.- Returns
The newly created
IExecutionContext
.
- create_execution_context_without_device_memory(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.IExecutionContext
Create an
IExecutionContext
without any device memory allocated The memory for execution of this device context must be supplied by the application.- Returns
An
IExecutionContext
without device memory allocated.
- create_serialization_config(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.ISerializationConfig
Create a serialization configuration object.
- get_device_memory_size_for_profile(self: tensorrt.tensorrt.ICudaEngine, profile_index: int) int
Return the device memory size required for a certain profile.
- Parameters
profile_index – The index of the profile.
- get_device_memory_size_for_profile_v2(self: tensorrt.tensorrt.ICudaEngine, profile_index: int) int
Return the device memory size required for a certain profile.
The return value will change depending on the following API calls 1. setWeightStreamingBudgetV2
- Parameters
profile_index – The index of the profile.
- get_tensor_bytes_per_component(*args, **kwargs)
Overloaded function.
get_tensor_bytes_per_component(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int
Return the number of bytes per component of an element.
The vector component size is returned if
get_tensor_vectorized_dim()
!= -1.- arg name
The tensor name.
get_tensor_bytes_per_component(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int
Return the number of bytes per component of an element.
The vector component size is returned if
get_tensor_vectorized_dim()
!= -1.- arg name
The tensor name.
- get_tensor_components_per_element(*args, **kwargs)
Overloaded function.
get_tensor_components_per_element(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int
Return the number of components included in one element.
The number of elements in the vectors is returned if
get_tensor_vectorized_dim()
!= -1.- arg name
The tensor name.
get_tensor_components_per_element(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int
Return the number of components included in one element.
The number of elements in the vectors is returned if
get_tensor_vectorized_dim()
!= -1.- arg name
The tensor name.
- get_tensor_dtype(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.DataType
Return the required data type for a buffer from its tensor name.
- Parameters
name – The tensor name.
- get_tensor_format(*args, **kwargs)
Overloaded function.
get_tensor_format(self: tensorrt.tensorrt.ICudaEngine, name: str) -> tensorrt.tensorrt.TensorFormat
Return the tensor format.
- arg name
The tensor name.
get_tensor_format(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> tensorrt.tensorrt.TensorFormat
Return the tensor format.
- arg name
The tensor name.
- get_tensor_format_desc(*args, **kwargs)
Overloaded function.
get_tensor_format_desc(self: tensorrt.tensorrt.ICudaEngine, name: str) -> str
Return the human readable description of the tensor format.
The description includes the order, vectorization, data type, strides, etc. For example:
Example 1: CHW + FP32“Row major linear FP32 format”Example 2: CHW2 + FP16“Two wide channel vectorized row major FP16 format”Example 3: HWC8 + FP16 + Line Stride = 32“Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”- arg name
The tensor name.
get_tensor_format_desc(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> str
Return the human readable description of the tensor format.
The description includes the order, vectorization, data type, strides, etc. For example:
Example 1: CHW + FP32“Row major linear FP32 format”Example 2: CHW2 + FP16“Two wide channel vectorized row major FP16 format”Example 3: HWC8 + FP16 + Line Stride = 32“Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0”- arg name
The tensor name.
- get_tensor_location(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.TensorLocation
Determine whether an input or output tensor must be on GPU or CPU.
- Parameters
name – The tensor name.
- get_tensor_mode(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.TensorIOMode
Determine whether a tensor is an input or output tensor.
- Parameters
name – The tensor name.
- get_tensor_name(self: tensorrt.tensorrt.ICudaEngine, index: int) str
Return the name of an input or output tensor.
- Parameters
index – The tensor index.
- get_tensor_profile_shape(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) List[tensorrt.tensorrt.Dims]
Get the minimum/optimum/maximum dimensions for a particular tensor under an optimization profile.
- Parameters
name – The tensor name.
profile_index – The index of the profile.
- get_tensor_profile_values(self: tensorrt.tensorrt.ICudaEngine, name: int, profile_index: str) List[List[int]]
Get minimum/optimum/maximum values for an input shape binding under an optimization profile. If the specified binding is not an input shape binding, an exception is raised.
- Parameters
name – The tensor name.
profile_index – The index of the profile.
- Returns
A
List[List[int]]
of length 3, containing the minimum, optimum, and maximum values, in that order. If the values have not been set yet, an empty list is returned.
- get_tensor_shape(self: tensorrt.tensorrt.ICudaEngine, name: str) tensorrt.tensorrt.Dims
Return the shape of an input or output tensor.
- Parameters
name – The tensor name.
- get_tensor_vectorized_dim(*args, **kwargs)
Overloaded function.
get_tensor_vectorized_dim(self: tensorrt.tensorrt.ICudaEngine, name: str) -> int
Return the dimension index that the buffer is vectorized.
Specifically -1 is returned if scalars per vector is 1.
- arg name
The tensor name.
get_tensor_vectorized_dim(self: tensorrt.tensorrt.ICudaEngine, name: str, profile_index: int) -> int
Return the dimension index that the buffer is vectorized.
Specifically -1 is returned if scalars per vector is 1.
- arg name
The tensor name.
- get_weight_streaming_automatic_budget(self: tensorrt.tensorrt.ICudaEngine) int
Get an automatic weight streaming budget based on available device memory. This value may change between TensorRT major and minor versions. Please use CudaEngine.weight_streaming_budget_v2 to set the returned budget.
- is_debug_tensor(self: tensorrt.tensorrt.ICudaEngine, name: str) bool
Determine whether the given name corresponds to a debug tensor.
- Parameters
name – The tensor name.
- is_shape_inference_io(self: tensorrt.tensorrt.ICudaEngine, name: str) bool
Determine whether a tensor is read or written by infer_shapes.
- Parameters
name – The tensor name.
- serialize(self: tensorrt.tensorrt.ICudaEngine) tensorrt.tensorrt.IHostMemory
Serialize the engine to a stream.
- Returns
An
IHostMemory
object containing the serializedICudaEngine
.
- serialize_with_config(self: tensorrt.tensorrt.ICudaEngine, arg0: tensorrt.tensorrt.ISerializationConfig) tensorrt.tensorrt.IHostMemory
Serialize the network to a stream.