IExecutionContext¶
-
class
tensorrt.
IExecutionContext
¶ Context for executing inference using an
ICudaEngine
. MultipleIExecutionContext
s may exist for oneICudaEngine
instance, allowing the sameICudaEngine
to be used for the execution of multiple batches simultaneously.- Variables
debug_sync –
bool
The debug sync flag. If this flag is set to true, theICudaEngine
will log the successful execution for each kernel during execute(). It has no effect when using execute_async().profiler –
IProfiler
The profiler in use by thisIExecutionContext
.engine –
ICudaEngine
The associatedICudaEngine
.name –
str
The name of theIExecutionContext
.device_memory –
capsule
The device memory for use by this execution context. The memory must be aligned on a 256-byte boundary, and its size must be at leastengine.device_memory_size
. If usingexecute_async()
to run the network, The memory is in use from the invocation ofexecute_async()
until network execution is complete. If usingexecute()
, it is in use untilexecute()
returns. Releasing or otherwise using the memory for other purposes during this time will result in undefined behavior.active_optimization_profile –
int
The active optimization profile for the context. The selected profile will be used in subsequent calls toexecute()
orexecute_async()
. Profile 0 is selected by default. Changing this value will invalidate all dynamic bindings for the current execution context, so that they have to be set again usingset_binding_shape()
before calling eitherexecute()
orexecute_async()
.all_binding_shapes_specified –
bool
Whether all dynamic dimensions of input tensors have been specified by callingset_binding_shape()
. Trivially true if network has no dynamically shaped input tensors.all_shape_inputs_specified –
bool
Whether values for all input shape tensors have been specified by callingset_shape_input()
. Trivially true if network has no input shape bindings.error_recorder –
IErrorRecorder
Application-implemented error reporting interface for TensorRT objects.
-
__del__
(self: tensorrt.tensorrt.IExecutionContext) → None¶
-
__exit__
(exc_type, exc_value, traceback)¶ Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
execute
(self: tensorrt.tensorrt.IExecutionContext, batch_size: int = 1, bindings: List[int]) → bool¶ Synchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using
ICudaEngine.get_binding_index()
.- Parameters
batch_size – The batch size. This is at most the value supplied when the
ICudaEngine
was built.bindings – A list of integers representing input and output buffer addresses for the network.
- Returns
True if execution succeeded.
-
execute_async
(self: tensorrt.tensorrt.IExecutionContext, batch_size: int = 1, bindings: List[int], stream_handle: int, input_consumed: capsule = None) → bool¶ Asynchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using
ICudaEngine::get_binding_index()
.- Parameters
batch_size – The batch size. This is at most the value supplied when the
ICudaEngine
was built.bindings – A list of integers representing input and output buffer addresses for the network.
stream_handle – A handle for a CUDA stream on which the inference kernels will be executed.
input_consumed – An optional event which will be signaled when the input buffers can be refilled with new data
- Returns
True if the kernels were executed successfully.
-
execute_async_v2
(self: tensorrt.tensorrt.IExecutionContext, bindings: List[int], stream_handle: int, input_consumed: capsule = None) → bool¶ Asynchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using
ICudaEngine::get_binding_index()
. This method only works for execution contexts built from networks with no implicit batch dimension.- Parameters
bindings – A list of integers representing input and output buffer addresses for the network.
stream_handle – A handle for a CUDA stream on which the inference kernels will be executed.
input_consumed – An optional event which will be signaled when the input buffers can be refilled with new data
- Returns
True if the kernels were executed successfully.
-
execute_v2
(self: tensorrt.tensorrt.IExecutionContext, bindings: List[int]) → bool¶ Synchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using
ICudaEngine.get_binding_index()
. This method only works for execution contexts built from networks with no implicit batch dimension.- Parameters
bindings – A list of integers representing input and output buffer addresses for the network.
- Returns
True if execution succeeded.
-
get_binding_shape
(self: tensorrt.tensorrt.IExecutionContext, binding: int) → tensorrt.tensorrt.Dims¶ Get the dynamic shape of a binding.
If
set_binding_shape()
has been called on this binding (or if there are no dynamic dimensions), all dimensions will be positive. Otherwise, it is necessary to callset_binding_shape()
beforeexecute_async()
orexecute()
may be called.If the
binding
is out of range, an invalid Dims with nbDims == -1 is returned.If
ICudaEngine.binding_is_input(binding)
isFalse
, then bothall_binding_shapes_specified
andall_shape_inputs_specified
must beTrue
before calling this method.- Parameters
binding – The binding index.
- Returns
A
Dims
object representing the currently selected shape.
-
get_shape
(self: tensorrt.tensorrt.IExecutionContext, binding: int) → List[int]¶ Get values of an input shape tensor required for shape calculations or an output tensor produced by shape calculations.
- Parameters
binding – The binding index of an input tensor for which
ICudaEngine.is_shape_binding(binding)
is true.
If
ICudaEngine.binding_is_input(binding) == False
, then bothall_binding_shapes_specified
andall_shape_inputs_specified
must beTrue
before calling this method.- Returns
An iterable containing the values of the shape tensor.
-
get_strides
(self: tensorrt.tensorrt.IExecutionContext, binding: int) → tensorrt.tensorrt.Dims¶ Return the strides of the buffer for the given binding.
Note that strides can be different for different execution contexts with dynamic shapes.
- Parameters
binding – The binding index.
-
set_binding_shape
(self: tensorrt.tensorrt.IExecutionContext, binding: int, shape: tensorrt.tensorrt.Dims) → bool¶ Set the dynamic shape of a binding.
Requires the engine to be built without an implicit batch dimension. The binding must be an input tensor, and all dimensions must be compatible with the network definition (i.e. only the wildcard dimension -1 can be replaced with a new dimension > 0). Furthermore, the dimensions must be in the valid range for the currently selected optimization profile.
For all dynamic non-output bindings (which have at least one wildcard dimension of -1), this method needs to be called after setting
active_optimization_profile
before eitherexecute_async()
orexecute()
may be called. When all input shapes have been specified,all_binding_shapes_specified
is set toTrue
.- Parameters
binding – The binding index.
shape – The shape to set.
- Returns
False
if an error occurs (e.g. index out of range), elseTrue
.
-
set_optimization_profile_async
(self: tensorrt.tensorrt.IExecutionContext, profile_index: int, stream_handle: int) → bool¶ Set the optimization profile with async semantics
- Parameters
profile_index – The index of the optimization profile
stream_handle – cuda stream on which the work to switch optimization profile can be enqueued
When an optimization profile is switched via this API, TensorRT may require that data is copied via cudaMemcpyAsync. It is the application’s responsibility to guarantee that synchronization between the profile sync stream and the enqueue stream occurs.
- Returns
True
if the optimization profile was set successfully
-
set_shape_input
(self: tensorrt.tensorrt.IExecutionContext, binding: int, shape: List[int]) → bool¶ Set values of an input shape tensor required by shape calculations.
- Parameters
binding – The binding index of an input tensor for which
ICudaEngine.is_shape_binding(binding)
andICudaEngine.binding_is_input(binding)
are both true.shape – An iterable containing the values of the input shape tensor. The number of values should be the product of the dimensions returned by
get_binding_shape(binding)
.
If
ICudaEngine.is_shape_binding(binding)
andICudaEngine.binding_is_input(binding)
are both true, this method must be called beforeexecute_async()
orexecute()
may be called. Additionally, this method must not be called if eitherICudaEngine.is_shape_binding(binding)
orICudaEngine.binding_is_input(binding)
are false.- Returns
True
if the values were set successfully.