IExecutionContext

class tensorrt.IExecutionContext

Context for executing inference using an ICudaEngine . Multiple IExecutionContext s may exist for one ICudaEngine instance, allowing the same ICudaEngine to be used for the execution of multiple batches simultaneously.

Variables
  • debug_syncbool The debug sync flag. If this flag is set to true, the ICudaEngine will log the successful execution for each kernel during execute(). It has no effect when using execute_async().

  • profilerIProfiler The profiler in use by this IExecutionContext .

  • engineICudaEngine The associated ICudaEngine .

  • namestr The name of the IExecutionContext .

  • device_memorycapsule The device memory for use by this execution context. The memory must be aligned on a 256-byte boundary, and its size must be at least engine.device_memory_size. If using execute_async() to run the network, The memory is in use from the invocation of execute_async() until network execution is complete. If using execute(), it is in use until execute() returns. Releasing or otherwise using the memory for other purposes during this time will result in undefined behavior.

  • active_optimization_profileint The active optimization profile for the context. The selected profile will be used in subsequent calls to execute() or execute_async() . Profile 0 is selected by default. Changing this value will invalidate all dynamic bindings for the current execution context, so that they have to be set again using set_binding_shape() before calling either execute() or execute_async() .

  • all_binding_shapes_specifiedbool Whether all dynamic dimensions of input tensors have been specified by calling set_binding_shape() . Trivially true if network has no dynamically shaped input tensors.

  • all_shape_inputs_specifiedbool Whether values for all input shape tensors have been specified by calling set_shape_input() . Trivially true if network has no input shape bindings.

  • error_recorderIErrorRecorder Application-implemented error reporting interface for TensorRT objects.

__del__(self: tensorrt.tensorrt.IExecutionContext) → None
__exit__(exc_type, exc_value, traceback)

Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.

__init__()

Initialize self. See help(type(self)) for accurate signature.

execute(self: tensorrt.tensorrt.IExecutionContext, batch_size: int = 1, bindings: List[int]) → bool

Synchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine.get_binding_index() .

Parameters
  • batch_size – The batch size. This is at most the value supplied when the ICudaEngine was built.

  • bindings – A list of integers representing input and output buffer addresses for the network.

Returns

True if execution succeeded.

execute_async(self: tensorrt.tensorrt.IExecutionContext, batch_size: int = 1, bindings: List[int], stream_handle: int, input_consumed: capsule = None) → bool

Asynchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::get_binding_index() .

Parameters
  • batch_size – The batch size. This is at most the value supplied when the ICudaEngine was built.

  • bindings – A list of integers representing input and output buffer addresses for the network.

  • stream_handle – A handle for a CUDA stream on which the inference kernels will be executed.

  • input_consumed – An optional event which will be signaled when the input buffers can be refilled with new data

Returns

True if the kernels were executed successfully.

execute_async_v2(self: tensorrt.tensorrt.IExecutionContext, bindings: List[int], stream_handle: int, input_consumed: capsule = None) → bool

Asynchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::get_binding_index() . This method only works for execution contexts built from networks with no implicit batch dimension.

Parameters
  • bindings – A list of integers representing input and output buffer addresses for the network.

  • stream_handle – A handle for a CUDA stream on which the inference kernels will be executed.

  • input_consumed – An optional event which will be signaled when the input buffers can be refilled with new data

Returns

True if the kernels were executed successfully.

execute_v2(self: tensorrt.tensorrt.IExecutionContext, bindings: List[int]) → bool

Synchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine.get_binding_index() . This method only works for execution contexts built from networks with no implicit batch dimension.

Parameters

bindings – A list of integers representing input and output buffer addresses for the network.

Returns

True if execution succeeded.

get_binding_shape(self: tensorrt.tensorrt.IExecutionContext, binding: int) → tensorrt.tensorrt.Dims

Get the dynamic shape of a binding.

If set_binding_shape() has been called on this binding (or if there are no dynamic dimensions), all dimensions will be positive. Otherwise, it is necessary to call set_binding_shape() before execute_async() or execute() may be called.

If the binding is out of range, an invalid Dims with nbDims == -1 is returned.

If ICudaEngine.binding_is_input(binding) is False , then both all_binding_shapes_specified and all_shape_inputs_specified must be True before calling this method.

Parameters

binding – The binding index.

Returns

A Dims object representing the currently selected shape.

get_shape(self: tensorrt.tensorrt.IExecutionContext, binding: int) → List[int]

Get values of an input shape tensor required for shape calculations or an output tensor produced by shape calculations.

Parameters

binding – The binding index of an input tensor for which ICudaEngine.is_shape_binding(binding) is true.

If ICudaEngine.binding_is_input(binding) == False, then both all_binding_shapes_specified and all_shape_inputs_specified must be True before calling this method.

Returns

An iterable containing the values of the shape tensor.

get_strides(self: tensorrt.tensorrt.IExecutionContext, binding: int) → tensorrt.tensorrt.Dims

Return the strides of the buffer for the given binding.

Note that strides can be different for different execution contexts with dynamic shapes.

Parameters

binding – The binding index.

set_binding_shape(self: tensorrt.tensorrt.IExecutionContext, binding: int, shape: tensorrt.tensorrt.Dims) → bool

Set the dynamic shape of a binding.

Requires the engine to be built without an implicit batch dimension. The binding must be an input tensor, and all dimensions must be compatible with the network definition (i.e. only the wildcard dimension -1 can be replaced with a new dimension > 0). Furthermore, the dimensions must be in the valid range for the currently selected optimization profile.

For all dynamic non-output bindings (which have at least one wildcard dimension of -1), this method needs to be called after setting active_optimization_profile before either execute_async() or execute() may be called. When all input shapes have been specified, all_binding_shapes_specified is set to True .

Parameters
  • binding – The binding index.

  • shape – The shape to set.

Returns

False if an error occurs (e.g. specified binding is out of range for the currently selected optimization profile or specified shape is inconsistent with min-max range of the optimization profile), else True.

Note that the network can still be invalid for certain combinations of input shapes that lead to invalid output shapes. To confirm the correctness of the network input shapes, check whether the output binding has valid shape using get_binding_shape() on the output binding.

set_optimization_profile_async(self: tensorrt.tensorrt.IExecutionContext, profile_index: int, stream_handle: int) → bool

Set the optimization profile with async semantics

Parameters
  • profile_index – The index of the optimization profile

  • stream_handle – cuda stream on which the work to switch optimization profile can be enqueued

When an optimization profile is switched via this API, TensorRT may require that data is copied via cudaMemcpyAsync. It is the application’s responsibility to guarantee that synchronization between the profile sync stream and the enqueue stream occurs.

Returns

True if the optimization profile was set successfully

set_shape_input(self: tensorrt.tensorrt.IExecutionContext, binding: int, shape: List[int]) → bool

Set values of an input shape tensor required by shape calculations.

Parameters
  • binding – The binding index of an input tensor for which ICudaEngine.is_shape_binding(binding) and ICudaEngine.binding_is_input(binding) are both true.

  • shape – An iterable containing the values of the input shape tensor. The number of values should be the product of the dimensions returned by get_binding_shape(binding).

If ICudaEngine.is_shape_binding(binding) and ICudaEngine.binding_is_input(binding) are both true, this method must be called before execute_async() or execute() may be called. Additionally, this method must not be called if either ICudaEngine.is_shape_binding(binding) or ICudaEngine.binding_is_input(binding) are false.

Returns

False if an error occurs (e.g. specified binding is out of range for the currently selected optimization profile or specified shape values are inconsistent with min-max range of the optimization profile), else True.

Note that the network can still be invalid for certain combinations of input shapes that lead to invalid output shapes. To confirm the correctness of the network input shapes, check whether the output binding has valid shape using get_binding_shape() on the output binding.