IExecutionContext¶

class tensorrt.IExecutionContext¶

Context for executing inference using an ICudaEngine . Multiple IExecutionContext s may exist for one ICudaEngine instance, allowing the same ICudaEngine to be used for the execution of multiple batches simultaneously.

Variables:

debug_sync – bool The debug sync flag. If this flag is set to true, the ICudaEngine will log the successful execution for each kernel during execute(). It has no effect when using execute_async().
profiler – IProfiler The profiler in use by this IExecutionContext .
engine – ICudaEngine The associated ICudaEngine .
name – str The name of the IExecutionContext .
device_memory – capsule The device memory for use by this execution context. The memory must be aligned on a 256-byte boundary, and its size must be at least engine.device_memory_size. If using execute_async() to run the network, The memory is in use from the invocation of execute_async() until network execution is complete. If using execute(), it is in use until execute() returns. Releasing or otherwise using the memory for other purposes during this time will result in undefined behavior. .

execute(self: tensorrt.tensorrt.IExecutionContext, batch_size: int, bindings: List[int]) → bool¶

Synchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine.get_binding_index() .

Parameters:	batch_size – The batch size. This is at most the value supplied when the `ICudaEngine` was built. bindings – A list of integers represetings input and output buffer addresses for the network.
Returns:	True if execution succeeded.

execute_async(self: tensorrt.tensorrt.IExecutionContext, batch_size: int = 1, bindings: List[int], stream_handle: int, input_consumed: capsule = None) → bool¶

Asynchronously execute inference on a batch. This method requires a array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::get_binding_index() .

Parameters:

batch_size – The batch size. This is at most the value supplied when the ICudaEngine was built.
bindings – A list of integers represetings input and output buffer addresses for the network.
stream_handle – A handle for a CUDA stream on which the inference kernels will be executed.
input_consumed – An optional event which will be signaled when the input buffers can be refilled with new data

Returns:

True if the kernels were executed successfully.