IExecutionContext¶
- class tensorrt.IOutputAllocator(self: tensorrt.tensorrt.IOutputAllocator)¶
- Application-implemented class for controlling output tensor allocation. - To implement a custom output allocator, ensure that you explicitly instantiate the base class in - __init__():- class MyOutputAllocator(trt.IOutputAllocator): def __init__(self): trt.IOutputAllocator.__init__(self) def reallocate_output(self, tensor_name, memory, size, alignment): ... # Your implementation here def reallocate_output_async(self, tensor_name, memory, size, alignment, stream): ... # Your implementation here def notify_shape(self, tensor_name, shape): ... # Your implementation here - __init__(self: tensorrt.tensorrt.IOutputAllocator) None¶
 
- class tensorrt.IExecutionContext¶
- Context for executing inference using an - ICudaEngine. Multiple- IExecutionContexts may exist for one- ICudaEngineinstance, allowing the same- ICudaEngineto be used for the execution of multiple batches simultaneously.- Variables:
- debug_sync – - boolThe debug sync flag. If this flag is set to true, the- ICudaEnginewill log the successful execution for each kernel during execute_v2().
- profiler – - IProfilerThe profiler in use by this- IExecutionContext.
- engine – - ICudaEngineThe associated- ICudaEngine.
- name – - strThe name of the- IExecutionContext.
- device_memory – - capsuleThe device memory for use by this execution context. The memory must be aligned with cuda memory alignment property (using- cuda.cudart.cudaGetDeviceProperties()), and its size must be large enough for performing inference with the given network inputs.- engine.device_memory_size()and- engine.get_device_memory_size_for_profile()report upper bounds of the size. Setting memory to nullptr is acceptable if the reported size is 0. If using- execute_async_v3()to run the network, the memory is in use from the invocation of- execute_async_v3()until network execution is complete. If using- execute_v2(), it is in use until- execute_v2()returns. Releasing or otherwise using the memory for other purposes, including using it in another execution context running in parallel, during this time will result in undefined behavior.
- active_optimization_profile – - intThe active optimization profile for the context. The selected profile will be used in subsequent calls to- execute_v2(). Profile 0 is selected by default. This is a readonly property and active optimization profile can be changed with- set_optimization_profile_async(). Changing this value will invalidate all dynamic bindings for the current execution context, so that they have to be set again using- set_input_shape()before calling either- execute_v2().
- all_binding_shapes_specified – - boolWhether all dynamic dimensions of input tensors have been specified by calling- set_input_shape(). Trivially true if network has no dynamically shaped input tensors. Does not work with name-base interfaces eg.- set_input_shape(). Use- infer_shapes()instead.
- all_shape_inputs_specified – - boolWhether values for all input shape tensors have been specified by calling- set_shape_input(). Trivially true if network has no input shape bindings. Does not work with name-base interfaces eg.- set_input_shape(). Use- infer_shapes()instead.
- error_recorder – - IErrorRecorderApplication-implemented error reporting interface for TensorRT objects.
- enqueue_emits_profile – - boolWhether enqueue emits layer timing to the profiler. The default value is- True. If set to- False, enqueue will be asynchronous if there is a profiler attached. An extra method- IExecutionContext::report_to_profiler()needs to be called to obtain the profiling data and report to the profiler attached.
- persistent_cache_limit – The maximum size of persistent L2 cache that this execution context may use for activation caching. Activation caching is not supported on all architectures - see “How TensorRT uses Memory” in the developer guide for details. The default is 0 Bytes. 
- nvtx_verbosity – The NVTX verbosity of the execution context. Building with DETAILED verbosity will generally increase latency in enqueueV3(). Call this method to select NVTX verbosity in this execution context at runtime. The default is the verbosity with which the engine was built, and the verbosity may not be raised above that level. This function does not affect how IEngineInspector interacts with the engine. 
- temporary_allocator – - IGpuAllocatorThe GPU allocator used for internal temporary storage.
 
 - __del__(self: tensorrt.tensorrt.IExecutionContext) None¶
 - __exit__(exc_type, exc_value, traceback)¶
- Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0. 
 - __init__(*args, **kwargs)¶
 - execute_async_v3(self: tensorrt.tensorrt.IExecutionContext, stream_handle: int) bool¶
- Asynchronously execute inference. - Modifying or releasing memory that has been registered for the tensors before stream synchronization or the event passed to - set_input_consumed_event()has been triggered results in undefined behavior.- Input tensors can be released after the - set_input_consumed_event()whereas output tensors require stream synchronization.- Parameters:
- stream_handle – The cuda stream on which the inference kernels will be enqueued. Using default stream may lead to performance issues due to additional cudaDeviceSynchronize() calls by TensorRT to ensure correct synchronizations. Please use non-default stream instead. 
 
 - execute_v2(self: tensorrt.tensorrt.IExecutionContext, bindings: List[int]) bool¶
- Synchronously execute inference on a batch. This method requires a array of input and output buffers. - Parameters:
- bindings – A list of integers representing input and output buffer addresses for the network. 
- Returns:
- True if execution succeeded. 
 
 - get_debug_listener(self: tensorrt.tensorrt.IExecutionContext) tensorrt.tensorrt.IDebugListener¶
- Get debug listener for execution context. - Returns:
- The - IDebugListenerof the execution context.
 
 - get_debug_state(self: tensorrt.tensorrt.IExecutionContext, name: str) bool¶
- Get the debug state of the tensor. - Parameters:
- name – The name of the tensor. 
 
 - get_input_consumed_event(self: tensorrt.tensorrt.IExecutionContext) int¶
- Return the event associated with consuming the input tensors. 
 - get_max_output_size(self: tensorrt.tensorrt.IExecutionContext, name: str) int¶
- Return the upper bound on an output tensor’s size, in bytes, based on the current optimization profile. - If the profile or input shapes are not yet set, or the provided name does not map to an output, returns -1. - Parameters:
- name – The tensor name. 
 
 - get_output_allocator(self: tensorrt.tensorrt.IExecutionContext, name: str) tensorrt.tensorrt.IOutputAllocator¶
- Return the output allocator associated with given output tensor, or - Noneif the provided name does not map to an output tensor.- Parameters:
- name – The tensor name. 
 
 - get_runtime_config(self: tensorrt.tensorrt.IExecutionContext) nvinfer1::IRuntimeConfig¶
- Get the runtime configuration. From the execution context. - Returns:
- The runtime configuration. 
 
 - get_tensor_address(self: tensorrt.tensorrt.IExecutionContext, name: str) int¶
- Get memory address for the given input or output tensor. - Parameters:
- name – The tensor name. 
 
 - get_tensor_shape(self: tensorrt.tensorrt.IExecutionContext, name: str) tensorrt.tensorrt.Dims¶
- Return the shape of the given input or output tensor. - Parameters:
- name – The tensor name. 
 
 - get_tensor_strides(self: tensorrt.tensorrt.IExecutionContext, name: str) tensorrt.tensorrt.Dims¶
- Return the strides of the buffer for the given tensor name. - Note that strides can be different for different execution contexts with dynamic shapes. - Parameters:
- name – The tensor name. 
 
 - infer_shapes(self: tensorrt.tensorrt.IExecutionContext) List[str]¶
- Infer shapes and return the names of any tensors that are insufficiently specified. - An input tensor is insufficiently specified if either of the following is true: - It has dynamic dimensions and its runtime dimensions have not yet been specified via - set_input_shape().
- is_shape_inference_io(t) is True and the tensor’s address has not yet been set. 
 - Returns:
- A - List[str]indicating the names of any tensors which have not been sufficiently specified, or an empty list on success.
- Raises:
- RuntimeError if shape inference fails due to reasons other than insufficiently specified tensors. 
 
 - report_to_profiler(self: tensorrt.tensorrt.IExecutionContext) bool¶
- Calculate layer timing info for the current optimization profile in IExecutionContext and update the profiler after one iteration of inference launch. - If the enqueue_emits_profiler flag was set to true, the enqueue function will calculate layer timing implicitly if a profiler is provided. There is no need to call this function. If the enqueue_emits_profiler flag was set to false, the enqueue function will record the CUDA event timers if a profiler is provided. But it will not perform the layer timing calculation. This function needs to be called explicitly to calculate layer timing for the previous inference launch. - In the CUDA graph launch scenario, it will record the same set of CUDA events as in regular enqueue functions if the graph is captured from an - IExecutionContextwith profiler enabled. This function needs to be called after graph launch to report the layer timing info to the profiler.- Profiling CUDA graphs is only available from CUDA 11.1 onwards. - Returns:
- Trueif the call succeeded, else- False(e.g. profiler not provided, in CUDA graph capture mode, etc.)
 
 - set_all_tensors_debug_state(self: tensorrt.tensorrt.IExecutionContext, flag: bool) bool¶
- Turn the debug state of all debug tensors on or off. - Parameters:
- flag – True if turning on debug state of tensor. False if turning off. 
 
 - set_aux_streams(self: tensorrt.tensorrt.IExecutionContext, aux_streams: List[int]) None¶
- Set the auxiliary streams that TensorRT should launch kernels on in the next execute_async_v3() call. - If set, TensorRT will launch the kernels that are supposed to run on the auxiliary streams using the streams provided by the user with this API. If this API is not called before the execute_async_v3() call, then TensorRT will use the auxiliary streams created by TensorRT internally. - TensorRT will always insert event synchronizations between the main stream provided via execute_async_v3() call and the auxiliary streams:
- At the beginning of the execute_async_v3() call, TensorRT will make sure that all the auxiliary streams wait on the activities on the main stream. 
- At the end of the execute_async_v3() call, TensorRT will make sure that the main stream wait on the activities on all the auxiliary streams. 
 
 - The provided auxiliary streams must not be the default stream and must all be different to avoid deadlocks. - Parameters:
- aux_streams – A list of cuda streams. If the length of the list is greater than engine.num_aux_streams, then only the first “engine.num_aux_streams” streams will be used. If the length is less than engine.num_aux_streams, such as an empty list, then TensorRT will use the provided streams for the first few auxiliary streams, and will create additional streams internally for the rest of the auxiliary streams. 
 
 - set_debug_listener(self: tensorrt.tensorrt.IExecutionContext, listener: tensorrt.tensorrt.IDebugListener) bool¶
- Set debug listener for execution context. - Parameters:
- listener – The - IDebugListener.
 
 - set_device_memory(self: tensorrt.tensorrt.IExecutionContext, memory: int, size: int) None¶
- The device memory for use by this - IExecutionContext.- Parameters:
- memory – 256-byte aligned device memory. 
- size – Size of the provided memory. This must be at least as large as CudaEngine.get_device_memory_size_v2 
 
 - If using - enqueue_v3(), it is in use until- enqueue_v3()returns. Releasing or otherwise using the memory for other purposes during this time will result in undefined behavior. This includes using the same memory for a parallel execution context.
 - set_input_consumed_event(self: tensorrt.tensorrt.IExecutionContext, event: int) bool¶
- Mark all input tensors as consumed. - Parameters:
- event – The cuda event that is triggered after all input tensors have been consumed. 
 
 - set_input_shape(*args, **kwargs)¶
- Overloaded function. - set_input_shape(self: tensorrt.tensorrt.IExecutionContext, name: str, shape: tuple) -> bool - Set shape for the given input tensor. - arg name:
- The input tensor name. 
- arg shape:
- The input tensor shape. 
 
- set_input_shape(self: tensorrt.tensorrt.IExecutionContext, name: str, shape: list) -> bool - Set shape for the given input tensor. - arg name:
- The input tensor name. 
- arg shape:
- The input tensor shape. 
 
- set_input_shape(self: tensorrt.tensorrt.IExecutionContext, name: str, shape: tensorrt.tensorrt.Dims) -> bool - Set shape for the given input tensor. - arg name:
- The input tensor name. 
- arg shape:
- The input tensor shape. 
 
 
 - set_optimization_profile_async(self: tensorrt.tensorrt.IExecutionContext, profile_index: int, stream_handle: int) bool¶
- Set the optimization profile with async semantics - Parameters:
- profile_index – The index of the optimization profile 
- stream_handle – cuda stream on which the work to switch optimization profile can be enqueued 
 
 - When an optimization profile is switched via this API, TensorRT may require that data is copied via cudaMemcpyAsync. It is the application’s responsibility to guarantee that synchronization between the profile sync stream and the enqueue stream occurs. - Returns:
- Trueif the optimization profile was set successfully
 
 - set_output_allocator(self: tensorrt.tensorrt.IExecutionContext, name: str, output_allocator: tensorrt.tensorrt.IOutputAllocator) bool¶
- Set output allocator to use for the given output tensor. - Pass - Noneto unset the output allocator.- The allocator is called by - execute_async_v3().- Parameters:
- name – The tensor name. 
- output_allocator – The output allocator. 
 
 
 - set_tensor_address(self: tensorrt.tensorrt.IExecutionContext, name: str, memory: int) bool¶
- Set memory address for the given input or output tensor. - Parameters:
- name – The tensor name. 
- memory – The memory address. 
 
 
 - set_tensor_debug_state(self: tensorrt.tensorrt.IExecutionContext, name: str, flag: bool) bool¶
- Turn the debug state of a tensor on or off. The Tensor must have been marked as a debug tensor during build time. - Parameters:
- name – The name of the target tensor. 
- flag – True if turning on debug state of tensor. False if turning off. 
 
 
 - update_device_memory_size_for_shapes(self: tensorrt.tensorrt.IExecutionContext) int¶
- Recompute the internal activation buffer sizes based on the current input shapes, and return the total amount of memory required. - Users can allocate the device memory based on the size returned and provided the memory to TRT with an assignment to IExecutionContext.device_memory. Must specify all input shapes and the optimization profile to use before calling this function, otherwise the partition will be invalidated.