Runners
Module: polygraphy.backend.trt
- class TrtRunner(engine, name: Optional[str] = None, optimization_profile: Optional[int] = None)[source]
Bases:
polygraphy.backend.base.runner.BaseRunner
Runs inference using TensorRT.
Note that runners are not designed for production deployment and should generally be used only for prototyping, testing, and debugging.
- Parameters
engine (Union[Union[trt.ICudaEngine, trt.IExecutionContext], Callable() -> Union[trt.ICudaEngine, trt.IExecutionContext]]) – A TensorRT engine or execution context or a callable that returns one. If an engine is provided, the runner will create a context automatically.
name (str) – The human-readable name prefix to use for this runner. A runner count and timestamp will be appended to this prefix.
optimization_profile (int) – The index of the optimization profile to set each time this runner is activated. When this is not provided, the profile is not set explicitly and will default to the 0th profile. You can also change the profile after the runner is active using the
set_profile()
method.
- set_profile(index: int)[source]
Sets the active optimization profile for this runner. The runner must already be active (see
__enter__()
oractivate()
).This only applies if your engine was built with multiple optimization profiles.
In TensorRT 8.0 and newer, the profile will be set asynchronously using this runner’s CUDA stream (
runner.stream
).By default, the runner uses the first profile (profile 0).
- Parameters
index (int) – The index of the optimization profile to use.
- infer_impl(feed_dict, copy_outputs_to_host=None)[source]
Implementation for running inference with TensorRT. Do not call this method directly - use
infer()
instead, which will forward unrecognized arguments to this method.In addition to accepting NumPy arrays in the feed_dict, this runner can also accept Polygraphy DeviceViews. In that case, no host-to-device copy is necessary for the inputs.
- Parameters
feed_dict (OrderedDict[str, Union[numpy.ndarray, DeviceView]]) – A mapping of input tensor names to corresponding input NumPy arrays or Polygraphy DeviceViews.
copy_outputs_to_host (bool) – Whether to copy inference outputs back to host memory. If this is False, Polygraphy DeviceViews are returned instead of NumPy arrays. Defaults to True.
- Returns
A mapping of output tensor names to corresponding output NumPy arrays or Polygraphy DeviceViews.
- Return type
OrderedDict[str, Union[numpy.ndarray, DeviceView]]
- __enter__()
Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.
- __exit__(exc_type, exc_value, traceback)
Deactivate the runner. For example, this may involve freeing CPU or GPU memory.
- activate()
Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.
Generally, you should use a context manager instead of manually activating and deactivating. For example:
with RunnerType(...) as runner: runner.infer(...)
- deactivate()
Deactivate the runner. For example, this may involve freeing CPU or GPU memory.
Generally, you should use a context manager instead of manually activating and deactivating. For example:
with RunnerType(...) as runner: runner.infer(...)
- get_input_metadata()
Returns information about the inputs of the model. Shapes here may include dynamic dimensions, represented by
None
. Must be called only afteractivate()
and beforedeactivate()
.- Returns
Input names, shapes, and data types.
- Return type
- infer(feed_dict, check_inputs=True, *args, **kwargs)
Runs inference using the provided feed_dict.
Must be called only after
activate()
and beforedeactivate()
.NOTE: Some runners may accept additional parameters in infer(). For details on these, see the documentation for their infer_impl() methods.
- Parameters
feed_dict (OrderedDict[str, numpy.ndarray]) – A mapping of input tensor names to corresponding input NumPy arrays.
check_inputs (bool) – Whether to check that the provided
feed_dict
includes the expected inputs with the expected data types and shapes. Disabling this may improve performance. Defaults to True.
- inference_time
The time required to run inference in seconds. Derived classes should set this so that performance metrics are accurate.
- Type
float
- Returns
A mapping of output tensor names to their corresponding NumPy arrays.
IMPORTANT: Runners may reuse these output buffers. Thus, if you need to save outputs from multiple inferences, you should make a copy with
copy.deepcopy(outputs)
.- Return type
OrderedDict[str, numpy.ndarray]
- last_inference_time()
Returns the total inference time in seconds required during the last call to
infer()
.Must be called only after
activate()
and beforedeactivate()
.- Returns
The time in seconds, or None if runtime was not measured by the runner.
- Return type
float
- is_active
Whether this runner has been activated, either via context manager, or by calling
activate()
.- Type
bool