Runners¶
Module: polygraphy.backend.trt
-
class
TrtRunner
(engine, name=None)[source]¶ Bases:
polygraphy.backend.base.runner.BaseRunner
Runs inference using TensorRT.
Note that runners are not designed for production deployment and should generally be used only for prototyping, testing, and debugging.
- Parameters
engine (Callable() -> Union[trt.ICudaEngine, trt.IExecutionContext]) –
A callable that can supply either a TensorRT engine or execution context. If an engine is provided, the runner will create a context automatically. This callable is invoked whenever the runner is activated.
Alternatively, the engine or context may be supplied directly instead of through a callable, in which case the runner will not take ownership of it, and therefore will not destroy it.
name (str) – The human-readable name prefix to use for this runner. A runner count and timestamp will be appended to this prefix.
-
set_profile
(index)[source]¶ Sets the active optimization profile for this runner. The runner must already be active (see
__enter__()
oractivate()
).This only applies if your engine was built with multiple optimization profiles.
In TensorRT 8.0 and newer, the profile will be set asynchronously using this runner’s CUDA stream (
runner.stream
).By default, the runner uses the first profile (profile 0).
- Parameters
index (int) – The index of the optimization profile to use.
-
infer
(feed_dict, check_inputs=None)[source]¶ Runs inference using the provided feed_dict.
- Parameters
feed_dict (OrderedDict[str, numpy.ndarray]) – A mapping of input tensor names to corresponding input NumPy arrays.
check_inputs (bool) – Whether to check that the provided
feed_dict
includes the expected inputs with the expected data types and shapes.
- Returns
A mapping of output tensor names to their corresponding NumPy arrays.
IMPORTANT: Runners may reuse these output buffers. Thus, if you need to save outputs from multiple inferences, you should make a copy with
copy.deepcopy(outputs)
.- Return type
OrderedDict[str, numpy.ndarray]
-
__enter__
()¶ Activate the runner for inference. This may involve allocating GPU buffers, for example.
-
__exit__
(exc_type, exc_value, traceback)¶ Deactivate the runner.
If the POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS environment variable is set to 1, this will also check that the runner was reset to its state prior to activation.
-
activate
()¶ Activate the runner for inference. This may involve allocating GPU buffers, for example.
Generally, you should use a context manager instead of manually activating and deactivating. For example:
with RunnerType(...) as runner: runner.infer(...)
-
deactivate
()¶ Deactivate the runner.
If the POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS environment variable is set to 1, this will also check that the runner was reset to its state prior to activation.
Generally, you should use a context manager instead of manually activating and deactivating. For example:
with RunnerType(...) as runner: runner.infer(...)
-
get_input_metadata
()¶ Returns information about the inputs of the model. Shapes here may include dynamic dimensions, represented by
None
. Must be called only after activate() and before deactivate().- Returns
Input names, shapes, and data types.
- Return type
-
last_inference_time
()¶ Returns the total inference time required during the last call to
infer()
.- Returns
The time in seconds, or None if runtime was not measured by the runner.
- Return type
float