Runners¶
Module: polygraphy.backend.onnxrt
-
class
OnnxrtRunner
(sess, name=None)[source]¶ Bases:
polygraphy.backend.base.runner.BaseRunner
Runs inference using an ONNX-Runtime inference session.
- Parameters
sess (Union[onnxruntime.InferenceSession, Callable() -> onnxruntime.InferenceSession]) – An ONNX-Runtime inference session or a callable that returns one.
-
__enter__
()¶ Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.
-
__exit__
(exc_type, exc_value, traceback)¶ Deactivate the runner. For example, this may involve freeing CPU or GPU memory.
-
activate
()¶ Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.
Generally, you should use a context manager instead of manually activating and deactivating. For example:
with RunnerType(...) as runner: runner.infer(...)
-
deactivate
()¶ Deactivate the runner. For example, this may involve freeing CPU or GPU memory.
Generally, you should use a context manager instead of manually activating and deactivating. For example:
with RunnerType(...) as runner: runner.infer(...)
-
get_input_metadata
()¶ Returns information about the inputs of the model. Shapes here may include dynamic dimensions, represented by
None
. Must be called only afteractivate()
and beforedeactivate()
.- Returns
Input names, shapes, and data types.
- Return type
-
infer
(feed_dict, check_inputs=True, *args, **kwargs)¶ Runs inference using the provided feed_dict.
Must be called only after
activate()
and beforedeactivate()
.NOTE: Some runners may accept additional parameters in infer(). For details on these, see the documentation for their infer_impl() methods.
- Parameters
feed_dict (OrderedDict[str, numpy.ndarray]) – A mapping of input tensor names to corresponding input NumPy arrays.
check_inputs (bool) – Whether to check that the provided
feed_dict
includes the expected inputs with the expected data types and shapes. Disabling this may improve performance. Defaults to True.
-
inference_time
¶ The time required to run inference. Derived classes should set this so that performance metrics are accurate.
- Type
float
- Returns
A mapping of output tensor names to their corresponding NumPy arrays.
IMPORTANT: Runners may reuse these output buffers. Thus, if you need to save outputs from multiple inferences, you should make a copy with
copy.deepcopy(outputs)
.- Return type
OrderedDict[str, numpy.ndarray]
-
last_inference_time
()¶ Returns the total inference time required during the last call to
infer()
.Must be called only after
activate()
and beforedeactivate()
.- Returns
The time in seconds, or None if runtime was not measured by the runner.
- Return type
float