Runners

Module: polygraphy.backend.onnxrt

class OnnxrtRunner(sess, name=None)[source]

Bases: polygraphy.backend.base.runner.BaseRunner

Runs inference using an ONNX-Runtime inference session.

Parameters

sess (Union[onnxruntime.InferenceSession, Callable() -> onnxruntime.InferenceSession]) – An ONNX-Runtime inference session or a callable that returns one.

__enter__()

Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.

__exit__(exc_type, exc_value, traceback)

Deactivate the runner. For example, this may involve freeing CPU or GPU memory.

activate()

Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.

Generally, you should use a context manager instead of manually activating and deactivating. For example:

with RunnerType(...) as runner:
    runner.infer(...)
deactivate()

Deactivate the runner. For example, this may involve freeing CPU or GPU memory.

Generally, you should use a context manager instead of manually activating and deactivating. For example:

with RunnerType(...) as runner:
    runner.infer(...)
get_input_metadata()

Returns information about the inputs of the model. Shapes here may include dynamic dimensions, represented by None. Must be called only after activate() and before deactivate().

Returns

Input names, shapes, and data types.

Return type

TensorMetadata

infer(feed_dict, check_inputs=True, *args, **kwargs)

Runs inference using the provided feed_dict.

Must be called only after activate() and before deactivate().

NOTE: Some runners may accept additional parameters in infer(). For details on these, see the documentation for their infer_impl() methods.

Parameters
  • feed_dict (OrderedDict[str, numpy.ndarray]) – A mapping of input tensor names to corresponding input NumPy arrays.

  • check_inputs (bool) – Whether to check that the provided feed_dict includes the expected inputs with the expected data types and shapes. Disabling this may improve performance. Defaults to True.

inference_time

The time required to run inference in seconds. Derived classes should set this so that performance metrics are accurate.

Type

float

Returns

A mapping of output tensor names to their corresponding NumPy arrays.

IMPORTANT: Runners may reuse these output buffers. Thus, if you need to save outputs from multiple inferences, you should make a copy with copy.deepcopy(outputs).

Return type

OrderedDict[str, numpy.ndarray]

last_inference_time()

Returns the total inference time in seconds required during the last call to infer().

Must be called only after activate() and before deactivate().

Returns

The time in seconds, or None if runtime was not measured by the runner.

Return type

float

is_active

Whether this runner has been activated, either via context manager, or by calling activate().

Type

bool