Runners

Module: polygraphy.backend.base

class BaseRunner(name=None, prefix=None)[source]

Bases: object

Base class for Polygraphy runners. All runners should override the functions and attributes specified here.

Parameters:
  • name (str) – The name to use for this runner.

  • prefix (str) – The human-readable name prefix to use for this runner. A runner count and timestamp will be appended to this prefix. Only used if name is not provided.

is_active

Whether this runner has been activated, either via context manager, or by calling activate().

Type:

bool

__enter__()[source]

Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.

__exit__(exc_type, exc_value, traceback)[source]

Deactivate the runner. For example, this may involve freeing CPU or GPU memory.

activate()[source]

Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.

Generally, you should use a context manager instead of manually activating and deactivating. For example:

with RunnerType(...) as runner:
    runner.infer(...)
get_input_metadata(use_numpy_dtypes=None)[source]

Returns information about the inputs of the model. Shapes here may include dynamic dimensions, represented by None. Must be called only after activate() and before deactivate().

Parameters:

use_numpy_dtypes (bool) – [DEPRECATED] Whether to return NumPy data types instead of Polygraphy DataType s. This is provided to retain backwards compatibility. In the future, this parameter will be removed and Polygraphy DataType s will always be returned. These can be converted to NumPy data types by calling the numpy() method. Defaults to True.

Returns:

Input names, shapes, and data types.

Return type:

TensorMetadata

infer(feed_dict, check_inputs=True, *args, **kwargs)[source]

Runs inference using the provided feed_dict.

Must be called only after activate() and before deactivate().

NOTE: Some runners may accept additional parameters in infer(). For details on these, see the documentation for their infer_impl() methods.

Parameters:
  • feed_dict (OrderedDict[str, numpy.ndarray]) – A mapping of input tensor names to corresponding input NumPy arrays.

  • check_inputs (bool) – Whether to check that the provided feed_dict includes the expected inputs with the expected data types and shapes. Disabling this may improve performance. Defaults to True.

inference_time

The time required to run inference in seconds.

Type:

float

Returns:

A mapping of output tensor names to their corresponding NumPy arrays.

IMPORTANT: Runners may reuse these output buffers. Thus, if you need to save outputs from multiple inferences, you should make a copy with copy.deepcopy(outputs).

Return type:

OrderedDict[str, numpy.ndarray]

last_inference_time()[source]

Returns the total inference time in seconds required during the last call to infer().

Must be called only after activate() and before deactivate().

Returns:

The time in seconds, or None if runtime was not measured by the runner.

Return type:

float

deactivate()[source]

Deactivate the runner. For example, this may involve freeing CPU or GPU memory.

Generally, you should use a context manager instead of manually activating and deactivating. For example:

with RunnerType(...) as runner:
    runner.infer(...)