Runners
Module: polygraphy.backend.onnxrt
- class OnnxrtRunner(sess, name=None)[source]
Bases:
BaseRunnerRuns inference using an ONNX-Runtime inference session.
- Parameters:
sess (Union[onnxruntime.InferenceSession, Callable() -> onnxruntime.InferenceSession]) – An ONNX-Runtime inference session or a callable that returns one.
- infer_impl(feed_dict)[source]
Implementation for running inference with ONNX-Runtime. Do not call this method directly - use
infer()instead, which will forward unrecognized arguments to this method.- Parameters:
feed_dict (OrderedDict[str, Union[numpy.ndarray, torch.Tensor]]) – A mapping of input tensor names to corresponding input NumPy arrays or PyTorch tensors. If PyTorch tensors are provided in the feed_dict, then this function will return the outputs also as PyTorch tensors.
- Returns:
A mapping of output tensor names to corresponding output NumPy arrays or PyTorch tensors.
- Return type:
OrderedDict[str, Union[numpy.ndarray, torch.Tensor]]
- __enter__()
Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.
- __exit__(exc_type, exc_value, traceback)
Deactivate the runner. For example, this may involve freeing CPU or GPU memory.
- activate()
Activate the runner for inference. For example, this may involve allocating CPU or GPU memory.
Generally, you should use a context manager instead of manually activating and deactivating. For example:
with RunnerType(...) as runner: runner.infer(...)
- deactivate()
Deactivate the runner. For example, this may involve freeing CPU or GPU memory.
Generally, you should use a context manager instead of manually activating and deactivating. For example:
with RunnerType(...) as runner: runner.infer(...)
- get_input_metadata(use_numpy_dtypes=None)
Returns information about the inputs of the model. Shapes here may include dynamic dimensions, represented by
None. Must be called only afteractivate()and beforedeactivate().- Parameters:
use_numpy_dtypes (bool) – [DEPRECATED] Whether to return NumPy data types instead of Polygraphy
DataTypes. This is provided to retain backwards compatibility. In the future, this parameter will be removed and PolygraphyDataTypes will always be returned. These can be converted to NumPy data types by calling the numpy() method. Defaults to True.- Returns:
Input names, shapes, and data types.
- Return type:
- infer(feed_dict, check_inputs=True, *args, **kwargs)
Runs inference using the provided feed_dict.
Must be called only after
activate()and beforedeactivate().NOTE: Some runners may accept additional parameters in infer(). For details on these, see the documentation for their infer_impl() methods.
- Parameters:
feed_dict (OrderedDict[str, numpy.ndarray]) – A mapping of input tensor names to corresponding input NumPy arrays.
check_inputs (bool) – Whether to check that the provided
feed_dictincludes the expected inputs with the expected data types and shapes. Disabling this may improve performance. Defaults to True.
- inference_time
The time required to run inference in seconds.
- Type:
float
- Returns:
A mapping of output tensor names to their corresponding NumPy arrays.
IMPORTANT: Runners may reuse these output buffers. Thus, if you need to save outputs from multiple inferences, you should make a copy with
copy.deepcopy(outputs).- Return type:
OrderedDict[str, numpy.ndarray]
- last_inference_time()
Returns the total inference time in seconds required during the last call to
infer().Must be called only after
activate()and beforedeactivate().- Returns:
The time in seconds, or None if runtime was not measured by the runner.
- Return type:
float
- is_active
Whether this runner has been activated, either via context manager, or by calling
activate().- Type:
bool