Loaders

Module: polygraphy.backend.trt

class LoadPlugins(plugins=None, obj=None)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

A passthrough loader that loads plugins from the specified paths. Passthrough here means that it can be used to wrap any other loader. The purpose of wrapping another loader is that you can control the order of execution when lazily evaluating.

For immediate evaluation, use load_plugins instead:

load_plugins(plugins=["/path/to/my/plugin.so", "/path/to/my/other_plugin.so"])

Loads plugins from the specified paths.

Parameters
  • plugins (List[str]) – A list of paths to plugin libraries to load before inference.

  • obj (object) – An object or callable to return or call respectively. If obj is callable, extra parameters will be forwarded to obj. If obj is not callable, it will be returned.

call_impl(*args, **kwargs)[source]
Returns

The provided obj argument, or its return value if it is callable. Returns None if obj was not set.

Return type

object

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class CreateNetwork(explicit_precision=None, explicit_batch=None)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

Functor that creates an empty TensorRT network.

Creates an empty TensorRT network.

Parameters
  • explicit_precision (bool) – Whether to create the network with explicit precision enabled. Defaults to False

  • explicit_batch (bool) – Whether to create the network with explicit batch mode. Defaults to True.

call_impl()[source]
Returns

The builder and empty network.

Return type

(trt.Builder, trt.INetworkDefinition)

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class NetworkFromOnnxBytes(model_bytes, explicit_precision=None)[source]

Bases: polygraphy.backend.trt.loader.BaseNetworkFromOnnx

Functor that parses an ONNX model to create a trt.INetworkDefinition.

Parses an ONNX model.

Parameters
  • model_bytes (Union[bytes, Callable() -> bytes]) – A serialized ONNX model or a callable that returns one.

  • explicit_precision (bool) – Whether to construct the TensorRT network with explicit precision enabled.

call_impl()[source]
Returns

A TensorRT network, as well as the builder used to create it, and the parser used to populate it.

Return type

(trt.IBuilder, trt.INetworkDefinition, trt.OnnxParser)

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class NetworkFromOnnxPath(path, explicit_precision=None)[source]

Bases: polygraphy.backend.trt.loader.BaseNetworkFromOnnx

Functor that parses an ONNX model to create a trt.INetworkDefinition. This loader supports models with weights stored in an external location.

Parses an ONNX model from a file.

Parameters

path (str) – The path from which to load the model.

call_impl()[source]
Returns

A TensorRT network, as well as the builder used to create it, and the parser used to populate it.

Return type

(trt.IBuilder, trt.INetworkDefinition, trt.OnnxParser)

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class ModifyNetworkOutputs(network, outputs=None, exclude_outputs=None)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

Functor that modifies outputs in a TensorRT INetworkDefinition.

Modifies outputs in a TensorRT INetworkDefinition.

Parameters
  • network (Union[Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]], Callable() -> Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]]) – A tuple containing a TensorRT builder, network and optionally parser or a callable that returns one. To omit the parser, return a tuple containing just the builder and network.

  • outputs (Sequence[str]) – Names of tensors to mark as outputs. If provided, this will override the outputs already marked in the network. If a value of constants.MARK_ALL is used instead of a list, all tensors in the network are marked.

  • exclude_outputs (Sequence[str]) – Names of tensors to exclude as outputs. This can be useful in conjunction with outputs=constants.MARK_ALL to omit outputs.

call_impl()[source]
Returns

The modified network.

Return type

trt.INetworkDefinition

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class CreateConfig(max_workspace_size=None, tf32=None, fp16=None, int8=None, profiles=None, calibrator=None, strict_types=None, load_timing_cache=None, algorithm_selector=None, sparse_weights=None, tactic_sources=None, restricted=None, use_dla=None, allow_gpu_fallback=None)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

Functor that creates a TensorRT IBuilderConfig.

Creates a TensorRT IBuilderConfig that can be used by EngineFromNetwork.

Parameters
  • max_workspace_size (int) – The maximum workspace size, in bytes, when building the engine. Defaults to 16 MiB.

  • tf32 (bool) – Whether to build the engine with TF32 precision enabled. Defaults to False.

  • fp16 (bool) – Whether to build the engine with FP16 precision enabled. Defaults to False.

  • int8 (bool) – Whether to build the engine with INT8 precision enabled. Defaults to False.

  • profiles (List[Profile]) – A list of optimization profiles to add to the configuration. Only needed for networks with dynamic input shapes. If this is omitted for a network with dynamic shapes, a default profile is created, where dynamic dimensions are replaced with Polygraphy’s DEFAULT_SHAPE_VALUE (defined in constants.py). A partially populated profile will be automatically filled using values from Profile.fill_defaults() See Profile for details.

  • calibrator (trt.IInt8Calibrator) – An int8 calibrator. Only required in int8 mode when the network does not have explicit precision. For networks with dynamic shapes, the last profile provided (or default profile if no profiles are provided) is used during calibration.

  • strict_types (bool) – Whether to enable strict types in the builder. This will constrain the builder from using data types other than those specified in the network. Defaults to False.

  • load_timing_cache (Union[str, file-like]) – A path or file-like object from which to load a tactic timing cache. Providing a tactic timing cache can speed up the engine building process. Caches can be generated while building an engine with, for example, EngineFromNetwork.

  • algorithm_selector (trt.IAlgorithmSelector) – An algorithm selector. Allows the user to control how tactics are selected instead of letting TensorRT select them automatically.

  • sparse_weights (bool) – Whether to enable optimizations for sparse weights. Defaults to False.

  • tactic_sources (List[trt.TacticSource]) – The tactic sources to enable. This controls which libraries (e.g. cudnn, cublas, etc.) TensorRT is allowed to load tactics from. Use an empty list to disable all tactic sources. Defaults to TensorRT’s default tactic sources.

  • restricted (bool) – Whether to enable safety scope checking in the builder. This will check if the network and builder configuration are compatible with safety scope. Defaults to False.

  • use_dla (bool) – [EXPERIMENTAL] Whether to enable DLA as the default device type. Defaults to False.

  • allow_gpu_fallback (bool) – [EXPERIMENTAL] When DLA is enabled, whether to allow layers to fall back to GPU if they cannot be run on DLA. Has no effect if DLA is not enabled. Defaults to False.

call_impl(builder, network)[source]
Parameters
  • builder (trt.Builder) – The TensorRT builder to use to create the configuration.

  • network (trt.INetworkDefinition) – The TensorRT network for which to create the config. The network is used to automatically create a default optimization profile if none are provided.

Returns

The TensorRT builder configuration.

Return type

trt.IBuilderConfig

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class EngineBytesFromNetwork(network, config=None, save_timing_cache=None)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

Functor that uses a TensorRT INetworkDefinition to build a serialized engine.

Builds and serializes TensorRT engine.

Parameters
  • network (Union[Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]], Callable() -> Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]]) – A tuple containing a TensorRT builder, network and optionally parser or a callable that returns one. To omit the parser, return a tuple containing just the builder and network.

  • config (Callable(trt.Builder, trt.INetworkDefinition) -> trt.IBuilderConfig) – A TensorRT builder configuration or a callable that returns one. If not supplied, a CreateConfig instance with default parameters is used.

  • save_timing_cache (Union[str, file-like]) – A path or file-like object at which to save a tactic timing cache. Any existing cache will be overwritten. Note that if the provided config includes a tactic timing cache, the data from that cache will be copied into the new cache.

call_impl()[source]
Returns

The serialized engine that was created.

Return type

bytes

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class EngineFromNetwork(network, config=None, save_timing_cache=None)[source]

Bases: polygraphy.backend.trt.loader.EngineBytesFromNetwork

Similar to EngineBytesFromNetwork, but returns an ICudaEngine instance instead of a serialized engine.

Builds and serializes TensorRT engine.

Parameters
  • network (Union[Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]], Callable() -> Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]]) – A tuple containing a TensorRT builder, network and optionally parser or a callable that returns one. To omit the parser, return a tuple containing just the builder and network.

  • config (Callable(trt.Builder, trt.INetworkDefinition) -> trt.IBuilderConfig) – A TensorRT builder configuration or a callable that returns one. If not supplied, a CreateConfig instance with default parameters is used.

  • save_timing_cache (Union[str, file-like]) – A path or file-like object at which to save a tactic timing cache. Any existing cache will be overwritten. Note that if the provided config includes a tactic timing cache, the data from that cache will be copied into the new cache.

call_impl()[source]
Returns

The engine that was created.

Return type

trt.ICudaEngine

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class EngineFromBytes(serialized_engine)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

Functor that deserializes an engine from a buffer.

Deserializes an engine from a buffer.

Parameters

serialized_engine (Union[Union[str, bytes], Callable() -> Union[str, bytes]]) – The serialized engine bytes or a callable that returns them.

call_impl()[source]
Returns

The deserialized engine.

Return type

trt.ICudaEngine

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class BytesFromEngine(engine)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

Functor that serializes an engine.

Serializes an engine.

Parameters

engine (Union[trt.ICudaEngine, Callable() -> trt.ICudaEngine]) – An engine or a callable that returns one.

call_impl()[source]
Returns

The serialized engine.

Return type

bytes

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class SaveEngine(engine, path)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

Functor that saves an engine to the provided path.

Saves an engine to the provided path.

Parameters
  • engine (Union[trt.ICudaEngine, Callable() -> trt.ICudaEngine]) – An engine or a callable that returns one.

  • path (str) – The path at which to save the engine.

call_impl()[source]
Returns

The engine that was saved.

Return type

trt.ICudaEngine

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

class OnnxLikeFromNetwork(network)[source]

Bases: polygraphy.backend.base.loader.BaseLoader

Functor that creates an ONNX-like, but not valid ONNX, model based on a TensorRT network.

[HIGHLY EXPERIMENTAL] Creates an ONNX-like, but not valid ONNX, model from a TensorRT network. This uses the ONNX format, but generates nodes that are not valid ONNX operators. Hence, this should be used only for visualization or debugging purposes.

The resulting model does not include enough information to faithfully reconstruct the TensorRT network, but does preserve the structure of the network and many of the layer parameters.

Parameters

network (Union[Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]], Callable() -> Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]]) – A tuple containing a TensorRT builder, network and optionally parser or a callable that returns one. To omit the parser, return a tuple containing just the builder and network.

call_impl()[source]
Returns

The ONNX-like, but not valid ONNX, representation of the TensorRT network.

Return type

onnx.ModelProto

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

load_plugins(plugins=None, obj=None, *args, **kwargs)

Immediately evaluated functional variant of LoadPlugins .

Loads plugins from the specified paths.

Parameters
  • plugins (List[str]) – A list of paths to plugin libraries to load before inference.

  • obj (object) – An object or callable to return or call respectively. If obj is callable, extra parameters will be forwarded to obj. If obj is not callable, it will be returned.

Returns

The provided obj argument, or its return value if it is callable. Returns None if obj was not set.

Return type

object

create_network(explicit_precision=None, explicit_batch=None)

Immediately evaluated functional variant of CreateNetwork .

Creates an empty TensorRT network.

Parameters
  • explicit_precision (bool) – Whether to create the network with explicit precision enabled. Defaults to False

  • explicit_batch (bool) – Whether to create the network with explicit batch mode. Defaults to True.

Returns

The builder and empty network.

Return type

(trt.Builder, trt.INetworkDefinition)

network_from_onnx_bytes(model_bytes, explicit_precision=None)

Immediately evaluated functional variant of NetworkFromOnnxBytes .

Parses an ONNX model.

Parameters
  • model_bytes (Union[bytes, Callable() -> bytes]) – A serialized ONNX model or a callable that returns one.

  • explicit_precision (bool) – Whether to construct the TensorRT network with explicit precision enabled.

Returns

A TensorRT network, as well as the builder used to create it, and the parser used to populate it.

Return type

(trt.IBuilder, trt.INetworkDefinition, trt.OnnxParser)

network_from_onnx_path(path, explicit_precision=None)

Immediately evaluated functional variant of NetworkFromOnnxPath .

Parses an ONNX model from a file.

Parameters

path (str) – The path from which to load the model.

Returns

A TensorRT network, as well as the builder used to create it, and the parser used to populate it.

Return type

(trt.IBuilder, trt.INetworkDefinition, trt.OnnxParser)

modify_network_outputs(network, outputs=None, exclude_outputs=None)

Immediately evaluated functional variant of ModifyNetworkOutputs .

Modifies outputs in a TensorRT INetworkDefinition.

Parameters
  • network (Union[Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]], Callable() -> Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]]) – A tuple containing a TensorRT builder, network and optionally parser or a callable that returns one. To omit the parser, return a tuple containing just the builder and network.

  • outputs (Sequence[str]) – Names of tensors to mark as outputs. If provided, this will override the outputs already marked in the network. If a value of constants.MARK_ALL is used instead of a list, all tensors in the network are marked.

  • exclude_outputs (Sequence[str]) – Names of tensors to exclude as outputs. This can be useful in conjunction with outputs=constants.MARK_ALL to omit outputs.

Returns

The modified network.

Return type

trt.INetworkDefinition

create_config(builder, network, max_workspace_size=None, tf32=None, fp16=None, int8=None, profiles=None, calibrator=None, strict_types=None, load_timing_cache=None, algorithm_selector=None, sparse_weights=None, tactic_sources=None, restricted=None, use_dla=None, allow_gpu_fallback=None)

Immediately evaluated functional variant of CreateConfig .

Creates a TensorRT IBuilderConfig that can be used by EngineFromNetwork.

Parameters
  • max_workspace_size (int) – The maximum workspace size, in bytes, when building the engine. Defaults to 16 MiB.

  • tf32 (bool) – Whether to build the engine with TF32 precision enabled. Defaults to False.

  • fp16 (bool) – Whether to build the engine with FP16 precision enabled. Defaults to False.

  • int8 (bool) – Whether to build the engine with INT8 precision enabled. Defaults to False.

  • profiles (List[Profile]) – A list of optimization profiles to add to the configuration. Only needed for networks with dynamic input shapes. If this is omitted for a network with dynamic shapes, a default profile is created, where dynamic dimensions are replaced with Polygraphy’s DEFAULT_SHAPE_VALUE (defined in constants.py). A partially populated profile will be automatically filled using values from Profile.fill_defaults() See Profile for details.

  • calibrator (trt.IInt8Calibrator) – An int8 calibrator. Only required in int8 mode when the network does not have explicit precision. For networks with dynamic shapes, the last profile provided (or default profile if no profiles are provided) is used during calibration.

  • strict_types (bool) – Whether to enable strict types in the builder. This will constrain the builder from using data types other than those specified in the network. Defaults to False.

  • load_timing_cache (Union[str, file-like]) – A path or file-like object from which to load a tactic timing cache. Providing a tactic timing cache can speed up the engine building process. Caches can be generated while building an engine with, for example, EngineFromNetwork.

  • algorithm_selector (trt.IAlgorithmSelector) – An algorithm selector. Allows the user to control how tactics are selected instead of letting TensorRT select them automatically.

  • sparse_weights (bool) – Whether to enable optimizations for sparse weights. Defaults to False.

  • tactic_sources (List[trt.TacticSource]) – The tactic sources to enable. This controls which libraries (e.g. cudnn, cublas, etc.) TensorRT is allowed to load tactics from. Use an empty list to disable all tactic sources. Defaults to TensorRT’s default tactic sources.

  • restricted (bool) – Whether to enable safety scope checking in the builder. This will check if the network and builder configuration are compatible with safety scope. Defaults to False.

  • use_dla (bool) – [EXPERIMENTAL] Whether to enable DLA as the default device type. Defaults to False.

  • allow_gpu_fallback (bool) – [EXPERIMENTAL] When DLA is enabled, whether to allow layers to fall back to GPU if they cannot be run on DLA. Has no effect if DLA is not enabled. Defaults to False.

  • builder (trt.Builder) – The TensorRT builder to use to create the configuration.

  • network (trt.INetworkDefinition) – The TensorRT network for which to create the config. The network is used to automatically create a default optimization profile if none are provided.

Returns

The TensorRT builder configuration.

Return type

trt.IBuilderConfig

engine_bytes_from_network(network, config=None, save_timing_cache=None)

Immediately evaluated functional variant of EngineBytesFromNetwork .

Builds and serializes TensorRT engine.

Parameters
  • network (Union[Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]], Callable() -> Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]]) – A tuple containing a TensorRT builder, network and optionally parser or a callable that returns one. To omit the parser, return a tuple containing just the builder and network.

  • config (Callable(trt.Builder, trt.INetworkDefinition) -> trt.IBuilderConfig) – A TensorRT builder configuration or a callable that returns one. If not supplied, a CreateConfig instance with default parameters is used.

  • save_timing_cache (Union[str, file-like]) – A path or file-like object at which to save a tactic timing cache. Any existing cache will be overwritten. Note that if the provided config includes a tactic timing cache, the data from that cache will be copied into the new cache.

Returns

The serialized engine that was created.

Return type

bytes

engine_from_network(network, config=None, save_timing_cache=None)

Immediately evaluated functional variant of EngineFromNetwork .

Builds and serializes TensorRT engine.

Parameters
  • network (Union[Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]], Callable() -> Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]]) – A tuple containing a TensorRT builder, network and optionally parser or a callable that returns one. To omit the parser, return a tuple containing just the builder and network.

  • config (Callable(trt.Builder, trt.INetworkDefinition) -> trt.IBuilderConfig) – A TensorRT builder configuration or a callable that returns one. If not supplied, a CreateConfig instance with default parameters is used.

  • save_timing_cache (Union[str, file-like]) – A path or file-like object at which to save a tactic timing cache. Any existing cache will be overwritten. Note that if the provided config includes a tactic timing cache, the data from that cache will be copied into the new cache.

Returns

The engine that was created.

Return type

trt.ICudaEngine

engine_from_bytes(serialized_engine)

Immediately evaluated functional variant of EngineFromBytes .

Deserializes an engine from a buffer.

Parameters

serialized_engine (Union[Union[str, bytes], Callable() -> Union[str, bytes]]) – The serialized engine bytes or a callable that returns them.

Returns

The deserialized engine.

Return type

trt.ICudaEngine

bytes_from_engine(engine)

Immediately evaluated functional variant of BytesFromEngine .

Serializes an engine.

Parameters

engine (Union[trt.ICudaEngine, Callable() -> trt.ICudaEngine]) – An engine or a callable that returns one.

Returns

The serialized engine.

Return type

bytes

save_engine(engine, path)

Immediately evaluated functional variant of SaveEngine .

Saves an engine to the provided path.

Parameters
  • engine (Union[trt.ICudaEngine, Callable() -> trt.ICudaEngine]) – An engine or a callable that returns one.

  • path (str) – The path at which to save the engine.

Returns

The engine that was saved.

Return type

trt.ICudaEngine

onnx_like_from_network(network)

Immediately evaluated functional variant of OnnxLikeFromNetwork .

[HIGHLY EXPERIMENTAL] Creates an ONNX-like, but not valid ONNX, model from a TensorRT network. This uses the ONNX format, but generates nodes that are not valid ONNX operators. Hence, this should be used only for visualization or debugging purposes.

The resulting model does not include enough information to faithfully reconstruct the TensorRT network, but does preserve the structure of the network and many of the layer parameters.

Parameters

network (Union[Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]], Callable() -> Tuple[trt.Builder, trt.INetworkDefinition, Optional[parser]]) – A tuple containing a TensorRT builder, network and optionally parser or a callable that returns one. To omit the parser, return a tuple containing just the builder and network.

Returns

The ONNX-like, but not valid ONNX, representation of the TensorRT network.

Return type

onnx.ModelProto