Config

Module: polygraphy.backend.trt

class CreateConfig(tf32=None, fp16=None, int8=None, profiles=None, calibrator=None, precision_constraints=None, load_timing_cache=None, algorithm_selector=None, sparse_weights=None, tactic_sources=None, restricted=None, use_dla=None, allow_gpu_fallback=None, profiling_verbosity=None, memory_pool_limits=None, refittable=None, strip_plan=None, preview_features=None, engine_capability=None, direct_io=None, builder_optimization_level=None, fp8=None, hardware_compatibility_level=None, max_aux_streams=None, version_compatible=None, exclude_lean_runtime=None, quantization_flags=None, error_on_timing_cache_miss=None, bf16=None, disable_compilation_cache=None, progress_monitor=None, weight_streaming=None)[source]

Bases: BaseLoader

Functor that creates a TensorRT IBuilderConfig.

Creates a TensorRT IBuilderConfig that can be used by EngineFromNetwork.

Parameters:
  • tf32 (bool) – Whether to build the engine with TF32 precision enabled. Defaults to False.

  • fp16 (bool) – Whether to build the engine with FP16 precision enabled. Defaults to False.

  • int8 (bool) – Whether to build the engine with INT8 precision enabled. Defaults to False.

  • profiles (List[Profile]) – A list of optimization profiles to add to the configuration. Only needed for networks with dynamic input shapes. If this is omitted for a network with dynamic shapes, a default profile is created, where dynamic dimensions are replaced with Polygraphy’s DEFAULT_SHAPE_VALUE (defined in constants.py). A partially populated profile will be automatically filled using values from Profile.fill_defaults() See Profile for details.

  • calibrator (trt.IInt8Calibrator) – An int8 calibrator. Only required in int8 mode when the network does not have explicit precision. For networks with dynamic shapes, the last profile provided (or default profile if no profiles are provided) is used during calibration.

  • precision_constraints (Optional[str]) – If set to “obey”, require that layers execute in specified precisions. If set to “prefer”, prefer that layers execute in specified precisions but allow TRT to fall back to other precisions if no implementation exists for the requested precision. Otherwise, precision constraints are ignored. Defaults to None.

  • load_timing_cache (Union[str, file-like]) – A path or file-like object from which to load a tactic timing cache. Providing a tactic timing cache can speed up the engine building process. Caches can be generated while building an engine with, for example, EngineFromNetwork. If a path is provided, the file will be locked for exclusive access so that other processes cannot update the cache while it is being read. If the file specified by the path does not exist, CreateConfig will emit a warning and fall back to using an empty timing cache.

  • algorithm_selector (trt.IAlgorithmSelector) – An algorithm selector. Allows the user to control how tactics are selected instead of letting TensorRT select them automatically.

  • sparse_weights (bool) – Whether to enable optimizations for sparse weights. Defaults to False.

  • tactic_sources (List[trt.TacticSource]) – The tactic sources to enable. This controls which libraries (e.g. cudnn, cublas, etc.) TensorRT is allowed to load tactics from. Use an empty list to disable all tactic sources. Defaults to TensorRT’s default tactic sources.

  • restricted (bool) – Whether to enable safety scope checking in the builder. This will check if the network and builder configuration are compatible with safety scope. Defaults to False.

  • use_dla (bool) – [EXPERIMENTAL] Whether to enable DLA as the default device type. Defaults to False.

  • allow_gpu_fallback (bool) – [EXPERIMENTAL] When DLA is enabled, whether to allow layers to fall back to GPU if they cannot be run on DLA. Has no effect if DLA is not enabled. Defaults to False.

  • profiling_verbosity (trt.ProfilingVerbosity) – The verbosity of NVTX annotations in the generated engine. Higher verbosity allows you to determine more information about the engine. Defaults to trt.ProfilingVerbosity.VERBOSE.

  • memory_pool_limits (Dict[trt.MemoryPoolType, int]) – Limits for different memory pools. This should be a mapping of pool types to their respective limits in bytes.

  • refittable (bool) – Enables the engine to be refitted with new weights after it is built. Defaults to False.

  • strip_plan (bool) – Strips the refittable weights from the engine plan file. Defaults to False.

  • preview_features (List[trt.PreviewFeature]) – The preview features to enable. Use an empty list to disable all preview features. Defaults to TensorRT’s default preview features.

  • engine_capability (trt.EngineCapability) – The engine capability to build for. Defaults to the default TensorRT engine capability.

  • direct_io (bool) – Whether to disallow reformatting layers at network input/output tensors with user-specified formats. Defaults to False.

  • builder_optimization_level (int) – The builder optimization level. A higher optimization level allows the optimizer to spend more time searching for optimization opportunities. The resulting engine may have better performance compared to an engine built with a lower optimization level. Refer to the TensorRT API documentation for details. Defaults to TensorRT’s default optimization level.

  • fp8 (bool) – Whether to build the engine with FP8 precision enabled. Defaults to False.

  • hardware_compatibility_level (trt.HardwareCompatibilityLevel) – The hardware compatibility level. This allows engines built on one GPU architecture to work on GPUs of other architectures. Defaults to TensorRT’s default hardware compatibility level.

  • max_aux_streams (int) – The maximum number of auxiliary streams that TensorRT is allowed to use. If the network contains operators that can run in parallel, TRT can execute them using auxiliary streams in addition to the one provided to the IExecutionContext::enqueueV3() call. The default maximum number of auxiliary streams is determined by the heuristics in TensorRT on whether enabling multi-stream would improve the performance.

  • version_compatible (bool) – Whether to build an engine that is version compatible.

  • exclude_lean_runtime (bool) – Whether to exclude the lean runtime in version compatible engines. Requires that version compatibility is enabled.

  • quantization_flags (List[trt.QuantizationFlag]) – The quantization flags to enable. Use an empty list to disable all quantization flags. Defaults to TensorRT’s default quantization flags.

  • error_on_timing_cache_miss (bool) – Emit error when a tactic being timed is not present in the timing cache. This flag has an effect only when IBuilderConfig has an associated ITimingCache. Defaults to False.

  • bf16 (bool) – Whether to build the engine with BF16 precision enabled. Defaults to False.

  • disable_compilation_cache (bool) – Whether to disable caching JIT-compiled code. Defaults to False.

  • progress_monitor (trt.IProgressMonitor) – A progress monitor. Allow users to view engine building progress through CLI.

  • weight_streaming (bool) – TWhether to enable weight streaming for the TensorRT Engine.

call_impl(builder, network)[source]
Parameters:
  • builder (trt.Builder) – The TensorRT builder to use to create the configuration.

  • network (trt.INetworkDefinition) – The TensorRT network for which to create the config. The network is used to automatically create a default optimization profile if none are provided.

Returns:

The TensorRT builder configuration.

Return type:

trt.IBuilderConfig

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

create_config(builder, network, tf32=None, fp16=None, int8=None, profiles=None, calibrator=None, precision_constraints=None, load_timing_cache=None, algorithm_selector=None, sparse_weights=None, tactic_sources=None, restricted=None, use_dla=None, allow_gpu_fallback=None, profiling_verbosity=None, memory_pool_limits=None, refittable=None, strip_plan=None, preview_features=None, engine_capability=None, direct_io=None, builder_optimization_level=None, fp8=None, hardware_compatibility_level=None, max_aux_streams=None, version_compatible=None, exclude_lean_runtime=None, quantization_flags=None, error_on_timing_cache_miss=None, bf16=None, disable_compilation_cache=None, progress_monitor=None, weight_streaming=None)

Immediately evaluated functional variant of CreateConfig .

Creates a TensorRT IBuilderConfig that can be used by EngineFromNetwork.

Parameters:
  • tf32 (bool) – Whether to build the engine with TF32 precision enabled. Defaults to False.

  • fp16 (bool) – Whether to build the engine with FP16 precision enabled. Defaults to False.

  • int8 (bool) – Whether to build the engine with INT8 precision enabled. Defaults to False.

  • profiles (List[Profile]) – A list of optimization profiles to add to the configuration. Only needed for networks with dynamic input shapes. If this is omitted for a network with dynamic shapes, a default profile is created, where dynamic dimensions are replaced with Polygraphy’s DEFAULT_SHAPE_VALUE (defined in constants.py). A partially populated profile will be automatically filled using values from Profile.fill_defaults() See Profile for details.

  • calibrator (trt.IInt8Calibrator) – An int8 calibrator. Only required in int8 mode when the network does not have explicit precision. For networks with dynamic shapes, the last profile provided (or default profile if no profiles are provided) is used during calibration.

  • precision_constraints (Optional[str]) – If set to “obey”, require that layers execute in specified precisions. If set to “prefer”, prefer that layers execute in specified precisions but allow TRT to fall back to other precisions if no implementation exists for the requested precision. Otherwise, precision constraints are ignored. Defaults to None.

  • load_timing_cache (Union[str, file-like]) – A path or file-like object from which to load a tactic timing cache. Providing a tactic timing cache can speed up the engine building process. Caches can be generated while building an engine with, for example, EngineFromNetwork. If a path is provided, the file will be locked for exclusive access so that other processes cannot update the cache while it is being read. If the file specified by the path does not exist, CreateConfig will emit a warning and fall back to using an empty timing cache.

  • algorithm_selector (trt.IAlgorithmSelector) – An algorithm selector. Allows the user to control how tactics are selected instead of letting TensorRT select them automatically.

  • sparse_weights (bool) – Whether to enable optimizations for sparse weights. Defaults to False.

  • tactic_sources (List[trt.TacticSource]) – The tactic sources to enable. This controls which libraries (e.g. cudnn, cublas, etc.) TensorRT is allowed to load tactics from. Use an empty list to disable all tactic sources. Defaults to TensorRT’s default tactic sources.

  • restricted (bool) – Whether to enable safety scope checking in the builder. This will check if the network and builder configuration are compatible with safety scope. Defaults to False.

  • use_dla (bool) – [EXPERIMENTAL] Whether to enable DLA as the default device type. Defaults to False.

  • allow_gpu_fallback (bool) – [EXPERIMENTAL] When DLA is enabled, whether to allow layers to fall back to GPU if they cannot be run on DLA. Has no effect if DLA is not enabled. Defaults to False.

  • profiling_verbosity (trt.ProfilingVerbosity) – The verbosity of NVTX annotations in the generated engine. Higher verbosity allows you to determine more information about the engine. Defaults to trt.ProfilingVerbosity.VERBOSE.

  • memory_pool_limits (Dict[trt.MemoryPoolType, int]) – Limits for different memory pools. This should be a mapping of pool types to their respective limits in bytes.

  • refittable (bool) – Enables the engine to be refitted with new weights after it is built. Defaults to False.

  • strip_plan (bool) – Strips the refittable weights from the engine plan file. Defaults to False.

  • preview_features (List[trt.PreviewFeature]) – The preview features to enable. Use an empty list to disable all preview features. Defaults to TensorRT’s default preview features.

  • engine_capability (trt.EngineCapability) – The engine capability to build for. Defaults to the default TensorRT engine capability.

  • direct_io (bool) – Whether to disallow reformatting layers at network input/output tensors with user-specified formats. Defaults to False.

  • builder_optimization_level (int) – The builder optimization level. A higher optimization level allows the optimizer to spend more time searching for optimization opportunities. The resulting engine may have better performance compared to an engine built with a lower optimization level. Refer to the TensorRT API documentation for details. Defaults to TensorRT’s default optimization level.

  • fp8 (bool) – Whether to build the engine with FP8 precision enabled. Defaults to False.

  • hardware_compatibility_level (trt.HardwareCompatibilityLevel) – The hardware compatibility level. This allows engines built on one GPU architecture to work on GPUs of other architectures. Defaults to TensorRT’s default hardware compatibility level.

  • max_aux_streams (int) – The maximum number of auxiliary streams that TensorRT is allowed to use. If the network contains operators that can run in parallel, TRT can execute them using auxiliary streams in addition to the one provided to the IExecutionContext::enqueueV3() call. The default maximum number of auxiliary streams is determined by the heuristics in TensorRT on whether enabling multi-stream would improve the performance.

  • version_compatible (bool) – Whether to build an engine that is version compatible.

  • exclude_lean_runtime (bool) – Whether to exclude the lean runtime in version compatible engines. Requires that version compatibility is enabled.

  • quantization_flags (List[trt.QuantizationFlag]) – The quantization flags to enable. Use an empty list to disable all quantization flags. Defaults to TensorRT’s default quantization flags.

  • error_on_timing_cache_miss (bool) – Emit error when a tactic being timed is not present in the timing cache. This flag has an effect only when IBuilderConfig has an associated ITimingCache. Defaults to False.

  • bf16 (bool) – Whether to build the engine with BF16 precision enabled. Defaults to False.

  • disable_compilation_cache (bool) – Whether to disable caching JIT-compiled code. Defaults to False.

  • progress_monitor (trt.IProgressMonitor) – A progress monitor. Allow users to view engine building progress through CLI.

  • weight_streaming (bool) – TWhether to enable weight streaming for the TensorRT Engine.

  • builder (trt.Builder) – The TensorRT builder to use to create the configuration.

  • network (trt.INetworkDefinition) – The TensorRT network for which to create the config. The network is used to automatically create a default optimization profile if none are provided.

Returns:

The TensorRT builder configuration.

Return type:

trt.IBuilderConfig

class PostprocessConfig(config, func)[source]

Bases: BaseLoader

[EXPERIMENTAL] Functor that applies a given post-processing function to a TensorRT IBuilderConfig.

Applies a given post-processing function to a TensorRT IBuilderConfig.

Parameters:
  • config (Union[trt.IBuilderConfig, Callable[[trt.Builder, trt.INetworkDefinition], trt.IBuilderConfig]) – A TensorRT IBuilderConfig or a callable that accepts a TensorRT builder and network and returns a config.

  • func (Callable[[trt.Builder, trt.INetworkDefinition, trt.IBuilderConfig], None]) – A callable which takes a builder, network, and config parameter and modifies the config in place.

call_impl(builder, network)[source]
Parameters:
  • builder (trt.Builder) – The TensorRT builder to use to create the configuration.

  • network (trt.INetworkDefinition) – The TensorRT network for which to create the config. The network is used to automatically create a default optimization profile if none are provided.

Returns:

The modified builder configuration.

Return type:

trt.IBuilderConfig

__call__(*args, **kwargs)

Invokes the loader by forwarding arguments to call_impl.

Note: call_impl should not be called directly - use this function instead.

postprocess_config(config, func, builder, network)

Immediately evaluated functional variant of PostprocessConfig .

Applies a given post-processing function to a TensorRT IBuilderConfig.

Parameters:
  • config (Union[trt.IBuilderConfig, Callable[[trt.Builder, trt.INetworkDefinition], trt.IBuilderConfig]) – A TensorRT IBuilderConfig or a callable that accepts a TensorRT builder and network and returns a config.

  • func (Callable[[trt.Builder, trt.INetworkDefinition, trt.IBuilderConfig], None]) – A callable which takes a builder, network, and config parameter and modifies the config in place.

  • builder (trt.Builder) – The TensorRT builder to use to create the configuration.

  • network (trt.INetworkDefinition) – The TensorRT network for which to create the config. The network is used to automatically create a default optimization profile if none are provided.

Returns:

The modified builder configuration.

Return type:

trt.IBuilderConfig