IBuilderConfig¶
- class tensorrt_rtx.IBuilderConfig¶
- Variables:
avg_timing_iterations –
int
The number of averaging iterations used when timing layers. When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in averaging. By default the number of averaging iterations is 1.int8_calibrator –
IInt8Calibrator
[DEPRECATED] Deprecated in TensorRT 10.1. Superseded by explicit quantization. Int8 Calibration interface. The calibrator is to minimize the information loss during the INT8 quantization process.flags –
int
The build mode flags to turn on builder options for this network. The flags are listed in the BuilderFlags enum. The flags set configuration options to build the network. This should be in integer consisting of one or moreBuilderFlag
s, combined via binary OR. For example,1 << BuilderFlag.FP16 | 1 << BuilderFlag.DEBUG
.profile_stream –
int
The handle for the CUDA stream that is used to profile this network.num_optimization_profiles –
int
The number of optimization profiles.default_device_type –
tensorrt.DeviceType
The default DeviceType to be used by the Builder.DLA_core –
int
The DLA core that the engine executes on. Must be between 0 and N-1 where N is the number of available DLA cores.profiling_verbosity – Profiling verbosity in NVTX annotations.
engine_capability – The desired engine capability. See
EngineCapability
for details.algorithm_selector – [DEPRECATED] Deprecated in TensorRT 10.8. Please use editable mode in ITimingCache instead. The
IAlgorithmSelector
to use.builder_optimization_level – The builder optimization level which TensorRT should build the engine at. Setting a higher optimization level allows TensorRT to spend longer engine building time searching for more optimization options. The resulting engine may have better performance compared to an engine built with a lower optimization level. The default optimization level is 3. Valid values include integers from 0 to the maximum optimization level, which is currently 5. Setting it to be greater than the maximum level results in identical behavior to the maximum level.
max_num_tactics – The maximum number of tactics to time when there is a choice of tactics. Setting a larger number allows TensorRT to spend longer engine building time searching for more optimization options. The resulting engine may have better performance compared to an engine built with a smaller number of tactics. Valid values include integers from -1 to the maximum 32-bit integer. Default value -1 indicates that TensorRT can decide the number of tactics based on its own heuristic.
hardware_compatibility_level – Hardware compatibility allows an engine compatible with GPU architectures other than that of the GPU on which the engine was built.
plugins_to_serialize – The plugin libraries to be serialized with forward-compatible engines.
max_aux_streams – The maximum number of auxiliary streams that TRT is allowed to use. If the network contains operators that can run in parallel, TRT can execute them using auxiliary streams in addition to the one provided to the IExecutionContext::enqueueV3() call. The default maximum number of auxiliary streams is determined by the heuristics in TensorRT on whether enabling multi-stream would improve the performance. This behavior can be overridden by calling this API to set the maximum number of auxiliary streams explicitly. Set this to 0 to enforce single-stream inference. The resulting engine may use fewer auxiliary streams than the maximum if the network does not contain enough parallelism or if TensorRT determines that using more auxiliary streams does not help improve the performance. Allowing more auxiliary streams does not always give better performance since there will be synchronizations overhead between streams. Using CUDA graphs at runtime can help reduce the overhead caused by cross-stream synchronizations. Using more auxiliary leads to more memory usage at runtime since some activation memory blocks will not be able to be reused.
progress_monitor – The
IProgressMonitor
to use.tiling_optimization_level – The optimization level of tiling strategies. A Higher level allows TensorRT to spend more time searching for better optimization strategy.
l2_limit_for_tiling – The target L2 cache usage for tiling optimization.
Below are the descriptions about each builder optimization level:
Level 0: This enables the fastest compilation by disabling dynamic kernel generation and selecting the first tactic that succeeds in execution. This will also not respect a timing cache.
Level 1: Available tactics are sorted by heuristics, but only the top are tested to select the best. If a dynamic kernel is generated its compile optimization is low.
Level 2: Available tactics are sorted by heuristics, but only the fastest tactics are tested to select the best.
Level 3: Apply heuristics to see if a static precompiled kernel is applicable or if a new one has to be compiled dynamically.
Level 4: Always compiles a dynamic kernel.
Level 5: Always compiles a dynamic kernel and compares it to static kernels.
- __del__(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig) None ¶
- __exit__(exc_type, exc_value, traceback)¶
Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.
- __init__(*args, **kwargs)¶
- add_optimization_profile(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, profile: tensorrt_rtx.tensorrt_rtx.IOptimizationProfile) int ¶
Add an optimization profile.
This function must be called at least once if the network has dynamic or shape input tensors.
- Parameters:
profile – The new optimization profile, which must satisfy
bool(profile) == True
- Returns:
The index of the optimization profile (starting from 0) if the input is valid, or -1 if the input is not valid.
- can_run_on_DLA(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, layer: tensorrt_rtx.tensorrt_rtx.ILayer) bool ¶
Check if the layer can run on DLA.
- Parameters:
layer – The layer to check
- Returns:
A bool indicating whether the layer can run on DLA
- clear_flag(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, flag: tensorrt_rtx.tensorrt_rtx.BuilderFlag) None ¶
Clears the builder mode flag from the enabled flags.
- Parameters:
flag – The flag to clear.
- create_timing_cache(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, serialized_timing_cache: buffer) tensorrt_rtx.tensorrt_rtx.ITimingCache ¶
Create timing cache
Create
ITimingCache
instance from serialized raw data. The created timing cache doesn’t belong to a specific builder config. It can be shared by multiple builder instances- Parameters:
serialized_timing_cache – The serialized timing cache. If an empty cache is provided (i.e.
b""
), a new cache will be created.- Returns:
The created
ITimingCache
object.
- get_compute_capability(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, index: int) nvinfer1::ComputeCapability ¶
Get one target compute capability for building the engine.
- Parameters:
index – The index of the compute capability to get. Must be between 0 and the number of compute capabilities - 1.
- Returns:
The compute capability at the specified index.
- get_device_type(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, layer: tensorrt_rtx.tensorrt_rtx.ILayer) tensorrt_rtx.tensorrt_rtx.DeviceType ¶
Get the device that the layer executes on.
- Parameters:
layer – The layer to get the DeviceType for
- Returns:
The DeviceType of the layer
- get_flag(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, flag: tensorrt_rtx.tensorrt_rtx.BuilderFlag) bool ¶
Check if a build mode flag is set.
- Parameters:
flag – The flag to check.
- Returns:
A bool indicating whether the flag is set.
- get_memory_pool_limit(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, pool: tensorrt_rtx.tensorrt_rtx.MemoryPoolType) int ¶
Retrieve the memory size limit of the corresponding pool in bytes. If
set_memory_pool_limit()
for the pool has not been called, this returns the default value used by TensorRT. This default value is not necessarily the maximum possible value for that configuration.- Parameters:
pool – The memory pool to get the limit for.
- Returns:
The size of the memory limit, in bytes, for the corresponding pool.
- get_preview_feature(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, feature: tensorrt_rtx.tensorrt_rtx.PreviewFeature) bool ¶
Check if a preview feature is enabled.
- Parameters:
feature – the feature to query
- Returns:
true if the feature is enabled, false otherwise
- get_tactic_sources(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig) int ¶
Get the tactic sources currently set in the engine build configuration.
- get_timing_cache(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig) tensorrt_rtx.tensorrt_rtx.ITimingCache ¶
Get the timing cache from current IBuilderConfig
- Returns:
The timing cache used in current IBuilderConfig, or None if no timing cache is set.
- is_device_type_set(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, layer: tensorrt_rtx.tensorrt_rtx.ILayer) bool ¶
Check if the DeviceType for a layer is explicitly set.
- Parameters:
layer – The layer to check for DeviceType
- Returns:
True if DeviceType is not default, False otherwise
- property num_compute_capabilities¶
The number of target compute capabilities for building the engine.
- Type:
ivar num_compute_capabilities
- reset(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig) None ¶
Resets the builder configuration to defaults. When initializing a builder config object, we can call this function.
- reset_device_type(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, layer: tensorrt_rtx.tensorrt_rtx.ILayer) None ¶
Reset the DeviceType for the given layer.
- Parameters:
layer – The layer to reset the DeviceType for
- set_compute_capability(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, compute_capability: tensorrt_rtx.ComputeCapability, index: int) bool ¶
Set one target compute capability for building the engine.
- Parameters:
compute_capability – The compute capability to set. Cannot be kNONE. When set to kCURRENT, the index must be 0 and there must be only one compute capability.
index – The index at which to set the compute capability. Must be between 0 and the number of compute capabilities - 1.
- Returns:
True if successful, false otherwise.
- set_device_type(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, layer: tensorrt_rtx.tensorrt_rtx.ILayer, device_type: tensorrt_rtx.tensorrt_rtx.DeviceType) None ¶
Set the device that this layer must execute on. If DeviceType is not set or is reset, TensorRT will use the default DeviceType set in the builder.
The DeviceType for a layer must be compatible with the safety flow (if specified). For example a layer cannot be marked for DLA execution while the builder is configured for SAFE_GPU.
- Parameters:
layer – The layer to set the DeviceType of
device_type – The DeviceType the layer must execute on
- set_flag(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, flag: tensorrt_rtx.tensorrt_rtx.BuilderFlag) None ¶
Add the input builder mode flag to the already enabled flags.
- Parameters:
flag – The flag to set.
- set_memory_pool_limit(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, pool: tensorrt_rtx.tensorrt_rtx.MemoryPoolType, pool_size: int) None ¶
Set the memory size for the memory pool.
TensorRT layers access different memory pools depending on the operation. This function sets in the
IBuilderConfig
the size limit, specified by pool_size, for the corresponding memory pool, specified by pool. TensorRT will build a plan file that is constrained by these limits or report which constraint caused the failure.If the size of the pool, specified by pool_size, fails to meet the size requirements for the pool, this function does nothing and emits the recoverable error, ErrorCode.INVALID_ARGUMENT, to the registered
IErrorRecorder
.If the size of the pool is larger than the maximum possible value for the configuration, this function does nothing and emits ErrorCode.UNSUPPORTED_STATE.
If the pool does not exist on the requested device type when building the network, a warning is emitted to the logger, and the memory pool value is ignored.
Refer to MemoryPoolType to see the size requirements for each pool.
- Parameters:
pool – The memory pool to limit the available memory for.
pool_size – The size of the pool in bytes.
- set_preview_feature(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, feature: tensorrt_rtx.tensorrt_rtx.PreviewFeature, enable: bool) None ¶
Enable or disable a specific preview feature.
Allows enabling or disabling experimental features, which are not enabled by default in the current release. Preview Features have been fully tested but are not yet as stable as other features in TensorRT. They are provided as opt-in features for at least one release.
Refer to PreviewFeature for additional information, and a list of the available features.
- Parameters:
feature – the feature to enable
enable – whether to enable or disable
- set_tactic_sources(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, tactic_sources: int) bool ¶
Set tactic sources.
This bitset controls which tactic sources TensorRT is allowed to use for tactic selection.
Multiple tactic sources may be combined with a bitwise OR operation. For example, to enable cublas and cublasLt as tactic sources, use a value of:
1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT)
- Parameters:
tactic_sources – The tactic sources to set
- Returns:
A bool indicating whether the tactic sources in the build configuration were updated. The tactic sources in the build configuration will not be updated if the provided value is invalid.
- set_timing_cache(self: tensorrt_rtx.tensorrt_rtx.IBuilderConfig, cache: tensorrt_rtx.tensorrt_rtx.ITimingCache, ignore_mismatch: bool) bool ¶
Attach a timing cache to IBuilderConfig
The timing cache has verification header to make sure the provided cache can be used in current environment. A failure will be reported if the CUDA device property in the provided cache is different from current environment.
bool(ignore_mismatch) == True
skips strict verification and allows loading cache created from a different device. The cache must not be destroyed until after the engine is built.- Parameters:
cache – The timing cache to be used
ignore_mismatch – Whether or not allow using a cache that contains different CUDA device property
- Returns:
A BOOL indicating whether the operation is done successfully.