IBuilderConfig
- tensorrt.QuantizationFlag
List of valid flags for quantizing the network to int8.
Members:
CALIBRATE_BEFORE_FUSION : Run int8 calibration pass before layer fusion. Only valid for IInt8LegacyCalibrator and IInt8EntropyCalibrator. We always run int8 calibration pass before layer fusion for IInt8MinMaxCalibrator and IInt8EntropyCalibrator2. Disabled by default.
- tensorrt.DeviceType
Device types that TensorRT can execute on
Members:
GPU : GPU device
DLA : DLA core
- tensorrt.ProfilingVerbosity
Profiling verbosity in NVTX annotations and the engine inspector
Members:
LAYER_NAMES_ONLY : Print only the layer names. This is the default setting.
DETAILED : Print detailed layer information including layer names and layer parameters.
NONE : Do not print any layer information.
DEFAULT : DEPRECATED. Same as LAYER_NAMES_ONLY.
VERBOSE : DEPRECATED. Same as DETAILED.
- tensorrt.TacticSource
Tactic sources that can provide tactics for TensorRT.
Members:
- CUBLAS :
Enables cuBLAS tactics. NOTE: Disabling this value will cause the cublas handle passed to plugins in attachToContext to be null.
- CUBLAS_LT :
Enables cuBLAS LT tactics
- CUDNN :
Enables cuDNN tactics
- tensorrt.EngineCapability
- List of supported engine capability flows.
The EngineCapability determines the restrictions of a network during build time and what runtime it targets. When BuilderFlag::kSAFETY_SCOPE is not set (by default), EngineCapability.STANDARD does not provide any restrictions on functionality and the resulting serialized engine can be executed with TensorRT’s standard runtime APIs in the nvinfer1 namespace. EngineCapability.SAFETY provides a restricted subset of network operations that are safety certified and the resulting serialized engine can be executed with TensorRT’s safe runtime APIs in the nvinfer1::safe namespace. EngineCapability.DLA_STANDALONE provides a restricted subset of network operations that are DLA compatible and the resulting serialized engine can be executed using standalone DLA runtime APIs. See sampleNvmedia for an example of integrating NvMediaDLA APIs with TensorRT APIs.
Members:
DEFAULT : Deprecated: Unrestricted: TensorRT mode without any restrictions using TensorRT nvinfer1 APIs.
SAFE_GPU : Deprecated: Safety-restricted: TensorRT mode for GPU devices using TensorRT safety APIs. See safety documentation for list of supported layers and formats.
SAFE_DLA : Deprecated: DLA-restricted: TensorRT mode for DLA devices using NvMediaDLA APIs. Only FP16 and Int8 modes are supported.
STANDARD : Standard: TensorRT flow without targeting the standard runtime. This flow supports both DeviceType::kGPU and DeviceType::kDLA.
SAFETY : Safety: TensorRT flow with restrictions targeting the safety runtime. See safety documentation for list of supported layers and formats. This flow supports only DeviceType::kGPU.
DLA_STANDALONE : DLA Standalone: TensorRT flow with restrictions targeting external, to TensorRT, DLA runtimes. See DLA documentation for list of supported layers and formats. This flow supports only DeviceType::kDLA.
- tensorrt.BuilderFlag
Valid modes that the builder can enable when creating an engine from a network definition.
Members:
FP16 : Enable FP16 layer selection
INT8 : Enable Int8 layer selection
DEBUG : Enable debugging of layers via synchronizing after every layer
GPU_FALLBACK : Enable layers marked to execute on GPU if layer cannot execute on DLA
STRICT_TYPES : Deprecated: Enables strict type constraints. Equivalent to setting PREFER_PRECISION_CONSTRAINTS, DIRECT_IO, and REJECT_EMPTY_ALGORITHMS.
REFIT : Enable building a refittable engine
DISABLE_TIMING_CACHE : Disable reuse of timing information across identical layers.
TF32 : Allow (but not require) computations on tensors of type DataType.FLOAT to use TF32. TF32 computes inner products by rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas. Enabled by default.
SPARSE_WEIGHTS : Allow the builder to examine weights and use optimized functions when weights have suitable sparsity.
SAFETY_SCOPE : Change the allowed parameters in the EngineCapability.STANDARD flow to match the restrictions that EngineCapability.SAFETY check against for DeviceType.GPU and EngineCapability.DLA_STANDALONE check against the DeviceType.DLA case. This flag is forced to true if EngineCapability.SAFETY at build time if it is unset.
OBEY_PRECISION_CONSTRAINTS : Require that layers execute in specified precisions. Build fails otherwise.
PREFER_PRECISION_CONSTRAINTS : Prefer that layers execute in specified precisions. Fall back (with warning) to another precision if build would otherwise fail.
DIRECT_IO : Require that no reformats be inserted between a layer and a network I/O tensor for which ITensor.allowed_formats was set. Build fails if a reformat is required for functional correctness.
REJECT_EMPTY_ALGORITHMS : Fail if IAlgorithmSelector.select_algorithms returns an empty set of algorithms.
- class tensorrt.IBuilderConfig
- Variables
min_timing_iterations –
int
The number of minimization iterations used when timing layers. When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in minimization.avg_timing_iterations –
int
The number of averaging iterations used when timing layers. When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in averaging.int8_calibrator –
IInt8Calibrator
Int8 Calibration interface. The calibrator is to minimize the information loss during the INT8 quantization process.max_workspace_size –
int
The maximum workspace size. The maximum GPU temporary memory which the engine can use at execution time.flags –
int
The build mode flags to turn on builder options for this network. The flags are listed in the BuilderFlags enum. The flags set configuration options to build the network. This should be in integer consisting of one or moreBuilderFlag
s, combined via binary OR. For example,1 << BuilderFlag.FP16 | 1 << BuilderFlag.DEBUG
.profile_stream –
int
The handle for the CUDA stream that is used to profile this network.num_optimization_profiles –
int
The number of optimization profiles.default_device_type –
tensorrt.DeviceType
The default DeviceType to be used by the Builder.DLA_core –
int
The DLA core that the engine executes on. Must be between 0 and N-1 where N is the number of available DLA cores.profiling_verbosity – Profiling verbosity in NVTX annotations.
engine_capability – The desired engine capability. See
EngineCapability
for details.
- __del__(self: tensorrt.tensorrt.IBuilderConfig) None
- __exit__(exc_type, exc_value, traceback)
Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.
- __init__(*args, **kwargs)
- add_optimization_profile(self: tensorrt.tensorrt.IBuilderConfig, profile: tensorrt.tensorrt.IOptimizationProfile) int
Add an optimization profile.
This function must be called at least once if the network has dynamic or shape input tensors.
- Parameters
profile – The new optimization profile, which must satisfy
bool(profile) == True
- Returns
The index of the optimization profile (starting from 0) if the input is valid, or -1 if the input is not valid.
- can_run_on_DLA(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer) bool
Check if the layer can run on DLA.
- Parameters
layer – The layer to check
- Returns
A bool indicating whether the layer can run on DLA
- clear_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.BuilderFlag) None
clears the builder mode flag from the enabled flags.
- Parameters
flag – The flag to clear.
- clear_quantization_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.QuantizationFlag) None
Clears the quantization flag from the enabled quantization flags.
- Parameters
flag – The flag to clear.
- create_timing_cache(self: tensorrt.tensorrt.IBuilderConfig, serialized_timing_cache: buffer) tensorrt.tensorrt.ITimingCache
Create timing cache
Create
ITimingCache
instance from serialized raw data. The created timing cache doesn’t belong to a specific builder config. It can be shared by multiple builder instances- Parameters
serialized_timing_cache – The serialized timing cache. If an empty cache is provided (i.e.
b""
), a new cache will be created.- Returns
The created
ITimingCache
object.
- get_calibration_profile(self: tensorrt.tensorrt.IBuilderConfig) tensorrt.tensorrt.IOptimizationProfile
Get the current calibration profile.
- Returns
The current calibration profile or nullptr if calibrartion profile is unset.
- get_device_type(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer) tensorrt.tensorrt.DeviceType
Get the device that the layer executes on.
- Parameters
layer – The layer to get the DeviceType for
- Returns
The DeviceType of the layer
- get_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.BuilderFlag) bool
Check if a build mode flag is set.
- Parameters
flag – The flag to check.
- Returns
A bool indicating whether the flag is set.
- get_quantization_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.QuantizationFlag) bool
Check if a quantization flag is set.
- Parameters
flag – The flag to check.
- Returns
A bool indicating whether the flag is set.
- get_tactic_sources(self: tensorrt.tensorrt.IBuilderConfig) int
Get the tactic sources currently set in the engine build configuration.
- get_timing_cache(self: tensorrt.tensorrt.IBuilderConfig) tensorrt.tensorrt.ITimingCache
Get the timing cache from current IBuilderConfig
- Returns
The timing cache used in current IBuilderConfig, or None if no timing cache is set.
- is_device_type_set(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer) bool
Check if the DeviceType for a layer is explicitly set.
- Parameters
layer – The layer to check for DeviceType
- Returns
True if DeviceType is not default, False otherwise
- reset(self: tensorrt.tensorrt.IBuilderConfig) None
Resets the builder configuration to defaults. When initializing a builder config object, we can call this function.
- reset_device_type(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer) None
Reset the DeviceType for the given layer.
- Parameters
layer – The layer to reset the DeviceType for
- set_calibration_profile(self: tensorrt.tensorrt.IBuilderConfig, profile: tensorrt.tensorrt.IOptimizationProfile) bool
Set a calibration profile.
Calibration optimization profile must be set if int8 calibration is used to set scales for a network with runtime dimensions.
- Parameters
profile – The new calibration profile, which must satisfy
bool(profile) == True
or be nullptr. MIN and MAX values will be overwritten by kOPT.- Returns
True if the calibration profile was set correctly.
- set_device_type(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer, device_type: tensorrt.tensorrt.DeviceType) None
Set the device that this layer must execute on. If DeviceType is not set or is reset, TensorRT will use the default DeviceType set in the builder.
The DeviceType for a layer must be compatible with the safety flow (if specified). For example a layer cannot be marked for DLA execution while the builder is configured for kSAFE_GPU.
- Parameters
layer – The layer to set the DeviceType of
device_type – The DeviceType the layer must execute on
- set_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.BuilderFlag) None
Add the input builder mode flag to the already enabled flags.
- Parameters
flag – The flag to set.
- set_quantization_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.QuantizationFlag) None
Add the input quantization flag to the already enabled quantization flags.
- Parameters
flag – The flag to set.
- set_tactic_sources(self: tensorrt.tensorrt.IBuilderConfig, tactic_sources: int) bool
Set tactic sources.
This bitset controls which tactic sources TensorRT is allowed to use for tactic selection. By default, kCUBLAS and kCUDNN are always enabled, and kCUBLAS_LT is enabled for x86 platforms as well as non-x86 platforms when CUDA >= 11.0
Multiple tactic sources may be combined with a bitwise OR operation. For example, to enable cublas and cublasLt as tactic sources, use a value of:
1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT)
- Parameters
tactic_sources – The tactic sources to set
- Returns
A bool indicating whether the tactic sources in the build configuration were updated. The tactic sources in the build configuration will not be updated if the provided value is invalid.
- set_timing_cache(self: tensorrt.tensorrt.IBuilderConfig, cache: tensorrt.tensorrt.ITimingCache, ignore_mismatch: bool) bool
Attach a timing cache to IBuilderConfig
The timing cache has verification header to make sure the provided cache can be used in current environment. A failure will be reported if the CUDA device property in the provided cache is different from current environment.
bool(ignore_mismatch) == True
skips strict verification and allows loading cache created from a different device. The cache must not be destroyed until after the engine is built.- Parameters
cache – The timing cache to be used
ignore_mismatch – Whether or not allow using a cache that contains different CUDA device property
- Returns
A BOOL indicating whether the operation is done successfully.