IBuilderConfig¶

tensorrt.EngineCapability¶

List of supported engine capability flows.: The EngineCapability determines the restrictions of a network during build time for what can be executed at runtime. EngineCapability.DEFAULT does not provide any restrictions on functionality and the resulting serialized engine can be executed with TensorRT’s standard runtime APIs. EngineCapability.SAFE_GPU provides a restricted subset of network operations that are safety certified and the resulting serialized engine can be executed with TensorRT’s safe runtime APIs. EngineCapability.SAFE_DLA provides a restricted subset of network operations that are DLA compatible and the resulting serialized engine can be executed using NvMediaDLA’s runtime APIs. See sampleNvmedia for an example of integrating NvMediaDLA APIs with TensorRT APIs.

Members:

SAFE_GPU : Safety-restricted: TensorRT mode for GPU devices using TensorRT safety APIs. See safety documentation for list of supported layers and formats.

DEFAULT : Unrestricted: TensorRT mode without any restrictions using TensorRT nvinfer1 APIs.

SAFE_DLA : DLA-restricted: TensorRT mode for DLA devices using NvMediaDLA APIs. Only FP16 and Int8 modes are supported.

tensorrt.BuilderFlag¶

Valid modes that the builder can enable when creating an engine from a network definition.

Members:

DEBUG : Enable debugging of layers via synchronizing after every layer

GPU_FALLBACK : Enable layers marked to execute on GPU if layer cannot execute on DLA

INT8 : Enable Int8 layer selection

TF32 : Allow (but not require) computations on tensors of type DataType.FLOAT to use TF32. TF32 computes inner products by rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas. Enabled by default.

DISABLE_TIMING_CACHE : Disable reuse of timing information across identical layers.

SPARSE_WEIGHTS : Allow the builder to examine weights and use optimized functions when weights have suitable sparsity.

FP16 : Enable FP16 layer selection

REFIT : Enable building a refittable engine

STRICT_TYPES : Enables strict type constraints

class tensorrt.IBuilderConfig¶

Variables

min_timing_iterations – int The number of minimization iterations used when timing layers. When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in minimization.
avg_timing_iterations – int The number of averaging iterations used when timing layers. When timing layers, the builder minimizes over a set of average times for layer execution. This parameter controls the number of iterations used in averaging.
int8_calibrator – IInt8Calibrator Int8 Calibration interface. The calibrator is to minimize the information loss during the INT8 quantization process.
max_workspace_size – int The maximum workspace size. The maximum GPU temporary memory which the engine can use at execution time.
flags – int The build mode flags to turn on builder options for this network. The flags are listed in the BuilderFlags enum. The flags set configuration options to build the network. This should be in integer consisting of one or more BuilderFlag s, combined via binary OR. For example, 1 << BuilderFlag.FP16 | 1 << BuilderFlag.DEBUG.
profile_stream – int The handle for the CUDA stream that is used to profile this network.
num_optimization_profiles – int The number of optimization profiles.
default_device_type – tensorrt.DeviceType The default DeviceType to be used by the Builder.
DLA_core – int The DLA core that the engine executes on. Must be between 0 and N-1 where N is the number of available DLA cores.
profiling_verbosity – Profiling verbosity in NVTX annotations.
engine_capability – The desired engine capability. See EngineCapability for details.

__del__(self: tensorrt.tensorrt.IBuilderConfig) → None¶

__exit__(exc_type, exc_value, traceback)¶: Context managers are deprecated and have no effect. Objects are automatically freed when the reference count reaches 0.

__init__()¶: Initialize self. See help(type(self)) for accurate signature.

add_optimization_profile(self: tensorrt.tensorrt.IBuilderConfig, profile: tensorrt.tensorrt.IOptimizationProfile) → int¶

Add an optimization profile.

This function must be called at least once if the network has dynamic or shape input tensors.

Parameters: profile – The new optimization profile, which must satisfy bool(profile) == True
Returns: The index of the optimization profile (starting from 0) if the input is valid, or -1 if the input is not valid.

can_run_on_DLA(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer) → bool¶

Check if the layer can run on DLA.

Parameters: layer – The layer to check
Returns: A bool indicating whether the layer can run on DLA

clear_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.BuilderFlag) → None¶

clears the builder mode flag from the enabled flags.

Parameters: flag – The flag to clear.

clear_quantization_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.QuantizationFlag) → None¶

Clears the quantization flag from the enabled quantization flags.

Parameters: flag – The flag to clear.

create_timing_cache(self: tensorrt.tensorrt.IBuilderConfig, serialized_timing_cache: buffer) → tensorrt.tensorrt.ITimingCache¶

Create timing cache

Create ITimingCache instance from serialized raw data. The created timing cache doesn’t belong to a specific builder config. It can be shared by multiple builder instances

Parameters: serialized_timing_cache – The serialized timing cache. If an empty cache is provided (i.e. b""), a new cache will be created.
Returns: The created ITimingCache object.

get_calibration_profile(self: tensorrt.tensorrt.IBuilderConfig) → tensorrt.tensorrt.IOptimizationProfile¶

Get the current calibration profile.

Returns: The current calibration profile or nullptr if calibrartion profile is unset.

get_device_type(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer) → tensorrt.tensorrt.DeviceType¶

Get the device that the layer executes on.

Parameters: layer – The layer to get the DeviceType for
Returns: The DeviceType of the layer

get_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.BuilderFlag) → bool¶

Check if a build mode flag is set.

Parameters: flag – The flag to check.
Returns: A bool indicating whether the flag is set.

get_quantization_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.QuantizationFlag) → bool¶

Check if a quantization flag is set.

Parameters: flag – The flag to check.
Returns: A bool indicating whether the flag is set.

get_tactic_sources(self: tensorrt.tensorrt.IBuilderConfig) → int¶: Get the tactic sources currently set in the engine build configuration.

get_timing_cache(self: tensorrt.tensorrt.IBuilderConfig) → tensorrt.tensorrt.ITimingCache¶

Get the timing cache from current IBuilderConfig

Returns: The timing cache used in current IBuilderConfig, or None if no timing cache is set.

is_device_type_set(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer) → bool¶

Check if the DeviceType for a layer is explicitly set.

Parameters: layer – The layer to check for DeviceType
Returns: True if DeviceType is not default, False otherwise

reset(self: tensorrt.tensorrt.IBuilderConfig) → None¶: Resets the builder configuration to defaults. When initializing a builder config object, we can call this function.

reset_device_type(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer) → None¶

Reset the DeviceType for the given layer.

Parameters: layer – The layer to reset the DeviceType for

set_calibration_profile(self: tensorrt.tensorrt.IBuilderConfig, profile: tensorrt.tensorrt.IOptimizationProfile) → bool¶

Set a calibration profile.

Calibration optimization profile must be set if int8 calibration is used to set scales for a network with runtime dimensions.

Parameters: profile – The new calibration profile, which must satisfy bool(profile) == True or be nullptr. MIN and MAX values will be overwritten by kOPT.
Returns: True if the calibration profile was set correctly.

set_device_type(self: tensorrt.tensorrt.IBuilderConfig, layer: tensorrt.tensorrt.ILayer, device_type: tensorrt.tensorrt.DeviceType) → None¶

Set the device that this layer must execute on. If DeviceType is not set or is reset, TensorRT will use the default DeviceType set in the builder.

The DeviceType for a layer must be compatible with the safety flow (if specified). For example a layer cannot be marked for DLA execution while the builder is configured for kSAFE_GPU.

Parameters

layer – The layer to set the DeviceType of
device_type – The DeviceType the layer must execute on

set_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.BuilderFlag) → None¶

Add the input builder mode flag to the already enabled flags.

Parameters: flag – The flag to set.

set_quantization_flag(self: tensorrt.tensorrt.IBuilderConfig, flag: tensorrt.tensorrt.QuantizationFlag) → None¶

Add the input quantization flag to the already enabled quantization flags.

Parameters: flag – The flag to set.

set_tactic_sources(self: tensorrt.tensorrt.IBuilderConfig, tactic_sources: int) → bool¶

Set tactic sources.

This bitset controls which tactic sources TensorRT is allowed to use for tactic selection. By default, kCUBLAS is always enabled, and kCUBLAS_LT is enabled for x86 platforms, as well as non-x86 platforms if CUDA >= 11.0

Multiple tactic sources may be combined with a bitwise OR operation. For example, to enable cublas and cublasLt as tactic sources, use a value of: 1U << static_cast<uint32_t>(TacticSource::kCUBLAS) | 1U << static_cast<uint32_t>(TacticSource::kCUBLAS_LT)

Parameters: tactic_sources – The tactic sources to set
Returns: A bool indicating whether the tactic sources in the build configuration were updated. The tactic sources in the build configuration will not be updated if the provided value is invalid.

set_timing_cache(self: tensorrt.tensorrt.IBuilderConfig, cache: tensorrt.tensorrt.ITimingCache, ignore_mismatch: bool) → bool¶

Attach a timing cache to IBuilderConfig

The timing cache has verification header to make sure the provided cache can be used in current environment. A failure will be reported if the CUDA device property in the provided cache is different from current environment. bool(ignore_mismatch) == True skips strict verification and allows loading cache created from a different device. The cache must not be destroyed until after the engine is built.

Parameters

cache – The timing cache to be used
ignore_mismatch – Whether or not allow using a cache that contains different CUDA device property

Returns

A BOOL indicating whether the operation is done successfully.