Is this page helpful?

Optimizing Builder Performance#

The TensorRT builder profiles each layer’s available tactics to search for the fastest inference engine plan. The builder time can be long if the model has many layers or complicated topology. The following sections provide options to reduce builder time.

Timing Cache#

TensorRT creates a layer-timing cache to reduce builder time and keep the layer-profiling information. The information it contains is specific to the targeted device, CUDA, TensorRT versions, and BuilderConfig parameters that can change the layer implementation, such as BuilderFlag::kTF32 or BuilderFlag::kREFIT.

The TensorRT builder skips profiling and reuses the cached result for the repeated layers if other layers have the same IO tensor configuration and layer parameters. If a timing query misses in the cache, the builder times the layer and updates the cache.

The timing cache can be serialized and deserialized. You can load a serialized cache from a buffer using IBuilderConfig::createTimingCache:

ITimingCache* cache =
 config->createTimingCache(cacheFile.data(), cacheFile.size());

Setting the buffer size to 0 creates a new empty timing cache.

You then attach the cache to a builder configuration before building.

config->setTimingCache(*cache, false);

Due to cache misses, the timing cache can be augmented with more information during the build. After the build, it can be serialized for use with another builder.

IHostMemory* serializedCache = cache->serialize();

If a builder does not have a timing cache attached, it creates its temporary local cache and destroys it when it is done.

The compilation cache is part of the timing cache, which caches JIT-compiled code and will be serialized as part of the timing cache by default. It can be disabled by setting the BuildFlag.

config->setFlag(BuilderFlag::kDISABLE_COMPILATION_CACHE);

Note

The timing cache supports the most frequently used layer types: Convolution, Deconvolution, Pooling, SoftMax, MatrixMultiply, ElementWise, Shuffle, and tensor memory layout conversion. More layer types will be added in future releases.

Builder Optimization Level#

Set the optimization level in the builder config to adjust how long TensorRT should spend searching for tactics with potentially better performance. By default, the optimization level is 3. Setting it to a smaller value results in much faster engine building time, but the engine’s performance can be worse. On the other hand, setting it to a larger value will increase the engine building time, but the resulting engine can perform better if TensorRT finds better tactics.

For example, to set the optimization level to 0 (the fastest):

C++

config->setBuilderOptimizationLevel(0);

Python

config.builder_optimization_level = 0