TensorRT API Capture and Replay#

TensorRT API Capture and Replay streamlines the process of reproducing and debugging issues within your applications. It allows you to record the engine-building phase of an application and later replay the engine-building steps, without needing to re-run the original application or access the model’s source code.

This process is facilitated by two key components:

Capture Shim (libtensorrt_shim.so): This is a library that you can preload or drop into your application. It works by intercepting all TensorRT API calls made during the network-build phase. These intercepted calls, along with any associated constants, are then saved as a pair of files: a JSON file for the API calls and a BIN file for the constants.

Player (tensorrt_player): This is a standalone executable that takes the recorded JSON and BIN files generated by the Capture Shim and uses them to rebuild the TensorRT engine. This means you can recreate the engine that was built during the original application run, differing only in details related to timing differences during auto-tuning, aiding significantly in troubleshooting and debugging. Or, use it to recreate an engine-build failure.

Getting Started#

The feature is currently restricted to Linux. There are two ways to run the capture step.

  1. Capture using LD_PRELOAD, however, if the user application uses dlopen and dlsym to load the TensorRT library and map its c-functions (exposed by extern C) to the process address space, then you must use the drop-in replacement approach.

  2. To capture trtexec use the drop-in replacement approach. Capturing the TensorRT Python API and Python ONNX parser does not require the drop-in replacement approach. The Capture Shim is implemented in a separate library which is installed as part of libnvifer-dev.

Capture using LD_PRELOAD

export TRT_SHIM_NVINFER_LIB_NAME=<path to libnvinfer.so> [optional]
export TRT_SHIM_OUTPUT_JSON_FILE=<output JSON path>
LD_PRELOAD=libtensorrt_shim.so <your application's command-line>

Drop-in replacement

In this approach, we replace the libnvinfer.so being loaded by the app with the libtensorrt_shim.so (we overwrite it), and point the shim using an environment variable to load the original TensorRT library.

mv <path to the TRT lib that the app loads>/libnvinfer.so.<major version> libnvinfer_orig.so.<major version>
cp build/x86_64-gnu/libtensorrt_shim.so <path to the TRT lib that the app loads>/libnvinfer.so.<major version>
TRT_SHIM_OUTPUT_JSON_FILE=<JSON file path> TRT_SHIM_NVINFER_LIB_NAME=<path to the original libnvinfer.so>/libnvinfer_orig.so.<major version>

Player

tensorrt_player -j <output JSON file> -o <output engine file>

When running the player, set LD_PRELOAD to the plugin library path to load it.

Capture Tool Configuration#

Required Environment Variables#

Environment Variables

Description

TRT_SHIM_OUTPUT_JSON_FILE

Path to save the captured JSON file.

Optional Environment Variables#

Environment Variables

Description

Type

Default Value

TRT_SHIM_NVINFER_LIB_NAME

Intercepted TensorRT library name. If unset, dlopen(<lib_name>) finds the library using its normal search rules (a full path is also allowed).

string

libnvinfer.so

TRT_SHIM_DUMP_API

Print enter and exit messages for every API function call.

bool

false

TRT_SHIM_PRINT_WELCOME

Print Welcome to TensorRT Shim at the start of the run.

bool

false

TRT_SHIM_FORCE_SINGLE_THREAD_API

Lock every API call to enforce single-threaded execution. Ignored when TRT_SHIM_OUTPUT_JSON_FILE is set.

bool

false

TRT_SHIM_INLINE_WEIGHTS_LOWER_EQUAL_THAN

Inline weights into the JSON instead of a separate .bin when their size (in elements) is ≤ this threshold.

int

8

TRT_SHIM_MARK_AS_RANDOM_WEIGHTS_GREATER_EQUAL_THAN

Skip saving weights with an element count ≥ this threshold (they will be marked as random).

int

Max int

TRT_SHIM_FLUSH_AFTER_EVERY_CALL

Flush captured calls to the file after every API call instead of aggregating them.

bool

false

TRT_SHIM_SET_TACTIC_CACHE

Path to a tactic-cache file that will be loaded and applied to TensorRT’s IBuilderConfig so tactic selection stays consistent across runs.

string

“”

Known Limitations#

Capturing custom layers is limited to the following:

  • Supports only Linux x86_64 in TensorRT release 10.13.3.

  • PluginV2 only

  • Registered statically, by calling REGISTER_TENSORRT_PLUGIN. Dynamic registration via registerCreator() is not supported.

  • Supported C++ plugins only. Python plugins are not supported.

  • Plugin must be shipped externally, and not be part of the engine (that is, config→setPluginsToSerialize is not supported).

  • Capturing more than one network in a single process isn’t supported.

  • trtexec --saveEngine flag is not supported.

  • buildSerializedNetworkToStream and buildSerializedNetworkWithKernelText functions are not captured.