TensorRT API Capture and Replay#
TensorRT API Capture and Replay streamlines the process of reproducing and debugging issues within your applications. It allows you to record the engine-building phase of an application and later replay the engine-building steps, without needing to re-run the original application or access the model’s source code.
This process is facilitated by two key components:
Capture Shim (libtensorrt_shim.so): This is a library that you can preload or drop into your application. It works by intercepting all TensorRT API calls made during the network-build phase. These intercepted calls, along with any associated constants, are then saved as a pair of files: a JSON file for the API calls and a BIN file for the constants.
Player (tensorrt_player): This is a standalone executable that takes the recorded JSON and BIN files generated by the Capture Shim and uses them to rebuild the TensorRT engine. This means you can recreate the engine that was built during the original application run, differing only in details related to timing differences during auto-tuning, aiding significantly in troubleshooting and debugging. Or, use it to recreate an engine-build failure.
Getting Started#
Important
The capture-replay feature is currently restricted to Linux.
There are two ways to run the capture step:
LD_PRELOAD approach - simplest method, works for most applications.
Drop-in replacement approach - required for
trtexecand any application that dynamically loads TensorRT viadlopen/dlsym.
Preload the Capture Shim into your application. If your application uses dlopen and dlsym to load the TensorRT library and map its C-functions (exposed by extern C) to the process address space, use the drop-in replacement approach instead.
export TRT_SHIM_NVINFER_LIB_NAME=<path to libnvinfer.so> # optional
export TRT_SHIM_OUTPUT_JSON_FILE=<output JSON path>
LD_PRELOAD=libtensorrt_shim.so <your-application-command-line>
Replace the libnvinfer.so library with the Capture Shim and redirect it to the original library. This approach is required for trtexec and any application that dynamically loads TensorRT. Capturing the TensorRT Python API and Python ONNX parser does not require this approach.
mv <path to the TRT lib that the app loads>/libnvinfer.so.<major version> libnvinfer_orig.so.<major version>
cp build/x86_64-gnu/libtensorrt_shim.so <path to the TRT lib that the app loads>/libnvinfer.so.<major version>
TRT_SHIM_OUTPUT_JSON_FILE=<JSON file path> TRT_SHIM_NVINFER_LIB_NAME=<path to the original libnvinfer.so>/libnvinfer_orig.so.<major version>
Player
tensorrt_player -j <output JSON file> -o <output engine file>
When replaying models that use TensorRT plugins, set LD_PRELOAD to the plugin library paths to load them.
Multi-Network Support#
TensorRT API Capture and Replay supports capturing multiple networks within a single process. This capability allows you to record the engine-building phase for applications that create and build multiple TensorRT networks, such as applications with ensemble models or multi-stage inference pipelines.
How It Works
Each network created via
createNetworkV2is assigned a unique network ID.Objects (tensors, layers, etc.) are tracked per-network to ensure proper isolation.
Weights are stored with network-scoped identifiers, allowing the same weight pointer to be reused across different networks without conflicts.
The player automatically manages network context during replay.
Capture Example
The following example demonstrates capturing multiple networks, including interleaved operations:
import tensorrt as trt
logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
# Create first network
network1 = builder.create_network()
input1 = network1.add_input("input1", trt.float32, (1, 3, 224, 224))
# ... add layers ...
network1.mark_output(output1)
# Create second network (interleaved operations are supported)
network2 = builder.create_network()
input2 = network2.add_input("input2", trt.float32, (1, 64))
# ... add layers ...
network2.mark_output(output2)
# Build both networks
config1 = builder.create_builder_config()
engine1 = builder.build_serialized_network(network1, config1)
config2 = builder.create_builder_config()
engine2 = builder.build_serialized_network(network2, config2)
Replay with Multiple Networks
When replaying a multi-network capture, the player generates one engine file per network. The output files are named with indices appended:
tensorrt_player -j capture.json -o output.engine
This produces:
output.engine0– Engine for the first networkoutput.engine1– Engine for the second network, and so on.
Best Practices
Use descriptive output file names when capturing multiple networks to help identify which capture session produced the files.
If you need to replay only specific networks, consider capturing them in separate processes or sessions.
Verify that all required plugins are available when replaying, as each network may use different custom layers.
Capture Tool Configuration#
Environment Variables |
Description |
Type |
Default Value |
|---|---|---|---|
|
Path to save the captured |
|
|
|
Intercepted TensorRT library name. If unset, |
|
|
|
Print |
|
|
|
Print |
|
|
|
Lock every API call to enforce single-threaded execution. Ignored when |
|
|
|
Inline weights into the |
|
|
|
Skip saving weights with an element count ≥ this threshold (they will be marked as random). |
|
|
|
Flush captured calls to the file after every API call instead of aggregating them. |
|
|
|
Flush captured calls to the file after every |
|
|
|
Path to a tactic-cache file that will be loaded and applied to TensorRT’s |
|
|
Known Limitations#
Supported
Linux x86_64 and AArch64 platforms.
C++ plugins of type
IPluginV3(registered statically by callingREGISTER_TENSORRT_PLUGIN).IPluginV2and the rest of the V2 plugin family were removed in TensorRT 11.0; migrate toIPluginV3per the Migration Guide.
Not supported
Dynamic plugin registration using
registerCreator().Python plugins.
Plugins shipped as part of the engine (that is,
config->setPluginsToSerializeis not supported); the plugin must be shipped externally.The
trtexec --saveEngineflag.The
buildSerializedNetworkToStreamandbuildSerializedNetworkWithKernelTextfunctions are not captured.
Flush Behavior#
The captured data is flushed (written to disk) under any of the following conditions:
Process exit - flush occurs automatically when the process completes.
Every API call - flush after each captured call when
TRT_SHIM_FLUSH_AFTER_EVERY_CALLis set.On build - flush after each
buildSerializedNetworkcall whenTRT_SHIM_FLUSH_ON_BUILDis set.
On each flush, the entire .json file is overwritten, while the .bin file is appended.
See also
- Engine Inspector and Debug Tensors
Additional tools for inspecting and debugging TensorRT engines.
- Troubleshooting
Error resolution and diagnostic guidance.