Is this page helpful?

TensorRT-RTX API Capture and Replay#

TensorRT-RTX API Capture and Replay streamlines the process of reproducing and debugging issues within your applications. It allows you to record the engine-building phase of an application and later replay the engine-building steps, without needing to re-run the original application or access the model’s source code.

This process is facilitated by two key components:

Capture Shim (libtensorrt_shim.so): This is a library that you can preload or drop into your application. It works by intercepting all TensorRT-RTX API calls made during the network-build phase. These intercepted calls, along with any associated constants, are then saved as a pair of files: a JSON file for the API calls and a BIN file for the constants.

Player (tensorrt_player): This is a standalone executable that takes the recorded JSON and BIN files generated by the Capture Shim and uses them to rebuild the TensorRT-RTX engine. This means you can recreate the engine that was built during the original application run, differing only in details related to timing differences during auto-tuning, aiding significantly in troubleshooting and debugging. Or, use it to recreate an engine-build failure.

Getting Started#

The feature is currently restricted to Linux. There are two ways to run the capture step.

Capture using LD_PRELOAD, however, if the user application uses dlopen and dlsym to load the TensorRT-RTX library and map its c-functions (exposed by extern C) to the process address space, then you must use the drop-in replacement approach.
To capture tensorrt_rtx, use the drop-in replacement approach. Capturing the TensorRT-RTX Python API and Python ONNX parser does not require the drop-in replacement approach. The Capture Shim (libtensorrt_shim.so) is installed alongside the TensorRT-RTX libraries.

Capture using LD_PRELOAD

export TRT_SHIM_NVINFER_LIB_NAME=<path to libtensorrt_rtx.so> # optional
export TRT_SHIM_OUTPUT_JSON_FILE=<output JSON path>
LD_PRELOAD=libtensorrt_shim.so <your-application-command-line>

Drop-in replacement

In this approach, we replace the libtensorrt_rtx.so being loaded by the app with the libtensorrt_shim.so (we overwrite it), and point the shim using an environment variable to load the original TensorRT-RTX library.

mv <path to the TRT-RTX lib that the app loads>/libtensorrt_rtx.so.<major version> libtensorrt_rtx_orig.so.<major version>
cp build/x86_64-gnu/libtensorrt_shim.so <path to the TRT-RTX lib that the app loads>/libtensorrt_rtx.so.<major version>
TRT_SHIM_OUTPUT_JSON_FILE=<JSON file path> TRT_SHIM_NVINFER_LIB_NAME=<path to the original libtensorrt_rtx.so>/libtensorrt_rtx_orig.so.<major version>

Player

tensorrt_player -j <output JSON file> -o <output engine file>

Multi-Network Support#

TensorRT-RTX API Capture and Replay supports capturing multiple networks within a single process. This capability allows you to record the engine-building phase for applications that create and build multiple TensorRT-RTX networks, such as applications with ensemble models or multi-stage inference pipelines.

How It Works

Each network created via createNetworkV2 is assigned a unique network ID.
Objects (tensors, layers, etc.) are tracked per-network to ensure proper isolation.
Weights are stored with network-scoped identifiers, allowing the same weight pointer to be reused across different networks without conflicts.
The player automatically manages network context during replay.

Capture Example

The following example demonstrates capturing multiple networks, including interleaved operations:

import tensorrt_rtx as trt

logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)

# Create first network
network1 = builder.create_network()
input1 = network1.add_input("input1", trt.float32, (1, 3, 224, 224))
# ... add layers ...
network1.mark_output(output1)

# Create second network (interleaved operations are supported)
network2 = builder.create_network()
input2 = network2.add_input("input2", trt.float32, (1, 64))
# ... add layers ...
network2.mark_output(output2)

# Build both networks
config1 = builder.create_builder_config()
engine1 = builder.build_serialized_network(network1, config1)

config2 = builder.create_builder_config()
engine2 = builder.build_serialized_network(network2, config2)

Replay with Multiple Networks

When replaying a multi-network capture, the player generates one engine file per network. The output files are named with indices appended:

tensorrt_player -j capture.json -o output.engine

This produces:

output.engine0 – Engine for the first network
output.engine1 – Engine for the second network, and so on.

Best Practices

Use descriptive output file names when capturing multiple networks to help identify which capture session produced the files.
If you need to replay only specific networks, consider capturing them in separate processes or sessions.

Capture Tool Configuration#

Table 16 Optional Environment Variables#
Environment Variables	Description	Type	Default Value
`TRT_SHIM_OUTPUT_JSON_FILE`	Path to save the captured `.json` file. A corresponding `.bin` file will be created in the same directory.	`string`	`capture.json`
`TRT_SHIM_NVINFER_LIB_NAME`	Intercepted TensorRT-RTX library name. If unset, `dlopen(<lib_name>)` finds the library using its normal search rules (a full path is also allowed).	`string`	`libtensorrt_rtx.so`
`TRT_SHIM_DUMP_API`	Print `enter` and `exit` messages for every API function call.	`bool`	`false`
`TRT_SHIM_PRINT_WELCOME`	Print `Welcome to TensorRT Shim` at the start of the run.	`bool`	`false`
`TRT_SHIM_INLINE_WEIGHTS_LOWER_EQUAL_THAN`	Inline weights into the `.json` instead of a separate `.bin` when their size (in elements) is ≤ this threshold.	`int`	`8`
`TRT_SHIM_MARK_AS_RANDOM_WEIGHTS_GREATER_EQUAL_THAN`	Skip saving weights with an element count ≥ this threshold (they will be marked as random).	`int`	`Max int`
`TRT_SHIM_FLUSH_AFTER_EVERY_CALL`	Flush captured calls to the file after every API call instead of aggregating them.	`bool`	`false`
`TRT_SHIM_FLUSH_ON_BUILD`	Flush captures calls to the file after every `buildSerializedNetwork` call. Useful for debugging crashes during engine building or capturing incremental progress in multi-network sessions.	`bool`	`false`

Known Limitations#

Supports Linux x86_64. Windows is not currently supported.
tensorrt_rtx --saveEngine flag is not supported.

Flush Behavior#

The captured data is flushed (written to disk) under any of the following conditions:

Process exit - flush occurs automatically when the process completes.
Every API call - flush after each captured call when TRT_SHIM_FLUSH_AFTER_EVERY_CALL is set.
On build - flush after each buildSerializedNetwork call when TRT_SHIM_FLUSH_ON_BUILD is set.

On each flush, the entire .json file is overwritten, while the .bin file is appended.