TensorRT-RTX API Capture and Replay#

TensorRT-RTX API Capture and Replay streamlines the process of reproducing and debugging issues within your applications. It allows you to record the engine-building phase of an application and later replay the engine-building steps, without needing to re-run the original application or access the model’s source code.

This process is facilitated by two key components:

Capture Shim (libtensorrt_shim.so): This is a library that you can preload or drop into your application. It works by intercepting all TensorRT-RTX API calls made during the network-build phase. These intercepted calls, along with any associated constants, are then saved as a pair of files: a JSON file for the API calls and a BIN file for the constants.

Player (tensorrt_player): This is a standalone executable that takes the recorded JSON and BIN files generated by the Capture Shim and uses them to rebuild the TensorRT-RTX engine. This means you can recreate the engine that was built during the original application run, differing only in details related to timing differences during auto-tuning, aiding significantly in troubleshooting and debugging. Or, use it to recreate an engine-build failure.

Getting Started#

The feature is currently restricted to Linux. There are two ways to run the capture step.

  1. Capture using LD_PRELOAD, however, if the user application uses dlopen and dlsym to load the TensorRT-RTX library and map its c-functions (exposed by extern C) to the process address space, then you must use the drop-in replacement approach.

  2. To capture tensorrt_rtx, use the drop-in replacement approach. Capturing the TensorRT-RTX Python API and Python ONNX parser does not require the drop-in replacement approach. The Capture Shim (libtensorrt_shim.so) is installed alongside the TensorRT-RTX libraries.

Capture using LD_PRELOAD

export TRT_SHIM_NVINFER_LIB_NAME=<path to libtensorrt_rtx.so> # optional
export TRT_SHIM_OUTPUT_JSON_FILE=<output JSON path>
LD_PRELOAD=libtensorrt_shim.so <your-application-command-line>

Drop-in replacement

In this approach, we replace the libtensorrt_rtx.so being loaded by the app with the libtensorrt_shim.so (we overwrite it), and point the shim using an environment variable to load the original TensorRT-RTX library.

mv <path to the TRT-RTX lib that the app loads>/libtensorrt_rtx.so.<major version> libtensorrt_rtx_orig.so.<major version>
cp build/x86_64-gnu/libtensorrt_shim.so <path to the TRT-RTX lib that the app loads>/libtensorrt_rtx.so.<major version>
TRT_SHIM_OUTPUT_JSON_FILE=<JSON file path> TRT_SHIM_NVINFER_LIB_NAME=<path to the original libtensorrt_rtx.so>/libtensorrt_rtx_orig.so.<major version>

Player

tensorrt_player -j <output JSON file> -o <output engine file>

Multi-Network Support#

TensorRT-RTX API Capture and Replay supports capturing multiple networks within a single process. This capability allows you to record the engine-building phase for applications that create and build multiple TensorRT-RTX networks, such as applications with ensemble models or multi-stage inference pipelines.

How It Works

  • Each network created via createNetworkV2 is assigned a unique network ID.

  • Objects (tensors, layers, etc.) are tracked per-network to ensure proper isolation.

  • Weights are stored with network-scoped identifiers, allowing the same weight pointer to be reused across different networks without conflicts.

  • The player automatically manages network context during replay.

Capture Example

The following example demonstrates capturing multiple networks, including interleaved operations:

import tensorrt_rtx as trt

logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)

# Create first network
network1 = builder.create_network()
input1 = network1.add_input("input1", trt.float32, (1, 3, 224, 224))
# ... add layers ...
network1.mark_output(output1)

# Create second network (interleaved operations are supported)
network2 = builder.create_network()
input2 = network2.add_input("input2", trt.float32, (1, 64))
# ... add layers ...
network2.mark_output(output2)

# Build both networks
config1 = builder.create_builder_config()
engine1 = builder.build_serialized_network(network1, config1)

config2 = builder.create_builder_config()
engine2 = builder.build_serialized_network(network2, config2)

Replay with Multiple Networks

When replaying a multi-network capture, the player generates one engine file per network. The output files are named with indices appended:

tensorrt_player -j capture.json -o output.engine

This produces:

  • output.engine0 – Engine for the first network

  • output.engine1 – Engine for the second network, and so on.

Best Practices

  • Use descriptive output file names when capturing multiple networks to help identify which capture session produced the files.

  • If you need to replay only specific networks, consider capturing them in separate processes or sessions.

Capture Tool Configuration#

Table 16 Optional Environment Variables#

Environment Variables

Description

Type

Default Value

TRT_SHIM_OUTPUT_JSON_FILE

Path to save the captured .json file. A corresponding .bin file will be created in the same directory.

string

capture.json

TRT_SHIM_NVINFER_LIB_NAME

Intercepted TensorRT-RTX library name. If unset, dlopen(<lib_name>) finds the library using its normal search rules (a full path is also allowed).

string

libtensorrt_rtx.so

TRT_SHIM_DUMP_API

Print enter and exit messages for every API function call.

bool

false

TRT_SHIM_PRINT_WELCOME

Print Welcome to TensorRT Shim at the start of the run.

bool

false

TRT_SHIM_INLINE_WEIGHTS_LOWER_EQUAL_THAN

Inline weights into the .json instead of a separate .bin when their size (in elements) is ≤ this threshold.

int

8

TRT_SHIM_MARK_AS_RANDOM_WEIGHTS_GREATER_EQUAL_THAN

Skip saving weights with an element count ≥ this threshold (they will be marked as random).

int

Max int

TRT_SHIM_FLUSH_AFTER_EVERY_CALL

Flush captured calls to the file after every API call instead of aggregating them.

bool

false

TRT_SHIM_FLUSH_ON_BUILD

Flush captures calls to the file after every buildSerializedNetwork call. Useful for debugging crashes during engine building or capturing incremental progress in multi-network sessions.

bool

false

Known Limitations#

  • Supports Linux x86_64. Windows is not currently supported.

  • tensorrt_rtx --saveEngine flag is not supported.

Flush Behavior#

The captured data is flushed (written to disk) under any of the following conditions:

  • Process exit - flush occurs automatically when the process completes.

  • Every API call - flush after each captured call when TRT_SHIM_FLUSH_AFTER_EVERY_CALL is set.

  • On build - flush after each buildSerializedNetwork call when TRT_SHIM_FLUSH_ON_BUILD is set.

On each flush, the entire .json file is overwritten, while the .bin file is appended.