Working with Runtime Cache#

TensorRT-RTX by default compiles GPU kernels during runtime. Runtime caching helps reduce the startup overhead of GPU kernel compilation by storing compiled kernels on disk for reuse.

Overview#

When runtime cache is enabled, kernels compiled at runtime can be saved to a local cache file, allowing future runs to load them directly instead of recompiling. Clients can significantly improve performance in workflows with frequent kernel reuse or repeated application runs, resulting in faster startup times, and a smoother user experience. Runtime caching is especially beneficial in production environments or iterative development workflows where minimizing latency is critical.

Compatibility Checks#

When using a pre-populated runtime cache, the cache may have been created in a different or outdated environment. Thus, the runtime cache makes the following checks against the runtime environment to ensure the cache is reusable:

The runtime environment’s GPU SM version should be equivalent to that of the cached version.
The runtime environment’s TensorRT-RTX version should be greater than or equal to that of the cached version.
The runtime environment’s CUDA version should be greater than or equal to that of the cached version.

If the above compatibility checks are not met, the pre-compiled runtime cache will not be used and the cached contents will be replaced after the current execution.

`tensorrt_rtx` Example#

Runtime caching is added to tensorrt_rtx with the --runtimeCacheFile flag, which takes in a file path to the runtime cache file on disk. Ensure that the provided file path has the read and write permissions.

# sample command on Windows
tensorrt_rtx --onnx=sample.onnx --runtimeCacheFile=.\runtime.cache

The first tensorrt_rtx run will fill the cache with the compilation information and serialize it to the specified file. The following tensorrt_rtx runs can reuse the cache file to speed up inference. The acceleration is greatest when the runtime cache is used for the same or similarly-structured models.

Working with Runtime Cache#

Overview#

Compatibility Checks#

APIs#

Creating the Runtime Cache#

Load and Save the Runtime Cache#

`tensorrt_rtx` Example#

Working with Runtime Cache#

Overview#

Compatibility Checks#

APIs#

Creating the Runtime Cache#

Load and Save the Runtime Cache#

tensorrt_rtx Example#

`tensorrt_rtx` Example#