Core Concepts

TensorRT Workflow

The general TensorRT workflow consists of 3 steps:

  1. Populate a tensorrt.INetworkDefinition either with a parser or by using the TensorRT Network API (see tensorrt.INetworkDefinition for more details). The tensorrt.Builder can be used to generate an empty tensorrt.INetworkDefinition .

  2. Use the tensorrt.Builder to build a tensorrt.ICudaEngine using the populated tensorrt.INetworkDefinition .

  3. Create a tensorrt.IExecutionContext from the tensorrt.ICudaEngine and use it to perform optimized inference.

Classes Overview


Most other TensorRT classes use a logger to report errors, warnings and informative messages. TensorRT provides a basic tensorrt.Logger implementation, but you can write your own implementation by deriving from tensorrt.ILogger for more advanced functionality.


Parsers are used to populate a tensorrt.INetworkDefinition from a model trained in a Deep Learning framework.


The tensorrt.INetworkDefinition represents a computational graph. In order to populate the network, TensorRT provides a suite of parsers for a variety of Deep Learning frameworks. It is also possible to populate the network manually using the Network API.


The tensorrt.Builder is used to build a tensorrt.ICudaEngine . In order to do so, it must be provided a populated tensorrt.INetworkDefinition .

Engine and Context

The tensorrt.ICudaEngine is the output of the TensorRT optimizer. It is used to generate a tensorrt.IExecutionContext that can perform inference.