Core Concepts¶
TensorRT Workflow¶
The general TensorRT workflow consists of 3 steps:
- Populate a
tensorrt.INetworkDefinition
either with a parser or by using the TensorRT Network API (seetensorrt.INetworkDefinition
for more details). Thetensorrt.Builder
can be used to generate an emptytensorrt.INetworkDefinition
. - Use the
tensorrt.Builder
to build atensorrt.ICudaEngine
using the populatedtensorrt.INetworkDefinition
. - Create a
tensorrt.IExecutionContext
from thetensorrt.ICudaEngine
and use it to perform optimized inference.
Classes Overview¶
Logger¶
Most other TensorRT classes use a logger to report errors, warnings and informative messages. TensorRT provides a basic tensorrt.Logger
implementation, but it can be extended for more advanced functionality.
Engine and Context¶
The tensorrt.ICudaEngine
is the primary element of TensorRT. It is used to generate a tensorrt.IExecutionContext
that can perform inference.
Builder¶
The tensorrt.Builder
is used to build a tensorrt.ICudaEngine
. In order to do so, it must be provided a populated tensorrt.INetworkDefinition
.
Network¶
The tensorrt.INetworkDefinition
represents a computational graph. In order to populate the network, TensorRT provides a suite of parsers for a variety of Deep Learning frameworks. It is also possible to populate the network manually using the Network API.
Parsers¶
Parsers are used to populate a tensorrt.INetworkDefinition
from a model trained in a Deep Learning framework.
TensorRT Object Lifetime Management¶
The legacy bindings required destroy()
calls in order to properly deallocate memory. The preferred method of object lifetime management with the new Python API is to use with ... as ...
clauses to scope objects. For example, a typical inference pipeline using the ONNX parser might look something like this (some code omitted for clarity):
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
def build_engine(onnx_model_file):
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = 2**30
# In this example, we use the ONNX parser, but this should be replaced according to your needs.
# This step might instead use the Caffe/UFF parser, or even the Network API to build a TensorRT Network manually .
with open(model_file, 'rb') as model:
parser.parse(model.read())
# Build and return the engine.
return builder.build_cuda_engine(network)
with build_engine(onnx_model_file) as engine:
# Allocate buffers and create a CUDA stream here.
with engine.create_execution_context() as context:
# Preprocess input (if required) and copy to the GPU, do inference, then copy the output back to the host.