Core Concepts¶
TensorRT Workflow¶
The general TensorRT workflow consists of 3 steps:
- Populate a
tensorrt.INetworkDefinition
either with a parser or by using the TensorRT Network API (seetensorrt.INetworkDefinition
for more details). Thetensorrt.Builder
can be used to generate an emptytensorrt.INetworkDefinition
. - Use the
tensorrt.Builder
to build atensorrt.ICudaEngine
using the populatedtensorrt.INetworkDefinition
. - Create a
tensorrt.IExecutionContext
from thetensorrt.ICudaEngine
and use it to perform optimized inference.
Classes Overview¶
Logger¶
Most other TensorRT classes use a logger to report errors, warnings and informative messages. TensorRT provides a basic tensorrt.Logger
implementation, but it can be extended for more advanced functionality.
Engine and Context¶
The tensorrt.ICudaEngine
is the primary element of TensorRT. It is used to generate a tensorrt.IExecutionContext
that can perform inference.
Builder¶
The tensorrt.Builder
is used to build a tensorrt.ICudaEngine
. In order to do so, it must be provided a populated tensorrt.INetworkDefinition
.
Network¶
The tensorrt.INetworkDefinition
represents a computational graph. In order to populate the network, TensorRT provides a suite of parsers for a variety of Deep Learning frameworks. It is also possible to populate the network manually using the Network API.
Parsers¶
Parsers are used to populate a tensorrt.INetworkDefinition
from a model trained in a Deep Learning framework.
TensorRT Object Lifetime Management¶
The legacy bindings required destroy()
calls in order to properly deallocate memory. The new API automatically frees memory when objects go out of scope. Even so, it is generally desirable to destroy objects as soon as they are no longer required. The preferred method of object lifetime management with the new Python API is to use with ... as ...
clauses to scope objects. For example, a typical inference pipeline using the ONNX parser might look something like this (inference code omitted for clarity):
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
ONNX_MODEL = "mnist.onnx"
def build_engine():
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, \
trt.OnnxParser(network, TRT_LOGGER) as parser:
# Configure the builder here.
builder.max_workspace_size = 2**30
# In this example, we use the ONNX parser, but this should be replaced
# according to your needs. This step might instead use the Caffe/UFF parser,
# or even the Network API to build a TensorRT Network manually .
with open(ONNX_MODEL, 'rb') as model:
parser.parse(model.read())
# Build and return the engine. Note that the builder,
# network and parser are destroyed when this function returns.
return builder.build_cuda_engine(network)
def do_inference():
with build_engine() as engine, engine.create_execution_context() as context:
# Allocate buffers and create a CUDA stream before inference.
# This should only be done once.
pass
# Preprocess input (if required), then copy to the GPU, do inference,
# and copy the output back to the host.
pass