Core Concepts

TensorRT Workflow

The general TensorRT workflow consists of 3 steps:

  1. Populate a tensorrt.INetworkDefinition either with a parser or by using the TensorRT Network API (see tensorrt.INetworkDefinition for more details). The tensorrt.Builder can be used to generate an empty tensorrt.INetworkDefinition .
  2. Use the tensorrt.Builder to build a tensorrt.ICudaEngine using the populated tensorrt.INetworkDefinition .
  3. Create a tensorrt.IExecutionContext from the tensorrt.ICudaEngine and use it to perform optimized inference.

Classes Overview

Logger

Most other TensorRT classes use a logger to report errors, warnings and informative messages. TensorRT provides a basic tensorrt.Logger implementation, but it can be extended for more advanced functionality.

Engine and Context

The tensorrt.ICudaEngine is the primary element of TensorRT. It is used to generate a tensorrt.IExecutionContext that can perform inference.

Builder

The tensorrt.Builder is used to build a tensorrt.ICudaEngine . In order to do so, it must be provided a populated tensorrt.INetworkDefinition .

Network

The tensorrt.INetworkDefinition represents a computational graph. In order to populate the network, TensorRT provides a suite of parsers for a variety of Deep Learning frameworks. It is also possible to populate the network manually using the Network API.

Parsers

Parsers are used to populate a tensorrt.INetworkDefinition from a model trained in a Deep Learning framework.

TensorRT Object Lifetime Management

The legacy bindings required destroy() calls in order to properly deallocate memory. The preferred method of object lifetime management with the new Python API is to use with ... as ... clauses to scope objects. For example, a typical inference pipeline using the ONNX parser might look something like this (some code omitted for clarity):

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

def build_engine(onnx_model_file):
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_workspace_size = 2**30
        # In this example, we use the ONNX parser, but this should be replaced according to your needs.
        # This step might instead use the Caffe/UFF parser, or even the Network API to build a TensorRT Network manually .
        with open(model_file, 'rb') as model:
            parser.parse(model.read())
        # Build and return the engine.
        return builder.build_cuda_engine(network)

with build_engine(onnx_model_file) as engine:
    # Allocate buffers and create a CUDA stream here.
    with engine.create_execution_context() as context:
        # Preprocess input (if required) and copy to the GPU, do inference, then copy the output back to the host.