Sample Support Guide#

The following samples show how to use NVIDIA TensorRT in numerous use cases while highlighting the different capabilities of the interface.

Note

The TensorRT samples are provided for illustrative purposes only and are not meant to be used or taken as production-quality code examples.

TensorRT Samples#
Sample Title	TensorRT Sample Name	Description
trtexec	`trtexec`	A tool to quickly utilize TensorRT without having to develop your application.
“Hello World” for TensorRT from ONNX	sampleOnnxMNIST	Converts a model trained on the MNIST dataset in ONNX format to a TensorRT network.
Building an RNN Network Layer by Layer	sampleCharRNN	It uses the TensorRT API to build an RNN network layer by layer, sets weights and inputs/outputs, and then performs inference.
Performing Inference in INT8 Precision	sampleINT8API	Sets per tensor dynamic range and computation precision of a layer.
Specifying I/O Formats	sampleIOFormats	It uses an Onnx model trained on the MNIST dataset and performs engine building and inference using TensorRT. The correctness of outputs is then compared to the golden reference.
Digit Recognition with Dynamic Shapes in TensorRT	sampleDynamicReshape	Demonstrates how to use dynamic input dimensions in TensorRT by creating an engine for resizing dynamically shaped inputs to the correct size for an ONNX MNIST model.
Create a Deterministic Build using an Editable Timing Cache	sampleEditableTimingCache	Demonstrates how to build an engine with the desired tactics by modifying the timing cache.
Introduction to Importing ONNX Models into TensorRT using Python	introductory_parser_samples	It uses TensorRT, which includes an ONNX parser, to perform inference with ResNet-50 models trained with various frameworks.
“Hello World” for TensorRT using PyTorch and Python	network_api_pytorch_mnist	An end-to-end sample that trains a model in PyTorch recreates the network in TensorRT, imports weights from the trained model, and finally runs inference with a TensorRT engine.
Writing a TensorRT Plugin to Use a Custom Layer in your ONNX Model	onnx_custom_plugin	Implements a Hardmax Layer as a TensorRT plugin and uses it to run an ONNX BiDAF question-answering model in TensorRT.
Object Detection with the ONNX TensorRT Backend in Python	yolov3_onnx	Implements a full ONNX-based pipeline for performing inference with the YOLOv3-608 network, including pre and post-processing.
TensorRT Inference of ONNX Models with Custom Layers in Python	onnx_packnet	Uses TensorRT to perform inference with a PackNet network. This sample demonstrates using custom layers in ONNX graphs and processing them using ONNX-graphsurgeon API.
Refitting an Engine Built from an ONNX Model in Python	engine_refit_onnx_bidaf	Builds an engine from the ONNX BiDAF model and refits the TensorRT engine with weights from the model.
Scalable and Efficient Object Detection with EfficientDet Networks in Python	efficientdet	Sample application to demonstrate conversion and execution of Google EfficientDet models with TensorRT.
Scalable and Efficient Image Classification with EfficientNet Networks in Python	efficientnet	Sample application to demonstrate conversion and execution of a Google EfficientNet model with TensorRT.
Implementing CoordConv in TensorRT with a Custom Plugin using sampleOnnxMnistCoordConvAC in TensorRT	sampleOnnxMnistCoordConvAC	Contains custom CoordConv layers. It converts a model trained on the MNIST dataset in ONNX format to a TensorRT network and runs inference on the network.
Object Detection with TensorFlow Object Detection API Model Zoo Networks in Python	tensorflow_object_detection_api	Demonstrates the conversion and execution of the Tensorflow Object Detection API Model Zoo models with TensorRT.
Object Detection with Detectron 2 Mask R-CNN R50-FPN 3x Network in Python	detectron2	Demonstrates the conversion and execution of the Detectron 2 Model Zoo Mask R-CNN R50-FPN 3x model with TensorRT.
Working with ONNX Models with Named Input Dimensions	sampleNamedDimensions	An example of parsing an ONNX model with named input dimensions and building the engine for it.
Usage of Progress Monitor During Engine Build	sampleProgressMonitor (C++) simple_progress_reporter (Python)	C++ and Python examples for using Progress Monitor during engine build.
Python-Based TensorRT Plugins	python_plugin	Showcases a Python-based plugin definition in TensorRT.
Building and Refitting Weight-Stripping Engines	sample_weight_stripping	Showcases building and refitting weight-stripped engines from ONNX models.
Plugin with Data-Dependent Output Shapes NonZero	sampleNonZeroPlugin	Demonstrates a plugin with data-dependent output shapes.
Python Plugin with Data-Dependent Output Shapes NonZero	non_zero_plugin	Demonstrates a Python-based plugin with data-dependent output shapes.
Using a Plugin with Aliased I/O to Realize In-Place Updates	aliased_io_plugin	Demonstrates a plugin with aliased I/O.
Quickly Deployable TensorRT Python Plugins	quickly_deployable_plugins	Decorator-based approach to defining TensorRT Python plugins with simpler semantics requiring less code.
DDS Faster R-CNN Object Detection in TensorRT	dds_faster_rcnn	Demonstrates how to deal with data-dependent output shapes with native TensorRT.
Run ONNX with TensorRT	1_run_onnx_with_tensorrt	Demonstrates how to use TensorRT to run an ONNX model.
Construct an LSTM Network with TensorRT Layer APIs	2_construct_lstm_with_layer_apis	Demonstrates how to build an LSTM network with TensorRT layer APIs.

Getting Started with C++ Samples#

You can find the C++ samples in the /usr/src/tensorrt/samples package directory and on GitHub. The following C++ samples are shipped with TensorRT:

“Hello World” for TensorRT from ONNX
Building an RNN Network Layer by Layer
Performing Inference in INT8 Precision
Specifying I/O Formats
Digit Recognition with Dynamic Shapes in TensorRT
Create a Deterministic Build using an Editable Timing Cache
Implementing CoordConv in TensorRT with a Custom Plugin using sampleOnnxMnistCoordConvAC in TensorRT
Working with ONNX Models with Named Input Dimensions
Usage of Progress Monitor During Engine Build
Plugin with Data-Dependent Output Shapes: NonZero

Getting Started with C++ Samples

Every C++ sample includes a README.md file in GitHub that provides detailed information about how the sample works, sample code, and step-by-step instructions on how to run and verify its output.

Running C++ Samples on Linux

If you installed TensorRT using the Debian files, copy /usr/src/tensorrt to a new directory before building the C++ samples. If you installed TensorRT using the tar file, the samples are in {TAR_EXTRACT_PATH}/samples. To build all the samples and then run one of the samples, use the following commands:

$ cd <samples_dir>
$ make -j4
$ cd ../bin
$ ./<sample_bin>

Running C++ Samples on Windows

All C++ samples on Windows are provided as Visual Studio Solution files. To build a sample, open its corresponding Visual Studio Solution file and build the solution. The output executable will be generated in (ZIP_EXTRACT_PATH)\bin. You can then run the executable directly or through Visual Studio.

Getting Started with Python Samples#

You can find the Python samples in the /usr/src/tensorrt/samples/python package directory. The following Python samples are shipped with TensorRT:

Introduction to Importing ONNX Models into TensorRT using Python
“Hello World” for TensorRT using PyTorch and Python
Writing a TensorRT Plugin to Use a Custom Layer in your ONNX Model
Object Detection with the ONNX TensorRT Backend in Python
TensorRT Inference of ONNX Models with Custom Layers in Python
Refitting an Engine Built from an ONNX Model in Python
Scalable and Efficient Object Detection with EfficientDet Networks in Python
Scalable and Efficient Image Classification with EfficientNet Networks in Python
Object Detection with TensorFlow Object Detection API Model Zoo Networks in Python
Object Detection with Detectron 2 Mask R-CNN R50-FPN 3x Network in Python
Usage of Progress Monitor During Engine Build
Python-Based TensorRT Plugins
Building and Refitting Weight-Stripping Engines
Python Plugin with Data-Dependent Output Shapes: NonZero
Using a Plugin with Aliased I/O to Realize In-Place Updates
Quickly Deployable TensorRT Python Plugins
DDS Faster R-CNN Object Detection in TensorRT
Run ONNX with TensorRT
Construct an LSTM Network with TensorRT Layer APIs

Getting Started with Python Samples

Every C++ sample includes a README.md file in GitHub that provides detailed information about how the sample works, sample code, and step-by-step instructions on how to run and verify its output.

Running the Python Samples

To run one of the Python samples, the process typically involves two steps:

Install the sample requirements.
```
python<x> -m pip install -r requirements.txt
```
Where python<x> is either python2 or python3.
Run the sample code with the data directory provided if the TensorRT sample data is not in the default location. For example:
```
python<x> sample.py [-d DATA_DIR]
```

For more information on running samples, refer to the README.md file included with the sample.

Cross Compiling Samples#

The following sections show how to cross-compile TensorRT samples for AArch64 QNX and Linux platforms under x86_64 Linux.

Prerequisites#

This section provides step-by-step instructions to ensure you meet the minimum requirements to cross-compile.

Install the CUDA cross-platform toolkit for the corresponding target and set the environment variable CUDA_INSTALL_DIR.
```
$ export CUDA_INSTALL_DIR="your cuda install dir"
```
Where CUDA_INSTALL_DIR is set to /usr/local/cuda by default.

Note

If you are installing TensorRT using the network repository, then it’s best if you install the cuda-toolkit-X-Y and cuda-cross-<arch>-X-Y packages first to ensure you have all CUDA dependencies required to build the TensorRT samples.
Install the TensorRT cross-compilation Debian packages for the corresponding target.

Note

You can safely skip this step using the target platform’s tar file release. The tar file release already includes the cross-compile libraries, so no additional packages are required.
- QNX AArch64: tensorrt-dev-cross-qnx
- Linux AArch64: tensorrt-dev-cross-aarch64
- Linux SBSA: tensorrt-dev-cross-sbsa

Building Samples for QNX AArch64#

This section provides step-by-step instructions on how to build samples for QNX users.

Download the QNX toolchain and export the following environment variables.

$ export QNX_HOST=/path/to/your/qnx/toolchain/host/linux/x86_64
$ export QNX_TARGET=/path/to/your/qnx/toolchain/target/qnx7

Build the samples.

$ cd /path/to/TensorRT/samples
$ make TARGET=qnx

Building Samples for Linux AArch64#

This section provides step-by-step instructions on how to build samples for JetPack users.

Install the corresponding GCC compiler, aarch64-linux-gnu-g++.
```
$ sudo apt-get install g++-aarch64-linux-gnu
```

Build the samples.

$ cd /path/to/TensorRT/samples
$ make TARGET=aarch64

Building Samples for Linux SBSA#

This section provides step-by-step instructions on how to build samples for Linux SBSA users.

Install the corresponding GCC compiler, aarch64-linux-gnu-g++.
```
$ sudo apt-get install g++-aarch64-linux-gnu
```

Build the samples.

$ cd /path/to/TensorRT/samples
$ make TARGET=aarch64 ARMSERVER=1 DLSW_TRIPLE=aarch64-linux-gnu CUDA_TRIPLE=sbsa-linux CUDA_INSTALL_DIR=<cuda-cross-dir>

Building Samples using Static Libraries#

This section demonstrates how to build the TensorRT samples using the TensorRT static libraries, including other CUDA libraries that are statically linked. The TensorRT samples can be used as a guideline for how to build your application using the TensorRT static libraries if you choose.

Note

You must use the tar package if you wish to build the TensorRT samples statically because some libraries, including some required dependent static libraries and linker scripts, are not included in the Debian or RPM packages. Also, building the TensorRT samples statically is only supported on Linux x86 platforms and not AArch64 or PowerPC.

To build the TensorRT samples using the TensorRT static libraries, you can use the following command.

$ make TRT_STATIC=1

You should append any other Make arguments you would normally include, such as TARGET to indicate the CPU architecture or CUDA_INSTALL_DIR to indicate where CUDA has been installed on your system. The static sample binaries created by the TRT_STATIC make option will have the suffix _static appended to the filename in the output directory to distinguish them from the dynamic sample binaries.

Limitations#

It is required that the same major.minor.patch version of the CUDA toolkit that was used to build TensorRT is used to build your application. Since symbols cannot be hidden or duplicated in a static binary, like they can for dynamic libraries, using the same CUDA toolkit version reduces the chance of symbol conflicts, incompatibilities, or undesired behaviors.

If you are including libnvinfer_static.a and libnvinfer_plugin_static.a in your linker command line, consider using the following linker flags to ensure that all CUDA kernels and TensorRT plug-ins are included in your final application.

-Wl,-whole-archive -lnvinfer_static -Wl,-no-whole-archive
-Wl,-whole-archive -lnvinfer_plugin_static -Wl,-no-whole-archive

If you build the TensorRT samples with a GCC version less than 11.x, you may require the RedHat GCC Toolset 11 non-shared libstdc++ library to avoid missing C++ standard library symbols during linking. You can use the following one-line command to obtain this additional static library, assuming the programs required by this command are already installed on your system.

$ curl -s https://dl.rockylinux.org/pub/rocky/8/AppStream/x86_64/os/Packages/g/gcc-toolset-11-libstdc%2B%2B-devel-11.2.1-9.2.el8_9.x86_64.rpm | rpm2cpio - | bsdtar --strip-components=10 -xf - '*/libstdc++_nonshared.a'

Suppose you are building TensorRT applications with a GCC version less than 11.x. In that case, you may require the linker options below to ensure you use the correct C++ standard library symbols in your application. When linking, your application object files must come after the TensorRT static libraries and whole-archive all TensorRT static libraries to ensure the newer C++ standard library symbols from the RedHat GCC Toolset are used. This change is required to avoid undefined behavior within TensorRT that may lead to a crash.

-Wl,--start-group -Wl,-whole-archive -lnvinfer_static -lnvinfer_plugin_static -lnvonnxparser_static -Wl,-no-whole-archive <object_files> -Wl,--end-group

You may observe relocation issues during linking if the resulting binary exceeds 2 GB. This can occur if you statically link TensorRT and all its dependencies into your application. To workaround this issue, you should move the GPU code to the end of the binary. This may require the linker script below and the following options -mcmodel=large or -Wl,<path/to/fatbin.ld>. The contents of fatbin.ld are listed below.

SECTIONS
{
  nvFatBinSegment : { *(.nvFatBinSegment) }
.nv_fatbin : { *(.nv_fatbin) }
}

Note

Due to a bug in the gcc-toolset-11 linker, you may need to use gcc-toolset-13 to link your application. This bug most frequently occurs when linking with libcudart_static.a using ld.gold, which breaks exception handling and instead causes an abort to be raised.

Machine Comprehension#

Machine comprehension systems translate text from one language to another, make predictions, or answer questions based on a specific context. Recurrent neural networks (RNNs) are one of the most popular deep-learning solutions for machine comprehension.

Building an RNN Network Layer by Layer#

What does this sample do?

This sample, sampleCharRNN, uses the TensorRT API to build an RNN network layer by layer, sets up weights and inputs/outputs, and then performs inference. Specifically, this sample creates a CharRNN network that has been trained on the Tiny Shakespeare dataset. For more information about character-level modeling, refer to char-rnn.

TensorFlow has a useful RNN tutorial, which can be used to train a word-level model. Word-level models learn a probability distribution over all possible word sequences. Since we aim to train a char-level model, which learns a probability distribution over a set of all possible characters, a few modifications will need to be made to get the TensorFlow sample to work. These modifications can be seen here.

Where is this sample located?

This sample is maintained under the samples/sampleCharRNN directory in the GitHub: sampleCharRNN repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleCharRNN. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleCharRNN.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. For specifics about this sample, refer to the GitHub: sampleCharRNN/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Refitting an Engine Built from an ONNX Model in Python#

What does this sample do?

This sample, engine_refit_onnx_bidaf, builds an engine from the ONNX BiDAF model and refits the TensorRT engine with weights from the model. The new refit APIs allow users to locate the weights via names from ONNX models instead of layer names and weight roles.

In the first pass, the weights “Parameter576_B_0” are refitted with empty values, resulting in an incorrect inference result. We refit the engine with the actual weights in the second pass and run inference again. With the weights now set correctly, inference should provide correct results.

By default, the engine will be refitted using GPU weights. This behavior can be changed using the option --weights-location CPU.

Where is this sample located?

This sample is maintained under the samples/python/engine_refit_onnx_bidaf directory in the GitHub: engine_refit_onnx_bidaf repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/engine_refit_onnx_bidaf. If using the tar or zip package, the sample is at <extracted_path>/samples/python/engine_refit_onnx_bidaf.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: engine_refit_onnx_bidaf/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Writing a TensorRT Plugin to Use a Custom Layer in Your ONNX Model#

What does this sample do?

This sample, onnx_custom_plugin, demonstrates how to use plugins written in C++ to run TensorRT on ONNX models with custom or unsupported layers. This sample implements a Hardmax layer and uses it to run a BiDAF question-answering model using the TensorRT ONNX Parser and Python API.

Where is this sample located?

This sample is maintained under the samples/python/onnx_custom_plugin directory in the GitHub: onnx_custom_plugin repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/onnx_custom_plugin. If using the tar or zip package, the sample is at <extracted_path>/samples/python/onnx_custom_plugin.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: /onnx_custom_plugin/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Character Recognition#

Character recognition, especially on the MNIST dataset, is a classic machine learning problem. The MNIST problem involves recognizing the digit in an image of a handwritten digit.

“Hello World” for TensorRT from ONNX#

What does this sample do?

This sample, sampleOnnxMNIST, converts a model trained on the MNIST dataset in ONNX format to a TensorRT network and runs inference on the network.

ONNX is a standard for representing deep learning models that enables models to be transferred between frameworks.

Where is this sample located?

This sample is maintained under the samples/sampleOnnxMNIST directory in the GitHub: sampleOnnxMNIST repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleOnnxMNIST. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleOnnxMNIST.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. For specifics about this sample, refer to the GitHub: sampleOnnxMNIST/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Digit Recognition with Dynamic Shapes in TensorRT#

What does this sample do?

This sample, sampleDynamicReshape, demonstrates how to use dynamic input dimensions in TensorRT by creating an engine for resizing dynamically shaped inputs to the correct size for an ONNX MNIST model. For more information, refer to the Working With Dynamic Shapes section.

This sample creates an engine for resizing an input with dynamic dimensions to a size that an ONNX MNIST model can consume.

Specifically, this sample demonstrates how to:

Create a network with dynamic input dimensions to act as a preprocessor for the model
Parse an ONNX MNIST model to create a second network
Build engines for both networks and start calibration if running in INT8
Run inference using both engines

Where is this sample located?

This sample is maintained under the samples/sampleDynamicReshape directory in the GitHub: sampleDynamicReshape repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleDynamicReshape. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleDynamicReshape.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. For specifics about this sample, refer to the GitHub: sampleDynamicReshape/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Specifying I/O Formats#

What does this sample do?

This sample, sampleIOFormats, uses an ONNX model trained on the MNIST dataset and performs engine building and inference using TensorRT. The correctness of outputs is then compared to the golden reference. Specifically, it shows how to explicitly specify I/O formats for TensorFormat::kLINEAR, TensorFormat::kCHW2, and TensorFormat::kHWC8 for Float16 and INT8 precision.

ITensor::setAllowedFormats is invoked to specify which format is used.

Where is this sample located?

This sample is maintained under the directory samples/sampleIOFormats in the GitHub: sampleIOFormats repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleIOFormats. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleIOFormats.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. Refer to the GitHub: sampleIOFormats/README.md file for specifics about this sample. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

“Hello World” for TensorRT using PyTorch and Python#

What does this sample do?

This sample, network_api_pytorch_mnist, trains a convolutional model on the MNIST dataset and runs inference with a TensorRT engine.

Where is this sample located?

This sample is maintained under the samples/python/network_api_pytorch_mnist directory in the GitHub: network_api_pytorch_mnist repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/network_api_pytorch. If using the tar or zip package, the sample is at <extracted_path>/samples/python/network_api_pytorch.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: /network_api_pytorch_mnist/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Implementing CoordConv in TensorRT with a Custom Plugin using sampleOnnxMnistCoordConvAC in TensorRT#

What does this sample do?

This sample, sampleOnnxMnistCoordConvAC, converts a model trained on the MNIST dataset in Open Neural Network Exchange (ONNX) format to a TensorRT network and runs inference on the network. This model was trained in PyTorch, containing custom CoordConv layers instead of Conv ones.

The model with the CoordConvAC layers training script and code of the CoordConv layers in PyTorch are here. The original model with the Conv layers is here.

This sample creates and runs a TensorRT engine on an ONNX model of MNIST trained with CoordConv layers. It demonstrates how TensorRT can parse and import ONNX models and use plugins to run custom layers in neural networks.

Where is this sample located?

This sample is maintained under the samples/sampleOnnxMnistCoordConvAC directory in the GitHub:sampleOnnxMnistCoordConvAC repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleOnnxMnistCoordConvAC. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleOnnxMnistCoordConvAC.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. For specifics about this sample, refer to the GitHub:/sampleOnnxMnistCoordConvAC/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Create a Deterministic Build using an Editable Timing Cache#

What does this sample do?

This sample, sampleEditableTimingCache, illustrates how to modify the timing cache to build an engine with the desired tactics.

In TensorRT, some layers may have multiple implementations, which are called tactics. When building an engine, all of the tactics will be profiled, and the fastest one will be chosen and written into the TimingCache. In some circumstances, the expected tactic is not the fastest, and the user must replace the best tactic with another tactic. This requirement can be satisfied by editing the timing cache. This sample demonstrates how to achieve this using the Timing Cache editing API and the profiling log.

Where is this sample located?

This sample is maintained under the samples/sampleEditableTimingCache directory in the GitHub:sampleEditableTimingCache repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleEditableTimingCache. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleEditableTimingCache.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. For specifics about this sample, refer to the GitHub:/sampleEditableTimingCache/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Image Classification#

Image classification is the problem of identifying one or more objects present in an image. Convolutional neural networks (CNN) are popular for solving this problem. They are typically composed of convolution and pooling layers.

Performing Inference in INT8 Precision#

What does this sample do?

This sample, sampleINT8API, performs INT8 inference without using the INT8 calibrator, using the user-provided per activation tensor dynamic range. INT8 inference is available only on GPUs with compute capability 6.1 or 7.x and supports Image Classification ONNX models such as ResNet-50, VGG19, and MobileNet.

Specifically, this sample demonstrates how to:

Use nvinfer1::ITensor::setDynamicRange to set per tensor dynamic range
Use nvinfer1::ILayer::setPrecison to set the computation precision of a layer
Use nvinfer1::ILayer::setOutputType to set the output tensor data type of a layer
Perform INT8 inference without using INT8 calibration

Where is this sample located?

This sample is maintained under the samples/sampleINT8API directory in the GitHub: sampleINT8API repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleINT8API. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleINT8API.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. For specifics about this sample, refer to the GitHub: sampleINT8API/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Introduction to Importing ONNX Models into TensorRT using Python#

What does this sample do?

This sample, introductory_parser_samples, is a Python sample that uses TensorRT, and it includes an ONNX parser to perform inference with ResNet-50 models trained with various frameworks.

Where is this sample located?

This sample is maintained under the samples/python/introductory_parser_samples directory in the GitHub: introductory_parser_samples repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/introductory_parser_samples. If using the tar or zip package, the sample is at <extracted_path>/samples/python/introductory_parser_samples.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: introductory_parser_samples/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

TensorRT Inference of ONNX Models with Custom Layers in Python#

What does this sample do?

This sample, onnx_packnet, uses TensorRT to perform inference with the PackNet network. PackNet is a self-supervised monocular depth estimation network used in autonomous driving.

This sample converts the PyTorch graph into ONNX and uses an ONNX parser included in TensorRT to parse the ONNX graph. The sample also demonstrates how to:

Use custom layers (plugins) in an ONNX graph. The REGISTER_TENSORRT_PLUGIN API automatically registers plugins in TensorRT.
Use the ONNX GraphSurgeon (ONNX-GS) API to modify layers or subgraphs in the ONNX graph. For this network, we transform Group Normalization, upsample, and pad layers to remove unnecessary nodes for inference with TensorRT.

Where is this sample located?

This sample is maintained under the samples/python/onnx_packnet directory in the GitHub: onnx_packnet repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/onnx_packnet. If using the tar or zip package, the sample is at <extracted_path>/samples/python/onnx_packnet.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: onnx_packnet/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Scalable and Efficient Image Classification with EfficientNet Networks in Python#

What does this sample do?

This efficientnet sample shows how to convert and execute a Google EfficientNet model with TensorRT. The sample supports models from the original EfficientNet implementation and newer EfficientNet V2 models. The sample code converts a TensorFlow saved model to ONNX and then builds a TensorRT engine. Inference and accuracy validation can also be performed with the helper scripts provided in the sample.

Where is this sample located?

This sample is maintained under the samples/python/efficientnet directory in the GitHub: efficientnet repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/efficientnet. If using the tar or zip package, the sample is at <extracted_path>/samples/python/efficientnet.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: efficientnet/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Object Detection#

Object detection is one of the classic computer vision problems. The task for a given image is to detect, classify, and localize all objects of interest. For example, imagine that you are developing a self-driving car and need to do pedestrian detection. The object detection algorithm would then return bounding box coordinates for each pedestrian in the image.

There have been many advances in designing models for object detection in recent years.

Object Detection with the ONNX TensorRT Backend in Python#

What does this sample do?

This sample, yolov3_onnx, implements a full ONNX-based pipeline for performing inference with the YOLOv3 network. Its input size is 608x608 pixels, including pre- and post-processing. This sample is based on the YOLOv3-608 paper.

Note

This sample is not supported on Ubuntu 14.04 and older. Additionally, the yolov3_to_onnx.py script does not support Python 3.

Where is this sample located?

This sample is maintained under the samples/python/yolov3_onnx directory in the GitHub: yolov3_onnx repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/yolov3_onnx. If using the tar or zip package, the sample is at <extracted_path>/samples/python/yolov2_onnx.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: yolov3_onnx/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Scalable and Efficient Object Detection with EfficientDet Networks in Python#

What does this sample do?

This sample, efficientdet, demonstrates the conversion and execution of Google EfficientDet models with NVIDIA TensorRT. The code converts a TensorFlow checkpoint or saved model to ONNX, adapts the ONNX graph for TensorRT compatibility, and then builds a TensorRT engine. The corresponding scripts provided in the sample can then be used for inference and accuracy validation.

Where is this sample located?

This sample is maintained under the samples/python/efficientdet directory in the GitHub: efficientdet repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/efficientdet. If using the tar or zip package, the sample is at <extracted_path>/samples/python/efficientdet.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: efficientdet/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Object Detection with TensorFlow Object Detection API Model Zoo Networks in Python#

What does this sample do?

This sample, tensorflow_object_detection_api, demonstrates the conversion and execution of the Tensorflow Object Detection API Model Zoo models with NVIDIA TensorRT. The code converts a TensorFlow checkpoint or saved model to ONNX, adapts the ONNX graph for TensorRT compatibility, and then builds a TensorRT engine. Inference and accuracy validation can then be performed using the corresponding scripts provided in the sample.

Where is this sample located?

This sample is maintained under the samples/python/tensorflow_object_detection_api directory in the GitHub: tensorflow_object_detection_api repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/tensorflow_object_detection_api. If using the tar or zip package, the sample is at <extracted_path>/samples/python/tensorflow_object_detection_api.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: tensorflow_object_detection_api/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Object Detection with Detectron 2 Mask R-CNN R50-FPN 3x Network in Python#

What does this sample do?

This sample, detectron2, demonstrates the conversion and execution of the Detectron 2 Model Zoo Mask R-CNN R50-FPN 3x model with NVIDIA TensorRT. The project provides steps to export the Detectron 2 model to ONNX, code adapts the ONNX graph for TensorRT compatibility, and then builds a TensorRT engine. Inference and accuracy validation can then be performed using the corresponding scripts provided in the sample.

Where is this sample located?

This sample is maintained under the samples/python/detectron2 directory in the GitHub: detectron2 repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/detectron2. If using the tar or zip package, the sample is at <extracted_path>/samples/python/detectron2.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: detectron2/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Note

This sample cannot be run on Jetson platforms as torch.distributed is unavailable. To check whether your platform supports torch.distributed, open a Python shell, and confirm that torch.distributed.is_available() returns True.

Other Features#

Working with ONNX Models with Named Input Dimensions#

What does this sample do?

This sample, sampleNamedDimensions, illustrates the feature of named input dimensions. Specifically, a simple one-layer ONNX model with named dimension parameters in the model input is generated and then passed to TensorRT for parsing and engine building.

Where is this sample located?

This sample is maintained under the samples/sampleNamedDimensions directory in the GitHub: sampleNamedDimensions repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleNamedDimensions. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleNamedDimensions.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. For specifics about this sample, refer to the GitHub: sampleNamedDimensions/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Usage of Progress Monitor During Engine Build#

sampleProgressMonitor and simple_progress_reporter use the Progress Monitor during engine build.

What do these samples do?

sampleProgressMonitor is a C++ sample that shows an example of how to use the progress monitor API. This sample demonstrates the usage of IProgressMonitor to report the status of TensorRT engine-building operations.

simple_progress_reporter is a Python sample that uses TensorRT, and it includes an ONNX parser to perform inference with ResNet-50 models saved in ONNX format. It displays animated progress bars while TensorRT builds the engine.

Where are these samples located?

sampleProgressMonitor is maintained under the samples/sampleProgressMonitor directory in the GitHub: sampleProgressMonitor repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleProgressMonitor. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleProgressMonitor.

simple_progress_reporter sample is maintained under the samples/python/simple_progress_reporter directory in the GitHub: simple_progress_reporter repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/simple_progress_reporter. If using the tar or zip package, the sample is at <extracted_path>/samples/python/simple_progress_reporter.

How do I get started?

For more information, refer to the sections Getting Started with C++ Samples and Getting Started with Python Samples. For specifics about these samples, refer to the files GitHub: sampleProgressMonitor/README.md and GitHub: simple_progress_reporter/README.md. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Python-Based TensorRT Plugins#

What does this sample do?

python_plugin showcases the definitions of Python-based plugins in TensorRT. No changes to existing TensorRT APIs have been made to deliver this feature, so using the updated bindings should not break any existing code.

circ_pad_plugin_multi_tactic.py demonstrates the custom tactic functionality and timing caching functionality provided by IPluginV3.

Where is this sample located?

This sample is maintained under the samples/python/python_plugin directory in the GitHub: python_plugin repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/python_plugin. If using the tar or zip package, the sample is at <extracted_path>/samples/python/python_plugin.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: python_plugin/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Building and Refitting Weight-Stripping Engines#

What does this sample do?

This sample_weight_stripping sample is a Python sample that showcases building and refitting weight-stripping engines from ONNX models in TensorRT.

Where is this sample located?

This sample is maintained under the samples/python/sample_weight_stripping directory in the GitHub: sample_weight_stripping repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/sample_weight_stripping. If using the tar or zip package, the sample is at <extracted_path>/samples/python/sample_weight_stripping.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: sample_weight_stripping/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Plugin with Data-Dependent Output Shapes: NonZero#

What does this sample do?

This sample, sampleNonZeroPlugin, is a C++ sample that showcases, using the NonZero operator as an example, how to implement a TensorRT plugin with data-dependent output shapes using the IPluginV3 interface.

Where is this sample located?

This sample is maintained under the samples/sampleNonZeroPlugin directory in the GitHub: sampleNonZeroPlugin repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/sampleNonZeroPlugin. If using the tar or zip package, the sample is at <extracted_path>/samples/sampleNonZeroPlugin.

How do I get started?

For more information, refer to the Getting Started with C++ Samples section. For specifics about this sample, refer to the GitHub: sampleNonZeroPlugin/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Python Plugin with Data-Dependent Output Shapes: NonZero#

What does this sample do?

This sample, non_zero_plugin, is a Python sample that showcases, by taking the NonZero operator as an example, how to implement a TensorRT plugin with data-dependent output shapes using the IPluginV3 interface. It is a Python-based version of the C++ sample sampleNonZeroPlugin.

Where is this sample located?

This sample is maintained under the samples/python/non_zero_plugin directory in the GitHub: non_zero_plugin repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/non_zero_plugin. If using the tar or zip package, the sample is at <extracted_path>/samples/python/non_zero_plugin.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: non_zero_plugin/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Using a Plugin with Aliased I/O to Realize In-Place Updates#

What does this sample do?

This sample, aliased_io_plugin, is a Python sample that showcases, by taking a plugin for an in-place scatter-add operation as an example, how to use aliased I/O with TensorRT plugins.

Where is this sample located?

This sample is maintained under the samples/python/aliased_io_plugin directory in the GitHub: aliased_io_plugin repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/aliased_io_plugin. If using the tar or zip package, the sample is at <extracted_path>/samples/python/aliased_io_plugin.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: aliased_io_plugin/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Quickly Deployable TensorRT Python Plugins#

What does this sample do?

This Python sample, quickly_deployable_plugins, showcases quickly deployable Python-based plugin definitions (QDPs) in TensorRT. QDPs are a simple and intuitive decorator-based approach to defining TensorRT plugins, requiring drastically less code.

Two types of QDPs are demonstrated in this sample: just-in-time (JIT) QDPs and ahead-of-time (AOT) QDPs. JIT QDPs are often simpler to write, however, they establish a dependency on the plugin source (and hence Python) to be available at runtime. AOT QDPs allow the plugin to be fully embedded into the TensorRT engine such that there is no plugin source, library, or Python dependency at runtime.

Where is this sample located?

This sample is maintained under the samples/python/quickly_deployable_plugins directory in the GitHub: quickly_deployable_plugins repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/quickly_deployable_plugins. If using the tar or zip package, the sample is at <extracted_path>/samples/python/quickly_deployable_plugins.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: quickly_deployable_plugins/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

DDS Faster R-CNN Object Detection in TensorRT#

What does this sample do?

The dds_faster_rcnn sample demonstrates the usage of tensorrt.IOutputAllocator in TensorRT to execute networks with data-dependent shape (DDS) outputs. In this sample, we showcase an end-to-end workflow for building and running an object detection model Faster-RCNN.

Where is this sample located?

This sample is maintained under the samples/python/dds_faster_rcnn directory in the GitHub: dds_faster_rcnn repository. If using the Debian or RPM package, the sample is located at /usr/src/tensorrt/samples/python/dds_faster_rcnn. If using the tar or zip package, the sample is at <extracted_path>/samples/python/dds_faster_rcnn.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub:dds_faster_rcnn/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Refactored Samples#

Explore our refactored TensorRT samples in this section, designed to be standalone and easier to understand. Each feature includes improved readability and comprehensive comments. Python examples are provided as Jupyter Notebooks for an interactive experience.

Run ONNX with TensorRT#

What does this sample do?

This sample demonstrates converting a pre-trained EfficientNet-B0 ONNX model to a TensorRT engine and performing inference with performance comparison. Specifically, it shows how to use TensorRT’s ONNX parser to build an optimized engine, handle input/output tensors, and compare inference performance between ONNX Runtime and TensorRT with proper memory management and resource cleanup.

Where is this sample located?

This sample is maintained under the samples/python/refactored/1_run_onnx_with_tensorrt directory in the TensorRT repository. The sample includes a Jupyter notebook (main.ipynb) that provides an interactive walkthrough of the ONNX to TensorRT conversion process.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: 1_run_onnx_with_tensorrt/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.

Construct an LSTM Network with TensorRT Layer APIs#

What does this sample do?

This sample demonstrates how to build a TensorRT network definition from scratch using the TensorRT Layer APIs, focusing on constructing a recurrent neural network (LSTM). Specifically, it shows how to define individual network layers and their connections programmatically, implement recurrent logic using TensorRT’s loop constructs, monitor engine build progress, configure version-compatible engines, and verify correctness against a NumPy reference implementation.

Where is this sample located?

This sample is maintained under the samples/python/refactored/2_construct_network_with_layer_apis directory in the TensorRT repository. The sample includes a Jupyter notebook (main.ipynb) that provides an interactive walkthrough of building neural networks from scratch using TensorRT’s Layer APIs.

How do I get started?

For more information, refer to the Getting Started with Python Samples section. For specifics about this sample, refer to the GitHub: 2_construct_network_with_layer_apis/README.md file. It provides detailed information about how it works, sample code, and step-by-step instructions on running and verifying its output.