Is this page helpful?

Build Your First Engine#

This tutorial walks you through building and running your first NVIDIA TensorRT engine end-to-end in about 10 minutes. It is intentionally narrow: it picks one model, one build command, and one inference command. For the full menu of workflows (PyTorch source models, ONNX models, multiple runtimes, dynamic shapes, quantization), refer to the Quick Start Guide after you finish this tutorial.

This is a tutorial, not a how-to guide. The goal is to give you a working engine on disk and a successful inference run, not to teach you the TensorRT API. After you finish, you will know that your install works and what a successful build looks like end-to-end.

Prerequisites#

You should already have:

NVIDIA TensorRT 11.2.1 installed with the trtexec tool on your PATH. The Debian, RPM, tar, zip, and container installs include trtexec. The pip wheel installs Python bindings and libraries only, so use one of the other install methods for this tutorial. Recommended path: Installing TensorRT → Debian/RPM, tar/zip, or container → verify with trtexec --help.
A supported NVIDIA GPU (refer to the Support Matrix).
CUDA Toolkit 13.3 update 1 on your PATH (required for the 11.2.1 Debian/RPM/tar/zip packages used with this tutorial). If you only installed CUDA 12.x for pip wheels, switch to a non-pip TensorRT package plus CUDA 13.3 update 1 before continuing. Refer to Prerequisites (Which CUDA should I install?).

You do not need to know the TensorRT API. You do need a working Python environment if you plan to run the optional Step 4 sample.

What you will Build#

By the end of this tutorial, you will have:

A ResNet-50 ONNX model on disk.
A TensorRT engine compiled from that model, saved as resnet50.engine.
A single inference run printed to your terminal.

Total time: about 10 minutes. Total commands: 5.

Step 1: Get the Model#

Download a ResNet-50 ONNX model:

wget https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx \
    -O resnet50.onnx

If wget is not available on your system, download the file with your browser or with curl -L -o resnet50.onnx <url>.

After this step, ls should show resnet50.onnx in the current directory.

Step 2: Build the Engine#

Note

This step requires a supported NVIDIA GPU and a working CUDA driver. Refer to the Support Matrix if build fails with GPU or driver errors.

Build an engine from the ONNX file using the trtexec command-line tool that ships with TensorRT:

trtexec --onnx=resnet50.onnx --saveEngine=resnet50.engine

The build typically takes 30 to 90 seconds on a modern NVIDIA GPU. You will see TensorRT log lines describing layer fusions, kernel selection, and profiling. When the build finishes, trtexec reports the engine size on disk and the time taken to build.

After this step, ls should show both resnet50.onnx and resnet50.engine.

Step 3: Verify the Engine Runs#

Note

You built this engine locally in Step 2, so loading it here is safe. For engines from other sources, refer to Engine Deserialization and the Trust Boundary.

Run inference with random inputs to confirm the engine loads and executes:

trtexec --loadEngine=resnet50.engine --shapes=data:1x3x224x224

Tip

data is the ONNX input tensor name for this ResNet-50 model, and 1x3x224x224 is batch × channels × height × width. For a different ONNX file, inspect input names and shapes (for example with Netron or python -c "import onnx; m=onnx.load('model.onnx'); print([(i.name, [d.dim_value for d in i.type.tensor_type.shape.dim]) for i in m.graph.input])") and substitute them in --shapes.

You should see latency numbers (mean, median, and percentiles) and a PASSED summary at the end of the output. If you see FAILED or any error, refer to Troubleshooting.

What you Just Did#

You proved three things in 5 commands:

Your TensorRT install can read an ONNX file.
The build phase can compile a model into a GPU-specific engine.
The runtime can load and execute that engine.

This is the minimum viable inference loop. Everything else in the docs builds on this foundation.

Step 4 (optional): Run inference from Python#

Steps 1 through 3 are an installation smoke test built around trtexec. For the Python API path, use the maintained ONNX ResNet-50 sample. The sample demonstrates parsing an ONNX model, building an engine, preparing real image input, running inference, and interpreting the result.

The TensorRT samples are the source of truth for API usage and are updated alongside product API changes. This tutorial intentionally avoids duplicating that Python implementation.

Where to go next#

You have a working engine. From here:

TensorRT Capabilities summarizes what the library supports and links to deeper how-to pages.
C++ API Documentation and Python API Documentation walk through the full build and run workflow.
Quick Start Guide covers the same build and run flow with C++ runtime examples, PyTorch source models, and more options.
Working with Dynamic Shapes removes the fixed batch size and resolution you used above.
Performance Benchmarking shows how to measure your engine’s real latency and throughput on your hardware.
Accuracy Considerations explains how to validate numerical results when you move beyond random inputs.
Working with Quantized Types covers INT8 and FP8 paths once you want lower precision than FP16.