Migrating trtexec Usage from TensorRT 10.x to 11.x#

This page describes how to update trtexec usage when migrating from TensorRT 10.x to 11.x: before/after command examples and lists of removed or deprecated options.

Migrating Precision Flags#

TensorRT 11.x removes all precision flags (--fp16, --int8, --bf16, --fp8, --int4, --best) because all networks are strongly typed. Use ModelOpt AutoCast to produce a mixed-precision model before building.

Before (TensorRT 10.x)#

1 trtexec \
2     --onnx=model.onnx \
3     --saveEngine=engine.plan \
4     --fp16

In TensorRT 11.x, the --fp16 flag and all other precision flags have been removed because all networks are strongly typed. Use ModelOpt AutoCast to produce a mixed-precision model, then build it.

After (TensorRT 11.x)#

1 # Step 1: Convert model to mixed precision offline
2 python -m modelopt.onnx.autocast --onnx_path model.onnx
3
4 # Step 2: Build with trtexec (strong typing is always on)
5 trtexec \
6     --onnx=model_autocast.onnx \
7     --saveEngine=engine.plan

Summary of Changes#

  • Removed --fp16 (and --bf16, --fp8, --int4, --best)

  • Added an offline preprocessing step using ModelOpt AutoCast

  • --stronglyTyped is no longer needed - it is always enabled

Migrating INT8 Calibration Workflow#

TensorRT 11.x removes the --int8 and --calib flags. Quantization is now applied to the model offline using ModelOpt before building.

Before (TensorRT 10.x)#

1 trtexec \
2     --onnx=model.onnx \
3     --saveEngine=engine.plan \
4     --int8 \
5     --calib=calibration_cache.bin

After (TensorRT 11.x)#

1 # Step 1: Quantize the model offline using ModelOpt
2 python -m modelopt.onnx.quantization \
3     --onnx_path model.onnx \
4     --calibration_data data.npz
5
6 # Step 2: Build with trtexec
7 trtexec \
8     --onnx=model_quantized.onnx \
9     --saveEngine=engine.plan

Summary of Changes#

  • Removed --int8 and --calib flags

  • Quantization is applied to the model offline, not during engine build

Removed trtexec Flags and Replacements#

Warning

The flags listed below have been removed in TensorRT 11.x. Using them will cause trtexec to exit with an error. Review each entry for its replacement before upgrading.

--fp16

Use ModelOpt AutoCast to produce a mixed-precision model

--bf16

Use ModelOpt AutoCast to produce a mixed-precision model

--int8

Use ModelOpt quantization to produce a quantized model

--fp8

Use ModelOpt quantization to produce a quantized model

--int4

Use ModelOpt quantization to produce a quantized model

--best

Use ModelOpt AutoCast and/or quantization

--precisionConstraints

Removed (strong typing is always enforced)

--layerPrecisions

Removed (use strong typing in the model)

--layerOutputTypes

Removed (use strong typing in the model)

--calib

Use ModelOpt quantization offline

--calibProfile

Removed

Deprecated trtexec Flags and Replacements#

The following trtexec flags have been deprecated in 11.x but are still accepted. Each entry shows the deprecated flag and its replacement.

--stronglyTyped

Always enabled; flag accepted but has no effect

--inputIOFormats (type+format)

Format-only specification (type portion ignored)

--outputIOFormats (type+format)

Format-only specification (type portion ignored)

--useCudaGraph

Enabled by default; flag accepted but has no effect. Use --noCudaGraph to disable CUDA graph.

--useSpinWait

Enabled by default; flag accepted but has no effect. Use --noSpinWait to disable spin wait for CUDA synchronizations.

--noDataTransfers

Data transfers are disabled by default; flag accepted but has no effect. Use --includeDataTransfers to include data transfer latencies in the benchmarking result.

--separateProfileRun

Always enabled; flag accepted but has no effect. When --dumpProfile or --exportProfile is set, trtexec runs end-to-end performance benchmarking first and then per-layer performance benchmarking as a separate run.