Migrating trtexec Usage from TensorRT 10.x to 11.x#
This page describes how to update trtexec usage when migrating from TensorRT 10.x to 11.x: before/after command examples and lists of removed or deprecated options.
Migrating Precision Flags#
TensorRT 11.x removes all precision flags (--fp16, --int8, --bf16, --fp8, --int4, --best) because all networks are strongly typed. Use ModelOpt AutoCast to produce a mixed-precision model before building.
Before (TensorRT 10.x)#
1 trtexec \
2 --onnx=model.onnx \
3 --saveEngine=engine.plan \
4 --fp16
In TensorRT 11.x, the --fp16 flag and all other precision flags have been removed because all networks are strongly typed. Use ModelOpt AutoCast to produce a mixed-precision model, then build it.
After (TensorRT 11.x)#
1 # Step 1: Convert model to mixed precision offline
2 python -m modelopt.onnx.autocast --onnx_path model.onnx
3
4 # Step 2: Build with trtexec (strong typing is always on)
5 trtexec \
6 --onnx=model_autocast.onnx \
7 --saveEngine=engine.plan
Summary of Changes#
Removed
--fp16(and--bf16,--fp8,--int4,--best)Added an offline preprocessing step using ModelOpt AutoCast
--stronglyTypedis no longer needed - it is always enabled
Migrating INT8 Calibration Workflow#
TensorRT 11.x removes the --int8 and --calib flags. Quantization is now applied to the model offline using ModelOpt before building.
Before (TensorRT 10.x)#
1 trtexec \
2 --onnx=model.onnx \
3 --saveEngine=engine.plan \
4 --int8 \
5 --calib=calibration_cache.bin
After (TensorRT 11.x)#
1 # Step 1: Quantize the model offline using ModelOpt
2 python -m modelopt.onnx.quantization \
3 --onnx_path model.onnx \
4 --calibration_data data.npz
5
6 # Step 2: Build with trtexec
7 trtexec \
8 --onnx=model_quantized.onnx \
9 --saveEngine=engine.plan
Summary of Changes#
Removed
--int8and--calibflagsQuantization is applied to the model offline, not during engine build
Removed trtexec Flags and Replacements#
Warning
The flags listed below have been removed in TensorRT 11.x. Using them will cause trtexec to exit with an error. Review each entry for its replacement before upgrading.
--fp16Use ModelOpt AutoCast to produce a mixed-precision model
--bf16Use ModelOpt AutoCast to produce a mixed-precision model
--int8Use ModelOpt quantization to produce a quantized model
--fp8Use ModelOpt quantization to produce a quantized model
--int4Use ModelOpt quantization to produce a quantized model
--bestUse ModelOpt AutoCast and/or quantization
--precisionConstraintsRemoved (strong typing is always enforced)
--layerPrecisionsRemoved (use strong typing in the model)
--layerOutputTypesRemoved (use strong typing in the model)
--calibUse ModelOpt quantization offline
--calibProfileRemoved
Deprecated trtexec Flags and Replacements#
The following trtexec flags have been deprecated in 11.x but are still accepted. Each entry shows the deprecated flag and its replacement.
--stronglyTypedAlways enabled; flag accepted but has no effect
--inputIOFormats(type+format)Format-only specification (type portion ignored)
--outputIOFormats(type+format)Format-only specification (type portion ignored)
--useCudaGraphEnabled by default; flag accepted but has no effect. Use
--noCudaGraphto disable CUDA graph.--useSpinWaitEnabled by default; flag accepted but has no effect. Use
--noSpinWaitto disable spin wait for CUDA synchronizations.--noDataTransfersData transfers are disabled by default; flag accepted but has no effect. Use
--includeDataTransfersto include data transfer latencies in the benchmarking result.--separateProfileRunAlways enabled; flag accepted but has no effect. When
--dumpProfileor--exportProfileis set,trtexecruns end-to-end performance benchmarking first and then per-layer performance benchmarking as a separate run.