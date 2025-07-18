NVIDIA TensorRT is an SDK for high-performance deep learning inference. It provides APIs and parsers to import trained models from all major deep learning frameworks; it then generates optimized runtime engines deployable in a data center, as well as in automotive and embedded environments. To understand TensorRT and its capabilities better, refer to the official TensorRT documentation.

Models trained in TAO are deployed to NVIDIA inference SDKs, like DeepStream, via TensorRT. Computer vision models trained by TAO can be consumed by TensorRT via tao deploy , which is included as part of the tao launcher. TAO Deploy parses the exported .onnx model file and generates an optimized TensorRT engine. These engines can be generated to support inference at low precision (e.g. FP16 or INT8 ). While most of the TAO models support direct integration of the .onnx files with DeepStream, DeepStream can also consume the optimized engine generated by tao deploy .

TAO Deploy separates the model training and optimization steps from deployment by parsing a .onnx file to generate an optimized TensorRT engine. TAO Deploy also provides tools to run evaluation and inference using the original TAO spec file. With TAO Deploy, you can perform the following tasks:

gen_trt_engine

evaluate

inference

Like other TAO commands, the TAO Deploy CLI follows a cascaded structure:

Copy Copied! tao deploy <task> <sub-task> <args>

Currently, TAO Deploy only supports computer vision models. For example, DetectNet_v2 is a computer vision task for object detection in TAO and supports the gen_trt_engine , evaluate , and inference subtasks. When you execute a command like tao deploy detectnet_v2 gen_trt_engine --help , the TAO Launcher does the following: