The deep learning and computer vision models that you’ve trained can be deployed on edge devices, such as Jetson Xavier or Jetson Nano, on discrete GPUs, or in the cloud with NVIDIA GPUs. TAO is designed to integrate with the DeepStream SDK. Models trained with TAO work out of the box with DeepStream.

DeepStream SDK is a streaming analytic toolkit that accelerates building AI-based video analytic applications. This section describes how to deploy a TAO-trained model to DeepStream.

TAO model skills export their trained checkpoints to ONNX. Build the device-specific TensorRT engine from that ONNX with the trtexec tool (or ask the agent to run the model’s gen_trt_engine action), then feed the engine to DeepStream:

  • Refer to the Exporting the Model section for how to export the trained model to ONNX.

  • Refer to Integrating TAO Models into DeepStream for how to wire the engine into a DeepStream pipeline.

Machine-specific optimizations are done as part of the engine-creation process, so a distinct engine must be generated for each target environment and hardware configuration. If the TensorRT or CUDA libraries on the inference host change (including minor versions), or if a new model is generated, the engine must be regenerated. Running an engine built against a different TensorRT or CUDA version is not supported and produces undefined behavior; failures range from degraded accuracy to refusing to load.