TRTEXEC with CLIP#

The trtexec tool is a command-line wrapper included as part of the TensorRT samples. TAO 5.0.0 exposes the trtexec tool in the TAO Deploy container (or task group when run via launcher) for deploying the model with an x86-based CPU and discrete GPUs. To run trtexec on other platforms, such as Jetson devices, or with versions of TensorRT that are not used by default in the TAO containers, you can follow the official TensorRT documentation on how to get trtexec.

This section describes how to generate a TensorRT engine using trtexec, which allows you to deploy TAO-trained models on TensorRT, Triton, and Deepstream.

To generate an ONNX file for CLIP, refer to Exporting the Model.

Sample Command: Combined Engine#

A combined CLIP engine contains both the vision and text encoders in a single TensorRT engine. Use this format when you run vision and text encoding together at inference time.

Warning

attention_mask is currently accepted as an explicit graph input for backward compatibility only. This input is deprecated and scheduled for removal. Remove it from your shape profiles and inference code to avoid a future breaking change.

trtexec \
  --onnx=/path/to/clip_model.onnx \
  --minShapes=image:1x3x256x256,input_ids:1x77,attention_mask:1x77 \
  --optShapes=image:8x3x256x256,input_ids:8x77,attention_mask:8x77 \
  --maxShapes=image:16x3x256x256,input_ids:16x77,attention_mask:16x77 \
  --fp16 \
  --saveEngine=/path/to/clip_model.engine

Sample Command: Separate Vision Engine#

Export the vision encoder as a standalone engine when you want to pre-compute image embeddings independently of text encoding.

trtexec \
  --onnx=/path/to/clip_model_vision.onnx \
  --minShapes=image:1x3x256x256 \
  --optShapes=image:8x3x256x256 \
  --maxShapes=image:16x3x256x256 \
  --fp16 \
  --saveEngine=/path/to/clip_model_vision.engine

Sample Command: Separate Text Engine#

Export the text encoder as a standalone engine when you want to pre-compute text embeddings independently — for example, to index a fixed set of captions or class names offline.

trtexec \
  --onnx=/path/to/clip_model_text.onnx \
  --minShapes=input_ids:1x77,attention_mask:1x77 \
  --optShapes=input_ids:8x77,attention_mask:8x77 \
  --maxShapes=input_ids:16x77,attention_mask:16x77 \
  --fp16 \
  --saveEngine=/path/to/clip_model_text.engine