CLIP with TAO Deploy#

To generate an optimized TensorRT™ engine for CLIP, the gen_trt_engine action takes an ONNX file previously produced by the CLIP export action. For more information about training and exporting a CLIP model, refer to CLIP Training and Deployment.

TAO Deploy supports both combined and separate encoder formats. Refer to Exporting the Model for guidance on which to choose.

gen_trt_engine#

gen_trt_engine:
  onnx_file: /results/clip_experiment/export/clip_model.onnx
  trt_engine: /results/clip_experiment/deploy/clip_model.engine
  batch_size: -1
  tensorrt:
    workspace_size: 4096
    data_type: fp16
    min_batch_size: 1
    opt_batch_size: 8
    max_batch_size: 16

Field	Data type	Description	Default value
`gen_trt_engine.onnx_file`	string	Path to the input ONNX file. For separate encoders, pass the base path without `_vision` or `_text` suffix — TAO detects both files automatically.
`gen_trt_engine.trt_engine`	string	Output path for the TRT engine. For separate encoders, TAO writes `_vision.engine` and `_text.engine` files.
`gen_trt_engine.batch_size`	int	Engine batch size. Set to `-1` for dynamic batch size.	`-1`
`tensorrt.workspace_size`	int	TRT workspace size in megabytes.	`4096`
`tensorrt.data_type`	string	TRT inference precision. Supported values: `fp16`, `fp32`.	`fp16`
`tensorrt.min_batch_size`	int	Minimum batch size for dynamic batch optimization.	`1`
`tensorrt.opt_batch_size`	int	Optimal batch size for dynamic batch optimization.	`1`
`tensorrt.max_batch_size`	int	Maximum batch size for dynamic batch optimization.	`16`

Ask the agent to run the gen_trt_engine action against your spec. For example:

Build an FP16 TensorRT engine for CLIP from the exported ONNX at
``s3://my-bucket/clip/clip_model.onnx`` using ``trt-spec.yaml``. Write the
engine to ``s3://my-bucket/clip/clip_model.engine``. Run on the local Docker backend.

Running Evaluation with a TensorRT Engine#

TRT evaluation reports the same bidirectional retrieval metrics as training-time evaluation (R@1, R@5, R@10, mAP, Median Rank, Mean Rank, AUC). TAO supports both combined and separate engine formats. For separate engines, set evaluate.trt_engine to the base path (without _vision or _text suffix). TAO loads <base>_vision.engine and <base>_text.engine automatically.

You can reuse the TAO evaluation specification for evaluation through a TensorRT engine.

Ask the agent to run the evaluate action against the engine you built. For example:

Evaluate the CLIP TensorRT engine at
``s3://my-bucket/clip/clip_model.engine`` against ``eval-spec.yaml``.
Run on local Docker.

Running Inference with a TensorRT Engine#

TRT inference extracts image and text embeddings in the same HDF5 format as PyTorch inference. TAO supports both combined and separate engine formats. When using separate engines, each tower is optional. Set only inference.trt_engine for image embeddings (vision engine) or only inference.text_file with a text engine to extract text embeddings. Omitting one tower skips that embedding entirely.

You can reuse the TAO inference specification for inference through a TensorRT engine.

Ask the agent to run the inference action against the engine you built. For example:

Run CLIP inference with the TensorRT engine at
``s3://my-bucket/clip/clip_model.engine`` using ``infer-spec.yaml`` and
the prompts in ``s3://my-bucket/clip/prompts.txt``. Run on your chosen
backend.