CLIP with TAO Deploy#

To generate an optimized TensorRT engine for CLIP, the gen_trt_engine action takes an ONNX file previously produced by the CLIP export action. For more information about training and exporting a CLIP model, refer to CLIP Training and Deployment.

TAO Deploy supports both combined and separate encoder formats. Refer to Exporting the Model for guidance on which to choose.

gen_trt_engine#

gen_trt_engine:
  onnx_file: /results/clip_experiment/export/clip_model.onnx
  trt_engine: /results/clip_experiment/deploy/clip_model.engine
  batch_size: -1
  tensorrt:
    workspace_size: 4096
    data_type: fp16
    min_batch_size: 1
    opt_batch_size: 8
    max_batch_size: 16

Field

Data type

Description

Default value

gen_trt_engine.onnx_file

string

Path to the input ONNX file. For separate encoders, pass the base path without _vision or _text suffix — TAO detects both files automatically.

gen_trt_engine.trt_engine

string

Output path for the TRT engine. For separate encoders, TAO writes _vision.engine and _text.engine files.

gen_trt_engine.batch_size

int

Engine batch size. Set to -1 for dynamic batch size.

-1

tensorrt.workspace_size

int

TRT workspace size in megabytes.

4096

tensorrt.data_type

string

TRT inference precision. Supported values: fp16, fp32.

fp16

tensorrt.min_batch_size

int

Minimum batch size for dynamic batch optimization.

1

tensorrt.opt_batch_size

int

Optimal batch size for dynamic batch optimization.

1

tensorrt.max_batch_size

int

Maximum batch size for dynamic batch optimization.

16

Ask the agent to run the gen_trt_engine action against your spec. For example:

Build an FP16 TensorRT engine for CLIP from the exported ONNX at
``s3://my-bucket/clip/clip_model.onnx`` using ``trt-spec.yaml``. Write the
engine to ``s3://my-bucket/clip/clip_model.engine``. Run on the local Docker backend.

Running Evaluation with a TensorRT Engine#

TRT evaluation reports the same bidirectional retrieval metrics as training-time evaluation (R@1, R@5, R@10, mAP, Median Rank, Mean Rank, AUC). TAO supports both combined and separate engine formats. For separate engines, set evaluate.trt_engine to the base path (without _vision or _text suffix). TAO loads <base>_vision.engine and <base>_text.engine automatically.

You can reuse the TAO evaluation specification for evaluation through a TensorRT engine.

Ask the agent to run the evaluate action against the engine you built. For example:

Evaluate the CLIP TensorRT engine at
``s3://my-bucket/clip/clip_model.engine`` against ``eval-spec.yaml``.
Run on local Docker.

Running Inference with a TensorRT Engine#

TRT inference extracts image and text embeddings in the same HDF5 format as PyTorch inference. TAO supports both combined and separate engine formats. When using separate engines, each tower is optional. Set only inference.trt_engine for image embeddings (vision engine) or only inference.text_file with a text engine to extract text embeddings. Omitting one tower skips that embedding entirely.

You can reuse the TAO inference specification for inference through a TensorRT engine.

Ask the agent to run the inference action against the engine you built. For example:

Run CLIP inference with the TensorRT engine at
``s3://my-bucket/clip/clip_model.engine`` using ``infer-spec.yaml`` and
the prompts in ``s3://my-bucket/clip/prompts.txt``. Run on your chosen
backend.