CLIP with TAO Deploy#
To generate an optimized TensorRT™ engine for CLIP, the gen_trt_engine action takes an
ONNX file previously produced by the CLIP export action. For more information about
training and exporting a CLIP model, refer to CLIP Training and Deployment.
TAO Deploy supports both combined and separate encoder formats. Refer to Exporting the Model for guidance on which to choose.
gen_trt_engine#
gen_trt_engine:
onnx_file: /results/clip_experiment/export/clip_model.onnx
trt_engine: /results/clip_experiment/deploy/clip_model.engine
batch_size: -1
tensorrt:
workspace_size: 4096
data_type: fp16
min_batch_size: 1
opt_batch_size: 8
max_batch_size: 16
Field |
Data type |
Description |
Default value |
|---|---|---|---|
|
string |
Path to the input ONNX file. For separate encoders, pass the base path
without |
|
|
string |
Output path for the TRT engine. For separate encoders, TAO writes
|
|
|
int |
Engine batch size. Set to |
|
|
int |
TRT workspace size in megabytes. |
|
|
string |
TRT inference precision. Supported values: |
|
|
int |
Minimum batch size for dynamic batch optimization. |
|
|
int |
Optimal batch size for dynamic batch optimization. |
|
|
int |
Maximum batch size for dynamic batch optimization. |
|
Ask the agent to run the gen_trt_engine action against your spec. For example:
Build an FP16 TensorRT engine for CLIP from the exported ONNX at
``s3://my-bucket/clip/clip_model.onnx`` using ``trt-spec.yaml``. Write the
engine to ``s3://my-bucket/clip/clip_model.engine``. Run on the local Docker backend.
Running Evaluation with a TensorRT Engine#
TRT evaluation reports the same bidirectional retrieval metrics as training-time
evaluation (R@1, R@5, R@10, mAP, Median Rank, Mean Rank, AUC). TAO supports
both combined and separate engine formats. For separate engines, set
evaluate.trt_engine to the base path (without _vision or _text
suffix). TAO loads <base>_vision.engine and <base>_text.engine
automatically.
You can reuse the TAO evaluation specification for evaluation through a TensorRT engine.
Ask the agent to run the evaluate action against the engine you built. For example:
Evaluate the CLIP TensorRT engine at
``s3://my-bucket/clip/clip_model.engine`` against ``eval-spec.yaml``.
Run on local Docker.
Running Inference with a TensorRT Engine#
TRT inference extracts image and text embeddings in the same HDF5 format as
PyTorch inference. TAO supports both combined and separate engine formats.
When using separate engines, each tower is optional. Set only
inference.trt_engine for image embeddings (vision engine) or only
inference.text_file with a text engine to extract text embeddings.
Omitting one tower skips that embedding entirely.
You can reuse the TAO inference specification for inference through a TensorRT engine.
Ask the agent to run the inference action against the engine you built. For example:
Run CLIP inference with the TensorRT engine at
``s3://my-bucket/clip/clip_model.engine`` using ``infer-spec.yaml`` and
the prompts in ``s3://my-bucket/clip/prompts.txt``. Run on your chosen
backend.