CLIP with TAO Deploy#
Pass a CLIP .onnx file—exported using tao model clip export—to
tao deploy clip gen_trt_engine to generate an optimized TensorRT™ engine. For more information about training and
exporting a CLIP model, refer to CLIP Training and Deployment.
TAO Deploy supports both combined and separate encoder formats. Refer to Exporting the Model for guidance on which to choose.
gen_trt_engine#
gen_trt_engine:
onnx_file: /results/clip_experiment/export/clip_model.onnx
trt_engine: /results/clip_experiment/deploy/clip_model.engine
batch_size: -1
tensorrt:
workspace_size: 4096
data_type: fp16
min_batch_size: 1
opt_batch_size: 8
max_batch_size: 16
Field |
Data type |
Description |
Default value |
|---|---|---|---|
|
string |
Path to the input ONNX file. For separate encoders, pass the base path
without |
|
|
string |
Output path for the TRT engine. For separate encoders, TAO writes
|
|
|
int |
Engine batch size. Set to |
|
|
int |
TRT workspace size in megabytes. |
|
|
string |
TRT inference precision. Supported values: |
|
|
int |
Minimum batch size for dynamic batch optimization. |
|
|
int |
Optimal batch size for dynamic batch optimization. |
|
|
int |
Maximum batch size for dynamic batch optimization. |
|
tao deploy clip gen_trt_engine \
--workspace-id $WORKSPACE_ID \
--specs @gen_trt_engine_spec.yaml
tao deploy clip gen_trt_engine -e /path/to/experiment_spec.yaml
Sample Usage
tao deploy clip gen_trt_engine -e /path/to/experiment_spec.yaml \
gen_trt_engine.onnx_file=/results/clip_experiment/export/clip_model.onnx \
gen_trt_engine.trt_engine=/results/clip_experiment/deploy/clip_model.engine \
gen_trt_engine.tensorrt.data_type=fp16
Running Evaluation with a TensorRT Engine#
TRT evaluation reports the same bidirectional retrieval metrics as training-time
evaluation (R@1, R@5, R@10, mAP, Median Rank, Mean Rank, AUC). TAO supports
both combined and separate engine formats. For separate engines, set
evaluate.trt_engine to the base path (without _vision or _text
suffix). TAO loads <base>_vision.engine and <base>_text.engine
automatically.
You can reuse the TAO evaluation specification for evaluation through a TensorRT engine.
tao deploy clip evaluate \
--workspace-id $WORKSPACE_ID \
--specs @eval_spec.yaml
tao deploy clip evaluate -e /path/to/experiment_spec.yaml
Sample Usage
tao deploy clip evaluate -e /path/to/experiment_spec.yaml \
evaluate.trt_engine=/results/clip_experiment/deploy/clip_model.engine \
results_dir=/results/clip_experiment/trt_eval
Running Inference with a TensorRT Engine#
TRT inference extracts image and text embeddings in the same HDF5 format as
PyTorch inference. TAO supports both combined and separate engine formats.
When using separate engines, each tower is optional. Set only
inference.trt_engine for image embeddings (vision engine) or only
inference.text_file with a text engine to extract text embeddings.
Omitting one tower skips that embedding entirely.
You can reuse the TAO inference specification for inference through a TensorRT engine.
tao deploy clip inference \
--workspace-id $WORKSPACE_ID \
--specs @inference_spec.yaml
tao deploy clip inference -e /path/to/experiment_spec.yaml
Sample Usage
tao deploy clip inference -e /path/to/experiment_spec.yaml \
inference.trt_engine=/results/clip_experiment/deploy/clip_model.engine \
inference.text_file=/data/prompts.txt \
results_dir=/results/clip_experiment/trt_inference