CLIP with TAO Deploy#

Pass a CLIP .onnx file—exported using tao model clip export—to tao deploy clip gen_trt_engine to generate an optimized TensorRT engine. For more information about training and exporting a CLIP model, refer to CLIP Training and Deployment.

TAO Deploy supports both combined and separate encoder formats. Refer to Exporting the Model for guidance on which to choose.

gen_trt_engine#

gen_trt_engine:
  onnx_file: /results/clip_experiment/export/clip_model.onnx
  trt_engine: /results/clip_experiment/deploy/clip_model.engine
  batch_size: -1
  tensorrt:
    workspace_size: 4096
    data_type: fp16
    min_batch_size: 1
    opt_batch_size: 8
    max_batch_size: 16

Field

Data type

Description

Default value

gen_trt_engine.onnx_file

string

Path to the input ONNX file. For separate encoders, pass the base path without _vision or _text suffix — TAO detects both files automatically.

gen_trt_engine.trt_engine

string

Output path for the TRT engine. For separate encoders, TAO writes _vision.engine and _text.engine files.

gen_trt_engine.batch_size

int

Engine batch size. Set to -1 for dynamic batch size.

-1

tensorrt.workspace_size

int

TRT workspace size in megabytes.

4096

tensorrt.data_type

string

TRT inference precision. Supported values: fp16, fp32.

fp16

tensorrt.min_batch_size

int

Minimum batch size for dynamic batch optimization.

1

tensorrt.opt_batch_size

int

Optimal batch size for dynamic batch optimization.

1

tensorrt.max_batch_size

int

Maximum batch size for dynamic batch optimization.

16

tao deploy clip gen_trt_engine \
  --workspace-id $WORKSPACE_ID \
  --specs @gen_trt_engine_spec.yaml
tao deploy clip gen_trt_engine -e /path/to/experiment_spec.yaml

Sample Usage

tao deploy clip gen_trt_engine -e /path/to/experiment_spec.yaml \
  gen_trt_engine.onnx_file=/results/clip_experiment/export/clip_model.onnx \
  gen_trt_engine.trt_engine=/results/clip_experiment/deploy/clip_model.engine \
  gen_trt_engine.tensorrt.data_type=fp16

Running Evaluation with a TensorRT Engine#

TRT evaluation reports the same bidirectional retrieval metrics as training-time evaluation (R@1, R@5, R@10, mAP, Median Rank, Mean Rank, AUC). TAO supports both combined and separate engine formats. For separate engines, set evaluate.trt_engine to the base path (without _vision or _text suffix). TAO loads <base>_vision.engine and <base>_text.engine automatically.

You can reuse the TAO evaluation specification for evaluation through a TensorRT engine.

tao deploy clip evaluate \
  --workspace-id $WORKSPACE_ID \
  --specs @eval_spec.yaml
tao deploy clip evaluate -e /path/to/experiment_spec.yaml

Sample Usage

tao deploy clip evaluate -e /path/to/experiment_spec.yaml \
  evaluate.trt_engine=/results/clip_experiment/deploy/clip_model.engine \
  results_dir=/results/clip_experiment/trt_eval

Running Inference with a TensorRT Engine#

TRT inference extracts image and text embeddings in the same HDF5 format as PyTorch inference. TAO supports both combined and separate engine formats. When using separate engines, each tower is optional. Set only inference.trt_engine for image embeddings (vision engine) or only inference.text_file with a text engine to extract text embeddings. Omitting one tower skips that embedding entirely.

You can reuse the TAO inference specification for inference through a TensorRT engine.

tao deploy clip inference \
  --workspace-id $WORKSPACE_ID \
  --specs @inference_spec.yaml
tao deploy clip inference -e /path/to/experiment_spec.yaml

Sample Usage

tao deploy clip inference -e /path/to/experiment_spec.yaml \
  inference.trt_engine=/results/clip_experiment/deploy/clip_model.engine \
  inference.text_file=/data/prompts.txt \
  results_dir=/results/clip_experiment/trt_inference