Model Export to TensorRT-LLM

To enable the export stage with a CLIP model, configure the configuration files:

  1. In the defaults section of conf/config.yaml, update the export field to point to the desired CLIP configuration file. For example, if you want to use the clip/export_clip configuration, change the export field to clip/export_clip.

    Copy
    Copied!
                

    defaults: - export: clip/export_clip ...


  2. In the stages field of conf/config.yaml, make sure the export stage is included. For example,

    Copy
    Copied!
                

    stages: - export ...


  3. Configure infer.max_batch_size of the conf/export/clip/export_clip.yaml file to set the max_batch_size to use for the ONNX and NVIDIA TensorRT model.

  4. Set the resolution of the model with max_dim in the infer field. One can also set the infer.max_text to be the maximum text size for the text_encoder. This will be used to generate the ONNX and NVIDIA TensorRT formats.

Remarks:

  1. To load a pretrained checkpoint for inference, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format in conf/export/clip/export_clip.yaml.

Previous Framework Inference
Next Model Deployment
© Copyright 2023-2024, NVIDIA. Last updated on Feb 22, 2024.