Model Export to TensorRT-LLM

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide Model Export to TensorRT-LLM

For text-to-image models, the export script generates three different optimized inference models. The first model is the VAE Decoder, the second model is the UNet, and the third model is the CLIP Encoder.

In the defaults section of conf/config.yaml, update the export field to point to the desired Stable Diffusion inference configuration file. For example, if you want to use the stable_diffusion/export_stable_diffusion.yaml configuration, change the export field to stable_diffusion/export_stable_diffusion.
Copy

Copied!
```
            
            defaults:
  - export: stable_diffusion/export_stable_diffusion
  ...
        
```

In the stages field of conf/config.yaml, make sure the export stage is included. For example,

Copy
Copied!

            
            stages:
  - export
  ...

Configure infer.num_images_per_prompt of the conf/export/stable_diffusion/export_stable_diffusion.yaml file to set the batch_size to use for the ONNX and NVIDIA TensorRT models.

Remarks:

To load a pretrained checkpoint for inference, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format in conf/export/stable_diffusion/export_stable_diffusion.yaml.

Previous Framework Inference

Next Model Deployment