Model Export to TensorRT-LLM

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide Model Export to TensorRT-LLM

For text-to-image models, the export script generates two different optimized inference models. The first model is the UNet, and the second model is the T5 encoder. The script generates separate UNet model for different resolutions (e.g. 64x64, 256x256, 1024x1024)

In the defaults section of conf/config.yaml, update the export field to point to the desired Stable Diffusion inference configuration file. For example, if you want to use the imagen/export_imagen.yaml configuration, change the export field to imagen/export_imagen.
Copy

Copied!
```
            
            defaults:
- export: imagen/export_imagen
...
        
```
In the stages field of conf/config.yaml, make sure the export stage is included. For example,
Copy

Copied!
```
            
            stages:
- export
...
        
```
Configure infer.num_images_per_prompt of the conf/export/imagen/export_imagen.yaml file to set the batch_size to use for the ONNX and NVIDIA TensorRT models.

Remarks:

To load a pretrained checkpoint for inference, set the base_ckpt, sr256_ckpt, sr1024_ckpt field in the model.customized_model section to the path of the pretrained checkpoint in .nemo format in conf/export/imagen/export_imagen.yaml. Make sure model.target_resolution is set to desired resolution.

Previous Framework Inference

Next Model Deployment