Training with Predefined Configurations

InstructPix2Pix essentially performs tuning on top of an existing Stable Diffusion checkpoint. The recommended configuration can be found in the conf/training/instruct_pix2pix directory. You can access and modify the parameters to customize the hyperparameters according to your specific training requirements.

To enable the training stage with an InstructPix2Pix model, configure the configuration files:

In the defaults section of conf/config.yaml, update the training field to point to the desired InstructPix2Pix configuration file. For example, if you want to use the 860m_sd_edit, change the training field to instruct_pix2pix/860m_sd_edit.
```
defaults:
  - _self_
  - cluster: bcm
  - data_preparation: null
  - training: instruct_pix2pix/860m_sd_edit
  ...
```
In the stages field of conf/config.yaml, make sure the training stage is included. For example,
```
stages:
  - data_preparation
  - training
  ...
```
Execute the launcher pipeline: python3 main.py.

Remarks:

You can feed the trained Stable Diffusion checkpoint into InstructPix2Pix training by specifying training.model.ckpt_path (or set ckpt_path field in the model section of 860m_sd_edit.yaml). The checkpoint can be sourced from either NeMo or Hugging Face in the form of a .ckpt file.
In order to train InstructPix2Pix, a pretrained Stable Diffusion model is required. However, it is important to note that only the UNet component needs to be fine-tuned, while AutoencoderKL and CLIP remain unaltered. We recommend training the base Stable Diffusion model with AutoencoderKL and CLIP, using the pretrained checkpoints for initialization. Please be advised the download scripts that NVIDIA provides are optional to use and will download models that are based on public data which may contain copyrighted material. Consult your legal department before using these scripts.