Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Training with Predefined Configurations#
InstructPix2Pix essentially performs tuning on top of an existing Stable Diffusion checkpoint. The recommended
configuration can be found in the conf/training/instruct_pix2pix
directory. You can access and modify the parameters
to customize the hyperparameters according to your specific training requirements.
To enable the training stage with an InstructPix2Pix model, configure the configuration files:
In the
defaults
section ofconf/config.yaml
, update thetraining
field to point to the desired InstructPix2Pix configuration file. For example, if you want to use the860m_sd_edit
, change thetraining
field toinstruct_pix2pix/860m_sd_edit
.defaults: - _self_ - cluster: bcm - data_preparation: null - training: instruct_pix2pix/860m_sd_edit ...
In the
stages
field ofconf/config.yaml
, make sure the training stage is included. For example,stages: - data_preparation - training ...
Execute the launcher pipeline:
python3 main.py
.
Remarks:
You can feed the trained Stable Diffusion checkpoint into InstructPix2Pix training by specifying
training.model.ckpt_path
(or setckpt_path
field in themodel
section of860m_sd_edit.yaml
). The checkpoint can be sourced from either NeMo or Hugging Face in the form of a.ckpt
file.In order to train InstructPix2Pix, a pretrained Stable Diffusion model is required. However, it is important to note that only the UNet component needs to be fine-tuned, while AutoencoderKL and CLIP remain unaltered. We recommend training the base Stable Diffusion model with AutoencoderKL and CLIP, using the pretrained checkpoints for initialization. Please be advised the download scripts that NVIDIA provides are optional to use and will download models that are based on public data which may contain copyrighted material. Consult your legal department before using these scripts.