Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Training with Predefined Configurations
InstructPix2Pix essentially performs tuning on top of an existing Stable Diffusion checkpoint. The recommended
configuration can be found in the conf/training/instruct_pix2pix
directory. You can access and modify the parameters
to customize the hyperparameters according to your specific training requirements.
To enable the training stage with an InstructPix2Pix model, configure the configuration files:
In the
defaults
section ofconf/config.yaml
, update thetraining
field to point to the desired InstructPix2Pix configuration file. For example, if you want to use the860m_sd_edit
, change thetraining
field toinstruct_pix2pix/860m_sd_edit
.defaults: - _self_ - cluster: bcm - data_preparation: null - training: instruct_pix2pix/860m_sd_edit ...
In the
stages
field ofconf/config.yaml
, make sure the training stage is included. For example,stages: - data_preparation - training ...
Execute the launcher pipeline:
python3 main.py
.
Remarks:
You can feed the trained Stable Diffusion checkpoint into InstructPix2Pix training by specifying
training.model.ckpt_path
(or setckpt_path
field in themodel
section of860m_sd_edit.yaml
). The checkpoint can be sourced from either NeMo or Hugging Face in the form of a.ckpt
file.In order to train InstructPix2Pix, a pretrained Stable Diffusion model is required. However, it is important to note that only the UNet component needs to be fine-tuned, while AutoencoderKL and CLIP remain unaltered. We recommend training the base Stable Diffusion model with AutoencoderKL and CLIP, using the pretrained checkpoints for initialization. Please be advised the download scripts that NVIDIA provides are optional to use and will download models that are based on public data which may contain copyrighted material. Consult your legal department before using these scripts.