Training with Predefined Configurations

NeMo DreamBooth is fine-tuning on top of an existing Stable Diffusion checkpoint. The recommended configuration can be found in the conf/training/dreambooth directory. You can access and modify the parameters to customize the hyperparameters according to your specific training requirements. The instance dataset should contain several pictures of the object you want to inject into the model. To achieve better quality, 3-5 pictures from different angles are preferred. To enable the training stage with a dreambooth model, make sure:

In the defaults section, update the training field to point to the desired configuration file. For example, dreambooth/860m.yaml.

defaults:
   - _self_
   - cluster: bcm
   - data_preparation: null
   - training: dreambooth/860m.yaml
   ...

In the stages field, make sure the training stage is included. For example,
```
stages:
  ...
  - training
  ...
```
We offer support for optimizing the training process in Dreambooth by using cached latents. This approach boosts training throughput by 75% while reducing GPU memory consumption. To activate this feature, simply append training.model.use_cached_latents=True to your launch command or modify the config file.

When you specify the instance_dir but leave cached_instance_dir null, and set use_cached_latents to True, the latent representations of your input images will be computed and locally stored at the path {instance_dir}_cached prior to training. This preprocessing may take a short moment. Once caching is complete, the training process will automatically continue with using these cached latents.

Remarks:

To train DreamBooth with a prior preservation loss, you need to prepare a regularization dataset. The regularization dataset is usually populated by images generated from a similar prompt without a special token, using the original Stable Diffusion checkpoint that we fine-tuned on. For example, if the instance prompt you are training on is “a photo of a sks dog”, then the regularization data could be generated by a prompt like “a photo of a dog”.
To generate regularization images, pass the Stable Diffusion checkpoint you want to use to training.model.restore_from_path. Note that the .nemo checkpoint is required here. The U-Net weights you want to fine-tune on should be set in training.model.unet_config.from_pretrained. You can follow the same procedure as described above in section Stable Diffusion Training.
In order to train DreamBooth, a pretrained Stable Diffusion model is required. However, it is important to note that only the UNet component needs to be fine-tuned, while AutoencoderKL and CLIP remain unaltered. We recommend training the base Stable Diffusion model with AutoencoderKL and CLIP, using the pretrained checkpoints for initialization. See Stable Diffusion Training for details. Please be advised the scripts that NVIDIA provides are optional to use and will download models that are based on public data which may contain copyrighted material. Consult your legal department before using these scripts.
By default, DreamBooth training results are not stored in NEMO checkpoint format. However, a customized conversion stage is available to convert DreamBooth checkpoint files to ‘.nemo’, enabling compatibility with Stable Diffusion inference pipelines.