Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Data Preparation

Note

It is the responsibility of each user to check the content of the dataset, review the applicable licenses, and determine if it is suitable for their intended use. Users should review any applicable links associated with the dataset before placing the data on their machine.

Controlnet needs an extra conditioning input given in image format, following Stable Diffusion Dataset Preparation, the dataset should be organized into tarfiles in the following way:

contolnet0001.tar
|---- 00000.png (conditioning image)
|---- 00000.jpg (target image)
|---- 00000.txt (text prompt)
|---- 00001.png (conditioning image)
|---- 00001.jpg (target image)
|---- 00001.txt (text prompt)
...

To utilize segmentation maps as conditioning input, the conditioning image can be obtained through a detector model, while text prompts can be derived from blip captioning. For further guidance on preparing your own dataset, you may find the documentation of [ControlNet](https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md) helpful.