Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Data Preparation#

Note

It is the responsibility of each user to check the content of the dataset, review the applicable licenses, and determine if it is suitable for their intended use. Users should review any applicable links associated with the dataset before placing the data on their machine.

Controlnet needs an extra conditioning input given in image format, following Stable Diffusion Dataset Preparation, the dataset should be organized into tarfiles in the following way:

contolnet0001.tar
|---- 00000.png (conditioning image)
|---- 00000.jpg (target image)
|---- 00000.txt (text prompt)
|---- 00001.png (conditioning image)
|---- 00001.jpg (target image)
|---- 00001.txt (text prompt)
...

To utilize segmentation maps as conditioning input, the conditioning image can be obtained through a detector model, while text prompts can be derived from blip captioning. For further guidance on preparing your own dataset, you may find the documentation of [ControlNet](lllyasviel/ControlNet) helpful.