Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Data Preparation

Note

It is the responsibility of each user to check the content of the dataset, review the applicable licenses, and determine if it is suitable for their intended use. Users should review any applicable links associated with the dataset before placing the data on their machine.

The NSFW system operates on a simple directory-based dataset. This dataset should have two main directories: safe and nsfw. Each directory should contain JPEG files. Additionally, there should be a concepts.txt file that outlines the concepts to be emphasized during detection. An example of this concept list can be found at data/nsfw/concepts.txt.

To use a custom dataset for fine-tuning:

  1. Download the dataset.

  2. Move the downloaded data to the appropriate directories, corresponding to the desired class: ${data_dir}/nsfw/safe and ${data_dir}/nsfw/nsfw.

The directory structure should look like this:

├── concepts.txt
├── train
│   ├── nsfw  # Folder containing NSFW images
│   └── safe  # Folder containing safe images
└── val
    ├── nsfw  # Folder containing NSFW images
    └── safe  # Folder containing safe images