This guide provides step-by-step instructions for setting up NeMo Curator’s image curation capabilities. Follow these instructions to prepare your environment and execute your first image curation pipeline.
Ensure your environment meets the following prerequisites for NeMo Curator image curation modules:
If uv is not installed, refer to the Installation Guide for setup instructions, or install it quickly with:
You can install NeMo Curator using one of the following methods:
Install the image modules from PyPI:
NeMo Curator provides a working image curation example in the Image Curation Tutorial. You can adapt this pipeline for your own datasets.
Create directories to store your image datasets and models:
For this example, you’ll need:
.tar files (text and JSON files are ignored during loading)Here’s a simple example to get started with NeMo Curator’s image curation pipeline:
After running the pipeline, you’ll have:
Output Format Details:
.jpg files that passed both aesthetic and NSFW filteringimages-a1b2c3d4e5f6-000000.tar) for uniqueness across distributed processingaesthetic_score and nsfw_score stored in the Parquet filesFor a more comprehensive example with data download and more configuration options, see:
Explore the Image Curation documentation for more advanced processing techniques: