Curate ImagesProcess Data

Process Data for Image Curation

View as Markdown

Process image data you’ve loaded from tar archives using NeMo Curator’s suite of tools. These tools help you generate embeddings, filter images, and prepare your dataset to produce high-quality data for downstream AI tasks such as generative model training, dataset analysis, or quality control.

How it Works

Image processing in NeMo Curator follows a pipeline-based approach with these stages:

  1. Partition files using FilePartitioningStage to distribute tar files
  2. Read images using ImageReaderStage with DALI acceleration
  3. Generate embeddings using ImageEmbeddingStage with CLIP models
  4. Apply filters using ImageAestheticFilterStage and ImageNSFWFilterStage
  5. Save results using ImageWriterStage to export curated datasets

Each stage processes ImageBatch objects containing images, metadata, and processing results. You can use built-in stages or create custom stages for advanced use cases.


Embedding Options

Filter Options