***

description: >-
Process image data using embeddings, filters, and filtering for high-quality
dataset curation
categories:

* workflows
  tags:
* data-processing
* embedding
* filtering
* gpu-accelerated
  personas:
* data-scientist-focused
* mle-focused
  difficulty: intermediate
  content\_type: workflow
  modality: image-only

***

# Process Data for Image Curation

Process image data you've loaded from tar archives using NeMo Curator's suite of tools. These tools help you generate embeddings, filter images, and prepare your dataset to produce high-quality data for downstream AI tasks such as generative model training, dataset analysis, or quality control.

## How it Works

Image processing in NeMo Curator follows a pipeline-based approach with these stages:

1. **Partition files** using `FilePartitioningStage` to distribute tar files
2. **Read images** using `ImageReaderStage` with DALI acceleration
3. **Generate embeddings** using `ImageEmbeddingStage` with CLIP models
4. **Apply filters** using `ImageAestheticFilterStage` and `ImageNSFWFilterStage`
5. **Save results** using `ImageWriterStage` to export curated datasets

Each stage processes `ImageBatch` objects containing images, metadata, and processing results. You can use built-in stages or create custom stages for advanced use cases.

***

## Embedding Options

<Cards>
  <Card title="CLIP Embedding Stage" href="/curate-images/process-data/embeddings/clip-embedder">
    Generate image embeddings using CLIP models with GPU acceleration. Supports various CLIP architectures and automatic model downloading.
    ImageEmbeddingStage CLIP GPU-accelerated
  </Card>
</Cards>

## Filter Options

<Cards>
  <Card title="Aesthetic Filter Stage" href="/curate-images/process-data/filters/aesthetic">
    Assess the subjective quality of images using a model trained on human aesthetic preferences. Filters images based on aesthetic score thresholds.
    ImageAestheticFilterStage aesthetic\_score
  </Card>

  <Card title="NSFW Filter Stage" href="/curate-images/process-data/filters/nsfw">
    Detect not-safe-for-work (NSFW) content in images using a CLIP-based filter. Filters explicit material from your datasets.
    ImageNSFWFilterStage nsfw\_score
  </Card>
</Cards>
