Aesthetic Filter#

The Aesthetic Filter predicts the subjective visual quality of images using a model trained on human aesthetic preferences. It outputs an aesthetic score (higher values show more aesthetic images), making it useful for filtering or ranking images in generative pipelines and dataset curation.

Model Details#

  • Architecture: Multi-layer neural network (MLP) trained on OpenAI CLIP ViT-L/14 image embeddings

  • Source: Improved Aesthetic Predictor

  • Output Field: aesthetic_score

  • Score Range: Continuous values (higher is more aesthetic)

  • Embedding Input: CLIP ViT-L/14 embeddings (see Image Embedding)

How It Works#

The filter takes pre-computed CLIP ViT-L/14 image embeddings from a previous pipeline stage and predicts an aesthetic score. The lightweight model processes batches of embeddings efficiently on the GPU.

Usage#

from nemo_curator.pipeline import Pipeline
from nemo_curator.stages.file_partitioning import FilePartitioningStage
from nemo_curator.stages.image.io.image_reader import ImageReaderStage
from nemo_curator.stages.image.embedders.clip_embedder import ImageEmbeddingStage
from nemo_curator.stages.image.filters.aesthetic_filter import ImageAestheticFilterStage

# Create pipeline
pipeline = Pipeline(name="aesthetic_filtering", description="Filter images by aesthetic quality")

# Stage 1: Partition tar files
pipeline.add_stage(FilePartitioningStage(
    file_paths="/path/to/tar_dataset",
    files_per_partition=1,
    file_extensions=[".tar"],
))

# Stage 2: Read images
pipeline.add_stage(ImageReaderStage(
    task_batch_size=100,
    num_gpus_per_worker=0.25,
))

# Stage 3: Generate CLIP embeddings
pipeline.add_stage(ImageEmbeddingStage(
    model_dir="/path/to/models",
    model_inference_batch_size=32,
    num_gpus_per_worker=0.25,
))

# Stage 4: Apply aesthetic filtering
pipeline.add_stage(ImageAestheticFilterStage(
    model_dir="/path/to/models",
    score_threshold=0.5,
    model_inference_batch_size=32,
    num_gpus_per_worker=0.25,
))

# Run the pipeline (uses XennaExecutor by default)
results = pipeline.run()

Parameters#

Parameter

Type

Default

Description

model_dir

str

None

Path to directory containing model weights

score_threshold

float

0.5

Aesthetic score threshold for filtering (filters out images below this threshold)

model_inference_batch_size

int

32

Batch size for model inference

num_gpus_per_worker

float

0.25

GPU allocation per worker (0.25 = 1/4 GPU)

verbose

bool

False

Enable verbose logging for debugging

Performance Notes#

  • The model is small and processes pre-computed embeddings efficiently on the GPU.

  • Increase batch size for faster throughput if memory allows.

Best Practices#

  • Use CLIP ViT-L/14 embeddings generated by ImageEmbeddingStage for best results.

  • Run the aesthetic filter after embedding generation in the same pipeline to avoid extra I/O.

  • The filter requires pre-computed embeddings and cannot extract embeddings from raw images.

  • Review a sample of scores to calibrate thresholds for your use case.

  • Adjust model_inference_batch_size based on available GPU memory.

Resources#