Aesthetic Classifier#

The Aesthetic Classifier predicts the subjective visual quality of images using a model trained on human aesthetic preferences. It outputs a score from 0 (least aesthetic) to 10 (most aesthetic), making it useful for filtering or ranking images in generative pipelines and dataset curation.

Model Details#

Architecture: Linear MLP trained on OpenAI CLIP ViT-L/14 image embeddings
Source: Improved Aesthetic Predictor
Output Field: aesthetic_score
Score Range: 0–10 (higher is more aesthetic)
Embedding Requirement: CLIP ViT-L/14 (see Image Embedding)

How It Works#

The classifier takes normalized image embeddings and predicts an aesthetic score. It is lightweight and can be run on the GPU alongside embedding computation for efficient batch processing.

Usage#

Python

from nemo_curator import get_client
from nemo_curator.datasets import ImageTextPairDataset
from nemo_curator.image.embedders import TimmImageEmbedder
from nemo_curator.image.classifiers import AestheticClassifier

client = get_client(cluster_type="gpu")
dataset = ImageTextPairDataset.from_webdataset(path="/path/to/dataset", id_col="key")

embedding_model = TimmImageEmbedder(
    "vit_large_patch14_clip_quickgelu_224.openai",
    pretrained=True,
    batch_size=1024,
    num_threads_per_worker=16,
    normalize_embeddings=True,
)
aesthetic_classifier = AestheticClassifier()

dataset_with_embeddings = embedding_model(dataset)
dataset_with_aesthetic_scores = aesthetic_classifier(dataset_with_embeddings)

dataset_with_aesthetic_scores.save_metadata()

Key Parameters#

Parameter	Default	Description
`embedding_column`	`image_embedding`	Name of the column with image embeddings
`pred_column`	`aesthetic_score`	Name of the output column for scores
`batch_size`	`-1`	Batch size for inference; `-1` processes all at once
`model_path`	auto	Path to model weights; downloads if not provided

Performance Notes#

The model is small and can be loaded onto the GPU with the embedding model for fast, in-place scoring.
Batch size can be increased for faster throughput if memory allows.

Best Practices#

Use normalized CLIP ViT-L/14 embeddings for best results.
Run the classifier immediately after embedding to avoid extra I/O.
Review a sample of scores to calibrate thresholds for your use case.