Aesthetic Classifier#
The Aesthetic Classifier predicts the subjective visual quality of images using a model trained on human aesthetic preferences. It outputs a score from 0 (least aesthetic) to 10 (most aesthetic), making it useful for filtering or ranking images in generative pipelines and dataset curation.
Model Details#
Architecture: Linear MLP trained on OpenAI CLIP ViT-L/14 image embeddings
Source: Improved Aesthetic Predictor
Output Field:
aesthetic_score
Score Range: 0–10 (higher is more aesthetic)
Embedding Requirement: CLIP ViT-L/14 (see Image Embedding)
How It Works#
The classifier takes normalized image embeddings and predicts an aesthetic score. It is lightweight and can be run on the GPU alongside embedding computation for efficient batch processing.
Usage#
from nemo_curator import get_client
from nemo_curator.datasets import ImageTextPairDataset
from nemo_curator.image.embedders import TimmImageEmbedder
from nemo_curator.image.classifiers import AestheticClassifier
client = get_client(cluster_type="gpu")
dataset = ImageTextPairDataset.from_webdataset(path="/path/to/dataset", id_col="key")
embedding_model = TimmImageEmbedder(
"vit_large_patch14_clip_quickgelu_224.openai",
pretrained=True,
batch_size=1024,
num_threads_per_worker=16,
normalize_embeddings=True,
)
aesthetic_classifier = AestheticClassifier()
dataset_with_embeddings = embedding_model(dataset)
dataset_with_aesthetic_scores = aesthetic_classifier(dataset_with_embeddings)
dataset_with_aesthetic_scores.save_metadata()
Key Parameters#
Parameter |
Default |
Description |
---|---|---|
|
|
Name of the column with image embeddings |
|
|
Name of the output column for scores |
|
|
Batch size for inference; |
|
auto |
Path to model weights; downloads if not provided |
Performance Notes#
The model is small and can be loaded onto the GPU with the embedding model for fast, in-place scoring.
Batch size can be increased for faster throughput if memory allows.
Best Practices#
Use normalized CLIP ViT-L/14 embeddings for best results.
Run the classifier immediately after embedding to avoid extra I/O.
Review a sample of scores to calibrate thresholds for your use case.