Aesthetic Filter
The Aesthetic Filter predicts the subjective visual quality of images using a model trained on human aesthetic preferences. It outputs an aesthetic score (higher values show more aesthetic images), making it useful for filtering or ranking images in generative pipelines and dataset curation.
Model Details
- Architecture: Multi-layer neural network (MLP) trained on OpenAI CLIP ViT-L/14 image embeddings
- Source: Improved Aesthetic Predictor
- Output Field:
aesthetic_score - Score Range: Continuous values (higher is more aesthetic)
- Embedding Input: CLIP ViT-L/14 embeddings (see Image embeddings)
How It Works
The filter takes pre-computed CLIP ViT-L/14 image embeddings from a previous pipeline stage and predicts an aesthetic score. The lightweight model processes batches of embeddings efficiently on the GPU.
Prerequisites
Before using the ImageAestheticFilterStage, ensure you have:
Model Setup
The aesthetic predictor model weights are automatically downloaded from HuggingFace on first use. The stage will:
- Download the improved aesthetic predictor model (~20MB) to the specified
model_dir - Cache the model for subsequent runs
- Load the model onto GPU (or CPU if GPU unavailable)
First-time setup: The initial model download is quick (under 1 minute on most connections). Subsequent runs will use the cached model.
Required Input
- CLIP Embeddings: Images must have embeddings already generated by
ImageEmbeddingStage - Embedding Format: CLIP ViT-L/14 768-dimensional vectors stored in
ImageObject.embedding
Usage
Python
Parameters
Performance Notes
- The model is small and processes pre-computed embeddings efficiently on the GPU.
- Increase batch size for faster throughput if memory allows.
Best Practices
- Use CLIP ViT-L/14 embeddings generated by
ImageEmbeddingStagefor best results. - Run the aesthetic filter after embedding generation in the same pipeline to avoid extra I/O.
- The filter requires pre-computed embeddings and cannot extract embeddings from raw images.
- Review a sample of scores to calibrate thresholds for your use case.
- Adjust
model_inference_batch_sizebased on available GPU memory.