Classifiers#

Base Class#

class nemo_curator.image.classifiers.ImageClassifier( model_name: str, embedding_column: str, pred_column: str, pred_type: str | type, batch_size: int, embedding_size: int, )#

An abstract base class that represents a classifier on top of embeddings generated by a CLIP vision encoder.

Subclasses only need to define how a model is loaded. They may also override the postprocess method if they would like to modify output series of predictions before it gets combined into the dataset. The classifier must be able to fit on a single GPU.

abstract load_model(device: str) → Callable#

Loads the classifier model.

Parameters:

device (str) – A PyTorch device identifier that specifies what GPU to load the model on.

Returns:

A callable model, usually a torch.nn.Module.: The input to this model will be the batches of images output by the ImageEmbedder.load_dataset_shard.

Return type:

Callable

postprocess(series: cudf.Series) → cudf.Series#

Postprocesses the predictions of the classifier before saving them to the metadata.

Parameters:

series (cudf.Series) – The cuDF series of raw model predictions.

Returns:

The same series unmodified. Override in your classifier: if needed.

Return type:

cudf.Series

Image Classifiers#

class nemo_curator.image.classifiers.AestheticClassifier( embedding_column: str = 'image_embedding', pred_column: str = 'aesthetic_score', batch_size: int = -1, model_path: str | None = None, )#

LAION-Aesthetics_Predictor V2 is a linear classifier trained on top of OpenAI CLIP ViT-L/14 image embeddings. It is used to assess the aesthetic quality of images. More information on the model can be found here: https://laion.ai/blog/laion-aesthetics/.

load_model(device: str) → torch.nn.Module#

Loads the classifier model.

Parameters:

device (str) – A PyTorch device identifier that specifies what GPU to load the model on.

Returns:

A callable model, usually a torch.nn.Module.: The input to this model will be the batches of images output by the ImageEmbedder.load_dataset_shard.

Return type:

Callable

postprocess(series: cudf.Series) → cudf.Series#

Postprocesses the predictions of the classifier before saving them to the metadata.

Parameters:

series (cudf.Series) – The cuDF series of raw model predictions.

Returns:

The same series unmodified. Override in your classifier: if needed.

Return type:

cudf.Series

class nemo_curator.image.classifiers.NsfwClassifier( embedding_column: str = 'image_embedding', pred_column: str = 'nsfw_score', batch_size: int = -1, model_path: str | None = None, )#

NSFW Classifier is a small MLP trained on top of OpenAI’s ViT-L CLIP image embeddings. It is used to assess the likelihood of images containing sexually explicit material. More information on the model can be found here: LAION-AI/CLIP-based-NSFW-Detector.

load_model(device: str) → torch.nn.Module#

Loads the classifier model.

Parameters:

device (str) – A PyTorch device identifier that specifies what GPU to load the model on.

Returns:

A callable model, usually a torch.nn.Module.: The input to this model will be the batches of images output by the ImageEmbedder.load_dataset_shard.

Return type:

Callable

postprocess(series: cudf.Series) → cudf.Series#

Postprocesses the predictions of the classifier before saving them to the metadata.

Parameters:

series (cudf.Series) – The cuDF series of raw model predictions.

Returns:

The same series unmodified. Override in your classifier: if needed.

Return type:

cudf.Series