nemo_curator.models.clip

Module Contents

Classes

Name	Description
`CLIPAestheticScorer`	A model that chains CLIPImageEmbeddings and AestheticScorer models.
`CLIPImageEmbeddings`	Interface for generating CLIP image embeddings from input images.

Data

_CLIP_MODEL_ID

_CLIP_MODEL_REVISION

API

class nemo_curator.models.clip.CLIPAestheticScorer(
    model_dir: str
)

Bases: ModelInterface

A model that chains CLIPImageEmbeddings and AestheticScorer models.

_aesthetic_model

AestheticScorer | None = None

_clip_model

CLIPImageEmbeddings | None = None

model_id_names

list[str]

Get the model ID names.

nemo_curator.models.clip.CLIPAestheticScorer.__call__(
    images: torch.Tensor | numpy.typing.NDArray[numpy.uint8]
) -> torch.Tensor

Call the CLIPAestheticScorer model.

Parameters:

images

torch.Tensor | npt.NDArray[np.uint8]

The images to score.

Returns: torch.Tensor

The scores.

nemo_curator.models.clip.CLIPAestheticScorer.download_weights_on_node(
    model_dir: str
) -> None

classmethod

Download the weights for the CLIPAestheticScorer model on the node.

nemo_curator.models.clip.CLIPAestheticScorer.setup() -> None

Set up the CLIPAestheticScorer model.

class nemo_curator.models.clip.CLIPImageEmbeddings(
    model_dir: str
)

Bases: ModelInterface

Interface for generating CLIP image embeddings from input images.

device

= 'cuda' if torch.cuda.is_available() else 'cpu'

dtype

= torch.float32

model_id_names

list[str]

Get the model ID names.

nemo_curator.models.clip.CLIPImageEmbeddings.__call__(
    images: torch.Tensor | numpy.typing.NDArray[numpy.uint8] | list[numpy.ndarray]
) -> torch.Tensor

Call the CLIPImageEmbeddings model.

Parameters:

images

torch.Tensor | npt.NDArray[np.uint8] | list[np.ndarray]

The images to embed.

Returns: torch.Tensor

The embeddings.

nemo_curator.models.clip.CLIPImageEmbeddings.download_weights_on_node(
    model_dir: str
) -> None

classmethod

Download the weights for the CLIPImageEmbeddings model on the node.

nemo_curator.models.clip.CLIPImageEmbeddings.encode_text(
    texts: list[str]
) -> torch.Tensor

Encode text(s) to normalized CLIP text embeddings.

Parameters:

texts

list[str]

List of strings to encode.

Returns: torch.Tensor

Normalized text embeddings, shape (len(texts), dim).

nemo_curator.models.clip.CLIPImageEmbeddings.setup() -> None

Set up the CLIPImageEmbeddings model.

nemo_curator.models.clip._CLIP_MODEL_ID: Final = 'openai/clip-vit-large-patch14'

nemo_curator.models.clip._CLIP_MODEL_REVISION: Final = '32bd642'