nemo_curator.models.cosmos_embed1

View as Markdown

Module Contents

Classes

NameDescription
CosmosEmbed1Cosmos-Embed1 embedding model.

Data

COSMOS_EMBED1_MODEL_REVISION_INFO

_COSMOS_EMBED1_VARIANTS_INFO

API

class nemo_curator.models.cosmos_embed1.CosmosEmbed1(
variant: typing.Literal['224p', '336p', '448p'] = '336p',
utils_only: bool = False,
model_dir: str | None = None
)

Bases: ModelInterface

Cosmos-Embed1 embedding model.

_model
AutoModel | None = None
_weights_dir
= str(Path(model_dir) / self._weights_name)
_weights_name
= _COSMOS_EMBED1_VARIANTS_INFO[variant]
model_id_names
list[str]

Get the model ID names.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.download_processor_config_on_node(
model_dir: str,
variant: typing.Literal['224p', '336p', '448p'] = '336p'
) -> None
classmethod

Download the processor config for the CosmosEmbed1 model on the node.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.download_weights_on_node(
model_dir: str,
variant: typing.Literal['224p', '336p', '448p'] = '336p'
) -> None
classmethod

Download the weights for the CosmosEmbed1 model on the node.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.encode_video_frames(
frames: numpy.typing.NDArray[numpy.float32]
) -> torch.Tensor

Encode video frames for the model.

Parameters:

frames
npt.NDArray[np.float32]

The input video frames.

Returns: torch.Tensor

The encoded video frames.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.evaluate(
video_embd: torch.Tensor,
text_embds: list[torch.Tensor]
) -> tuple[list[float], list[int]]

Evaluate the model.

Parameters:

video_embd
torch.Tensor

The video embedding.

text_embds
list[torch.Tensor]

The text embeddings.

Returns: tuple[list[float], list[int]]

The predicted probabilities and indices.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.formulate_input_frames(
frames: list[numpy.typing.NDArray[numpy.uint8]]
) -> numpy.typing.NDArray[numpy.float32] | None

Formulate input frames for the model.

Parameters:

frames
list[npt.NDArray[np.uint8]]

List of video frames.

Returns: npt.NDArray[np.float32] | None

The formulated input frames.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.get_target_num_frames() -> int

Get the target number of frames for the model.

Returns: int

The target number of frames.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.get_text_embedding(
text: str
) -> torch.Tensor

Get the text embedding for the given text.

Parameters:

text
str

The input text.

Returns: torch.Tensor

The text embedding.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.setup() -> None

Set up the Cosmos-Embed1 model.

This method initializes the model and its configuration for processing video and text data.

nemo_curator.models.cosmos_embed1.COSMOS_EMBED1_MODEL_REVISION_INFO: Final = {'224p': '85f5627', '336p': '5d8309d', '448p': '9f4ff4d'}
nemo_curator.models.cosmos_embed1._COSMOS_EMBED1_VARIANTS_INFO: Final = {'224p': 'nvidia/Cosmos-Embed1-224p', '336p': 'nvidia/Cosmos-Embed1-336p', '448p...