nemo_curator.models.cosmos_embed1

Module Contents

Classes

Name	Description
`CosmosEmbed1`	Cosmos-Embed1 embedding model.

Data

COSMOS_EMBED1_MODEL_REVISION_INFO

_COSMOS_EMBED1_VARIANTS_INFO

API

class nemo_curator.models.cosmos_embed1.CosmosEmbed1(
    variant: typing.Literal['224p', '336p', '448p'] = '336p',
    utils_only: bool = False,
    model_dir: str | None = None
)

Bases: ModelInterface

Cosmos-Embed1 embedding model.

_model

AutoModel | None = None

_weights_dir

= str(Path(model_dir) / self._weights_name)

_weights_name

= _COSMOS_EMBED1_VARIANTS_INFO[variant]

model_id_names

list[str]

Get the model ID names.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.download_processor_config_on_node(
    model_dir: str,
    variant: typing.Literal['224p', '336p', '448p'] = '336p'
) -> None

classmethod

Download the processor config for the CosmosEmbed1 model on the node.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.download_weights_on_node(
    model_dir: str,
    variant: typing.Literal['224p', '336p', '448p'] = '336p'
) -> None

classmethod

Download the weights for the CosmosEmbed1 model on the node.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.encode_video_frames(
    frames: numpy.typing.NDArray[numpy.float32]
) -> torch.Tensor

Encode video frames for the model.

Parameters:

frames

npt.NDArray[np.float32]

The input video frames.

Returns: torch.Tensor

The encoded video frames.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.evaluate(
    video_embd: torch.Tensor,
    text_embds: list[torch.Tensor]
) -> tuple[list[float], list[int]]

Evaluate the model.

Parameters:

video_embd

torch.Tensor

The video embedding.

text_embds

list[torch.Tensor]

The text embeddings.

Returns: tuple[list[float], list[int]]

The predicted probabilities and indices.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.formulate_input_frames(
    frames: list[numpy.typing.NDArray[numpy.uint8]]
) -> numpy.typing.NDArray[numpy.float32] | None

Formulate input frames for the model.

Parameters:

frames

list[npt.NDArray[np.uint8]]

List of video frames.

Returns: npt.NDArray[np.float32] | None

The formulated input frames.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.get_target_num_frames() -> int

Get the target number of frames for the model.

Returns: int

The target number of frames.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.get_text_embedding(
    text: str
) -> torch.Tensor

Get the text embedding for the given text.

Parameters:

text

str

The input text.

Returns: torch.Tensor

The text embedding.

nemo_curator.models.cosmos_embed1.CosmosEmbed1.setup() -> None

Set up the Cosmos-Embed1 model.

This method initializes the model and its configuration for processing video and text data.

nemo_curator.models.cosmos_embed1.COSMOS_EMBED1_MODEL_REVISION_INFO: Final = {'224p': '787e0b9', '336p': '0e8a28f', '448p': 'f60ec73'}

nemo_curator.models.cosmos_embed1._COSMOS_EMBED1_VARIANTS_INFO: Final = {'224p': 'nvidia/Cosmos-Embed1-224p', '336p': 'nvidia/Cosmos-Embed1-336p', '448p...