nemo_automodel.components.datasets.diffusion.text_to_video_dataset

Module Contents

Classes

Name	Description
`TextToVideoDataset`	Text-to-Video dataset with multiresolution bucket organization.

Functions

Name	Description
`collate_optional_video_fields`	Concatenate optional video fields present in batch into result dict.
`load_optional_video_fields`	Extract optional model-specific fields, moving to device.

Data

VIDEO_OPTIONAL_FIELDS

API

class nemo_automodel.components.datasets.diffusion.text_to_video_dataset.TextToVideoDataset(
    cache_dir: str,
    model_type: str = 'wan',
    device: str = 'cpu'
)

Bases: BaseMultiresolutionDataset

Text-to-Video dataset with multiresolution bucket organization.

Loads preprocessed .meta files organized by resolution bucket. Compatible with SequentialBucketSampler for multiresolution training.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.TextToVideoDataset.__getitem__(
    idx: int
) -> typing.Dict[str, torch.Tensor]

Load a single video sample from its .meta file.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.collate_optional_video_fields(
    batch: typing.List[typing.Dict],
    result: dict
) -> None

Concatenate optional video fields present in batch into result dict.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.load_optional_video_fields(
    data: dict,
    device: str = 'cpu'
) -> dict

Extract optional model-specific fields, moving to device.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.VIDEO_OPTIONAL_FIELDS = ('text_mask', 'text_embeddings_2', 'text_mask_2', 'image_embeds')