nemo_automodel.components.datasets.diffusion.text_to_video_dataset

View as Markdown

Module Contents

Classes

NameDescription
TextToVideoDatasetText-to-Video dataset with multiresolution bucket organization.

Functions

NameDescription
collate_optional_video_fieldsConcatenate optional video fields present in batch into result dict.
load_optional_video_fieldsExtract optional model-specific fields, moving to device.

Data

VIDEO_OPTIONAL_FIELDS

API

class nemo_automodel.components.datasets.diffusion.text_to_video_dataset.TextToVideoDataset(
cache_dir: str,
model_type: str = 'wan',
device: str = 'cpu'
)

Bases: BaseMultiresolutionDataset

Text-to-Video dataset with multiresolution bucket organization.

Loads preprocessed .meta files organized by resolution bucket. Compatible with SequentialBucketSampler for multiresolution training.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.TextToVideoDataset.__getitem__(
idx: int
) -> typing.Dict[str, torch.Tensor]

Load a single video sample from its .meta file.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.collate_optional_video_fields(
batch: typing.List[typing.Dict],
result: dict
) -> None

Concatenate optional video fields present in batch into result dict.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.load_optional_video_fields(
data: dict,
device: str = 'cpu'
) -> dict

Extract optional model-specific fields, moving to device.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.VIDEO_OPTIONAL_FIELDS = ('text_mask', 'text_embeddings_2', 'text_mask_2', 'image_embeds')