`nemo_automodel.components.datasets.diffusion.text_to_video_dataset`#

Module Contents#

Classes#

TextToVideoDataset

Text-to-Video dataset with multiresolution bucket organization.

Functions#

`load_optional_video_fields`	Extract optional model-specific fields, moving to device.
`collate_optional_video_fields`	Concatenate optional video fields present in batch into result dict.

Data#

VIDEO_OPTIONAL_FIELDS

API#

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.VIDEO_OPTIONAL_FIELDS#: (‘text_mask’, ‘text_embeddings_2’, ‘text_mask_2’, ‘image_embeds’)

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.load_optional_video_fields(data: dict, device: str = 'cpu') → dict#: Extract optional model-specific fields, moving to device.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset.collate_optional_video_fields( batch: List[Dict], result: dict, ) → None#: Concatenate optional video fields present in batch into result dict.

class nemo_automodel.components.datasets.diffusion.text_to_video_dataset.TextToVideoDataset( cache_dir: str, model_type: str = 'wan', device: str = 'cpu', )#

Bases: nemo_automodel.components.datasets.diffusion.base_dataset.BaseMultiresolutionDataset

Text-to-Video dataset with multiresolution bucket organization.

Loads preprocessed .meta files organized by resolution bucket. Compatible with SequentialBucketSampler for multiresolution training.

Initialization

Parameters:

cache_dir – Directory containing preprocessed cache (metadata.json + shards + WxH/*.meta)
model_type – Model type for model-specific fields (“wan”, “hunyuan”, etc.)
device – Device to load tensors to

__getitem__(idx: int) → Dict[str, torch.Tensor]#: Load a single video sample from its .meta file.

nemo_automodel.components.datasets.diffusion.text_to_video_dataset#