nemo_automodel.components.datasets.diffusion.text_to_video_dataset#
Module Contents#
Classes#
Text-to-Video dataset with multiresolution bucket organization. |
Functions#
Extract optional model-specific fields, moving to device. |
|
Concatenate optional video fields present in batch into result dict. |
Data#
API#
- nemo_automodel.components.datasets.diffusion.text_to_video_dataset.VIDEO_OPTIONAL_FIELDS#
(‘text_mask’, ‘text_embeddings_2’, ‘text_mask_2’, ‘image_embeds’)
- nemo_automodel.components.datasets.diffusion.text_to_video_dataset.load_optional_video_fields(data: dict, device: str = 'cpu') dict#
Extract optional model-specific fields, moving to device.
- nemo_automodel.components.datasets.diffusion.text_to_video_dataset.collate_optional_video_fields(
- batch: List[Dict],
- result: dict,
Concatenate optional video fields present in batch into result dict.
- class nemo_automodel.components.datasets.diffusion.text_to_video_dataset.TextToVideoDataset(
- cache_dir: str,
- model_type: str = 'wan',
- device: str = 'cpu',
Bases:
nemo_automodel.components.datasets.diffusion.base_dataset.BaseMultiresolutionDatasetText-to-Video dataset with multiresolution bucket organization.
Loads preprocessed .meta files organized by resolution bucket. Compatible with SequentialBucketSampler for multiresolution training.
Initialization
- Parameters:
cache_dir – Directory containing preprocessed cache (metadata.json + shards + WxH/*.meta)
model_type – Model type for model-specific fields (“wan”, “hunyuan”, etc.)
device – Device to load tensors to
- __getitem__(idx: int) Dict[str, torch.Tensor]#
Load a single video sample from its .meta file.