nemo_automodel.components.datasets.diffusion.collate_fns#
Collate functions and dataloader builders for multiresolution diffusion training.
Supports both image and video pipelines via the FlowMatchingPipeline expected batch format.
Module Contents#
Functions#
Production collate function with verification. |
|
Text-to-image collate function that transforms multiresolution batch output to match FlowMatchingPipeline expected format. |
|
Internal helper: create sampler + DataLoader from dataset and collate fn. |
|
Build a text-to-image multiresolution dataloader for TrainDiffusionRecipe. |
|
Video-compatible collate function for multiresolution video training. |
|
Build a multiresolution video dataloader for TrainDiffusionRecipe. |
Data#
API#
- nemo_automodel.components.datasets.diffusion.collate_fns.logger#
βgetLogger(β¦)β
- nemo_automodel.components.datasets.diffusion.collate_fns.collate_fn_production(batch: List[Dict]) Dict#
Production collate function with verification.
- nemo_automodel.components.datasets.diffusion.collate_fns.collate_fn_text_to_image(
- batch: List[Dict],
Text-to-image collate function that transforms multiresolution batch output to match FlowMatchingPipeline expected format.
- Parameters:
batch β List of samples from TextToImageDataset
- Returns:
Dict compatible with FlowMatchingPipeline.step()
- nemo_automodel.components.datasets.diffusion.collate_fns._build_multiresolution_dataloader_core(
- *,
- dataset,
- collate_fn: Callable,
- batch_size: int,
- dp_rank: int,
- dp_world_size: int,
- base_resolution: Tuple[int, int] = (512, 512),
- drop_last: bool = True,
- shuffle: bool = True,
- dynamic_batch_size: bool = False,
- num_workers: int = 4,
- pin_memory: bool = True,
- prefetch_factor: int = 2,
Internal helper: create sampler + DataLoader from dataset and collate fn.
- nemo_automodel.components.datasets.diffusion.collate_fns.build_text_to_image_multiresolution_dataloader(
- *,
- cache_dir: str,
- train_text_encoder: bool = False,
- batch_size: int = 1,
- dp_rank: int = 0,
- dp_world_size: int = 1,
- base_resolution: Tuple[int, int] = (256, 256),
- drop_last: bool = True,
- shuffle: bool = True,
- dynamic_batch_size: bool = False,
- num_workers: int = 4,
- pin_memory: bool = True,
- prefetch_factor: int = 2,
Build a text-to-image multiresolution dataloader for TrainDiffusionRecipe.
This wraps the existing TextToImageDataset and SequentialBucketSampler with a text-to-image collate function.
- Parameters:
cache_dir β Directory containing preprocessed cache (metadata.json, shards, and resolution subdirs)
train_text_encoder β If True, returns tokens instead of embeddings
batch_size β Batch size per GPU
dp_rank β Data parallel rank
dp_world_size β Data parallel world size
base_resolution β Base resolution for dynamic batch sizing
drop_last β Drop incomplete batches
shuffle β Shuffle data
dynamic_batch_size β Scale batch size by resolution
num_workers β DataLoader workers
pin_memory β Pin memory for GPU transfer
prefetch_factor β Prefetch batches per worker
- Returns:
Tuple of (DataLoader, SequentialBucketSampler)
- nemo_automodel.components.datasets.diffusion.collate_fns.collate_fn_video(
- batch: List[Dict],
- model_type: str = 'wan',
Video-compatible collate function for multiresolution video training.
Concatenates video_latents (5D) and text_embeddings (3D) along the batch dim, matching the format expected by FlowMatchingPipeline with SimpleAdapter.
- Parameters:
batch β List of samples from TextToVideoDataset
model_type β Model type for model-specific field handling
- Returns:
Dict compatible with FlowMatchingPipeline.step()
- nemo_automodel.components.datasets.diffusion.collate_fns.build_video_multiresolution_dataloader(
- *,
- cache_dir: str,
- model_type: str = 'wan',
- device: str = 'cpu',
- batch_size: int = 1,
- dp_rank: int = 0,
- dp_world_size: int = 1,
- base_resolution: Tuple[int, int] = (512, 512),
- drop_last: bool = True,
- shuffle: bool = True,
- dynamic_batch_size: bool = False,
- num_workers: int = 2,
- pin_memory: bool = True,
- prefetch_factor: int = 2,
Build a multiresolution video dataloader for TrainDiffusionRecipe.
Uses TextToVideoDataset with SequentialBucketSampler for bucket-based multiresolution video training (e.g. Wan, Hunyuan).
- Parameters:
cache_dir β Directory containing preprocessed cache (metadata.json + shards + WxH/*.meta)
model_type β Model type (βwanβ, βhunyuanβ, etc.)
device β Device to load tensors to
batch_size β Batch size per GPU
dp_rank β Data parallel rank
dp_world_size β Data parallel world size
base_resolution β Base resolution for dynamic batch sizing
drop_last β Drop incomplete batches
shuffle β Shuffle data
dynamic_batch_size β Scale batch size by resolution
num_workers β DataLoader workers
pin_memory β Pin memory for GPU transfer
prefetch_factor β Prefetch batches per worker
- Returns:
Tuple of (DataLoader, SequentialBucketSampler)