nemo_automodel.components.datasets.multimodal.interleave

View as Markdown

Interleaved-image parquet datasets for BAGEL editing + joint recipes.

Provides:

  • :class:InterleavedBaseIterableDataset – mixin that exposes _init_data / _add_text / _add_image / _add_video builders for per-row assembly of the packed-sequence plan.
  • :class:ParquetStandardIterableDataset – base class that iterates per-row-group over a list of parquet files; subclasses override parse_row to turn a pandas row into a dict compatible with :class:.packing.PackedDataset.
  • :class:UnifiedEditIterableDataset – concrete parse_row that emits interleaved (input-image, instruction, output-image) samples from an image-editing parquet schema (image_list + instruction_list).

When visual_gen=False these samples can still flow through packing while the model ignores VAE / flow-matching tensors. Stage 2 consumes the same yielded sample dicts for edit-generation loss.

Module Contents

Classes

NameDescription
InterleavedBaseIterableDatasetBuilder mixin for interleaved image/text/video sequence plans.
ParquetStandardIterableDatasetBase class: iterate per-(file, row_group) across a list of parquet shards.
UnifiedEditIterableDatasetImage-editing dataset: (input, instruction, output) chains over parquet.

Data

_MaximumDecompressedSize

_MegaByte

logger

API

class nemo_automodel.components.datasets.multimodal.interleave.InterleavedBaseIterableDataset()

Bases: DistributedIterableDataset

Builder mixin for interleaved image/text/video sequence plans.

Subclasses still provide __init__ + __iter__ + parse_row (via :class:ParquetStandardIterableDataset); this class only holds the per-item append helpers used inside parse_row.

nemo_automodel.components.datasets.multimodal.interleave.InterleavedBaseIterableDataset._add_image(
data,
image,
need_loss,
need_vae,
need_vit,
enable_cfg = True
)
nemo_automodel.components.datasets.multimodal.interleave.InterleavedBaseIterableDataset._add_text(
data,
text,
need_loss,
enable_cfg = True
)
nemo_automodel.components.datasets.multimodal.interleave.InterleavedBaseIterableDataset._add_video(
data,
frames,
frame_indexes,
need_loss,
need_vae,
enable_cfg = True
)
nemo_automodel.components.datasets.multimodal.interleave.InterleavedBaseIterableDataset._init_data()
class nemo_automodel.components.datasets.multimodal.interleave.ParquetStandardIterableDataset(
dataset_name,
transform,
tokenizer,
vit_transform,
data_dir_list,
num_used_data,
parquet_info,
local_rank = 0,
world_size = 1,
num_workers = 8,
data_status = None
)

Bases: DistributedIterableDataset

Base class: iterate per-(file, row_group) across a list of parquet shards.

Subclasses override :meth:parse_row to turn one pandas row into the dict schema consumed by :class:.packing.PackedDataset.

data_paths
nemo_automodel.components.datasets.multimodal.interleave.ParquetStandardIterableDataset.__iter__()
nemo_automodel.components.datasets.multimodal.interleave.ParquetStandardIterableDataset.get_data_paths(
data_dir_list,
num_used_data,
parquet_info
)
nemo_automodel.components.datasets.multimodal.interleave.ParquetStandardIterableDataset.parse_row(
row
)
class nemo_automodel.components.datasets.multimodal.interleave.UnifiedEditIterableDataset()

Bases: InterleavedBaseIterableDataset, ParquetStandardIterableDataset

Image-editing dataset: (input, instruction, output) chains over parquet.

Row schema (upstream BAGEL seedxedit_multi + compatibles): image_list: list of raw image bytes (at least 2). instruction_list: list of lists; instruction_list[i] is a set of equivalent phrasings for the edit that turns image_list[i] into image_list[i+1].

nemo_automodel.components.datasets.multimodal.interleave.UnifiedEditIterableDataset.parse_row(
row
)
nemo_automodel.components.datasets.multimodal.interleave._MaximumDecompressedSize = 1024
nemo_automodel.components.datasets.multimodal.interleave._MegaByte = 2 ** 20
nemo_automodel.components.datasets.multimodal.interleave.logger = logging.getLogger(__name__)