> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.multimodal.packing

PackedDataset + DataConfig — packed-sequence iterable for BAGEL training.

## Module Contents

### Classes

| Name                                                                                    | Description                                                                    |
| --------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
| [`DataConfig`](#nemo_automodel-components-datasets-multimodal-packing-DataConfig)       | Container for the packing-level knobs + grouped-dataset YAML dict.             |
| [`PackedDataset`](#nemo_automodel-components-datasets-multimodal-packing-PackedDataset) | Greedy pack of samples drawn from weighted groups into token-budgeted batches. |

### Data

[`logger`](#nemo_automodel-components-datasets-multimodal-packing-logger)

### API

```python
class nemo_automodel.components.datasets.multimodal.packing.DataConfig(
    grouped_datasets,
    text_cond_dropout_prob = 0.1,
    vit_cond_dropout_prob = 0.4,
    vae_cond_dropout_prob = 0.1,
    vae_image_downsample = 16,
    max_latent_size = 32,
    vit_patch_size = 14,
    max_num_patch_per_side = 70
)
```

Container for the packing-level knobs + grouped-dataset YAML dict.

```python
class nemo_automodel.components.datasets.multimodal.packing.PackedDataset(
    data_config,
    tokenizer,
    special_tokens,
    local_rank,
    world_size,
    num_workers,
    expected_num_tokens = 32768,
    max_num_tokens_per_sample = 16384,
    max_num_tokens = 36864,
    prefer_buffer_before = 16384,
    max_buffer_size = 50,
    interpolate_pos = False,
    use_flex = False,
    data_status = None,
    dataset_info = None
)
```

**Bases:** `IterableDataset`

Greedy pack of samples drawn from weighted groups into token-budgeted batches.

The dataset reseeds at iterator start so AM sees a deterministic
BAGEL-compatible packed-data stream regardless of earlier RNG consumption
during model construction or checkpoint loading.

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset.__iter__()
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset._grouped_dataset_state_dicts()
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset._load_grouped_dataset_state_dicts(
    states
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset._load_rng_state_dict(
    state
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset._log_drop(
    reason,
    message,
    args = (),
    every = 100
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset._rng_state_dict()
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset._set_resume_point(
    buffer,
    yielded_batches
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset.build_datasets(
    datasets_metainfo,
    data_status
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset.load_state_dict(
    state_dict
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset.pack_sequence(
    sample,
    sequence_status
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset.set_epoch(
    seed
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset.set_sequence_status()
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset.state_dict()
```

```python
nemo_automodel.components.datasets.multimodal.packing.PackedDataset.to_tensor(
    sequence_status
)
```

```python
nemo_automodel.components.datasets.multimodal.packing.logger = logging.getLogger(__name__)
```