> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.multimodal.distributed_iterable

DistributedIterableDataset base for BAGEL-style data pipelines.

## Module Contents

### Classes

| Name                                                                                                                           | Description                                         |
| ------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------- |
| [`DistributedIterableDataset`](#nemo_automodel-components-datasets-multimodal-distributed_iterable-DistributedIterableDataset) | Base class for rank/worker-aware iterable datasets. |

### Data

[`logger`](#nemo_automodel-components-datasets-multimodal-distributed_iterable-logger)

### API

```python
class nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset(
    dataset_name,
    local_rank = 0,
    world_size = 1,
    num_workers = 8
)
```

**Bases:** `IterableDataset`

Base class for rank/worker-aware iterable datasets.

Owns a private `rng` used only to shuffle file paths deterministically
in :meth:`set_epoch` — NOT used for per-sample randomness. Per-sample
randomness still goes through the Python global `random` module (see
:mod:`packing` for the reseed hook).

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset.__iter__()
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset._get_worker_data_status(
    worker_id
)
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset._log_drop(
    reason,
    message,
    args = (),
    every = 100,
    exc_info = False
)
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset._set_worker_resume_data_status(
    worker_id,
    status
)
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset.get_data_paths(
    args = (),
    kwargs = {}
)
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset.get_data_paths_per_worker()
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset.load_state_dict(
    state_dict
)
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset.set_data_status(
    data_status
)
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset.set_epoch(
    seed = 42
)
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.DistributedIterableDataset.state_dict()
```

```python
nemo_automodel.components.datasets.multimodal.distributed_iterable.logger = logging.getLogger(__name__)
```